# How can I extract / match information from huge txt files please?



## eVILRigby (Nov 25, 2003)

Hello,

firstly thanks for taking the time to have a look at this. I realise I'm asking something that's quite involved, but if anyone can even point me in the right direction it would be greatly appreciated. It's probably easiest if I explain what I'm trying to do in it's entirety, then if anyone can help with any of it, that would be brilliant thank you.

It's a two or three stage problem really, both parts concerning how to get or match data from files way too large to open in excel or access etc. with a view to adding considerable numbers of extra product lines to my Amazon pro seller account, then keeping the stock and price figures updated. I get the impression this sort of this thing, or at least parts of it should be possible by parsing txt files, but I have no idea how to do that in practice unfortunately.

Stage 1
Basically I first want to extract certain information from a 2Gb .txt file. The text file contains approximately 6.2 million product records (books mostly) and is provided to me weekly. I want to extract the same single piece of information (the 'I3' field) for every product which has one of a certain set of values for a different piece of information (the 'GC' field). 

The I3 (that's a capital I not a 1 btw) field gives the 13 digit unique product identifier. The GC is basically a category code.

So all I want to end up with is a column of 13 digit codes. Every other bit of information is irrelevant. 

Below is an example of three products from the text file. The ** is the separator between products.

**START
IB 0702026174
BI Hardback
AU EVANS, WILLIAM CHARLES
BC GBC
CO UNITED KINGDOM
EI 15 REV ED
IU 275 ILLS.
PD 20011213
NP 600
RP 69.99
RI 69.99
RE 69.99
DI 279 x 221
PU ELSEVIER HEALTH SCIENCES
YP 2001
RC U
RS TERTIARY EDUCATION (US: COLLEGE)
TI TREASE AND EVANS PHARMACOGNOSY
DE Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and aromatherapy.
EA 9780702026171
RF R
WE 1780
SG 1
GC O00
I3 9780702026171
PC S6.1T
**
IB 0237526301
BI Paperback
AU LEVINE, KAREN
BC YFC
CO UNITED KINGDOM
IU PHOTOGRAPHS, ILLUSTRATIONS
PD 20030627
NP 128
RP 5.99
RI 5.99
RE 5.99
DI 198 x 130
PU EVANS PUBLISHING GROUP
YP 2003
RC J
RS CHILDREN / JUVENILE
TI HANA'S SUITCASE
DE In March 2000, a suitcase arrived at a children's Holocaust education centre in Tokyo. It belonged to a orphan girl called Hana Brady. Everyone was desperate to discover the story of Hana - Who was she? What had happened to her? This is her true story. 
EA 9780237526306
RF R
WE 220
SG 4
GC C09
I3 9780237526306
PC Y2.1
**
IB 1857152816
BI Hardback
AU ALLENDE, ISABEL
BC FBC
CO UNITED KINGDOM
EI NEW ED
PD 20050310
TP THE
NP 520
RP 10.99
RI 10.99
RE 10.99
DI 210 x 133
PU EVERYMAN'S LIBRARY
YP 2005
RC G
RS GENERAL (US: TRADE)
SR EVERYMAN'S LIBRARY CONTEMPORARY CLASSICS
TI HOUSE OF THE SPIRITS
TR BOGIN, MAGDA
PI Chilean writer Isabel Allende's classic bestseller is a richly symbolic
PI family saga that is also the riveting story of an unnamed Latin American
PI country's turbulent history. Translated by Magda Bogin, introduced by
PI Christopher Hitchens.
EA 9781857152814
RF R
WE 562
SG 3
GC F00
I3 9781857152814
PC F1.1
** 
(the final line in the real file will be
**END, a number 
where the number is the number of products)

Stage 2
I want to take that column of outputted 13 digits codes and run it against a second .txt file (about 300 Mb in size) to pull certain records from it. This is a price and stock file. So I am in effect checking availability of these products in my list of 13 digit codes.

I also want to do a couple of other operations at the same time to finish up with two slightly different files

Here are a few lines from the second text file below. There are seven variables per line.

I want to retain the first variable, the second variable is the 13 digit code for matching, I also want to retain the third and fifth variables, and lose the fourth, sixth and seventh variables. There isn't always a seventh variable, it depends on what code is in the sixth part. For those lines where there isn't a seventh variable you'll see they just end in a comma. 

However I am ONLY interested in lines where there is NO entry for the SIXTH column (because the codes mostly pertain to unavailable items) as these are items which are definitely available. Those lines you can tell because they end in two commas. i.e. there is an empty record field for the sixth variable also (if there is no sixth variable, there is never a seventh btw. If there is a sixth variable, sometimes there is a seventh, sometimes not).

(It is actually feasible at some point in the future I might want to also match against line containing certain specific codes in the sixth variable field. But not now.)

HEADER,START
3839149940,9783839149942,12.90,25.00,0,MD,07/11/11
3839137489,9783839137482,9.50,25.00,0,MD,07/11/11
1402080557,9781402080555,99.50,21.50,0,MD,14/11/11
3839133327,9783839133323,9.50,25.00,0,MD,07/11/11
3839121558,9783839121559,8.50,25.00,0,MD,07/11/11
818448884X,9788184488845,54.00,20.00,1,GXC,
8184488947,9788184488944,90.00,20.00,0,GXC,
1907816054,9781907816055,9.95,20.00,0,NYP,30/11/2011
9350255170,9789350255179,29.00,20.00,0,NYP,15/12/2011
9350254220,9789350254226,29.00,20.00,4,,
1906779465,9781906779467,14.99,0.00,2,,
1906779635,9781906779634,9.99,0.00,0,GXC,
(there will also be a final line in the big file which will be
TRAILER, a number 
which is the number of products)

So here, immediately we would only be looking to match against these two lines to see if the second variable was in the list of 13 digit codes from our first stage as they are the only two which don't have a code in the sixth variable.

9350254220,9789350254226,29.00,20.00,4,,
1906779465,9781906779467,14.99,0.00,2,,

So I would therefore, as mentioned above, like to end up with the following information from those two lines

9350254220,9789350254226,29.00,4
1906779465,9781906779467,14.99,2

Now it is from this point that I would like to finish up with two slightly different versions of the above information.

1) A version of the .txt file where the first remaining variable is stripped out, and a header line inserted, just leaving me with, for example...

sku,price,quantity
9789350254226,29.00,4
9781906779467,14.99,2

This txt file would be then effectively my daily file for refreshing price and stock which I just drop in a local folder on my machine which is a VPN link to Amazon.

2) A version where we have five extra supplemental columns of information which are ALWAYS identical... 1,1,11,3,6
And a different header line inserted.

sku,product-id,price,quantity,product-id-type,item-condition,expedited-shipping,will-ship-internationally
9350254220,9789350254226,29.00,4,1,1,11,3,6
1906779465,9781906779467,14.99,2,1,1,11,3,6

This would be the file for adding (and also updating) records to my Amazon inventory which I upload weekly.

Any assistance would be very gratefully received. 

Jonathan


----------



## Squashman (Apr 4, 2003)

2GB text file! You better have a whole bunch of ram in your computer! When you parse a text file in a FOR Loop it puts the entire file into memory. I only have 1GB of ram on my pc at work and my batch files will usually crap out at about 600MB.

Excel 2007 and above should be able to handle it and a database would be a really good idea if you know access really well.


----------



## eVILRigby (Nov 25, 2003)

Hi, many thanks for replying. 

Well fortunately I do have a machine with 4GB RAM for things like this! Do you think that would work?

I don't think I can use Excel because there are going to be about 20 million rows and the row limit is about a million, which rules it out for even the 300 MB file as that will have 6.2 million rows. Similarly the absolute file size for Access is 2GB. I am going to check with the supplier if the file will open in Access. The 2GB size was quoted to me, I don't know if it's slightly less or more or what. I don't get the file until I sign up for the service. It's an annual fee so I was a little reluctant to sign up and start paying until I was ready to rock. But I'm sure they would tell me that. Given I could probably open the 300MB file in Access, if I could chop the 2GB file down to the list of 13 digits codes by some other means, that would easily open in Access too. So I could probably do most of what I need in Access. It's just that first stage.

Sadly, I have no idea what a FOR loop is. Any chance you could explain it to me for extracting the I3 data I need based on specifying certain values for the GC variable? The GC line is always immediately above the I3 variable as you can see in the example text above if that helps in terms of coding?

many thanks, I really appreciate it.

Jonathan


----------



## eVILRigby (Nov 25, 2003)

I wouldn't have an issue with the GC lines being left in either if it's problematic to strip that as well because it is being referred to. Because the file size afterwards will definitely be small enough to open in Access.


----------



## Squashman (Apr 4, 2003)

I shouldn't have a problem getting the I3 codes out. Need you to do me a favor from here on out.

Post anything relevant to CODE or input and output of your data inside BB CODE TAGS.
You can see some examples of how to use them on this site.
http://en.wikipedia.org/wiki/BBCode


----------



## Squashman (Apr 4, 2003)

So the easy way to get the I3 codes out is to just use the FINDSTR command.

```
C:\Users\Squash\batch files\Rigby>findstr /b /C:"I3 " Data.txt>output.txt
```
Now if we look at the output file we will see the three I3 codes from the examples you provided but you will notice that the field name is in their as well.

```
C:\Users\Squash\batch files\Rigby>type output.txt
I3 9780702026171
I3 9780237526306
I3 9781857152814
```
So if we want to remove the I3 from the output data we have to basically use a FOR loop to remove it. We can put this into a batch file an execute it.

```
@echo off
IF EXIST output.txt del /q output.txt
FOR /F "tokens=2 delims= " %%I in ('findstr /b /C:"I3 " Data.txt') do echo %%I>>output.txt
```
So now when we run that batch file we will get this output. I am just displaying the output with the type command for brevity. You would probably want to look at it in Notepad after you are done.

```
C:\Users\Squash\batch files\Rigby>rigby.bat

C:\Users\Squash\batch files\Rigby>type output.txt
9780702026171
9780237526306
9781857152814
```


----------



## Squashman (Apr 4, 2003)

You know what would have been really nice to have?
Valid data!
The three I3 numbers you gave me from the first text file are not in the 2nd text file. I will add them to the 2nd test file to test with. I am a real stickler for details. I like things to be clear cut and black and white.


----------



## Squashman (Apr 4, 2003)

You are not clearly defining your data again.
In this one you say that 9789350254226 is the SKU


> 1) A version of the .txt file where the first remaining variable is stripped out, and a header line inserted, just leaving me with, for example...
> 
> sku,price,quantity
> 9789350254226,29.00,4
> ...


In this one you are saying that 9789350254226 is the Product ID


> 2) A version where we have five extra supplemental columns of information which are ALWAYS identical... 1,1,11,3,6
> And a different header line inserted.
> 
> sku,product-id,price,quantity,product-id-type,item-condition,expedited-shipping,will-ship-internationally
> ...


I have for the most part have the 1st two stages done above.

```
@echo off
IF EXIST MathOutput.txt del /q MatchOutput.txt
FOR /F "tokens=2 delims= " %%A in ('findstr /b /C:"I3 " Data.txt') do CALL :MATCH %%A

exit /b

:MATCH
FOR /F "tokens=1-5* delims=," %%I in ('findstr "%~1" Match.txt') do (
	IF "%%~J"=="%~1" (
		IF NOT "%%N"=="" echo %%I,%%J,%%K,%%M>>MatchOutput.txt
		EXIT /B
	)
)
EXIT /b
```
When I run it with these three I3 codes in the 1st text file. I do have all the other lines in the 1st text file just shortening it for the sake of brevity.

```
I3 9780702026171
I3 9780237526306
I3 9781857152814
```
And this as the 2nd file.

```
HEADER,START
3839149940,9780702026171,12.90,25.00,0,MD,07/11/11
3839137489,9783839137482,9.50,25.00,0,MD,07/11/11
1402080557,9781402080555,99.50,21.50,0,MD,14/11/11
3839133327,9783839133323,9.50,25.00,0,MD,07/11/11
3839121558,9783839121559,8.50,25.00,0,MD,07/11/11
818448884X,9788184488845,54.00,20.00,1,GXC,
8184488947,9788184488944,90.00,20.00,0,GXC,
1907816054,9781907816055,9.95,20.00,0,NYP,30/11/2011
9350255170,9789350255179,29.00,20.00,0,NYP,15/12/2011
9350254220,9780237526306,29.00,20.00,4,,
1906779465,9781906779467,14.99,0.00,2,,
1906779635,9781857152814,9.99,0.00,0,GXC,
TRAILER,10
```
I get this for the output.

```
3839149940,9780702026171,12.90,0
1906779635,9781857152814,9.99,0
```
I will finish up the other code after you clarify how you need your output for the other two files. Which will be rather easy to add. You could probably figure it yourself. Just need to add two lines of echo command in the 2nd for loop.


----------



## eVILRigby (Nov 25, 2003)

Hi Squashman.

I really appreciate this.

Yeah, sorry about the poor quality data. The example files they sent are a bit basic to say the least. I will be getting my hand in my wallet for the annual subscription on Monday I think, especially given how close to a solution you're getting me already, so I have some proper data.

With respect to extracting the I3 data, the minor wrinkle is that rather than extracting every single I3 line, I only wish to extract those lines where the line above, the GC variable, is equal to certain values. I guess one would use the FINDSTR command to look for the "GC O00" and "GC F00" to find the two correct records in this instance, but how would one code it to have a) multiple values of the variable FINDSTR is looking for, and b) output the I3 variable for the same record rather than the GC variable itself? I can't simply pull all the I3 data because it would include many items I wouldn't want to list, including items that aren't released yet etc.

Also, if I can risk being a right, royal pain in the behind at this point, I've just been advised (I've been trying to clarify a few things today which absolutely would have been a lot easier to establish myself with a full data set to work with) it would be preferable to output the EA variable rather than the I3 variable. On the face of it with my limited data, they are always the same value, but apparently this only holds absolutely true for books. For DVDs, CDs and ebooks, this is not true. Therefore I should be pulling the EA variable (which is the correct value for ordering from my supplier for all mediums), which is 4 lines above the GC variable. Sorry about that, I hope that doesn't complicate matters too much?

Re: the error you spotted in my labelling, the first file is correct, the second was labelling wrongly.



> 1) A version of the .txt file where the first remaining variable is stripped out, and a header line inserted, just leaving me with, for example...
> 
> sku,price,quantity
> 9789350254226,29.00,4
> ...


The second one I had transposed the sku and product-id labels. It SHOULD be as follows. Basically the SKU is always 13 digits, the product-id (for Amazon) is 10 digits.



> 2) A version where we have five extra supplemental columns of information which are ALWAYS identical... 1,1,11,3,6
> And a different header line inserted.
> 
> product-id,sku,price,quantity,product-id-type,item-condition,expedited-shipping,will-ship-internationally
> ...


As much as I would love to say I could take it on from what you given me so far, err... I'll have to admit I couldn't. So if you have got time to finish off I would be extremely grateful. The nice thing is, once I've been shown how to do stuff, I am usually capable of tweaking and adjusting stuff, so from that point on if I need to fettle it slightly going forward I'll have a fighting chance.

Again, many thanks for your help. If it makes you feel any less exploited by me and feel you're contributing to the lighter side of life, I'm actually a comic shop owner. 

Jonathan


----------



## Squashman (Apr 4, 2003)

You are now asking for the next to impossible. I don't believe there is anyway for me to pull the EA out and then look 4 lines further down at the GC to see if it equals something. Batch files process line by line. I will see what I can do but you are talking about a lot of extra code and the only reason I had free time today was because I was home sick.

In regards to access being limited to 2GB you could split the data up into 2 databases that reference each other. I know I did that years ago when I had to take a class on Access to get my degree but hell if I remember how to do it. There is also the free MS SQL Express database you could use as well. I believe that has a 4GB limit.

Back to your Regular Show viewing.


----------



## eVILRigby (Nov 25, 2003)

Well look please, please don't expend a ridiculous amount on this Squashman. It really wasn't my intention to get someone to write something really ridiculously complicated for free. I just was hoping there was a simple way to do it. But I realise what you mean about batch files going line by line. I wasn't really aware of that, but it's obvious now you point it out. And I see that's obviously a big issue in terms of lines that you've already gone past.

So... possibly a stupid question... 

But thinking screw everything that isn't a book (which the vast majority of the products in the file will be anyway) and let's just go with the I3 variable rather than the EA variable. Because given that the I3 is the line directly below the GC line is it at all possible to (easily) say if the GC variable equals X, Y or Z, then pull the I3 variable on the line below, if not then ignore it, and go to the look at the next GC variable?

My big motto is life is 'with the ideal comes the actual.' In other words, deal in possibilities, not impossibilities.

Sorry to hear you're off sick, hope it's nothing too bad, and apologies if I'm making you feel worse!


----------



## eVILRigby (Nov 25, 2003)

If it was a stupid question, double apologies in advance...


----------



## eVILRigby (Nov 25, 2003)

And many thanks for the tip on the free MS SQL Express database. I will check into that and it's extremely useful to know anyway re: the file size limit.


----------



## Squashman (Apr 4, 2003)

I got some ideas on how to accomplish this but this is probably going to be real advanced batch code that most people who write batch files never see. I need to spend some time testing it.


----------



## eVILRigby (Nov 25, 2003)

Many thanks man, I do genuinely appreciate it. I'm hopefully going to get hold of the data early next week so if you'd like some larger data sets to test with I could forward them across. Or just give you ftp details if you'd want the whole thing.

I am also going to try My SQL Express as you kindly suggested when I get the files, simply because it would be very useful to have something that can actually open the file, as inevitably I am going to want to do other things with it, including matching what I currently stock against what they can supply en mass etc.


----------



## eVILRigby (Nov 25, 2003)

Hey Squashman,

I'm still waiting for access to the bulk data but what I have done is constructed some better test data. Find below two files of 41 records each with product codes that actually match. 


```
**START
IB 0702026174
BI Hardback
AU EVANS, WILLIAM CHARLES
BC GBC
CO UNITED KINGDOM
EI 15 REV ED
IU 275 ILLS.
PD 20011213
NP 600
RP 69.99
RI 69.99
RE 69.99
DI 279 x 221
PU ELSEVIER HEALTH SCIENCES
YP 2001
RC U
RS TERTIARY EDUCATION (US: COLLEGE)
TI TREASE AND EVANS PHARMACOGNOSY
DE Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and aromatherapy.
EA 9780702026171
RF R
WE 1780
SG 1
GC O00
I3 9780702026171
PC S6.1T
**
IB 0237526301
BI Paperback
AU LEVINE, KAREN
BC YFC
CO UNITED KINGDOM
IU PHOTOGRAPHS, ILLUSTRATIONS
PD 20030627
NP 128
RP 5.99
RI 5.99
RE 5.99
DI 198 x 130
PU EVANS PUBLISHING GROUP
YP 2003
RC J
RS CHILDREN / JUVENILE
TI HANA'S SUITCASE
DE In March 2000, a suitcase arrived at a children's Holocaust education centre in Tokyo. It belonged to a orphan girl called Hana Brady. Everyone was desperate to discover the story of Hana - Who was she? What had happened to her? This is her true story.
EA 9780237526306
RF R
WE 220
SG 4
GC C09
I3 9780237526306
PC Y2.1
**
IB 1857152816
BI Hardback
AU ALLENDE, ISABEL
BC FBC
CO UNITED KINGDOM
EI NEW ED
PD 20050310
TP THE
NP 520
RP 10.99
RI 10.99
RE 10.99
DI 210 x 133
PU EVERYMAN'S LIBRARY
YP 2005
RC G
RS GENERAL (US: TRADE)
SR EVERYMAN'S LIBRARY CONTEMPORARY CLASSICS
TI HOUSE OF THE SPIRITS
TR BOGIN, MAGDA
PI Chilean writer Isabel Allende's classic bestseller is a richly symbolic
PI family saga that is also the riveting story of an unnamed Latin American
PI country's turbulent history. Translated by Magda Bogin, introduced by
PI Christopher Hitchens.
EA 9781857152814
RF R
WE 562
SG 3
GC F00
I3 9781857152814
PC F1.1
**
IB 0091780721
AV R/P
BI hardback
AU SMITH, JANET
BC YWHB
CO UK
IU 100 COLOUR PHOTOS
PD 19930715
NP 64
RP 7.99
RI 7.99
RE 7.99
DI 272 x 193
PU EBURY PRESS
YP 1993
RC C
RS CHILDREN
TI "GOOD HOUSEKEEPING" KID'S COOK BOOK
DE A cookery book designed to allow children themselves to take the lead, with fun and healthy ideas for all sorts of food they can make themselves, with the added bonus of teaching them the basics of cookery in the process. Each tested recipe is in easy-to-follow steps and illustrated in colour.
EA 9780091780722
RF R
WE 452
SG 3
GC J00
I3 9780091780722
**
IB 041507875X
BI Paperback
AU FISKE, JOHN
BC GRB
CO UNITED KINGDOM
EI NEW ED
IU ILLUSTRATIONS, 5 B&W PHOTOGRAPHS, REFERENCES, INDEX
PD 19890824
NP 240
RP 16.99
RI 16.99
RE 16.99
DI 216 x 138
PU TAYLOR & FRANCIS LTD
YP 1989
RC UP
RS POSTGRADUATE, RESEARCH & SCHOLARLY
TI READING THE POPULAR
DE Designed as a companion to 'Understanding Popular Culture', 'Reading the Popular' is a series of readings about today's cultural phenomena. The book highlights the conflicting responses popular culture can evoke.
EA 9780415078757
RF R
WE 308
SG 1
GC I01
I3 9780415078757
PC S3.6
**
IB 0415078768
BI Paperback
AU FISKE, JOHN
BC GRB
CO UNITED KINGDOM
EI NEW ED
IU B&W PHOTOGRAPHS, BIBLIOGRAPHY, INDEXES
PD 19890824
NP 224
RP 17.99
RI 17.99
RE 17.99
DI 216 x 138
PU TAYLOR & FRANCIS LTD
YP 1989
RC UP
RS POSTGRADUATE, RESEARCH & SCHOLARLY
TI UNDERSTANDING POPULAR CULTURE
DE In this companion volume to Reading the Popular, Fiske presents a radical theory of what it means for culture to be popular.
EA 9780415078764
RF R
WE 32
GC K04
I3 9780415078764
**
IB 0415078989
AV GXC
BI Paperback
AU BRYMAN, ALAN
BC JBB
CO UNITED KINGDOM
EI NEW ED
PD 19880623
NP 208
RP 24.99
RI 24.99
RE 24.99
DI 216 x 138
PU TAYLOR & FRANCIS LTD
YP 1988
RC P
RS PROFESSIONAL & VOCATIONAL
SR CONTEMPORARY SOCIAL RESEARCH S.
TI QUANTITY AND QUALITY IN SOCIAL RESEARCH
EA 9780415078986
RF R
WE 272
SG 1
GC K02
I3 9780415078986
PC S3.2
**
IB 1588262634
BI Paperback
AU WALKER, LIZ
AU REID, GRAEME
AU CORNELL, MORNA
BC JBNP9
CO UNITED STATES
IU COLOUR PHOTOGRAPHS
PD 20040228
NP 145
RP 17.50
RI 17.50
RE 17.50
DI 241 x 171
PU LYNNE RIENNER PUBLISHERS INC,US
YP 2004
RC G
RS GENERAL (US: TRADE)
TI WAITING TO HAPPEN
ST HIV/AIDS IN SOUTH AFRICA - THE BIGGER PICTURE
DE Why are more women than men in South Africa HIV positive? This work - incorporating evocative photographs and the voices of scholars, practitioners, and victims of the epidemic - looks at the social, cultural, and historical aspects of HIV/AIDS in South Africa.
EA 9781588262639
RF R
WE 415
SG 1
GC K02
I3 9781588262639
PC S3.3
**
IB 0006744958
AV R/P
BI paperback (B format)
AU JARMAN, JULIA
BC YF
CO UK
IL BURNARD, DAMON
EI NEW ED
IU ILLUSTRATIONS
PD 19940302
NP 64
RP 3.99
RI 3.99
RE 3.99
DI 197 x 132
PU HARPERCOLLINS PUBLISHERS
YP 1994
RC C
RS CHILDREN
SR JETS S.
TI GEORGIE AND THE PLANET RAIDER
DE Part of a series for the child who is just beginning to enjoy reading or for reluctant older readers. In this story, Georgie the computer whiz pits her wits against a new and even more deadly enemy. Can she rescue her sister from the evil clutches of Planet Raider?
EA 9780006744955
RF F
WE 60
SG 3
GC C00
I3 9780006744955
**
IB 0006745113
BI Paperback
AU MORPURGO, MICHAEL, M.B.E.
BC YF
CO UNITED KINGDOM
EI NEW ED
IU ILLUSTRATIONS
PD 20030303
TP THE
NP 64
RP 4.99
RI 4.99
RE 4.99
DI 197 x 130
PU HARPERCOLLINS PUBLISHERS
YP 2003
RC J
RS CHILDREN / JUVENILE
TI DANCING BEAR
DE A gentle and deeply moving story of a young girl and her bear, told with great charm by a master storyteller.
EA 9780006745112
RF R
WE 59
SG 3
GC C00
I3 9780006745112
PC Y2.1
**
IB 0006745121
AV R/P
BI paperback (B format)
AU GIRLING, BROUGH
BC YFP
CO UK
IL BLUNDELL, TONY
EI NEW ED
IU ILLUSTRATIONS
PD 19931130
NP 64
RP 3.99
RI 3.99
RE 3.99
DI 196 x 128
PU HARPERCOLLINS PUBLISHERS
YP 1993
RC C
RS CHILDREN
SR JETS S.
TI NORA BONE
DE Nora Bone is a police dog with a nose for sniffing out trouble and causing chaos. But she has to prove to the Chief Inspector that she really is the perfect police dog when she's asked to police the school fete.
EA 9780006745129
RF R
WE 62
SG 2
GC C00
I3 9780006745129
**
IB 1402711468
BI Paperback
AU WILSON, FRED
AU ALBERSTON, BRUCE
BC WDMG1
CO UNITED STATES
IU ILLUSTRATIONS
PD 20050420
NP 224
RP 9.99
RI 9.99
RE 9.99
DI 210 x 140
PU STERLING PUBLISHING CO INC
YP 2005
RC G
RS GENERAL (US: TRADE)
SR OFFICIAL MENSA PUZZLE BOOK
TI 303 PRACTICAL CHESS PUZZLES
PI 100 scenarios for each of 3 levels - advanced beginner, intermediate and
PI tournament. The sequel to '303 Tactical Chess Puzzles'.
EA 9781402711466
RF R
WE 245
SG 2
GC S00
I3 9781402711466
PC T11.5
**
IB 1405120533
BI Paperback
AU GARRISON, PHILIP
BC TBC
CO UNITED KINGDOM
IU  163
PD 20050525
NP 296
RP 24.99
RI 24.99
RE 24.99
DI 244 x 173
PU BLACKWELL PUBLISHING LTD
YP 2005
RC P
RS PROFESSIONAL & VOCATIONAL
TI BASIC STRUCTURES FOR ENGINEERS AND ARCHITECTS
DE Offers students of civil engineering and architecture with a grounding in the fundamentals of structures, and a 'feel' for the way buildings behave structurally. This book intends to explain structural concepts, using analogies and examples to illustrate the points.
EA 9781405120531
RF R
WE 142
SG 1
GC K00
I3 9781405120531
PC S9.0
**
IB 0415247691
BI Paperback
AU MCKENZIE, JON (UNIVERSITY OF THE ARTS, USA)
BC CFG
CO UNITED STATES
IU ILLUSTRATIONS
PD 20010322
NP 320
RP 21.99
RI 21.99
RE 21.99
DI 216 x 138
PU TAYLOR & FRANCIS LTD
YP 2001
RC UP
RS POSTGRADUATE, RESEARCH & SCHOLARLY
TI PERFORM OR ELSE
ST FROM DISCIPLINE TO PERFORMANCE
DE This text examines the meaning of the word "performance" in the 21st century. The author asserts that there is a relationship between cultural, organizational and technological performance and demonstrates that all three operate together to create powerful and contradictory pressures.
EA 9780415247696
RF R
WE 454
SG 1
GC I02
I3 9780415247696
**
IB 1417944579
AV MD 
BI Paperback
AU COLTON, CALVIN
BC CVG
CO UNITED STATES
PD 20040819
TP THE
NP 508
RP 25.95
RI 25.95
RE 25.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI LIFE AND TIMES OF HENRY CLAY PART TWO
EA 9781417944576
RF F
WE 740
GC Q00
I3 9781417944576
PC Z99.9
**
IB 1417944560
AV MD 
BI Paperback
AU COLTON, CALVIN
BC CVG
CO UNITED STATES
PD 20040819
TP THE
NP 508
RP 23.95
RI 23.95
RE 23.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI LIFE AND TIMES OF HENRY CLAY PART ONE
EA 9781417944569
RF F
WE 740
GC Q00
I3 9781417944569
PC Z99.9
**
IB 0443072116
AV NYP
BI Hardback
AU BURNAND, KEVIN G.
AU YOUNG ANTHONY E.
AU LUCAS, JONATHAN D.
AU ROWLANDS, BRIAN (PROFESSOR OF SURGERY,QUEEN'S MEDICAL CENTRE NOTTINGHAM)
AU SCHOLEFIELD, JOHN
BC MN
CO UNITED KINGDOM
EI 3 REV ED
IU 1400 ILLS.
PD 20050609
TP THE
NP 1200
RP 125.00
RI 125.00
RE 125.00
DI 276 x 219
PU ELSEVIER HEALTH SCIENCES
YP 2005
RC P
RS PROFESSIONAL & VOCATIONAL
SR MRCS STUDY GUIDES
TI NEW AIRD'S COMPANION IN SURGICAL STUDIES
DE Offers a grounding in different aspects of surgical training. This book can fulfill the needs of candidates taking the MRCS examination, and also be a useful reference for the established surgeon who wishes to keep abreast of modern surgical thought and practice. It also includes key references which appear on each page for quick reference.
EA 9780443072116
RF R
WE 3850
SG 1
GC O00
I3 9780443072116
PC S6.3
**
IB 1417944552
AV MD 
BI Paperback
AU LILLY, WILLIAM SAMUEL
BC HRAB
CO UNITED STATES
PD 20040819
NP 400
RP 21.95
RI 21.95
RE 21.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI ANCIENT RELIGION AND MODERN THOUGHT
EA 9781417944552
RF F
WE 586
GC R00
I3 9781417944552
**
IB 1417944544
AV MD 
BI Paperback
AU JOYNEVILLE, C.
BC HBD
CO UNITED STATES
PD 20040819
NP 380
RP 20.95
RI 20.95
RE 20.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI LIFE AND TIMES OF ALEXANDER I EMPEROR OF ALL THE RUSSIAS PART THREE
EA 9781417944545
RF F
WE 557
GC Q00
I3 9781417944545
**
IB 1417944528
AV MD 
BI Paperback
AU JOYNEVILLE, C.
BC HBD
CO UNITED STATES
PD 20040819
NP 404
RP 21.95
RI 21.95
RE 21.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI LIFE AND TIMES OF ALEXANDER I EMPEROR OF ALL THE RUSSIAS PART ONE
EA 9781417944521
RF F
WE 592
GC Q00
I3 9781417944521
**
IB 141794451X
AV MD 
BI Paperback
AU JORDAN, W. G.
BC HRBM3
CO UNITED STATES
PD 20040819
NP 336
RP 19.95
RI 19.95
RE 19.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI BIBLICAL CRITICISM AND MODERN THOUGHT OR THE PLACE OF THE OLD TESTAMENT DOCUMENTS IN THE LIFE OF TODAY
EA 9781417944514
RF F
WE 495
GC R00
I3 9781417944514
**
IB 1417944501
AV MD 
BI Paperback
AU HOLROYD, CHARLES
BC CV
CO UNITED STATES
PD 20040819
NP 420
RP 21.95
RI 21.95
RE 21.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI MICHAEL ANGELO BUONARROTI
EA 9781417944507
RF F
WE 615
GC Q00
I3 9781417944507
**
IB 0750649534
AV NYP
BI Paperback
AU BENJAMIN
AU JAMES
BC MJQ
CO UNITED KINGDOM
IU 40 ILLUSTRATIONS
PD 20051201
NP 160
RP 32.50
RI 32.50
RE 32.50
PU ELSEVIER HEALTH SCIENCES
YP 2005
RC P
RS PROFESSIONAL & VOCATIONAL
TI ADVANCED OPHTHALMIC INVESTIGATIVE TECHNIQUES
DE This medical text provides a basic understanding of the underlying principles and indications for all the investigative techniques in ophthalmology.
EA 9780750649537
RF R
WE 505
SG 1
GC O00
I3 9780750649537
**
IB 1417944463
AV MD 
BI Paperback
AU CREIGHTON, MANDELL
BC HRB
CO UNITED STATES
PD 20040819
TP A
NP 500
RP 23.95
RI 23.95
RE 23.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI HISTORY OF THE PAPACY FROM THE GREAT SCHISM TO THE SACK OF ROME PART SIX
EA 9781417944460
RF F
WE 728
GC R00
I3 9781417944460
PC Z99.9
**
IB 0306428970
AV MD 
BI Hardback
AU MEHLMANN, ALEXANDER
BC KM
CO NETHERLANDS
PD 19880601
NP 212
RP 69.00
RI 69.00
RE 69.00
DI 234 x 156
PU KLUWER ACADEMIC PUBLISHERS GROUP
YP 1988
RC P
RS PROFESSIONAL & VOCATIONAL
TI APPLIED DIFFERENTIAL GAMES
EA 9780306428975
RF F
WE 477
GC B00
I3 9780306428975
PC S4.2
**
IB 1417944455
AV MD 
BI Paperback
AU CREIGHTON, MANDELL
BC HRB
CO UNITED STATES
PD 20040819
TP A
NP 356
RP 19.95
RI 19.95
RE 19.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI HISTORY OF THE PAPACY FROM THE GREAT SCHISM TO THE SACK OF ROME PART FIVE
EA 9781417944453
RF F
WE 523
GC R00
I3 9781417944453
PC Z99.9
**
IB 0702025534
BI Paperback
AU HYDE, JULIE
AU COOK, MICHAEL J.
BC MBPM
CO UNITED KINGDOM
IU 20 ILLS.
PD 20040209
NP 256
RP 21.99
RI 21.99
RE 21.99
DI 220 x 150
PU ELSEVIER HEALTH SCIENCES
YP 2004
RC U
RS TERTIARY EDUCATION (US: COLLEGE)
SR SIX STEPS TO EFFECTIVE MANAGEMENT S.
TI MANAGING AND SUPPORTING PEOPLE IN HEALTH CARE
DE Focuses on the importance of managing and supporting people in health care services. Human resources are a significant aspect of health care budgets and the attraction of quality staff is a pressing concern. This book addresses this issue and provides a theoretical framework and practical guidance in this aspect of health care management.
EA 9780702025532
RF R
WE 370
SG 1
GC O00
I3 9780702025532
PC S6.0
**
IB 0803987366
BI Paperback
AU ACKROYD, STEPHEN
AU THOMPSON, PAUL
BC JBJG
CO UNITED KINGDOM
EI ILLUSTRATED ED
IU 1
PD 19990330
NP 192
RP 22.99
RI 22.99
RE 22.99
DI 234 x 156
PU SAGE PUBLICATIONS LTD
YP 1999
RC P
RS PROFESSIONAL & VOCATIONAL
TI ORGANIZATIONAL MISBEHAVIOUR
DE This text provides an analysis of forms of organizational subversion, including "absenteeism", humour and the politics of sexuality. The authors examine the interaction between the pursuit of self-interest and the processes of "identity formation" by workgroups.
EA 9780803987364
RF R
WE 295
GC B00
I3 9780803987364
PC S4.1
**
IB 1405116420
BI Paperback
AU BROUGH, HELEN
AU ALKURDI, ROLA
AU NATARAJA, RAM
AU SURENDRANATHAN, AJENTHAN
BC MJW
CO UNITED KINGDOM
PD 20040922
NP 224
RP 15.99
RI 15.99
RE 15.99
DI 216 x 140
PU BLACKWELL PUBLISHING LTD
YP 2004
RC P
RS PROFESSIONAL & VOCATIONAL
SR RAPID
TI RAPID PAEDIATRICS AND CHILD HEALTH
DE A resource in everyday paediatrics and child health practice. It is suitable for medical students preparing for a major exam.
EA 9781405116428
RF R
WE 154
SG 1
GC O00
I3 9781405116428
PC S6.2
**
IB 1417944439
AV MD 
BI Paperback
AU CREIGHTON, MANDELL
BC HRB
CO UNITED STATES
PD 20040819
TP A
NP 372
RP 20.95
RI 20.95
RE 20.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI HISTORY OF THE PAPACY FROM THE GREAT SCHISM TO THE SACK OF ROME PART THREE
EA 9781417944439
RF F
WE 546
GC R00
I3 9781417944439
PC Z99.9
**
IB 1418482269
AV MD 
BI Paperback
AU MULLER, ALEX
BC FGB
CO UNITED STATES
PD 20050616
TP THE
NP 84
RP 8.99
RI 8.99
RE 8.99
DI 229 x 152
PU AUTHORHOUSE
YP 2005
RC G
RS GENERAL (US: TRADE)
TI X FAIRY
EA 9781418482268
RF F
WE 137
GC F05
I3 9781418482268
PC F2.1
**
IB 0595348572
AV MD 
BI Paperback
AU AEBI, ERNST W
BC WTL
CO UNITED STATES
PD 20050615
NP 272
RP 10.93
RI 10.93
RE 10.93
DI 216 x 140
PU IUNIVERSE.COM
YP 2005
RC G
RS GENERAL (US: TRADE)
TI SEASONS OF SAND SAHARA
ST ONE MAN'S QUEST TO SAVE A DYING SAHARA VILLAGE
EA 9780595348572
RF F
WE 350
GC T03
I3 9780595348572
PC T8.5
**
IB 0595348246
AV MD 
BI Paperback
AU BUNCH  PH.D., CHARLES K.
BC VFPB
CO UNITED STATES
IU 1
PD 20050615
NP 128
RP 7.64
RI 7.64
RE 7.64
DI 229 x 152
PU IUNIVERSE.COM
YP 2005
RC G
RS GENERAL (US: TRADE)
TI SOFT BIPOLAR
ST VIVID THOUGHTS, MOOD SHIFTS AND SWINGS, DEPRESSION, AND ANXIETY OF THE MILD MOOD DISORDERS AFFECTING MILLIONS OF AMERICANS
EA 9780595348244
RF F
WE 216
GC O03
I3 9780595348244
PC T9.0
**
IB 0595347916
AV MD 
BI Paperback
AU MCCANN, JEANNE
BC FBC
CO UNITED STATES
PD 20050615
NP 156
RP 7.10
RI 7.10
RE 7.10
DI 229 x 152
PU IUNIVERSE.COM
YP 2005
RC G
RS GENERAL (US: TRADE)
TI REAL LOVE
EA 9780595347919
RF F
WE 239
GC F00
I3 9780595347919
PC F1.1
**
IB 0595345565
AV MD 
BI Paperback
AU ROY, SUJOYA
BC FBC
CO UNITED STATES
PD 20050615
NP 244
RP 8.74
RI 8.74
RE 8.74
DI 229 x 152
PU IUNIVERSE.COM
YP 2005
RC G
RS GENERAL (US: TRADE)
TI FOR GANESH, REMOVER OF OBSTACLES
ST A NOVEL
EA 9780595345564
RF F
WE 364
GC F00
I3 9780595345564
PC F1.1
**
IB 1420852930
AV MD 
BI Paperback
AU JOSEPH, ANDRE
BC CV
CO UNITED STATES
PD 20050610
NP 88
RP 10.49
RI 10.49
RE 10.49
DI 203 x 127
PU AUTHORHOUSE
YP 2005
RC G
RS GENERAL (US: TRADE)
TI LEGEND OF THE STREET FIGHTER
EA 9781420852936
RF F
WE 105
GC Q00
I3 9781420852936
PC T4.0
**
IB 1405114355
BI Paperback
AU OBI, EBUBE
AU BAKER, CARA
AU TEO, DR MARK
AU TEO, JAMES
BC MN
CO UNITED KINGDOM
IU  52
PD 20041221
NP 232
RP 15.99
RI 15.99
RE 15.99
DI 216 x 143
PU BLACKWELL PUBLISHING LTD
YP 2004
RC P
RS PROFESSIONAL & VOCATIONAL
SR RAPID
TI RAPID SURGERY
DE Suitable for medical students preparing for a major exam in clinical surgery or needing a reminder during a clinical attachment.
EA 9781405114356
RF R
WE 364
SG 1
GC O00
I3 9781405114356
PC S6.3
**
IB 1417944420
AV MD 
BI Paperback
AU CREIGHTON, MANDELL
BC HRB
CO UNITED STATES
PD 20040819
TP A
NP 408
RP 21.95
RI 21.95
RE 21.95
DI 229 x 152
PU KESSINGER PUBLISHING CO
YP 2004
RC G
RS GENERAL (US: TRADE)
TI HISTORY OF THE PAPACY FROM THE GREAT SCHISM TO THE SACK OF ROME PART TWO
EA 9781417944422
RF F
WE 597
GC R00
I3 9781417944422
PC Z99.9
**
IB 0001384155
AV R/P
BI Board book
AU KERR, JUDITH
BC YF
CO UNITED KINGDOM
EI NEW ED
IU COLOUR ILLUSTRATIONS
PD 19850328
NP 16
RP 3.99
RI 3.99
RE 3.99
DI 118 x 118
PU HARPERCOLLINS PUBLISHERS
YP 1985
RC JN
RS PRESCHOOL (0-5)
SR COLLINS BABY & TODDLER
TI MOG'S FAMILY OF CATS
DE Readers are introduced to Mog's odd assortment of relatives - from her parents in the farmyard to her aunt who looks after the shop. This family album should delight young children.
EA 9780001384156
RF R
WE 106
SG 3
GC C07
I3 9780001384156
PC Y1.1
**
IB 0001384163
AV R/P
BI Board book
AU KERR, JUDITH
BC YF
CO UNITED KINGDOM
EI NEW ED
IU COLOUR ILLUSTRATIONS
PD 19841108
NP 16
RP 3.99
RI 3.99
RE 3.99
DI 118 x 118
PU HARPERCOLLINS PUBLISHERS
YP 1984
RC J
RS CHILDREN / JUVENILE
SR COLLINS BABY & TODDLER
TI MOG AND ME
DE Mog and Nicky do everything together - they get up, wash, play lots of different games, eat, and finally, sleep. babies and toddlers should enjoy the stories about Mog and Nicky's familiar activities.
EA 9780001384163
RF R
WE 102
SG 3
GC C07
I3 9780001384163
**
IB 142084833X
AV MD 
BI Paperback
AU CHALFIN, JACK
BC FBC
CO UNITED STATES
PD 20050610
NP 268
RP 11.49
RI 11.49
RE 11.49
DI 229 x 152
PU AUTHORHOUSE
YP 2005
RC G
RS GENERAL (US: TRADE)
TI MEMOIRS OF A MOTH
EA 9781420848335
RF F
GC F00
I3 9781420848335
PC F1.1
**END,42
```


```
HEADER,START
0702026174,9780702026171,69.99,25.00,1,,
0237526301,9780237526306,5.99,25.00,9,MD,07/12/11
1857152816,9781857152814,10.99,25.00,4,MD,07/12/11
0091780721,9780091780722,7.99,25.00,11,GXC,
041507875X,9780415078757,16.99,25.00,1,,
0415078768,9780415078764,17.99,25.00,32,,
0415078989,9780415078986,24.99,25.00,1,GXC,
1588262634,9781588262639,17.50,25.00,1,,
0006744958,9780006744955,3.99,20.00,0,MD,07/12/11
0006745113,9780006745112,4.99,20.00,1,,
0006745121,9780006745129,3.99,20.00,1,MD,07/12/11
1402711468,9781402711466,9.99,20.00,7,,
1405120533,9781405120531,24.99,20.00,0,,
0415247691,9780415247696,21.99,20.00,4,MD,07/12/11
1417944579,9781417944576,25.95,20.00,4,MD,07/12/11
1417944560,9781417944569,23.95,20.00,0,,
0443072116,9780443072116,125.00,20.00,89,,
1417944552,9781417944552,21.95,20.00,4,,
1417944544,9781417944545,20.95,20.00,1,,
1417944528,9781417944521,21.95,20.00,1,,
141794451X,9781417944514,19.95,20.00,0,,
1417944501,9781417944507,21.95,0.00,0,,
0750649534,9780750649537,32.50,0.00,1,MD,07/12/11
1417944463,9781417944460,23.95,0.00,1,,
0306428970,9780306428975,69.00,0.00,4,MD,07/12/11
1417944455,9781417944453,19.95,0.00,4,,
0702025534,9780702025532,21.99,0.00,5,,
0803987366,9780803987364,22.99,0.00,0,,
1405116420,9781405116428,15.99,0.00,1,,
1417944439,9781417944439,20.95,0.00,7,MD,07/12/11
1418482269,9781418482268,8.99,0.00,5,MD,07/12/11
0595348572,9780595348572,10.93,0.00,0,,
0595348246,9780595348244,7.64,0.00,1,,
0595347916,9780595347919,7.10,0.00,2,MD,07/12/11
0595345565,9780595345564,8.74,0.00,3,,
1420852930,9781420852936,10.49,0.00,3,,
1405114355,9781405114356,15.99,0.00,3,,
1417944420,9781417944422,21.95,0.00,0,MD,07/12/11
0001384155,9780001384156,3.99,0.00,1,MD,07/12/11
0001384163,9780001384163,3.99,0.00,1,,
142084833X,9781420848335,11.49,0.00,1,,
```
So it's 41 product lines of which I would expect to see just 26 product lines outputted solely based on a condition of NOT having a variable in the 6th column in the second set of code.

Now, if you can apply the pre-condition to only pull I3 (or EA) data lines with a GC code equal to one of the five below from the first set of data to go forward...

GC O00
GC K02
GC Q00
GC F05
GC C00

... I would expect to see just 12 product lines outputted after the condition of not having having a variable in the 6th column in the second set of code.

Hopefully this data will make it easier. Plus as I've constructed it I know exactly what the results should be. I wouldn't know 
that with the huge real files.

thanks,

Jonathan


----------



## Squashman (Apr 4, 2003)

You could have just uploaded the text files.


----------



## Squashman (Apr 4, 2003)

When I run STAGE 1 of your example to just pull the EA codes that have a corresponding GC code I am only getting 11 EA codes as my output. If you are expecting 12, could you tell me which one I am missing.
These are the 11 EA codes I am getting.

```
9780702026171
9780415078986
9781588262639
9780006744955
9780006745112
9780006745129
9780443072116
9780750649537
9780702025532
9781405116428
9781405114356
```


----------



## Squashman (Apr 4, 2003)

You just gave me invalid information again!!!!! I sure am wasting a lot of time. You told me the GC code was always 4 lines after the EA code. It is not! Some times it is the 2nd line after the EA code and some times it is the 3rd line after the EA code.


----------



## eVILRigby (Nov 25, 2003)

Hi Squashman,

yeah, I do apologise for that. My problem is I have been working more than a little blind before getting access to the FTP data on this, which I finally have early this morning. Having downloaded an 8GB file for a little look (in My SQL Express and thanks for the tip on that btw) it is clear that some other variables are most definitely NOT always present. Which isn't what I was advised.

It would have been useful if they had just put blank lines in, or provided better example data in advance. What I can see however, is that the I3 line ALWAYS, without fail, follows the GC line for a product. I do think despite the fact it might mean I miss out on some products in different mediums where the EA is the master product code, at the end of the day, my core business is books. Always has been, always will be. And for that, the I3 code is perfect. Given that I can confirm it ALWAYS follows the GC code, I would suggest using that, unless you have worked around it in some other way.

I realise you must be tearing your hair out by now... I can only apologise. It's just as irritating for me. I can't understand why they I couldn't just sign a non-disclosure agreement and look at the files beforehand. The US company I am thinking about doing a similar project with in about three months time do that so you can actually do your development with all the facts at hand. Makes far far more sense. Anyway, apologies again.

The twelve product codes I would expect to see would be...


```
9780702026171
9781588262639
9780006745112
9781417944569
9780443072116
9781417944545
9781417944521
9781417944507
9780702025532
9781405116428
9781420852936
9781405114356
```
many thanks,
Jonathan


----------



## Squashman (Apr 4, 2003)

8GB file. That is not going to work. Batch file is going to crash if put inside a For Loop. Your computer doesn't have enough memory to load the entire file into memory.

I can't work with the I3 code being after the GC code. Findstr can be tweaked to search across multiple lines but it will always output the first line of the match.

The database file they provide you as a text file is really a poor implementation. I have seen such things before when I worked support for a Tax Software company.

Right now you are basically trying to use the wrong tool for the job. I don't think I can help you with this. You would be better off trying to import this into Microsofts SQL express which I know nothing about.


----------



## eVILRigby (Nov 25, 2003)

Okay Squashman a massive thank you for all your effort anyway. I think you're probably right, it's not necessarily the right away to go about it, particularly with this data. They were actually honest enough to admit they are running some legacy systems and that is probably why the data is why it is. But please don't feel you've wasted your time. As often as not, working out how not to do something, and why you shouldn't do it that wat, is just as important in getting to a final solution. I think maybe I was thinking there was an easy solution with batch files based on what I'd seen them do before, but I can understand that isn't the case. I will have a crack at it with My SQL, now I know I can actually get the file open. 
Again, many thanks, 
Jonathan


----------



## Squashman (Apr 4, 2003)

Can they guarantee you that every product has an EA and GC code?
See if this will run on your data.

```
findstr /b /C:"EA " /C:"GC " Input.txt>>Output.txt
```
Change input.txt to whatever the name of your text file is. This should in theory pull all the EA and GC codes in order to one file. First in First out. They will each still be on a separate line but it would make the file more manageable to what you need.


----------

