# Rookie Unix Shell scripter



## kmel (Sep 10, 2009)

I need to find the matches between two files based on a key (ID), that is, read the ID from one file, find instances of the ID on the second file, when found on the second file, print the entire record. Here's my scripting:

for FILEN in ID.txt; do
ID=`nawk '$1' $FILEN`
grep ${ID} /home/user/match_file.txt >> raw_match.txt
done

it will correctly output the first ID that it encounters, but then stops with the following:

grep: 0652-033 Cannot open 4841
grep: 0652-033 Cannot open 4831
grep: 0652-033 Cannot open 4851
grep: 0652-033 Cannot open 3257
grep: 0652-033 Cannot open 4790
grep: 0652-033 Cannot open 4831
grep: 0652-033 Cannot open 5118
grep: 0652-033 Cannot open 4833
grep: 0652-033 Cannot open 4800

those last 4 digits are the actual ID's from the ID file.


----------



## Squashman (Apr 4, 2003)

don't think you would need the curly braces.
What shell are you using?


----------



## kmel (Sep 10, 2009)

kshell

tried it without curlys, same result


----------



## Squashman (Apr 4, 2003)

Is ID.txt nothing more than the 4 digits numbers you are looking for listed out 1 one on each line?

I have never used Korn Shell and I know some times the syntax for things are different.


----------



## kmel (Sep 10, 2009)

yes, ID.txt is a list of ten unique 4-digit ID's.

match_file.txt has appx 3000 records.

Each ID from ID.txt is in match_file.txt anywhere from 8-12 times each.


----------



## Squashman (Apr 4, 2003)

You wouldn't need to use AWK or the for loop then.

Just use the -f option in grep.
grep -f ID.txt /home/user/match_file.txt >> rawmatch.txt


----------



## kmel (Sep 10, 2009)

great idea, but it didn't work. Now I'm not even getting the first record to return.


----------



## Squashman (Apr 4, 2003)

The syntax looks ok as far as both scripts go as far as I know. I have never worked in a korn shell but I wouldn't think that would change the syntax of grep. I would really need to see what the two files looked like to be able to troubleshoot it.


----------



## kmel (Sep 10, 2009)

ID.txt = 

4791
4841
4831
4851
3257
4790
4831
5118
4833
4800

Match File = 

4791 John Doe 
4831 Jerry Doe
4851 Jane Doe
3257 Joe Doe
4790 Justin Doe
4831 Justine Doe
5118 Justina Doe
4833 Jessica Doe
4800 Jesse Doe
4791 John Doe 
4841 Jack Doe
4831 Jerry Doe
4851 Jane Doe
3257 Joe Doe
4831 Justine Doe
5118 Justina Doe
4833 Jessica Doe
4800 Jesse Doe
4791 John Doe 
4841 Jack Doe
4831 Jerry Doe
4851 Jane Doe
3257 Joe Doe
4790 Justin Doe
5118 Justina Doe
4833 Jessica Doe
4800 Jesse Doe

Some of these names are on this list 3 times apiece, some only two. But everything on ID.txt is on Match_file more than once


----------



## Squashman (Apr 4, 2003)

I still maintain a shell account over at freeshell. This worked just fine on my shell account.



> $ ls *.txt
> 3.1.x credits.txt htpasswd.pl.txt match.txt readme.txt id.txt newline.txt
> $ grep -f id.txt match.txt >> matchnew.txt
> $ cat matchnew.txt
> ...


----------



## Squashman (Apr 4, 2003)

And for good measure my shell is


> $ echo $SHELL
> /bin/ksh


----------



## Squashman (Apr 4, 2003)

Where did your text files come from. Are they true unix text files with just LF terminator. If the have CR\LF your script may die.


----------



## kmel (Sep 10, 2009)

this worked! Turns out I had to do some minor reformatting to the ID file. Thanks for sticking with me.

You wouldn't happen to know a grep parm that will keep it from exporting the path along with the records would you?


----------



## Squashman (Apr 4, 2003)

kmel said:


> this worked! Turns out I had to do some minor reformatting to the ID file. Thanks for sticking with me.
> 
> You wouldn't happen to know a grep parm that will keep it from exporting the path along with the records would you?


Don't know. It isn't doing that on the system I working. Again I am using the Korn Shell on NetBSD. Not sure why that would do that. If you look at the MAN page for GREP you will see this.


> -H, --with-filename
> Print the filename for each match.
> 
> -h, --no-filename
> ...


grep -f id.txt match.txt /path/matchnew.txt

This works just like what I posted above. It doesn't and shouldn't output anything more than what it finds in the match.txt file.

How did you have the ID file formatted before?


----------



## ghostdog74 (Dec 7, 2005)

this is simple to do with awk

```
awk 'FNR==NR{a[$1];next}($1 in a)'  ID.txt file
```
FNR means the records that are processed in the current file. NR means number of the records in total being processed. 
FNR==NR means processing the first file being passed to awk (ie ID.txt).
a[$1] means putting the IDs into array "a". awk automatically switch processing to the next file when FNR!=NR. (ie after the first file is processed). 
( $1 in a ) means print everything from "file" is its first field is contained in array "a"

Nobody explains it better than the official documentation of gawk, so i would still suggest to whom is interested to read the manual.


----------

