# Append to end of line



## Squashman

Batch files certainly aren't my strong suit. I can never figure out the linux bash equivalent in DOS.

I need to append some text to the end of every line in a file. In theory I guess I would like to just append the filename to the end of every line within the file. This is real easy with SED or AWK on Linux but can't seem to figure out the dos equivalent.


----------



## devil_himself

Greetings Squashman

Try This



Code:


@echo off
set addtext=hello!
if exist c:\tmpfile.txt del /q c:\tmpfile.txt
for /f "delims=" %%l in (c:\myfile.txt) Do (
      echo %%l %addtext% >> c:\tmpfile.txt
)
del /q c:\myfile.txt
ren c:\tmpfile.txt myfile.txt


----------



## TheOutcaste

devil_himself said:


> echo %%l %addtext% >> c:\tmpfile.txt


Note that the space between %%l and %addtext% will insert a space. If you don't want the space added, run them together:
echo %%l%addtext% >> c:\tmpfile.txt

i came up with something similar, allowing for filenames/paths with spaces:


Code:


@echo off
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:_t0 is the path to where the file you want to work on is
:located and _t1 is the file name. _t3 is the string you want to tack on
:You can always use command line parameters as well by using
:Set _t0=%1, Set _t1=%2, and Set _t3=%3
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Set _t0=c:\Test1
Set _t1=File1.txt
Set _t3=Add this string
PUSHD %_t0%
If EXIST tmp.txt del tmp.txt
For /F "usebackq delims=" %%A in ("%_t1%") do echo %%A%_t3% >>tmp.txt
del "%_t1%"
rename tmp.txt "%_t1%"
For %%A in (0 1 2) do Set _t%%A=
POPD

Problem is with both of these, any blank lines are removed from the file, so this:



Code:


Line1

Line3

becomes this



Code:


Line1 hello!
Line3 hello!

Instead of this


Code:


Line1 hello!
 hello!
Line3 hello!

Still trying to figure that part out, and/or a way to leave in the blank lines without adding anything, so the result will be this:


Code:


Line1 hello!

Line3 hello!

Also tried *For /F "usebackq delims=" %%A in (`type "%_t1%"`) do...* as the type command does send the blank lines to the screen, but the for statement ignores them.

I've got an idea though....

Jerry


----------



## devil_himself

I came up With This



Code:


@echo off
setlocal 
set addtext=hello!
for /f "delims=" %%a in (myfile.txt) do (echo/|set /p =%%a%addtext% & echo\ & echo\) >>new.txt

text file



Code:


Line1

Line3

output 


Code:


Line1hello!  

Line3hello!

Edit :- Sorry It Adds A Blank Line if There's No Blank Line ...


----------



## devil_himself

Ok ..This Should do it



Code:


:bof

    @echo off
    setlocal enabledelayedexpansion
    set addtext=hello!

:init

    for /f "delims=" %%a in ('findstr /n /v /c:"&$&$&$123" myfile.txt') do (
      set str=%%a
      set /a LineCount+=1
      set /a mod = LineCount/10 + 2
      call :PROCESS "!str!" !mod!
     )
  endlocal & goto :eof

  :PROCESS
  setlocal
  set str=%~1
  set offset=%2
  set str=!str:~%offset%!
  echo.!str! %addtext%
  endlocal & goto :eof

:eof


----------



## TheOutcaste

Ah, so many ways to do things. Neat approach to stripping the leading line numbers from the findstr command, devil_himself.

Here's what I've come up with. One for statement will add the string to every line, the other will leave blank lines blank. Using the 3rd parameter on the command line chooses to add the string to blank lines.

It the last line in the file does not have a carriage return at the end, this will add one. Feature of the echo command. I've got a batch file someplace that will strip that last carriage return, but can't find it at the moment.



Code:


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:Adds a text string onto the end of every line in a file
:Save code as a .bat or .cmd file, eg, addon.cmd
:usage: addon [drive:][path]filename string_to_add [flag]
:If filename or string_to_add contain spaces, they must be in quotes
:The flag parameter can be any character. If not specified, the string_to_add
:will not be added to blank lines.
:example: addon myfile.txt " Hello World" Y
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
@echo off
If [%1]==[] If [%2]==[] echo.Usage: & echo %0 [drive:][path]filename string_to_add [flag] &goto:eof
Set _t0=%~dp1
Set _t1=%~nx1
Set _t2=%2
Set _t2=%_t2:"=%
PUSHD %_t0%
copy file1.txt "file 1.txt"
If EXIST _f0.txt del _f0.txt
If [%3]==[] goto _noblank
For /F "usebackq skip=2 tokens=1* delims=]" %%A in (`find /v /n "" "%_t1%"`) do echo %%B%_t2%>>_f0.txt
goto _cleanup
:_noblank
For /F "usebackq skip=2 tokens=1* delims=]" %%A in (`find /v /n "" "%_t1%"`) do (if [%%B]==[] (echo.>>_f0.txt) Else echo.%%B%_t2%>>_f0.txt)
:_cleanup
del "%_t1%"
rename _f0.txt "%_t1%"
For /L %%A in (0,1,3) do Set _t%%A=
POPD

Jerry


----------



## Squashman

You both are so gung ho I guess I will tell you what my ultimate plan is. I do alot of data processinng on a mainframe and we are sometime limited to what we can do with some of the software we have on the mainframe. So I sometimes like to preprocess the file before I put them on the mainframe to do our regular data processing.

Alot of the files we get sometimes don't have an identifying keycode on them to tell us what file is what. So I thought I could just append the filename to every record so I know what original input file the record came from. When I say record I mean a line of text. We refer to them as records because it is information about a person or company.

So on some jobs I get about 7 different files in that eventually have to be merged into one file after I have done a couple of different changes to it. But I would like to merge them all together on my PC before I put the file on the mainframe. But I will need to identify what lines(records) came from each file. So that is why I would need to append the filename to the end of each line. As it is parsing each file it will append the appropriate filename to the end of each line(record).

Here is one catch though. Each file has a header line at the beginning of each file. It identifies where each field starts within the line. I only need the header record once and it would be the first line in the output of the combined files.

I figured I could do a 
set /P _Header=<somefile.txt

This would give me the first line of that file but I relly want the batch file to be generic enough that I will never have to enter in file names. I just want the batch file to parse all the files in the current directory. I just want to drop the batch file into the directory and execute it. So how would I do that without knowing the filenames and only out put the header record once from the first file it finds.

The other idea behind doing that command was to use it against the find command to kind of do a reverse find on the file. That way it would output all the records execept the header of every file.

find /V "%_Header%" somefile.txt >> combined_files.txt

So I now need to add the information I just talked about to your existing batch files you did for me.

If you guys want to know how easy it is to do this with sed....

sed -e "s/.$/filename.txt/g" filename.txt (This will add filename.txt to the end of ever line in a file.) I believe the .$ means find the CRLF(carriage return line feed). In unix it is just the dollar sign because they just use Line Feeds.

You can download native ports of alot of unix utilities from Sourceforge. You can use them in batch files just like any other dos command. 
http://unxutils.sourceforge.net/

I find using these things alot more useful sometimes then trying to figure out how to do it in natively in dos.

Just an FYI, my files will never have a whole line full of spaces. Sometines the file come in with a End of File marker but that is about it. The end of file is usually just a line feed or a CRLF.


----------



## devil_himself

> Each file has a header line at the beginning of each file. It identifies where each field starts within the line. I only need the header record once and it would be the first line in the output of the combined files.


Sorry , I Don't Understand ... Can You Explain Giving An Example ...

Or Do You Mean To Say

Different Files May Have Different Header .. Just Extract The First Line From The First File Being Processed And Place It At The Top Of The Output File ...

Then Append The File Names To The End Of The Every Line ! And Merge All The Log Files ?


----------



## Squashman

The header records are the same for all files. I just need the header record once. It should be the first record of the combined output. 

Append the filename to the end of every record and merge all data files together.

First file will output the header record and all remaining records. Each line will have the filename it belongs to appended to the end of every line.

Name,Address,City,ST (This is the header record)
John Doe, 535 Brule Rd, San Jose, CA,filename1.txt
Jim Andersen, 512 Grand Ave, San Diego, CA,filename1.txt
Jane Smith, 423 Maple St, Seattle, WA,filename2.txt (This would be the second line of file 2. Skips the header record because I only need it once)


----------



## devil_himself

Try This



Code:


@echo off
setlocal enabledelayedexpansion
set tp=c:\tp.txt
if exist c:\tmpfile.txt del /q c:\tmpfile.txt
(for /f "tokens=*" %%a in ('dir /b /a-d *.txt') do (
   (for /f "tokens=*" %%i in ('cmd /c for /f "tokens=*" %%j in (%%a^) do echo %%j ^^^& exit') do set f=%%i)
     (for /f "skip=1 usebackq delims=" %%b in ("%%~dpnxa") do (
     echo %%b,%%~nxa >> "%tp%"
  ))
))

echo %f% >c:\tmpfile.txt
type "%tp%" >>c:\tmpfile.txt
del /q "%tp%"


----------



## Squashman

I tried running it last night before I left from work but I got an erro that said tp.txt does not exist.

But I was able to figure it out with some of the existing Unix Utilites that have been ported to Windows.



Code:


:: delete any existing header.txt file
if exist header.txt del /q header.txt

:: delete any existing combined_output.dat files
if exist combined_ouput.dat del /q combined_output.dat

:: create header record to reverse match against all files and output as first record to combined_output file
:: All Input files must end in the file extension CHR
head -q -n1 *.CHR | head -n1 > header.txt

:: Add Filename Field to end of header line
:: Also Adds the delimiter and surround characters around the Field name
cat header.txt | sed -e "s/.$/:\"Filename\"/g" > combined_output.dat

:: Append the Filename to the end of each line of every file and combine all files together
:: Also adds the delimter and surround character around the variable
:: All Input Files must end in the File extension CHR
:: FOR /F "tokens=*" %%A IN ('dir /b /a-d *.CHR') DO cat %%A | grep -v -f header.txt | sed -e "s/.$/:\"%%A\"/g" >> combined_output.dat
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.CHR') DO cat %%A | grep -v -f header.txt | sed "s/.$/\":\"%%A\""/" >> combined_output.dat


:: Clean UP
del /q header.txt

Had one heck of a time with the second SED statement that appends the filename to end of every line. For some reason I had to escape the quotes a couple of times. I didn't need to do it on the header record but for some reason it made me do it on the other lines. I kept the original statement in there for reference, I still don't understand why it doesn't work.


----------



## devil_himself

So Are You Planning To Use The "Unix Utilities" Or Batch File

I tested the batch its running fine ... Try Running It with "Echo On" and Two Lines In Two Text File then Copy And Paste The Output From The Command Window .


----------



## TheOutcaste

devil_himself's file seems to work fine for me as well. One comment, it does tack on a space after the filename; If that's a problem, just need to remove the space in this line:
echo %%b,%%~nxa >> "%tp%"
so it looks like this:
echo %%b,%%~nxa>> "%tp%"

Here's what I came up with:



Code:


@Echo off
::Set Output file name here
Set _f{1}=Combined.txt
If EXIST "%_f{1}%" Del "%_f{1}%"
::Gets first filename in alphabetical order, excluding the batch file
For /F "tokens=*" %%A In ('dir /b /a-d /o:-n ^|Find /I /V "%~nx0"') Do Set _t0=%%A
::Read Header from first line in first file
For /F "usebackq tokens=*" %%A In (`Find /V /N "" "%_t0%" ^|Findstr /B /C:"[1]"`) Do Set _t1=%%A
::Output Header to temp file
>%temp%\_f{0} Echo.%_t1:~3%
::Read lines from each file excluding the batch file and excluding the header line
::Output to temp file adding ,filename to end of line
For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~nx0"') Do (
For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Find /I /V "%_t1%"`) Do @Echo %%C,%%A>>%temp%\_f{0}
)
Move %temp%\_f{0} "%_f{1}%"
For /L %%A In (0,1,1) Do Set _t%%A=

Jerry


----------



## Squashman

I am using those Unix Utilites in a batch file. They are quite powerful as you can see from the small amount of code I had to write vs the the nice batch files you guys wrote. I guess I would like to use what you guys wrote. I have a hard time getting anyone to change at work. I will try testing it again when I get back to work. But all I can tell you is that it said it could not find that text file. I think it is because all my data files end in the file extension .CHR. Would that wreak havoc with the stuff you guys wrote. I would assume so. I was trying to figure out a way to write mine without having to know the file extension of the data files but couldn't figure it out.

I probably should mention that the filename needs to be delimted and the header record needs the word Filename appended to the end as well. The header record and the data are colon delimited and have quotes surrounding the data.

The header record and data come in like this.

"Name":"Address":"City":"State":"ZipCode"
"John Doe":"503 Grand Ave":"San Jose":"CA":"93456"

Need it to look like this:
"Name":"Address":"City":"State":"ZipCode":"Filename"
"John Doe":"503 Grand Ave":"San Jose":"CA":"93456":"File1.txt"


----------



## TheOutcaste

My script doesn't care about the extension, devil's looks for **.txt* -- changing that to **.chr* in the For statement would fix that. Don't know how his would work with quoted data, or adding a comma, but shouldn't take much of a change.

Tweaked mine to work with quoted data strings and using the colon, and this should work. Be sure to change the output file name (part in red) to use the extension you want:



Code:


@Echo off
::Set Output file name here
Set _f{1}=[COLOR="Red"]Combined.txt[/COLOR]
If EXIST "%_f{1}%" Del "%_f{1}%"
::Gets first filename in alphabetical order, excluding the batch file
For /F "tokens=*" %%A In ('dir /b /a-d /o:-n ^|Find /I /V "%~nx0"') Do Set _t0=%%A
::Read Header from first line in first file
For /F "usebackq tokens=*" %%A In (`Find /V /N "" "%_t0%" ^|Findstr /B /C:"[1]"`) Do Set _t1=%%A
::Output Header to temp file
>%temp%\_f{0} Echo.%_t1:~3%:"Filename"
::Read lines from each file excluding the batch file and excluding the header line
::Output to temp file adding :"filename" to end of line
For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~nx0"') Do (
For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Findstr /I /V /B /C:"[1"`) Do @Echo %%C:"%%A">>%temp%\_f{0}
)
Move %temp%\_f{0} "%_f{1}%"
For /L %%A In (0,1,1) Do Set _t%%A=

HTH

Jerry


----------



## devil_himself

Now This Script Doesn't Care About The File Type



Code:


@echo off
setlocal enabledelayedexpansion
set tp=c:\tp.txt
if exist c:\tmpfile.txt del /q c:\tmpfile.txt
(for /f "tokens=*" %%a in ('dir /b /a-d') do (
   if not %%a==%~nx0 (
   (for /f "delims=" %%i in ('cmd /c for /f "delims=" %%j in (%%a^) do echo %%j ^^^& exit') do set f=%%i)
     (for /f "skip=1 usebackq delims=" %%b in ("%%~dpnxa") do (
     echo %%b:"%%~nxa" >> "%tp%"
    )
  ))
))

echo %f%:"Filename">c:\tmpfile.txt
type "%tp%">>c:\tmpfile.txt
del /q "%tp%"

file1.txt


Code:


"Name":"Address":"City":"State":"ZipCode"
"John Doe":"503 Grand Ave":"San Jose":"CA":"93456"

file2.txt


Code:


"Name":"Address":"City":"State":"ZipCode"
"Jim Andersen":"512 Grand Ave":"San Diego":"WA":"93456"

output.txt


Code:


"Name":"Address":"City":"State":"ZipCode" :"Filename"
"John Doe":"503 Grand Ave":"San Jose":"CA":"93456":"file1.txt" 
"Jim Andersen":"512 Grand Ave":"San Diego":"WA":"93456":"file2.txt"


----------



## devil_himself

if you want the output to be like this



Code:


"Name"         :"Address"      :"City"         :"State"        :"ZipCode"      :"Filename"      
"John Doe"     :"503 Grand Ave":"San Jose"     :"CA"           :"93456"        :"file1.txt"     
"Jim Andersen" :"512 Grand Ave":"San Diego"    :"WA"           :"93456"        :"file2.txt"

then this should do it 


Code:


@echo off & setlocal enableextensions enabledelayedexpansion
  for /f "tokens=1-6 delims=:" %%a in ('type Myfile.txt') do (
   set l1=%%a               .
   set l2=%%b               .
   set l3=%%c               .
   set l4=%%d               .
   set l5=%%e               .
   set l6=%%f               .
   echo !l1:~0,15!:!l2:~0,15!:!l3:~0,15!:!l4:~0,15!:!l5:~0,15!:!l6:~0,15! >>newfile.txt
   )
  endlocal & goto :EOF


----------



## Squashman

Thanks guys. I will give it a try when I get back to work on Monday. I don't want any spaces in between the delimiters but that is pretty cool. I think I might have a use for that on another order I coordinate. How does it know to expand it out to the longest variable of each field?


----------



## devil_himself

How does it know to expand it out to the longest variable of each field?

>>> 15 characters
space after the variable is what makes it work



Code:


set l1=%%a               .

i think we can also expand it by finding the length of the longest record !


----------



## Squashman

I gave Devil's a test at home. I noticed in your example above that it put a Space in the Header record after Zipcode. I can't have any spaces in between the delimiters. 
"Name":"Address":"City":"State":"ZipCode" :"Filename"
"Name":"Address":"City":"State":"ZipCode":"Filename"

Also need the output written back to the directory that the input files are located. These files need to stay on the Network file Server.


----------



## Squashman

TheOutcaste said:


> My script doesn't care about the extension, devil's looks for **.txt* -- changing that to **.chr* in the For statement would fix that. Don't know how his would work with quoted data, or adding a comma, but shouldn't take much of a change.
> 
> Tweaked mine to work with quoted data strings and using the colon, and this should work. Be sure to change the output file name (part in red) to use the extension you want:
> 
> 
> 
> Code:
> 
> 
> @Echo off
> ::Set Output file name here
> Set _f{1}=[COLOR="Red"]Combined.txt[/COLOR]
> If EXIST "%_f{1}%" Del "%_f{1}%"
> ::Gets first filename in alphabetical order, excluding the batch file
> For /F "tokens=*" %%A In ('dir /b /a-d /o:-n ^|Find /I /V "%~nx0"') Do Set _t0=%%A
> ::Read Header from first line in first file
> For /F "usebackq tokens=*" %%A In (`Find /V /N "" "%_t0%" ^|Findstr /B /C:"[1]"`) Do Set _t1=%%A
> ::Output Header to temp file
> >%temp%\_f{0} Echo.%_t1:~3%:"Filename"
> ::Read lines from each file excluding the batch file and excluding the header line
> ::Output to temp file adding :"filename" to end of line
> For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~nx0"') Do (
> For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Findstr /I /V /B /C:"[1"`) Do @Echo %%C:"%%A">>%temp%\_f{0}
> )
> Move %temp%\_f{0} "%_f{1}%"
> For /L %%A In (0,1,1) Do Set _t%%A=
> 
> HTH
> 
> Jerry


Thanks. That worked just fine. I will test it out on our live data when I get to work this afternoon. I will have to walk thru all the code to make sure I understand it all, just in case I ever need to tweak it. I am going to do a little write up on the code you guys provided, whatever I don't understand I will post back in here and maybe you can explain it a little better. If I could throw a few echo and pause statements in there to see what each statement is doing.


----------



## TheOutcaste

Be glad to answer any questions. I guess I should use more descriptive temp variable and temp file names, as it would make it easier to follow, but using numbered ones makes clean-up real easy. Not a lot of them used in this script though.

Would be nice if the For statement would let you use something other than just a letter.

For the main output loop, instead of %%A, %%B, %%C, using StrFileName, StrRecordInput, StrRecordOutput would make it easier to follow.

Also just noticed I left an *@echo* in that line, the @ is not needed and can be removed. I tend to use @Echo a lot in testing to suppress the Echo command line and just leave the output:


Code:


For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~nx0"') Do (
For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Findstr /I /V /B /C:"[1"`) Do [COLOR="Red"]@[/COLOR]Echo %%C:"%%A">>%temp%\_f{0}
)

Jerry


----------



## TheOutcaste

Squashman said:


> I gave Devil's a test at home. I noticed in your example above that it put a Space in the Header record after Zipcode. I can't have any spaces in between the delimiters.
> "Name":"Address":"City":"State":"ZipCode" :"Filename"
> "Name":"Address":"City":"State":"ZipCode":"Filename"
> 
> Also need the output written back to the directory that the input files are located. These files need to stay on the Network file Server.


Hope Devil_himself doesn't mind me jumping in, but to remove the space, you need to remove the space between *%j* and *^^^* in this line:


Code:


(for /f "delims=" %%i in ('cmd /c for /f "delims=" %%j in (%%a^) do echo [COLOR="Red"]%%j ^^^[/COLOR]& exit') do set f=%%i)

To keep the files in the same directory, remove *c:\* from in front of the tp.txt and tempfile.txt filenames.

If your file names have spaces (file 1.chr instead of file1.chr) you will get an error "*The system cannot find the file file* on each filename with a space. If ALL the files have spaces in the names, the header line won't get read, but if at least one filename has no spaces, it will read the header line from that file.
That can be fixed with the changes shown in red below:


Code:


(for /f "delims=" %%i in ('cmd /c for /f "[COLOR="Red"]usebackq [/COLOR]delims=" %%j in ([COLOR="Red"]"[/COLOR]%%a[COLOR="Red"]"[/COLOR]^) do echo %%j ^^^& exit') do set f=%%i)

Jerry

Edit: Just found out that since Devil's file doesn't delete tp.txt when it starts, if you remove the c:\ from the *set tp=c:\tp.txt* line and run this file twice, it will never end, as it will read the tp.txt file from the previous run. As that is also the output file, it may never end. So leave that line alone.
off to delete a 2,000,000+ line file....


----------



## Squashman

Where could I put in an echo statement to see what file it is working on at the moment? Alot of the files I work with are really big. Sometimes millions of records. Was hoping I could see the progression of files so I know how far it is along.


----------



## Squashman

Houston we have a problem.
I ran Outcaste's batch file on some data I have here at work and it took an eternity to run. Roughly about 3 hours for it to run. It then didn't output all the records from the input files. I should have had roughly 572,458 lines but I only ended up with 409,667. Not sure how I can troubleshoot this. 

The other weird thing is that the software I use to view large files is having a heck of a time handling the output file. It takes forever to open it and this software is designed to open large files. It chunks them into smaller sections and shows you one chunk at a time. It was taking about 15 seconds to go from chunk to chunk when it should only take about 2.

I ran the data thru my script with the unix utilites and it doesn't seem to have any problems handling the output file from that and I also got all the output records.

Outcaste what can we do to debug your batch file? I unfortunately can't send you our customer data to test your batch file with so we are going to have to do it all on my end. Still hoping I can use your batch file or Devil's to do this.

I am going to test Devil's batch file next. Will let you know how that one comes out.


----------



## Squashman

Well here is some more results. Devil's batch file took about 32 minutes to run thru those 500,000 lines.

Then I ran my script. I put time stamps into a log file when each batch file started and stopped.

Devil's
Mon 05/05/2008 21:07:53.31
Mon 05/05/2008 21:39:11.28
Squashman's
Mon 05/05/2008 21:51:33.20
Mon 05/05/2008 21:51:43.56

I really can't explain why mine only takes 10 seconds. It is beyond my comprehension.

I ran it with the data on the Network drive vs my hard drive and it took about 4 minutes.


----------



## devil_himself

hmm .. let me see if i can tweak it .. i think the nested for loops is the problem


----------



## devil_himself

Squashman said:


> Where could I put in an echo statement to see what file it is working on at the moment? Alot of the files I work with are really big. Sometimes millions of records. Was hoping I could see the progression of files so I know how far it is along.


Run The Script With "ECHO ON" << First Line

It Also Helps To Understand How The Batch Works ... But The Command Shell Only Displays A Limited Amount of Text


----------



## devil_himself

Can You Test This Script . At This Moment it Will Not Output the First Line ...



Code:


@echo off
setlocal enabledelayedexpansion
for /f "tokens=*" %%a in ('dir /b /a-d *.chr') do (
        for /f "skip=1 usebackq delims=" %%b in ("%%~dpnxa") do (
     echo %%b:"%%~nxa" >> Output.txt
  )
)

How Much Large Are The Text Files ? 
Your Batch Uses SED - Stream Editor ..which is made for text manipulation .. i think thats why it takes less time


----------



## TheOutcaste

Squashman said:


> Where could I put in an echo statement to see what file it is working on at the moment? Alot of the files I work with are really big. Sometimes millions of records. Was hoping I could see the progression of files so I know how far it is along.


You can add an Echo command right after the *DO* portion of the statement in the for loop if you want to see the For variable as it progresses. Just add @Echo %%X &&
after the DO part (be sure to leave a space after the *DO*)


Squashman said:


> Houston we have a problem.
> I ran Outcaste's batch file on some data I have here at work and it took an eternity to run. Roughly about 3 hours for it to run. It then didn't output all the records from the input files. I should have had roughly 572,458 lines but I only ended up with 409,667. Not sure how I can troubleshoot this.
> 
> The other weird thing is that the software I use to view large files is having a heck of a time handling the output file. It takes forever to open it and this software is designed to open large files. It chunks them into smaller sections and shows you one chunk at a time. It was taking about 15 seconds to go from chunk to chunk when it should only take about 2.
> 
> I ran the data thru my script with the unix utilites and it doesn't seem to have any problems handling the output file from that and I also got all the output records.
> 
> Outcaste what can we do to debug your batch file? I unfortunately can't send you our customer data to test your batch file with so we are going to have to do it all on my end. Still hoping I can use your batch file or Devil's to do this.
> 
> I am going to test Devil's batch file next. Will let you know how that one comes out.


I was going to say if the files are large, or there are a large number of them, Devil's code will be much more efficient. Mine is so slow because it basically reads each file twice: once to generate a numbered list of all records, then a second time to remove the header line. I started that way to cover a more generic situation where the header line may be repeated every XXX number of records, such as a file ready to print with the header on each page. I just modified that to exclude the line that is numbered "one" -- should have just skipped the first line and not used the Find and findstr statements.
Plus I read the entire first file just to get the header line (once with find, then with findstr), then read it again to actually process the file, so it was getting read 4 times.
I can change those, but then the only difference between devil's file and mine are our choice of variable names.

The missing lines are a typo in my file -- There is a missing \*]*\ in the 3rd line from the end:


Code:


For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Findstr /I /V /B /C:"[1"`) Do @Echo %%C:"%%A">>%temp%\_f{0}
should be 
For /F "usebackq skip=2 tokens=1* delims=]" %%B In (`Find /V /N "" "%%A" ^|Findstr /I /V /B /C:"[1[COLOR="Red"]][/COLOR]"`) Do @Echo %%C:"%%A">>%temp%\_f{0}

The editor makes it look like there is a space between the ] and the " because I colored it red -- there is no space.

I was using find to number lines (adds [number] to the start of each line), then findstr to exclude line 1; with out the last bracket, it excludes every line that starts with 1, so lines 10-19, 100-199, 1000-1999, etc were dropped, which would account for 111,110 lines out of the 162,791 missing lines. Not sure about the rest of the missing lines.
I'd specifically created files with 20 lines to check that, but never noticed the change in the output file when the *]* got dropped. You could add that in and see if you get all the lines, but it will take even longer.
This will be much more efficient, which is the same way devil's file processes the records:


Code:


For /F "usebackq skip=1 delims=" %%B In ("%%A") Do Echo.%%B:"%%A">>%temp%\_f{0}




Squashman said:


> Well here is some more results. Devil's batch file took about 32 minutes to run thru those 500,000 lines.
> 
> Then I ran my script. I put time stamps into a log file when each batch file started and stopped.
> 
> Devil's
> Mon 05/05/2008 21:07:53.31
> Mon 05/05/2008 21:39:11.28
> Squashman's
> Mon 05/05/2008 21:51:33.20
> Mon 05/05/2008 21:51:43.56
> 
> I really can't explain why mine only takes 10 seconds. It is beyond my comprehension.
> 
> I ran it with the data on the Network drive vs my hard drive and it took about 4 minutes.


A Command Prompt (aka DOS) is going to be much slower. It was never really meant to deal with the _contents_ of files, just the files themselves. The batch file commands have to be interpreted, and the external commands like find have to be called and passed parameters, whereas SED will have machine language routines to do it's manipulation all internally, which can easily be hundreds of times faster as you can see.

I'm also just guessing that your software may be taking so long with the output file because DOS uses Carriage Return/LineFeed (CR/LF) to end lines. Most *nix systems just use LF. Find and For will read in lines terminated with just LF, but when the filename is added to each record, and the line written to the combined output file, each line will end with CR/LF instead of just the LF. If your software has to convert the CR/LF to just LF before displaying each chunk, it will slow it considerably.

If you can hard code the header line in the batch file, devil's code above or the one I show below will be about the fastest you can get in a batch script.
Devil's method of reading the header only takes about 0.55 to 0.60 seconds for a 700,000 line (avg 88 char/line) file on my system. That shouldn't change on a per file basis, so hard coding the header would only shave about one minute off the time to process about 100 large files
A visual basic script _might_ be a bit faster, but I'm not at all proficient writing those.

Need to pick either the Red or the Blue lines depending on if you want to process only the one extension, or all files except the batch.



Code:


@Echo off
::Set Output file name here
Set _f{1}=Combined.txt
If EXIST "%_f{1}%" Del "%_f{1}%"
::Output Header to temp file
>%temp%\_f{0} Echo."Name":"Street Address":"City":"St":"Zip":"Filename"
::Read lines from each file excluding the batch file and excluding the header line
::Output to temp file adding :"filename" to end of line
[COLOR="Blue"]
::This line processes every file in the folder except this batch file
For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~nx0"') Do (
  For /F "usebackq skip=1 delims=" %%B In ("%%A") Do Echo.%%B:"%%A">>%temp%\_f{0}[/COLOR]
[COLOR="DarkRed"]
This line processes only files with a .CHR extension
For /F "tokens=*" %%A In ('dir /b /a-d /o:n "*.chr"') Do (
  For /F "usebackq skip=1 delims=" %%B In ("%%A") Do Echo.%%B:"%%A">>%temp%\_f{0}[/COLOR]
)
Move %temp%\_f{0} "%_f{1}%"
For /L %%A In (0,1,1) Do Set _t%%A=

I'm running a test with this using the "more efficient" line shown above with a sample file with 600,000 lines to see how long it takes.
Will then try this script to see the difference by hard coding the header.
Then will try devil's file
Running on a 3.0 GHz Pentium D

Jerry


----------



## TheOutcaste

Ok, test file is 52,620,043 bytes, 700,000 lines with one header line.
Elapsed times are displayed below the code I ran:

Using a hard coded Header line:


Code:


@Echo off
echo.%time%>add1.log
::Set Output file name here
Set _f{1}=Combined.txt
If EXIST "%_f{1}%" Del "%_f{1}%"
::Output Header to temp file
>%temp%\_f{0} Echo."Name":"Street Address":"City":"St":"Zip":"Filename"
::Read lines from each file excluding the batch file and excluding the header line
::Output to temp file adding :"filename" to end of line
For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~n0"') Do (
For /F "usebackq skip=1 delims=" %%B In ("%%A") Do Echo.%%B:"%%A">>%temp%\_f{0}
)
Move %temp%\_f{0} "%_f{1}%"
For /L %%A In (0,1,1) Do Set _t%%A=
echo.%time%>>add1.log

*0:37:26:53*

Reading one file to extract the header line:


Code:


@Echo off
echo.%time%>add1.log
::Set Output file name here
Set _f{1}=Combined.txt
If EXIST "%_f{1}%" Del "%_f{1}%"
::Gets first filename in alphabetical order, excluding the batch file
For /F "tokens=*" %%A In ('dir /b /a-d /o:-n ^|Find /I /V "%~n0"') Do Set _t0=%%A
::Read Header from first line in first file
For /F "usebackq tokens=*" %%A In (`Find /V /N "" "%_t0%" ^|Findstr /B /C:"[1]"`) Do Set _t1=%%A
::Output Header to temp file
>%temp%\_f{0} Echo.%_t1:~3%:"Filename"
::Read lines from each file excluding the batch file and excluding the header line
::Output to temp file adding :"filename" to end of line
For /F "tokens=*" %%A In ('dir /b /a-d /o:n ^|Find /I /V "%~n0"') Do (
For /F "usebackq skip=1 delims=" %%B In ("%%A") Do Echo.%%B:"%%A">>%temp%\_f{0}
)
Move %temp%\_f{0} "%_f{1}%"
For /L %%A In (0,1,1) Do Set _t%%A=
echo.%time%>>add1.log

*0:38:41:90*

And this is Devil's code.


Code:


@echo off
echo.%time%>add1.log
setlocal enabledelayedexpansion
set tp=c:\tp.txt
if exist tmpfile.txt del /q tmpfile.txt
(for /f "tokens=*" %%a in ('dir /b /a-d *.txt') do (
   (for /f "tokens=*" %%i in ('cmd /c for /f "usebackq tokens=*" %%j in ("%%a"^) do echo %%j^^^& exit') do set f=%%i)
     (for /f "skip=1 usebackq delims=" %%b in ("%%~dpnxa") do (
     echo %%b:%%~nxa>> "%tp%"
  ))
))

echo %f%:"Filename">tmpfile.txt
type "%tp%" >>tmpfile.txt
del /q "%tp%"
echo.%time%>>add1.log

*42:28:04*

I was very surprised that Devil's code took longer. I was running WMP10 listening to music and browsing the forums while I ran the tests on my two files. And during the first test, a scheduled backup was running (batch runs on another PC to back up this one). While Devil's code was running, I was not running WMP, was just browsing.

Hard coding the header line vs reading the file didn't make much difference, an extra 80 seconds or so. The slow part seems to be tacking on the filename and writing it to disk.

I'll try this again with two files (2nd will be copy of first), see if the times roughly double, and repeat this test, just to see if the times are consistant.

Jerry


----------



## TheOutcaste

And just for grins, here's the data file I've been using. Just take the 10 address lines and copy and paste until there are 700,000 lines of addresses:



Code:


"Name":"Street Address":"City":"St":"Zip"
[COLOR="Sienna"]"TechGuy Inc":"[PO] Box 268":"Waynesboro":"PA":"17268"
"John Smith":"128 SE Main St":"Seatle":"WA":"98687"
"RALPH C WILSON JR STADIUM":"[1] BILLS DR":"ORCHARD PARK":"NY":"14127-2237"
"Diplomatic Representation of Afghanistan in the US":"2341 Wyoming Avenue NW":"Washington":"DC":"20008"
"Diplomatic Representation of Andorra in the US: Chancery":"United Nations Plaza 25th Floor":"New York":"NY":"10017"
"General Consulate of Argentina in Houston":"1990 South Post Oak Road Suite 770":"Houston":"TX":"77056"
"Honorary Consulate of Fiji":"2050 W. 190th St. Suite 102":"Torrance":"CA":"90504"
"Honorary Consul Consulate of Finland":"1230 Peachtree Street NE Suite 3100":"Atlanta":"Georgia":"30309"
"Honorary Consulate of Ireland":"2511 NE 31st Court Lighthouse Point":"Fort Lauderdale":"FL":"33064"
"George W. Bush":" 1600 Pennsylvania Ave":" Washington":" DC":"20008"[/COLOR]


----------



## Squashman

I retested Devil's new code and it took 27 minutes.

All my files do have CR/LF. I think I am going to just have to make everyone bite the bullet and use my batch file with the ported utilities. That is if they want there job to finish quickly. If they are OK with it running for a half an hour, which they may be OK with.

I was thinking of just putting all my ported utilites out on our shared drive and then just putting in the full pathnames to the commands into my batch file. That way they won't have to worry about copying over any of them to their working folder or copying them to their pc and adding the directory to their path statement.

Could I create a temporary path statement to the utilities on the network drive so that I wouldn't have to put the full path to all the commands in my batch file? That way they wouldn't have to install them onto their PC and add to their path statement.

Here is my current batch file. I added some logging into it.


Code:


@echo off
:: Time Stamp Start Time
if exist appendlog.log del /q appendlog.log
Echo exit|cmd /q /k prompt $D $T>>appendlog.log

:: comment log
echo.>>appendlog.log
echo.>>appendlog.log 
echo Input>>appendlog.log

:: Get Quantity of Input Records for each file
echo "Calculating input quantities"
for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" *.CHR') do (
for  /f "tokens=1" %%l in ('set /A _NumLines^=%%j-1') do echo %%i %%l>>appendlog.log)

:: delete any existing header.txt file
if exist header.txt del /q header.txt

:: delete any existing combined_output.dat files
if exist combined_ouput.dat del /q combined_output.dat

:: create header record to reverse match against all files and output as first record to combined_output file
:: All Input files must end in the file extension CHR
head -q -n1 *.CHR | head -n1 > header.txt

:: Add Filename Field to end of header line
:: Also Adds the delimiter and surround characters around the Field name
cat header.txt | sed "s/.$/:\"Filename\"/" > combined_output.dat


:: Append the Filename to the end of each line of every file and combine all files together
:: Also adds the delimter and surround character around the variable
:: All Input Files must end in the File extension CHR
:: FOR /F "tokens=*" %%A IN ('dir /b /a-d *.CHR') DO cat %%A | grep -v -f header.txt | sed -e "s/.$/:\"%%A\"/g" >> combined_output.dat
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.CHR') DO echo "Working on file: %%A" && cat %%A | grep -v -f header.txt | sed "s/.$/\":\"%%A\""/" >> combined_output.dat


:: Clean UP
del /q header.txt

:: comment log
echo.>>appendlog.log
echo Output>>appendlog.log

:: Get Quantity of Output file
echo "Calculating Output Quantity"
for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" combined_output.dat') do (
for  /f "tokens=1" %%l in ('set /A _OutLines^=%%j-1') do echo %%i %%l>>appendlog.log)


:: Time Stamp finish time.
Echo exit|cmd /q /k prompt $D $T>>appendlog.log

echo "All done!!!!"
echo "Do you want to view the log file? Y/N"
set /P _Answer=

:: Open AppendLog if Answer equals Y. Not case sensitive.
if /I %_Answer%==Y appendlog.log

I am currently using the unix utils from here:
http://unxutils.sourceforge.net/

But maybe I can get everyone to install Windows Services for Unix. I could go back to my good ole days of writing shell scripts again.
http://www.microsoft.com/downloads/...88-601B-44F1-81A4-02878FF11778&displaylang=en


----------



## TheOutcaste

When you run a batch file, it starts cmd.exe. All changes made to System Variables are local to that process, and are discarded when the program ends and the cmd.exe process is ended.
If you open a command prompt, run a batch file that changes the Path variable, then when the batch file ends, the change is still present, but will be discarded when the command prompt window is closed.
You can also start your file with a setlocal command. Then all variables are discarded when the file ends.
Or you can save the current path, change it as desired, then set it back as part of the cleanup. This gives a little assurance that it will work with differing versions of Windows/DOS

You can also set a variable to the path, ie *set pth=\\server\share\;%path%*, then call the commands using that variable, %pth%sed, %pth%head and so on

Just be sure if you change the Path variable that you include al least *%systemroot%\system32;%systemroot%* if not the entire current path, else Windows won't be able to find the external batch commands like find and findstr.

I re-ran my timing tests rebooting my PC before each test and all three versions took 35 minutes +/- 15 seconds. I did have a couple of backup processes that started on two of the tests, that ran for about 2 minutes, so there might be a wider variance, but it will be no-where near as fast as using the Unix utilities.
I haven't tried converting the batch file to an exe file, that might make it a bit faster, but I'm not sure how much.
This roughly matches your time taking file size into account: (562000/700000)*35 minutes = 28.1 minutes

In your script above for Calculating Output Quantity, as you are only checking the one output file (combined_output.dat) and not multiple input files, the second For loop can be removed. This should give the same result:


Code:


for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" combined_output.dat') do (
  set /A _OutLines=%%j-1>nul) & Echo.%%i %_Outlines%>>appendlog.log

probably won't make more than a .02 second difference though.

Now to find my bat2exe file...

Jerry


----------



## Squashman

We used the bat2exe thing when I worked for the school district. Seems to be alot of different ones out on the web. I am wondering if it would be able to compile my batch file that uses my unix utilities.


----------



## Squashman

Thanks for reminding me about the Bat 2 exe thingy. I was able to compile my Batch script with all the ported Utilities into one Executable. 

I appreciate all the help you guys have given me. I wouldn't have been able to write half the stuff I added to my batch file without looking at some of your code. 

I am hoping both of you can help me with a few more things. But this one will require the use of grep but I am not sure. I think we might be able to use Findstr instead if I am correct that it can take all of its search paramters from a file.

My next order of busines is to do a profanity search. I have a file with a list of profanity words in it. One word per line. Need to search a data file and output the lines that match any of the profanity words to one file and the ones that don't match to another file. I think I could do that with a For, Type,echo and Findstr commands. Get the errorlevel of findstr and echo the line to the appropriate output file. Something like

for /f %%a in (type somefile.txt) do echo %%a | findstr /grofanitywords.txt | if errorlevel==0 echo %%a >> profanity_output.txt else echo %%a>>not_profanity.txt

Something like that. I know the syntax isn't correct. Just trying to convey what I am thinking in a more technical manner.

I should really start a new thread.


----------



## Squashman

devil_himself said:


> Now This Script Doesn't Care About The File Type
> 
> 
> 
> Code:
> 
> 
> @echo off
> setlocal enabledelayedexpansion
> set tp=c:\tp.txt
> if exist c:\tmpfile.txt del /q c:\tmpfile.txt
> (for /f "tokens=*" %%a in ('dir /b /a-d') do (
> if not %%a==%~nx0 (
> (for /f "delims=" %%i in ('cmd /c for /f "delims=" %%j in (%%a^) do echo %%j ^^^& exit') do set f=%%i)
> (for /f "skip=1 usebackq delims=" %%b in ("%%~dpnxa") do (
> echo %%b:"%%~nxa" >> "%tp%"
> )
> ))
> ))
> 
> echo %f%:"Filename">c:\tmpfile.txt
> type "%tp%">>c:\tmpfile.txt
> del /q "%tp%"


Trying to understand the syntax of your Batch file. Could you explain a few things for me.

if not %%a==%~nx0 (You are comparing the current filename to what?)

What are these variables getting assigned:
%%~dpnxa
%%~nxa

Not quite understanding the whole Tilde thing.


----------



## devil_himself

To use the FOR command in a batch program, specify %%variable instead
of %variable. Variable names are case sensitive, so %i is different from %I.

Inside A Batch Script Two "%'s" in %%variable, are required .The first % escapes the second %,
so that the for command to actually work. On the command line itself, you only use one %.

*Complete Modifier List - for /?*



Code:


 %~I         - expands %I removing any surrounding quotes (")
    %~fI        - expands %I to a fully qualified path name
    %~dI        - expands %I to a drive letter only
    %~pI        - expands %I to a path only
    %~nI        - expands %I to a file name only
    %~xI        - expands %I to a file extension only
    %~sI        - expanded path contains short names only
    %~aI        - expands %I to file attributes of file
    %~tI        - expands %I to date/time of file
    %~zI        - expands %I to size of file
    %~$PATH:I   - searches the directories listed in the PATH
                   environment variable and expands %I to the
                   fully qualified name of the first one found.
                   If the environment variable name is not
                   defined or the file is not found by the
                   search, then this modifier expands to the
                   empty string

The modifiers can be combined to get compound results:

    %~dpI       - expands %I to a drive letter and path only
    %~nxI       - expands %I to a file name and extension only
    %~fsI       - expands %I to a full path name with short names only
    %~dp$PATH:I - searches the directories listed in the PATH
                   environment variable for %I and expands to the
                   drive letter and path of the first one found.
    %~ftzaI     - expands %I to a DIR like output line

==
1.if not %%a==%~nx0 (You are comparing the current filename to what?)



Code:


     %~nI        - expands %I to a file name only
     %~xI        - expands %I to a file extension only
     %0           - batch file

>> %~nx0 --> name and extension of batch file
>> %%~dpnxa --> drive,path,name,extension of the current file
>> %%~nxa --> name and extension of the current file
==


----------



## TheOutcaste

The %0 variable will give you the name of the batch file *as it was called.
*This is important, as %0 by itself may not include the extension or path, so you have to use the modifiers to be sure of what you are getting. Note that %0 by itself will return with the same case as you called the file.

Examples for a batch file called Test.cmd located in the c:\scripts\temp folder:
C:\scripts\temp>test
%0 will be test
C:\scripts\temp>teSt
%0 will be teSt
C:\scripts\temp>.\test
%0 will be .\test
C:\scripts\temp>..\temp\test
%0 will be ..\temp\test
C:\scripts>temp\test
%0 will be temp\test

%~nx0 will always return as Test.cmd including the case of the filename.

Jerry


----------



## Squashman

Just like $0 in a bash script. I was thinking that it what it was but wasn't sure. The modifier list certainly comes in handy.


----------



## Squashman

Can the modifier list only be used in a For statement?


----------



## devil_himself

Squashman said:


> Can the modifier list only be used in a For statement?


hmm .. no



Code:


@echo off
  setlocal
  if "%~1"=="" (
    echo Please specify the name of the file
    goto :eof
  )
 
  if %~x1==.pdf echo this is PDF 
  if %~x1==.xls echo this is xls




Code:


::Determine The Path That a Batch File Is Run From
Echo %~dp0


----------



## Squashman

I guess I should have rephrased that. You can't use the modifiers with a variable. Lets say you alredy set a variable to the full path with the filename. Can you use that variable witth the modifiers.


----------



## devil_himself

Squashman said:


> Lets say you alredy set a variable to the full path with the filename. Can you use that variable witth the modifiers.


If I Understand Correctly Then



Code:


::extract the last part of a path
@echo off 
setlocal
set var=C:\ABC\DEF\GHI\IJK
echo %var%
for %%a in ("%var%") do echo %%~na


----------



## Squashman

I figured as much. I thought it would be easier then that. I was hoping I could just do it in a set statement but I guess you can't use the modifiers with a variable you already set. I was hoping you could do it like this.
set var=C:\ABC\DEF\GHI\IJK\some.txt
set new_var=%%~dpvar


----------



## TheOutcaste

Would be nice if it was that easy, but it only works with *For* variables or the batch parameters %0-%9. A *For* statement is probably easiest, but you can also use call:



Code:


set _filename=c:\test1\file.txt
call :_getfnext %_filename%
call :_getpath %_filename%
echo.Filename and extension is %_fn%
echo.Path to %_fn% is %_pn%
set var=C:\ABC\DEF\something.pdf
call :_getfnext %var%
echo.Filename and extension is now %_fn%
goto:eof
:_getfnext
set _fn=%~nx[COLOR=Red]1[/COLOR]
goto:eof
:_getpath
set _pn=%~dp[COLOR=Red]1[/COLOR]
goto:EOF

A little bug is you don't need the surrounding *%* symbols to get the path name; *call :_getpn _filename* works the same as *call :_getpn %_filename%

*I haven't tried all the modifiers to see which ones will work that way, but best use the *%* symbols properly in case other versions of cmd don't have that bug.

EDIT: Just to avoid confusion I've highlighted the batch variable number in red -- those are the number one, not a lower case "L"


----------



## devil_himself

you can do it this way



Code:


@echo off
set var=%1
set new_var=%~dp1
echo %new_var%

edit - i'm a slow typer


----------



## Squashman

You both sure are schooling me on batch files. Thanks for all the help.


----------



## Squashman

devil_himself said:


> if you want the output to be like this
> 
> 
> 
> Code:
> 
> 
> "Name"         :"Address"      :"City"         :"State"        :"ZipCode"      :"Filename"
> "John Doe"     :"503 Grand Ave":"San Jose"     :"CA"           :"93456"        :"file1.txt"
> "Jim Andersen" :"512 Grand Ave":"San Diego"    :"WA"           :"93456"        :"file2.txt"
> 
> then this should do it
> 
> 
> Code:
> 
> 
> @echo off & setlocal enableextensions enabledelayedexpansion
> for /f "tokens=1-6 delims=:" %%a in ('type Myfile.txt') do (
> set l1=%%a               .
> set l2=%%b               .
> set l3=%%c               .
> set l4=%%d               .
> set l5=%%e               .
> set l6=%%f               .
> echo !l1:~0,15!:!l2:~0,15!:!l3:~0,15!:!l4:~0,15!:!l5:~0,15!:!l6:~0,15! >>newfile.txt
> )
> endlocal & goto :EOF


As I was saying earlier, I think I could use this example with a different order I work on. Again, I want to append the filename to the end of each line but this file is already in a fixed format. The filenames are anywhere from 5 to 8 positions long. But I want to maintain the same line length for all records. So I am think I could integrate your example above with my current batch file. I am thinking I would just expand it out to 10 positions. Just add the appropriate amount of blanks to the end. I kind of see that your set statement adds 16 positions to the variable. Then I assume that echo statement outputs the first 15 positions of the variable. I am thinking I might be able to put that into my SED statement to do the replace. Not sure. Will have to test.


----------



## Squashman

found this neat little utility called Swiss File Knife.

Has a nice option called addtail. Works like a charm.



Code:


FOR /F "tokens=1* delims=." %%A IN ('dir /b /a-d *.TXT') DO echo "Working on file: %%A" && type "%%A.TXT" | sfk addtail %%A >> FILES_COMBINED.txt




.


----------



## Squashman

Not quite sure why this isn't working the way I want it to. It doesn't seem to add the spaces to the end of the filename. It only adds the filename. My filenames are 6 to 8 characters long but I want to make the filenames 10 characters so I add some spaces. The reason I am doing that is so the line lengths are the same for all lines on the output.



Code:


:: Append the Filename to the end of each line of every file and combine all files together
FOR /F "tokens=1* delims=." %%A IN ('dir /b /a-d *.TXT') DO (
     set _Tfile=%%A               .
     set _Filename=!_Tfile:~0,10!
     echo "Working on file: %%A"
     type "%%A.TXT"|sfk addtail !_Filename! >> FILES_COMBINED.tmp
     )


----------



## Squashman

Ignore the stupid person posting. Why do I always forget quotes. Devil, Thanks alot for that tip on substrings.


----------



## TheOutcaste

Just a suggestion:
With the FOR /F "tokens=1* delims=." you can't use names with a period. Example, *my.file.txt* will end up with *_Filename* being "*my* " (*my* followed by 8 spaces) and the type statement will be *type my.txt* which may not exist, and the File not found error may go unnoticed. Worse, it may actually exist, so it will get processed twice while *my.file.txt* gets skipped. You may not be using filenames with that format now, but who knows what someone will try 6 months from now.

The following will get the complete filename into the %%A variable, then uses %%~nA to specify the name portion only, so names with a period will be preserved. I haven't had a chance to download and play with sfk, so I'm guessing the missing quotes should be around *!_Filename!*.
Looks like a very useful tool. Thanks for finding that!



Code:


:: Append the Filename to the end of each line of every file and combine all files together
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
     set _Tfile=%%~nA               .
     set _Filename=!_Tfile:~0,10!
     echo "Working on file: %%~nA"
     type "%%A"|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
     )

HTH

Jerry


----------



## Squashman

I like how you think. That should make it just about foolproof.

Still wondering if I could do my file counts differently. Would be nice if I could count the records as I am processing the files, instead of counting them before and after. But I think if I do that I will start running into memory issues like we did with the profanity batch. I just do that to verify my input and output file counts.

Wasn't sure if I could put something in there that would increment a counter for each filename as it is processed. I assume this wouldn't work because I am echoing(typing) the file to another command and then redirecting output. I assume I couldn't put a counter in there because it would increment the counter until it was done typing the file. I thought maybe I could use the filename variable as part of the counter variable and then echo that variable to my Log file at the end of the batch in another loop. This would only work I assume if the loop was parsing one line of the file at a time. Instead the loop is controlled by the filename being processed.
set /a !_Filename!_count+=1

I suppose I could create another For Loop inside the first For Loop that types the file but again if it tries to load the entire file into memory, I think that it will choke on memory again.



Code:


FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
     set _Tfile=%%~nA               .
     set _Filename=!_Tfile:~0,10!
     echo "Working on file: %%~nA"
     FOR /F %%B IN ('type %%A') DO (
     set /a !_Filename!_count+=1
     echo.%%B|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
     )
     echo.!_Filename!_count >>Appendlog.log
     )

Not even sure if that will work. I am at home right now. Would have to test it on a large file later on went I get to work tonight.


----------



## TheOutcaste

There ya go forgetting those quotes again. Also, need to specify the "tokens=*" or remove delimiters with "delimns=", or any line containing a space or tab will be truncated.

FOR /F "tokens=*" %%B IN ('type "%%A"') DO (

Not sure about the memory issue. Testing will tell. The extra FOR loop will likely slow it down quite a bit though.

Find can be used to count lines, but its not real fast. Didn't time it counting my file with 28,799,994 records (about 80 bytes each), but seems like about 10-20 minutes. Had two counts running, and the PC was recording TV at the same time though. I just typed the commands; should have run a batch and logged the times.
Find was faster than findstr though, and didn't hang like findstr did when it finished.
You might try starting a separate batch file to do the count, and let it run in another process. Haven't tested this, might have to use start so that it doesn't wait for it to finish, but that would open another command prompt window.



Code:


 FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO ([COLOR=Red]cmd /c count.bat %%A[/COLOR]  
     set _Tfile=%%~nA               .
     set _Filename=!_Tfile:~0,10!
     echo "Working on file: %%~nA"
     FOR /F %%B IN ('type %%A') DO (
     set /a !_Filename!_count+=1
     echo.%%B|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
     )
     echo.!_Filename!_count >>Appendlog.log
     )

count.bat would contain:


Code:


>>appendlog.log find /v /c "@@this#won't#be#[email protected]@" %1

Or whatever method you are using to count records. Find outputs the file name, if your counting some other way just add


Code:


>>appendlog.log echo.File %1 count is:

to the start of count.bat

Jerry


----------



## EAFiedler

Thread reopened on request.


----------



## Squashman

Still haven't gotten around to testing the above changes but I am hoping it works because I would assume it has to be faster than parsing the data twice. Once to count the input and then appending the filename the second time.

Currently I have been using the find command to figure the number of records in the file. This has worked quite fine except for when I get a file in with an end of file byte. The last line will be a hex 1A. Now the utility that I use to append to the end of the line seems to be smart enough and drops the eof marker when it is appending all the data together. But as I said before, I count all the data first before it is appended and find seems to think it is another record and makes the input count one higher than it is suppose to be.

The output count comes out correct though because SFK which appends to the end of the record seems to ignore the hex 1A at the end of the file and just moves to the next file.


----------



## Squashman

So I finally got around to testing this but it didn't work on the data. If the data had an amerpsand in it, it would basically stream the line as far as the ampersand and then append the filename at that point. So if the line has somebody's name with a dual title, like Mr & Mrs John Smith, it would output it like Mr. filename instead of Mr & Mrs John Smith filename. I would end up with the rest of the line just dropped from the data.

Here is the current code I am using. The meat of the code that does all the processing is the nested for loop. I get a Missing Operator error when running the code but it never bombs. It does process all the data but truncates it whenever it find an ampersand. It also doesn't output the file counts correctly.

I would expect
Filename 1450
But it outputs
Filename _count 


Code:


@echo off & setlocal enableextensions enabledelayedexpansion

Set _Opsheet=appendlog.log

:: Time Stamp Start Time
if exist %_Opsheet% del %_Opsheet%
Echo exit|cmd /q /k prompt $D $T>>%_Opsheet%

:: comment log
echo.>>%_Opsheet%
echo.>>%_Opsheet% 
echo Input>>%_Opsheet%

:: delete any existing EASTER_SEALS_FILES_COMBINED files
if exist FILES_COMBINED.tmp del /q FILES_COMBINED.tmp
if exist FILES_COMBINED.txt del /q FILES_COMBINED.txt

::  This is the old CODE for counting and appending
:: Get Quantity of Input Records for each file
:: echo "Calculating input quantities"
::for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" *.TXT') do (
::     set _Ifile=%%i               .
::     set _Ifilename=!_Ifile:~0,20!
::     echo !_Ifilename! %%j>>%_Opsheet%
::     )
::
:: Append the Filename to the end of each line of every file and combine all files together
:: FOR /F "tokens=1* delims=." %%A IN ('dir /b /a-d *.TXT') DO (
::     set _Tfile=%%A               .
::     set _Filename=!_Tfile:~0,15!
::     echo "Working on file: %%A"
::     type "%%A.TXT"|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
::     )
:: End of OLD code fore counting and appending

:: NEW CODE for Counting and Appending
:: Trying to do both in these nested for loops.
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
     set _Tfile=%%~nA               .
     set _Filename=!_Tfile:~0,15!
     echo "Working on file: %%~nA"
     FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
     set /a !_Filename!_count+=1
     echo.%%B|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
     )
     echo.!_Filename!_count >>%_Opsheet%
     )



:: comment log
echo.>>%_Opsheet%
echo Output>>%_Opsheet%

:: Rename Temp file
ren FILES_COMBINED.tmp FILES_COMBINED.txt


:: Get Quantity of Output file
echo "Calculating Output Quantity"
for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" FILES_COMBINED.txt') do echo %%i %%j>>%_Opsheet%


:: Time Stamp finish time.
Echo exit|cmd /q /k prompt $D $T>>%_Opsheet%

echo "All done!!!!"
echo "Do you want to view the log file? Y/N"
set /P _Answer=

:: Open AppendLog if Answer equals Y. Not case sensitive.
if /I %_Answer%==Y start notepad %_Opsheet%


----------



## Squashman

So I tried putting quotes around the %%B thinking this is what was giving me the missing operator error. But when I did that it output the data with quotes and I still got the missing operator. It did give me all the data but it looked like this.

"Mr & Mrs John Smith" filename
So I got all my data but with quotes around it. I can't have the quotes. I suppose I could tack on another pipe to get rid of the quotes with sed but I fear that some day I may get data in that has quotes in it. Sometimes we get data in that is delimited and surrounded with quotes. So I assume the missing operator error is from the set command.
set /a !_Filename!_count+=1


----------



## TheOutcaste

Another quirk of Echo
You can echo a line with an ampersand, but if you pipe it, it sees the ampersand as the command "joining" symbol, to run multiple commands on one line.

Easiest way around it is quote the %%B variable:
echo.*"*%%B*"*|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp

This will add the quotes though.
This will do the trick though:



Code:


 FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
 set _Tfile=%%~nA               .
 set _Filename=!_Tfile:~0,15!
 echo "Working on file: %%~nA"
 FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
  set /a !_Filename!_count+=1
  Set _tmpstr=%%B
  Set _tmpstr=!_tmpstr:^&=^^^&!
  echo.!_tmpstr!|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
 )
     echo.!_Filename!_count >>%_Opsheet%
     )

!_tmpstr:^&=^^^&!
To use the ampersand and caret, they have to be escaped with a caret, so the red carets are the escape characters.
So this line changes & to ^&, so it is escaped and actually echoed when you pipe the echo command.

EDIT: Guess I should have read the whole thing, didn't look into the missing operator part yet

HTH

Jerry


----------



## TheOutcaste

Seems Set doesn't like using a variable to specify part of the name when using /A. I'm guessing it sees *!_Filename!* as one variable and expects an operator to follow, rather than tacking *_count* onto *!_Filename!* to create the variable name.
Not sure why you need the filename as part of the variable, as you only output the count, not the filename.
So use this instead:


Code:


set /a _count+=1
...
echo.!_count! >>%_Opsheet%

or if you need the filename on the line with the count:
echo.!_Filename! count is !_count! >>%_Opsheet%

HTH

Jerry


----------



## Squashman

Yeah, I realized that yesterday that I was only outputting the count. I actually need to output the filename and the count. At some point I need to reset the count back to zero though so that it doesn't keep counting up when it changes to a different filename. That is why I was using the filename as part of the variable.


----------



## TheOutcaste

Just initialize the _count variable to zero for each file:


Code:


FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
 [COLOR=Red]set _count=0[/COLOR]
 set _Tfile=%%~nA               .
 set _Filename=!_Tfile:~0,15!
 echo "Working on file: %%~nA"
 FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
  set /a _count+=1
  Set _tmpstr=%%B
  Set _tmpstr=!_tmpstr:^&=^^^&!
  echo.!_tmpstr!|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
  )
 echo.!_Filename! count is !_count! >>%_Opsheet%
)

Jerry


----------



## Squashman

I modified the code a bit more just to have a total input count as well. I ran about 2143 records thru the batch file and it took 7 minutes to complete reading and writing to the network drive. I then ran it locally with all the data on my hard drive and it still took 1:52. Not sure what is slowing it down. It never ran that slow before over the network. But I guess I can't complain. It works the way I want it to now. I have to assume that adding in that additional set command to escape the "&" is slowing things down a bit. I still do a find command at the end as well to determine the quantity of the output file. That command just takes a couple seconds to run. My line length is about 819 bytes that I am appending the filename to. I think I am better off going back to the old way with counting first with find.


Code:


set _Tcount=0
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
 set _count=0
 set _Tfile=%%~nA               .
 set _Filename=!_Tfile:~0,15!
 echo "Working on file: %%~nA"
 FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
  set /a _count+=1
  set /a _Tcount+=1
  Set _tmpstr=%%B
  Set _tmpstr=!_tmpstr:^&=^^^&!
  echo.!_tmpstr!|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
  )
 echo.!_Filename! count is !_count! >>%_Opsheet%
)
echo.Input Count is !_Tcount! >>%_Opsheet%


----------



## TheOutcaste

Instead of incrementing _Tcount for each line, try just adding the _Count values for each file. Should be faster.



Code:


set _Tcount=0
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
 set _count=0
 set _Tfile=%%~nA               .
 set _Filename=!_Tfile:~0,15!
 echo "Working on file: %%~nA"
 FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
  set /a _count+=1
  Set _tmpstr=%%B
  Set _tmpstr=!_tmpstr:^&=^^^&!
  echo.!_tmpstr!|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
  )
 echo.!_Filename! count is !_count! >>%_Opsheet%
 [COLOR=DarkRed][B]set /a _Tcount=!_Tcount!+!_Count![/B][/COLOR]
)
echo.Input Count is !_Tcount! >>%_Opsheet%

Jerry


----------



## Squashman

I will give that a try but I still think the old way is going to be faster. I ran a million records thru the old way with the files on my hard drive and it only took 7:30 seconds.


----------



## Squashman

It actually took 1:59 seconds with the count change on 2000 records. So they are relatively the same. That is just way to slow. If I had to process a million records it would take way too long. Gonna have to test it with just echoing quotes around the data and removing those 2 set commands to fix the ampersands. I might be able to just pipe it to one more command to remove the quotes with SED. In fact I really only need to remove the first quote at the beginning of the line. The quote at the end of the line doesn't matter.


----------



## Squashman

Well I thought I finally had this figured out. I tested using SFK echo and putting the variable in quotes. When you use sfk echo and put something in quotes it will only echo what is in quotes.
So if I do this at a cmd prompt
sfk echo "&"
It just echos back
&

I was ecstatic when I figured that out. But for some reason SFK doesn't like it when there is spaces at the beginning of a variable. Lets say I had the first name in positions 1 thru 10 of a line and the last name started in position 11. When it gets to a line with nothing in the first position, it won't start echoing the line until it finds a printable character. I have no idea why. You would think it should echo whatever in the variable.

So basically this:


Code:


                    GORMAN
JEANINE             GROSS

Came out like this on my output file.


Code:


GORMAN          Filename
JEANINE             GROSS Filename

What I really don't understand is that this seems to work fine from the cmd prompt.



Code:


H:\>sfk echo "     4 spaces before"
     4 spaces before

H:\>

Here is the full code I am using now.


Code:


@echo off & setlocal enableextensions enabledelayedexpansion

Set _Opsheet=appendlog.log

:: Time Stamp Start Time
if exist %_Opsheet% del %_Opsheet%
Echo exit|cmd /q /k prompt $D $T>>%_Opsheet%

:: comment log
echo.>>%_Opsheet%
echo.>>%_Opsheet% 
echo Input>>%_Opsheet%

:: delete any existing EASTER_SEALS_FILES_COMBINED files
if exist FILES_COMBINED.tmp del /q FILES_COMBINED.tmp
if exist FILES_COMBINED.txt del /q FILES_COMBINED.txt

::  This is the old CODE for counting and appending
:: Get Quantity of Input Records for each file
:: echo "Calculating input quantities"
::for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" *.TXT') do (
::     set _Ifile=%%i               .
::     set _Ifilename=!_Ifile:~0,20!
::     echo !_Ifilename! %%j>>%_Opsheet%
::     )
::
:: Append the Filename to the end of each line of every file and combine all files together
:: FOR /F "tokens=1* delims=." %%A IN ('dir /b /a-d *.TXT') DO (
::     set _Tfile=%%A               .
::     set _Filename=!_Tfile:~0,15!
::     echo "Working on file: %%A"
::     type "%%A.TXT"|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
::     )
:: End of OLD code fore counting and appending

:: NEW CODE for Counting and Appending
:: Trying to do both in these nested for loops.
set _Tcount=0
FOR /F "tokens=*" %%A IN ('dir /b /a-d *.TXT') DO (
 set _count=0
 set _Tfile=%%~nA               .
 set _Filename=!_Tfile:~0,15!
 echo "Working on file: %%~nA"
 FOR /F "tokens=*" %%B IN ('type "%%A"') DO (
  set /a _count+=1
  sfk echo "%%B"|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp
  )
 echo.!_Filename! count is !_count! >>%_Opsheet%
 set /a _Tcount=!_Tcount!+!_count!

)
echo.Input Count is !_Tcount! >>%_Opsheet%
echo.>>%_Opsheet%
echo.>>%_Opsheet%



:: comment log
echo.>>%_Opsheet%
echo Output>>%_Opsheet%

:: Rename Temp file
ren FILES_COMBINED.tmp FILES_COMBINED.txt


:: Get Quantity of Output file
echo "Calculating Output Quantity"
for /f "tokens=2,3 delims=: " %%i in ('find /v /c "SomeStringNotToBeFound" FILES_COMBINED.txt') do echo %%i %%j>>%_Opsheet%


:: Time Stamp finish time.
Echo exit|cmd /q /k prompt $D $T>>%_Opsheet%

echo "All done!!!!"
echo "Do you want to view the log file? Y/N"
set /P _Answer=

:: Open AppendLog if Answer equals Y. Not case sensitive.
if /I %_Answer%==Y start notepad %_Opsheet%


----------



## TheOutcaste

Try this:



Code:


sfk echo [COLOR=Red][B]-noblank ""[/B][/COLOR] "%%B"|sfk addtail "!_Filename!" >> FILES_COMBINED.tmp

This outputs a null string, then the variable and doesn't add a space between the null and the variable.
At least that worked at the command line:


Code:


set _temp=    This line starts with 4 spaces
sfk echo -noblank "" %_temp%
    This line starts with 4 spaces
12345

Same result using:


Code:


set _temp="    This line starts with 4 spaces"
sfk echo -noblank "" %_temp%
    This line starts with 4 spaces
12345

Quotes were removed

Jerry


----------



## Squashman

This is just really goofy. It is still not working.


----------

