# Merging HTML Files, How do I?



## HowdeeDoodee (Aug 26, 2004)

I need to merge thousands of HTML files. I have stripped out the header info and the footer info so basically what I have are text files with an html extension. I have tried changing the file names from html to txt but do this creates files with corrupted text in the file. 

I have tried inserting html files into a Word document but that only works for a few files and does not work for as many files as I have.

I have not been able to find a file merge utility that will merge all the html files together.

So...

Question 1:
Is there a way, perhaps by vba code, of inserting a directory of html files into a Word document regardless of how many files are in the directory? The average directory size is about 5 meg.

OR

Question 2:
Is there a way, any way, of merging or combining html files.

OR

Question 3:
Is there a way, any way, of inserting a directory html files into an editor?

Thank you in advance for any replies.


----------



## Rockn (Jul 29, 2001)

How do you want to merge them? Do you want a Word document or an HTML document? What about PDF format?


----------



## cristobal03 (Aug 5, 2005)

If you're merging a 5MB directory exclusively of HTML files, you must be talking about hundreds of files. Why do you want to do this? If I understand correctly, you want to merge everything from to in each file into one single file?

chris.


----------



## HowdeeDoodee (Aug 26, 2004)

Thank you for the responses. OK, here are the questions and answers.



> How do you want to merge them? Do you want a Word document or an HTML document?
> What about PDF format?


I would prefer a Word document or txt. The problem I am having is when I try to change the file name, the file contents can be corrupted. No pdf. If I have a large html file I think I can open the file up in FP and get at the contents that way.



> If you're merging a 5MB directory exclusively of HTML files, you must be talking about hundreds of files. Why do you want to do this? If I understand correctly, you want to merge everything from to in each file into one single file?


Actually the number of files is over 12,000. I want to do this so I can convert the body section of the file into txt which will be placed in Excel which will then be conformed to a file for MySql input. The content of the files becomes part of a MySql database. I have deleted all the header information in each file so all I have is the non-header and non-footer section you see on the screen.

I have tried stripping out the html tags with a tag stripper but some files still ended up corrupted because the file names were changed.

Thank you again for your time and the replies.


----------



## Rockn (Jul 29, 2001)

Do you have a sample of the HTML you can post here?


----------



## cristobal03 (Aug 5, 2005)

I don't see how changing the extension from *html* to *txt* would cause file corruption. What did you use to generate the HTML files?

chris.


----------



## Rockn (Jul 29, 2001)

Neither do I. Parsing out all of the HTML would make it even more unreadable as the output would be all strung together without any formatting.


----------



## HowdeeDoodee (Aug 26, 2004)

Thank you for the replies.

This issue has been solved.

Here is the solution.

Remember DOS?

Go to...

> Start
> Programs
> Accessories
> Command Prompt
Type in and confirm by hitting the Enter key

C:\>copy c:\TempStore\a*.html c:\TempStore\AllLettTwo.html

All file beginning with the letter "a" will be merged and joined into the file AllLettTwo.html


After the html files are merged, I can access them with FrontPage or another html editor. Copy the contents of the file into the Word document.

You know that little message you get when you try to change a filename extension that says something to the effect "You may lose data"? There is a reason for that message.

Keywords: Merging html files, joining html files, combining html files, merge html files, join html files, join html files


----------



## Rockn (Jul 29, 2001)

Well that was a simple and unexpected solution...COOL! Sometimes ya gotta take a step back and approach things from a different angle.


----------

