# Solved: Separate huge .doc file into multiple files



## Miss HK (Jan 13, 2009)

Hi,

I have a massive Word file which I need to split into multiple files.

Basically they look like this:

NEW FILE

RE: Tuesday, December 18, 2007 4:09 AM
To: [email protected]
Subject: france

TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTVTEXT
TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT
[email protected] <mailto:[email protected]> 



NEW FILE
 
RE: Tuesday, December 18, 2007 5:09 AM
To: [email protected]
Subject: france

TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTVTEXTTEXT
TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT
[email protected] <mailto:[email protected]>

NEW FILE

RE: Tuesday, December 18, 2007 6:09 AM
To: [email protected]
Subject: france

TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTVTEXTTEXTTEXT
TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT
[email protected] <mailto:[email protected]>

And so on and so forth...

I need each NEW FILE to be split and saved as a new file, ideally the name of the new saved file would be the email address (eg. [email protected])

I tried modifying the Macro created by Rollin_Again in the post : MS Word Help - Macro  but to no avail. I also trying finding and replacing the words in my file with those from Rollin's file which seemed to almost work.

Problem is I know nothing about programming, any help welcome!


----------



## 1002richards (Jan 29, 2006)

Hi and Welcome to TSG,
I've not tried that process myself, but here are some free progs suggested at Portablefreeware.com.
They can be run from a USB so you can avoid cluttering your hard drive if they don't do what you want.

http://www.portablefreeware.com/index.php?q=file+splitter&m=Search&so=p

I'm sure you'll make a couple of back-ups of your original document just in case something goes haywire!!

Richard


----------



## slurpee55 (Oct 20, 2004)

We need to find some item that is uniform across all sections at either the beginning or the end of where you want to break the file up.
It would appear that you have an email address along the lines of
<mailto:[email protected]>
at the end of each one. Is that correct?


----------



## Chip M (Jan 13, 2009)

If it's just a pure split you need, there are loads of freeware that will do that. You specify the exact size of the file and it makes non-working files that can be re-joined to make the original. I use this when I'm trapped using floppies to transfer data. Find one of these at www.filesplitter.org This particular app won't install on your PC (which I like!). It just runs as a direct .exe wherever you download. However, if you want the separate "chunks" to do anything at all (eg. execute, format, etc.) this simple type of program won't produce files that can operate separately. For pure DOS text, it should work fine.


----------



## slurpee55 (Oct 20, 2004)

Filesplitter was useful back in the day, but it doesn't split files according to text, just to size - tell it you need 100 1 mb files made from a 100 mb file and you will get them, but they are not in a format that can be read - they have to be reassembled first.
Basically all of 1002 richard's suggestions do the same thing.
No, you need a VBA pro to script this for you.


----------



## Elvandil (Aug 1, 2003)

WD2000: How to Programmatically Save Each Page or Section of a Document As Separate File

MS Word Split (Not free.)

You could also convert to PDF and use a PDF splitter. Or Copy and Paste into smaller docs.


----------



## computerman29642 (Dec 4, 2007)

I found this code:


```
Public Sub SplitWordDoc()
 
Dim sPath As String
Dim sName As String
Dim p As Long
Dim docNew As Document
Dim rngSource As Range
 
'gets document application path to provide saving location
sPath = "C:\"
sName = Replace(ActiveDocument.Name, ".doc", "", Compare:=vbTextCompare)
 
'go to start of document
Selection.HomeKey wdStory
 
Application.ScreenUpdating = False
 
'get current page count
ActiveDocument.Repaginate
 
'for each page in the document
For p = 1 To ActiveDocument.BuiltInDocumentProperties(wdPropertyPages)
 
    'select the page
    ActiveDocument.Bookmarks("\Page").Range.Select
    'move left 1 character
    Selection.MoveLeft wdCharacter, 1, wdExtend
    'set the range to be copied to new document
    Set rngSource = Selection.Range.FormattedText
 
    'create new document
    Set docNew = Documents.Add
    'copy page contents
    docNew.Range.FormattedText = rngSource
    'save the document
    docNew.SaveAs sPath & sName & "_Page" & p & ".doc"
    'close the document
    docNew.Close True
 
    'go to the next page
    Selection.GoTo What:=wdGoToPage, Which:=wdGoToNext
 
Next p
 
'go to start of document
Selection.HomeKey wdStory
 
Application.ScreenUpdating = True
 
End Sub
```
This does not save the file by e-mail address.


----------



## slurpee55 (Oct 20, 2004)

But doesn't that require that the document have page breaks between each file? I could have posted that, old buddy. 
But how does he break the file up into new pages according to the emails?


----------



## slurpee55 (Oct 20, 2004)

How about this:
do an Edit, Replace and find each instance of > and replace it with >^m. That will leave the > where it was and insert a manual page break right after it.


----------



## slurpee55 (Oct 20, 2004)

Then this code might work....

```
Sub BreakOnSection()
    'Used to set criteria for moving through the document by section.
    Application.Browser.Target = wdBrowseSection

    'A mailmerge document ends with a section break next page.
    'Subtracting one from the section count stop error message.
    For i = 1 To ((ActiveDocument.Sections.Count) - 1)

        'Select and copy the section text to the clipboard
        ActiveDocument.Bookmarks("\Section").Range.Copy

        'Create a new document to paste text from clipboard.
        Documents.Add
        Selection.Paste

        'Removes the break that is copied at the end of the section, if any.
        Selection.MoveUp Unit:=wdLine, Count:=1, Extend:=wdExtend
        Selection.Delete Unit:=wdCharacter, Count:=1

        ChangeFileOpenDirectory "C:\"
        DocNum = DocNum + 1
        ActiveDocument.SaveAs FileName:="test_" & DocNum & ".doc"
        ActiveDocument.Close
        'Move the selection to the next section in the document
        Application.Browser.Next
    Next i
    ActiveDocument.Close savechanges:=wdDoNotSaveChanges
End Sub
```
(Got this at http://word.tips.net/Pages/T001538_Merging_to_Individual_Files.html)


----------



## Miss HK (Jan 13, 2009)

Thank you all for your input!
I haven't been successful

This code was created by Rollin_Again:

Sub SplitRecords()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "OFFICE RECORD"

Do While Selection.Find.Execute = True

vCount = vCount + 1

If vCount > 1 Then

Selection.EndKey Unit:=wdLine

Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut

Do While Selection.Text = "" Or Selection.Text = " "
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Documents.Add DocumentType:=wdNewBlankDocument
Selection.PasteAndFormat (wdPasteDefault)
Selection.HomeKey Unit:=wdStory
With Selection.Find
.Text = "OFFICE RECORD"
.Replacement.Text = " "
.Forward = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
Do While Selection.Text = "" Or Selection = " "
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.Find.Text = "RE:"

Do While Selection.Find.Execute = True
Selection.EndKey Unit:=wdLine, Extend:=wdExtend

If InStr(1, Mid(Selection.Text, 4), "") > 0 Then
vFile = Trim(Replace(Mid(Selection.Text, 4), "", ""))
Else
vFile = Trim(Mid(Selection.Text, 4))
End If
ActiveDocument.SaveAs (vPath & vFile & ".doc")
ActiveDocument.Close
Selection.Find.Text = "OFFICE RECORD"
Exit Do
Loop

End If

Loop

End Sub

My idea was to switch "OFFICE RECORD" with my text "NEW FILE"
and Selection.Find.Text = "RE:" to 
Selection.Find.Text = " <mailto:[email protected]>"
but that didn't work...


----------



## slurpee55 (Oct 20, 2004)

I have sent a message to Rollin asking him to come check this out - only makes sense....


----------



## computerman29642 (Dec 4, 2007)

Can we not get a sample file of the Word document?


----------



## computerman29642 (Dec 4, 2007)

slurpee55 said:


> But doesn't that require that the document have page breaks between each file? I could have posted that, old buddy.
> But how does he break the file up into new pages according to the emails?


Sorry. I thought each one was already on a separate page. I will continue to take a look, and see if I can figure something out.


----------



## slurpee55 (Oct 20, 2004)

Well, see my post - #9. It will put a manual page break in at every place the file has something like
<mailto:[email protected]> (specifically, it will replace the ">" with ">" and a page break).
It appears that there is such an item at the end of each group of text.


----------



## computerman29642 (Dec 4, 2007)

So, we need automatic page breaks? Let me see what I can come up with.


----------



## Rollin_Again (Sep 4, 2003)

Does the phrase "NEW FILE" actually appear between each new entry in the Word file or did you just place it there to help with your explantion?

Also, you will not be able to use the exact email address as the filename since it contains a special character (@) 

You can always replace this symbol with an underscore or other character and save it that way. Would that work?

Regards,
Rollin


----------



## Miss HK (Jan 13, 2009)

Hi Rollin_Again!

Basically, it's a 100 resumes all in one Word file that I need to split up into as many resumes.

The beginning of each resume is always the same: 
From: [email protected]

I'd like to save by name or email id, changing @ to underscore isn't ideal cause i'll have to manually find and replace each file with @ again afterwards, could i just save each file with the name before @ ?

Here is an actual sample of the file:

From: [email protected]
Sent: Wednesday, October 31, 2007 2:04 PM
To: [email protected]
Subject: CANADA 
----------------------------------------------------------------------
#11111111
----------------------------------------------------------------------
Name: kris 
Address: 
Email: [email protected]
----------------------------------------------------------------------

CONTACT INFO
kris 
[email protected]
RESUME
#11111111
Resume 
Text Missing

From: [email protected] 
Sent: Wednesday, October 31, 2007 2:04 PM
To: [email protected]
Subject: DESIGN
----------------------------------------------------------------------
#88888888
----------------------------------------------------------------------
----------------------------------------------------------------------
Name: jane
Address
Email: [email protected]
----------------------------------------------------------------------

CONTACT INFO
Jane
[email protected]
RESUME
#88888888
Headline: 
Text Missing
----------------------------------------------------------------------

From: [email protected]
Sent: Wednesday, October 31, 2007 2:04 PM
To: [email protected]

I tried modifying the code you created but I know nothing about VBS!


----------



## slurpee55 (Oct 20, 2004)

That is much more useful for practical work. So, you could insert a page break before every point where "From: bob" appears?


----------



## Anne Troy (Feb 14, 1999)

First run this macro, which I recorded and main not be incredibly efficient, but should be fine for your purposes:


```
Sub MakeBreaks()
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "From: "
        .Replacement.Text = "DELIMITERFrom: "
        'Your delimiter is the word DELIMITER, above
        .Forward = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.HomeKey Unit:=wdStory
    Selection.Delete Unit:=wdCharacter, Count:=1
End Sub
```
Then use the word DELIMITER in this macro, to separate the files:

http://vbaexpress.com/kb/getarticle.php?kb_id=922

If you have questions or problems, ask Steve Lucas, who wrote the macro, at vbaexpress. He's a great guy.


----------



## slurpee55 (Oct 20, 2004)

Nice Anne, but all it does is give you a place to insert page breaks. That is just as easily accomplished with a variation of what I posted in #9

Replace "From: bob" with "^mFrom: bob" and you will have manual page breaks in front of every place that "From: bob" occurs
(The "^" means page break in "Word" and "m" means manual)


----------



## Rollin_Again (Sep 4, 2003)

Try this macro. Make sure to ALWAYS save a backup copy of the original Word document before running any macro code on it.


```
Sub SplitEmails()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "From:"

vFirstRecord = True

Do While Selection.Find.Execute = True

If vFirstRecord = False Then

Selection.MoveUp Unit:=wdLine, Count:=1
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut
Documents.Add DocumentType:=wdNewBlankDocument
Selection.Paste
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
ActiveDocument.SaveAs (vPath & vFrom & ".doc")
ActiveDocument.Close
vFirstRecord = True

Else

vFirstRecord = False

End If

Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

ActiveDocument.SaveAs (vPath & vFrom & ".doc")
ActiveDocument.Quit

End Sub
```
Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

Looks good, Rollin! I hope you are enjoying this nice warm winter!


----------



## Miss HK (Jan 13, 2009)

Sadly it's not working error message: 4198

Debugger points to: ActiveDocument.SaveAs (vPath & vFrom & ".doc")

Am I doing something wrong?


----------



## Rollin_Again (Sep 4, 2003)

Add the following line just before the line that is throwing the error and then tell us what the messagebox says. What version of MS Word are you using? The only other thing I can think of is that you are using Word 2007 which would require that you change the file extension from .doc to .docx or similar. I tested the code on my earlier version of Word and did not recieve any errors.


```
Msgbox(vPath & vFrom & ".doc")
```
Regards,
Rollin


----------



## Miss HK (Jan 13, 2009)

Duh...

Actually I tried converting my file from docx to doc and changing the code .doc to .docx but it seems to always split the first cv and the i get the msg error.

When I entered the line as you advised I first get a box that says:

C:\Documents and Settings: User\My Documents\ filename\ route.xxxxx.com.doc

When I click OK it is followed by another box: Run Time Error '4198'

Command failed

And so only the first file has been split


----------



## Rollin_Again (Sep 4, 2003)

Without an actual sample doc it will be hard to determine why the macro is not working. I copied and pasted the text from one of your previous posts straight into a new doc and resaved it before running the macro and it worked fine. If you want to send me a sample via email I'd be happy to take a look. Just replace any sensitive data and reduce the total size to include only about 5 or six entries.

Does the filename *route.xxxxx.com.doc* contain any special characters such as apostrophe or single quote mark?

When you get the runtime error it should give you the option to debug. Click debug and tell me which line of code is highlighted.

Rollin_Again at Hotmail dot com


----------



## Rollin_Again (Sep 4, 2003)

OK I got your sample document you sent me via email. It looks like the reason the code was failing was because the *FROM:* line contains two special characters that cannot be used in the SaveAs filename. These special characters are the *Tab* character and the *Carriage Return* character. I have modified the code to replace these characters with nothing. Just add the following two lines of code before each of the two save commands (ActiveDocument.SaveAs)


```
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
```
Your final code should appear as it does below. Make sure that you ALWAYS maintain duplicate copies of documents prior to running any macros on them. Also, the document that you are running the macro on should also be saved locally to your disk instead of opening directly from an email and running the macro. If you open a document straight from your email and do not save to your hard disk the individual docs will be saved to the temp files directory where they may be lost after being written.


```
Sub SplitFiles()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "From:"

vFirstRecord = True

Do While Selection.Find.Execute = True

If vFirstRecord = False Then

Selection.MoveUp Unit:=wdLine, Count:=1
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut
Documents.Add DocumentType:=wdNewBlankDocument
Selection.Paste
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), 

Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
ActiveDocument.SaveAs (vPath & vFrom & ".doc")
ActiveDocument.Close
vFirstRecord = True

Else

vFirstRecord = False

End If

Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), 

Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

ActiveDocument.SaveAs (vPath & vFrom & ".doc")
Application.Quit

End Sub
```
Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

sweet work, Rollin! hope it works!


----------



## Miss HK (Jan 13, 2009)

Almost there thank you! I run the macro, and it seems to work, but I can't find the files, I'm not sure where they have been saved.


----------



## slurpee55 (Oct 20, 2004)

The code says
ActiveDocument.SaveAs (vPath & vFrom & ".doc")
where vPath is earlier defined as
vPath = ActiveDocument.Path & "\"
so it either should be in the same folder or one folder deeper in that folder (I'm not sure which, but I think the first) as where the original document is.


----------



## Rollin_Again (Sep 4, 2003)

You are correct Slurpee. The files get saved to the same path as the original document. That is why you should save the original document to your own location instead and not run the macro directly on a doc opened from email.

Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

Rollin, just a question. Given that all the resumes start with "[email protected]" won't they all be named the same, e.g. C:\\desktop\[email protected]?
Or will it automatically call them [email protected], [email protected], [email protected], etc.?


----------



## Rollin_Again (Sep 4, 2003)

slurpee55 said:


> Rollin, just a question. Given that all the resumes start with "[email protected]" won't they all be named the same, e.g. C:\\desktop\[email protected]?
> Or will it automatically call them [email protected], [email protected], [email protected], etc.?


All the docs would be named the same. I just assumed that the addresses in the real doc would be different and that the op used the same address for demonstration purpose in the sample document. I guess we need more clarification about that.

Regards,
Rollin


----------



## Miss HK (Jan 13, 2009)

Hi All,

Actually the files to split do all start with the same email ID as they are emailed cvs from the same sender which were saved from Outlook into a single .txt file which i have converted to .doc

Each file that needs to be split is a different cv, so they all contain one differentiating email ID, but the beginning (where the split should occure) does not change, it's always [email protected]
Ideally it would be great if they were saved by the different email IDs, but if they are saved to a number that works too.

When I run the macro, it seems to work (i can see lots of files Document1... Document2... etc being opened) but I end up with one single file under the name bob_bobsaddress.com.doc. My guess is they are overwriting each other as they are being saved under the same name. 


Thank you all for your input and help!


----------



## Rollin_Again (Sep 4, 2003)

Just save each file with an incrementing counter variable. See updated code below.


```
Sub SplitFiles()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "From:"

vFirstRecord = True
i = 1

Do While Selection.Find.Execute = True

If vFirstRecord = False Then

Selection.MoveUp Unit:=wdLine, Count:=1
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut
Documents.Add DocumentType:=wdNewBlankDocument
Selection.Paste
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")

ActiveDocument.SaveAs (vPath & vFrom & "_" & i & ".doc")
i = i + 1
ActiveDocument.Close
vFirstRecord = True

Else

vFirstRecord = False

End If

Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

ActiveDocument.SaveAs (vPath & vFrom & "_" & i & ".doc")
Application.Quit

End Sub
```
Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

Great, now they will be named [email protected], [email protected], [email protected], etc.
It will be your problem, Miss HK, to open each and then rename them as you desire, but Rollin (once again) has done most of the work with code. 
I am not sure Rollin, but could you not, given the facts of the layout of the big doc, add a new loop that would select "To:" and call that, say, vTo, and then change the naming to be
ActiveDocument.SaveAs (vPath & vTo & "_" & i & ".doc") ?


----------



## Miss HK (Jan 13, 2009)

This is fantastic thank you soooo much! I will be sure to make a donation!!! I'm so ecstatic, genius, thank you, thank you, thank you!!!

Just one thing  I then use a software to parse the files and but for some reason it doesnt work. For the parsing software to function i need to re-save the doc file into another doc file... any suggestions why that may be?


----------



## slurpee55 (Oct 20, 2004)

What kind of parsing are you trying to do with your software? And what software are you using? We may be able to assist more....


----------



## Rollin_Again (Sep 4, 2003)

> For the parsing software to function i need to re-save the doc file into another doc file.


Can you elaborate on this?

Are you saying you have to reopen the doc and then re-save with a new filename?

Are you opening the doc and copying and pasting the contents into a new doc?

Are you opening a new doc and inserting the old doc via the insert menu?

Regards,
Rollin


----------



## Miss HK (Jan 13, 2009)

Yes I am opening the file in word and re-saving under a different name.

I am testing this resume parsing software: http://go4resume.com/uploaddemo/demo.aspx to extract essential contact info and resumes

I first run the macro which works fine. But when I use the parsing trial from the link above on any of the files that have been split, the parsing results is this:

[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/theme/theme1.xml
word/settings.xml
word/fontTable.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml

So I open the same document go to save as, and save in a different name, as MS Word 97-2003 and when i test the parsing software again, it works fine. I have checked the properties and both the original and renamed files are in MS Word 97-2003 Document type.

I am contacting the software's support, will see what their feedback is.

Cheers!


----------



## Miss HK (Jan 13, 2009)

A thought:

Each file begins with the same as the sender and the recipient (me) are always the same:

To: [email protected]
From: [email protected]

The layout of each email is pretty much the same

As each file that needs to be split is someone's cv, each file contains a line:

 [email protected] <mailto:[email protected]>

the next file to split would also contain:

 [email protected] <mailto: jo[email protected]>

Would it not be possible to save using <mailto:[email protected]>

so that each file gets saved with a different email ID, if the @ had to be changed to _ that wouldn't be a problem, but it would be great if i could differenciate the file by name/email ID rather than have each one saved as [email protected], [email protected] etc

Thanks so much!


----------



## Rollin_Again (Sep 4, 2003)

Here is the updated code to save the documents with the correct filename.


```
Sub SplitFiles()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "From:"

vFirstRecord = True

Do While Selection.Find.Execute = True

If vFirstRecord = False Then

Selection.MoveUp Unit:=wdLine, Count:=1
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut
Documents.Add DocumentType:=wdNewBlankDocument
Selection.Paste

Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.Find.Text = "mailto:"
Selection.Find.Execute
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Replace(Replace(Selection.Text, "mailto:", ""), ">", ""), "@", "_")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Trim(Replace(vFrom, Chr(13), ""))
Selection.HomeKey Unit:=wdStory

ActiveDocument.SaveAs (vPath & vFrom & "_" & ".doc")

ActiveDocument.Close
vFirstRecord = True
Selection.Find.Text = "From:"
Else

vFirstRecord = False

End If

Loop

Selection.HomeKey Unit:=wdStory
Selection.Find.Text = "mailto:"
If Selection.Find.Execute = False Then
MsgBox ("Can't auto-save Last Document.  Please Save Manually")
End
Else
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Replace(Replace(Selection.Text, "mailto:", ""), ">", ""), "@", "_")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Trim(Replace(vFrom, Chr(13), ""))
Selection.HomeKey Unit:=wdStory
End If

Do While Selection.Text <> "F"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

ActiveDocument.SaveAs (vPath & vFrom & "_" & ".doc")
Application.Quit

End Sub
```
Regards,
Rollin


----------



## Miss HK (Jan 13, 2009)

*Hi,

This is great, thank you!!!!! Do you have any suggestions with regards to my parsing issue mentioned earlier?
*
Yes I am opening the file in word and re-saving under a different name.

I am testing this resume parsing software: http://go4resume.com/uploaddemo/demo.aspx to extract essential contact info and resumes

I first run the macro which works fine. But when I use the parsing trial from the link above on any of the files that have been split, the parsing results is this:

[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/theme/theme1.xml
word/settings.xml
word/fontTable.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml

So I open the same document go to save as, and save in a different name, as MS Word 97-2003 and when i test the parsing software again, it works fine. I have checked the properties and both the original and renamed files are in MS Word 97-2003 Document type.


----------



## Rollin_Again (Sep 4, 2003)

I am not sure why the documents aren't getting parsed correctly. When I run the macro on the sample file you emailed me and then try to parse one of the resulting documents it appears to work fine. I am unable to duplicate your results and without having a "real" document to test it's going to be hard to troubleshoot this issue. Have you been able to get any help from the software vendor? What version of MS Word are you using? 

Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

Oh, I tinkered around and I get it. Microsoft has written all 2007 files in compressed XML files, with different subfiles controlling various aspects of the item you see when you open, in this case, a document. If you change the format from docx to zip and open it, you will see those different files that the parser is displaying.
For instance, word/fontTable.xml lists all the available fonts when you open Word.
The file you want to parse is actually
word/document.xml
If you extract this from the zipped file, you should be able to parse it properly.


----------



## Miss HK (Jan 13, 2009)

Hi,

I changed the options in Word 2007 so that it would save everything in 2003 by default and now the parsing in working fine.

Guys, thank you so much for all the help, time and effort you've given me, I am so grateful you have no idea!!! You are really doing a great job and I am really surprised that there are people out there who so generously give without expecting anything back. I will be sure to make a decent donation, how much do people usually give, I don't have a great budget but all your help has been a huge time saver for me.

A very big thank you I will be recommending this site to my buddies!!!


----------



## Rollin_Again (Sep 4, 2003)

Miss HK,

Thanks for the update, we've glad you finally got this sorted. We enjoy helping people and it brings smiles to our faces to know we've made life easier for someone else. While some people enjoy doing crossword puzzles, Slurpee and I and many others on this site enjoy solving technical challenges we are presented with here. There is no standard amount that you should donate. This site is 100% run by volunteers and I'm sure that any donation received would be greatly appreciated. If you are really feeling generous, Slurpee and I would love to go to Hawaii for a week or two. 

Regards,
Rollin


----------



## slurpee55 (Oct 20, 2004)

LOL! I just got back from the Caribbean, Rollin, and haven't a lot of vacation left at this moment! Darn!
And you know what, Miss HK? I have learned far more here than I have - or will ever be able to - give back in advice and help. Hang around and you could become "one of us"!


----------



## Miss HK (Jan 13, 2009)

Ok, let me sort out winning the lottery first and 2 tickets to Hawai will be at the top of my to do list!


----------



## slurpee55 (Oct 20, 2004)

Glad to have helped - good to work with Rollin (at least some) again, too!


----------



## Miss HK (Jan 13, 2009)

Greetings from a girl who like to split files (and not hairs).

I've been playing around but without much success and have been going through allot of the postings.

Same problem, doc files needs splitting, if they could be saved/incrementing number.

Here's a sample the green is the first file that needs to be separated from the second in blue.

I initially thouth that Nationality: could be the word to split from, but there are some files where it doesnt appear.

Internal Auditor with 15 Years experience
</xxxxxxx/xxxxxx.html?xxxx=12345&abc=965412&kind=private&folderID=45486746&section=9&source=0>

Internal audit,assessing internal controls,system review,reporting and
Follow up
CA (Any) , Institute for Clever People

Last Active: long time ago 



Salary per annum
Exp: 15 Years

Received Date: 16 Sep 2007


John Smith

Resume ID: 598856
Mobile: 1234567
Telephone: 891011
[email protected]
Nationality: Martian




CA with 9+ years experience and ERP exposure
</dbwedewhd/rfburwe.html?verq=46848&gver=8486&type=private&folderID=596745&section=8&source=0>

Chartered Accountant (CA), SAP user, ERP Implementation, Budgeting,
Accounts, Finance, MIS, Direct & Indirect Taxation & Commercial functions
CA (Any) , Institute of Numbers123
Languages Known: Many
Last Active: also not 



Succes compnay Limited
Too much salary per annum
Exp: 11 Years

Received Date: 16 Sep 2007


Jane White

Resume ID: 623213
Mobile: 987654
Telephone: 321000
[email protected]
Far Away

Nationality: Unknown

The Document then continues in the same manner, a different person, then another...

(More) help greatly appreciated!


----------



## slurpee55 (Oct 20, 2004)

Do they all have a line such as that beginning
</xxxxxxx/xxxxxx... as the second line?
Perhaps you could find that, move up one or two lines and split the file there?


----------



## Miss HK (Jan 13, 2009)

Hi Slurpee, thanks for getting back to me so quickly.

The second lines all start with </ blahblahblahblah> but the text in between the brackets all change.

I think if the split occurs at Nationality: it should be ok. It's on most of the files, if the word doesn't appear, the error msg will come, and I can manually save and re-run the code. What do u think?

Cheers!


----------



## Miss HK (Jan 13, 2009)

Actually, the </ blahblahblahblah> the words after the brackets start the same and then change fir the next person

ex </ blahblahblahblah*john*>

</ blahblahblahblah*jane*>

</ blahblahblahblah*peter*>

Is it possible with one code to delete the beginning of the file. Basically I have a two pages of text (more or less identical in format and content) that I need to delete at the beginning of each file i plan to run this code on.

I have attached a sample copy of the file

In red is what needs to be removed
In yellow are the places we could split from.
In green is what you suggested could be moved above.
The other pretty colours are just to differentiate the parts of the next that need to be split into a new file.


----------



## slurpee55 (Oct 20, 2004)

Is this an XML file - or originally from an XML file?


----------



## slurpee55 (Oct 20, 2004)

However, if you just inserted gibberish and it ended up looking like XML by chance, (it was all those "</..." at the start of your file that made me think it was XML) then probably Rollin's code could be altered fairly easily to do the job.


----------



## slurpee55 (Oct 20, 2004)

Rollin, I tried this small change on your code

```
Sub SplitFiles()

vPath = ActiveDocument.Path & "\"

Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Text = "Nationality:"

vFirstRecord = True
i = 1

Do While Selection.Find.Execute = True

If vFirstRecord = False Then

Selection.MoveDown Unit:=wdLine, Count:=1
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
Selection.Cut
Documents.Add DocumentType:=wdNewBlankDocument
Selection.Paste
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "N"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")

ActiveDocument.SaveAs (vPath & vFrom & "_" & i & ".doc")
i = i + 1
ActiveDocument.Close
vFirstRecord = True

Else

vFirstRecord = False

End If

Loop

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
Selection.HomeKey Unit:=wdStory

Do While Selection.Text <> "N"
Selection.Delete Unit:=wdCharacter, Count:=1
Loop

ActiveDocument.SaveAs (vPath & vFrom & "_" & i & ".doc")
Application.Quit

End Sub
```
and it takes the first set of data down to the line after Nationality (I converted all the lines in yellow to say Nationality:...) and saves it (although if that beginning stuff in red is not something you want, just delete it) but then it gives me an error code 5152 and says the name is not a valid file name.
I'm just running it on my desktop, and the first file gets saved there....


----------



## slurpee55 (Oct 20, 2004)

Okay, another move onward:
I removed this from the above
vFrom = Replace(Left(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"), Len(Replace(Trim(Mid(Selection.Text, 6)), "@", "_"))), Chr(11), "")
vFrom = Replace(vFrom, Chr(9), "")
vFrom = Replace(vFrom, Chr(13), "")
and altered this line to read
ActiveDocument.SaveAs (vPath & "_" & i & ".doc")
and now I am getting files called _1.doc, _2.doc
Unfortunately, I am only getting files consisting of the 2nd and 4th chunks of info - the first and 3rd just vanish except for the phrase "Nationality: Mndmz" appearing at the the top of the file....


----------



## slurpee55 (Oct 20, 2004)

Note that I have inserted the junk at the beginning - I suspect it needs to be deleted first. :up:


----------



## stuswitzer (Apr 6, 2009)

I used Slurpee's post on page 1 and it worked great, but I want to modify it a bit. I want to run the process where i = 1 to 5 but then grab the rest of the document and put it into a new document regardless of how many sections are left. I can get the 1-5 to run but then I am a little lost on how my code should look.


----------



## turbodante (Dec 19, 2008)

stuswitzer said:


> I used Slurpee's post on page 1 and it worked great, but I want to modify it a bit. I want to run the process where i = 1 to 5 but then grab the rest of the document and put it into a new document regardless of how many sections are left. I can get the 1-5 to run but then I am a little lost on how my code should look.


Looking at Slurpee's code on page 1. Replace the line after


```
Next i
```
 and before

```
End Sub
```
with


```
[COLOR=#4c6573][SIZE=2]ActiveDocument.SaveAs FileName:="test_" & DocNum & ".doc"[/SIZE][/COLOR]
[COLOR=#4c6573][SIZE=2]ActiveDocument.Close[/SIZE][/COLOR]
```
HTH


----------



## stuswitzer (Apr 6, 2009)

that sort of worked, it grabbed *all* of the sections though. What I am looking to do is take sections 1-5 and save them all as individual documents and then take sections 6-x and have them as 1 big document.

My guess is that I need to somehow move to section 6 and make a selection of /EndOfDoc like we do for /section to complete this....

I've mad a few modifications to the code but here is what I am doing

Function BreakOnSection()
'
' BreakOnSection Macro
' Macro created 4/3/2009 by switze_s
'
' Used to set criteria for moving through the document by section.
Application.Browser.Target = wdBrowseSection

'A mailmerge document ends with a section break next page.
'Subtracting one from the section count stop error message.
For i = 1 To 5

'Select and copy the section text to the clipboard
ActiveDocument.Bookmarks("\Section").Range.Copy

'Create a new document to paste text from clipboard.
Application.DisplayAlerts = False
Documents.Add
Selection.Paste

' Removes the break that is copied at the end of the section, if any.
Selection.MoveUp Unit:=wdLine, Count:=1, Extend:=wdExtend
Selection.Delete Unit:=wdCharacter, Count:=1

ChangeFileOpenDirectory "u:\"
DocNum = DocNum + 1
ActiveDocument.Save
ActiveDocument.Close
' Move the selection to the next section in the document
Application.Browser.Next
Next i
ActiveDocument.Save
ActiveDocument.Close
End Function


----------



## slurpee55 (Oct 20, 2004)

stuswitzer said:


> I used Slurpee's post on page 1 and it worked great


Stu, did you use the code from page 1 or page 4?


----------



## stuswitzer (Apr 6, 2009)

I grabbed the stuff from page 1 and have since made a few minor modifications. The code that was posted above includes all of the changes that I've made....I quite honestly didn't make it to page 4 because this code worked sufficiently until a new development popped up on me..


----------



## slurpee55 (Oct 20, 2004)

In the first posting, this "For i = 1 To ((ActiveDocument.Sections.Count) - 1)" set the limit.
To use "For i = 1 To 5" to work, you need to set i=1 at the start and then, just before Next i, set i=i+1,


----------



## stuswitzer (Apr 6, 2009)

The 1-5 part is working though...what I need is to be able to get sections 6 and beyond into one document...The bit that Turbodante gave :

ActiveDocument.Save
ActiveDocument.Close


saved the entire document again, which looking at it now makes sense that it would do that...

I guess where my biggest disconnect is that I don't know how to make it say from section 6 to the end of the document to copy the range and paste it into the new document.


----------



## slurpee55 (Oct 20, 2004)

I am not sure - try this with a test file, but what if you changed:
'Select and copy the section text to the clipboard
ActiveDocument.Bookmarks("\Section").Range.Copy
to
'Select and copy the section text to the clipboard
ActiveDocument.Bookmarks("\Section").Range.Cut


----------



## stuswitzer (Apr 6, 2009)

thought you might have been on to something there but it still gave me the entire document.

Edit: I take that back, hitting the save button after changing the code would have been handy....you're an evil genius and I am truly appreciative....thanks so much!!


----------



## slurpee55 (Oct 20, 2004)

stuswitzer said:


> you're an evil genius


I am truly flattered! 
But really, you worked it out with the help of my search - have fun with your document! :up:


----------

