# Replacing HTML in Excel cells with formatted text



## JNTM (May 2, 2010)

I have as Excel 2007 spreadsheet exported from another piece of software that includes a column of 1000 or so short reports, one report per Excel cell. The reports were originally formatted documents (headings, bullet points, etc. - even one or two small graphics), but the exporting process has converted them into HTML. 

I want to retain the structure of the spreadsheet (really just a three column data-base of text items) but convert the HTML back into readable formatted text with the original headings, etc. Since the spreadsheet export happens repeatedly, I'm ideally looking for a simple process that could be done as a regular routine.

Suggestions gratefully received!


----------



## Keebellah (Mar 27, 2008)

welcome to the forum,
To get an idea, could you post an Excel sheet with let's say one or two rows (non sensitve data) so that we can picture it?

Maybe its simple but it's easier to 'talk' about if you can see it.
Thanks.


----------



## JNTM (May 2, 2010)

Thanks for your interest. 

The attached demo is one row of the data-base, and contains an artificial 'sampler' report that uses many more style options that most students would use. It also includes one graphics link (with URL disabled).

It is a kind of 'lab-book' for students, with each row giving the name of the activity, the student's (usually quite brief) report, and the ID of the student. Since you might have, say, a dozen activities and a hundred students, the data-base could have 1-2000 reports.

The students enter their reports via a standard text-entry window that allows quite a range of style elements. The system holds the reports internally in HTML, and so its Export facility (which generates the data-base I am concerned with) provides the reports in the HTML format shown in my example.

The people who have to mark the reports have to use the data-base, so offering them raw HTML or even cleaned up plain text isn't likely to be appreciated - nor would it be fair to the student who has taken the trouble to lay out their report neatly. Hence the need for a way to get the database to present its content as properly formatted reports.

While there may well be proprietary products that could do this for me, I'd prefer to stick with something like Excel that is universally available.

I know that a single Excel cell is capable of holding (and displaying) a suitably formatted version of one of the HTML reports, because I've manually created a formatted report in a single cell. But so far I've failed to find a way to do this automatically (and my skills don't yet run to macro-writing!). 

I've tried copying the Excel cells into another package, formatting them, and pasting them back into Excel, but I get caught in a Catch 22: If I paste formatted text into Excel, it puts each line in a separate cell. If I try to temporarily create a single paragraph, so that it goes into a single cell, with the intention of then restoring the paragraphs, I lose the formatting!

Help gratefully received!


----------



## Keebellah (Mar 27, 2008)

Okay I have it, I'm not that good with 2007 or higher but will see if I understand it and can help.
I can't promise success yet.


----------



## Keebellah (Mar 27, 2008)

Okay, I hope this does what you want.
I added a module to your sheet, the modules name and the function's name is HTML2Txt
If you look in the code I think you'll get the idea.
It searches and replaces html code that you have to update if run into codes that aren't present.
I vertiaclly alligned the cells to top and you will have to format the cells to wrap text etc etc for you final version but that will be simple.

Pass the cell value to the function and itt will return plain text
The HTML text is in cell A2 then this is the formaula for let's say B2:

Syntax: =HTML2Txt(Range("A2")


----------



## JNTM (May 2, 2010)

Many thanks, Keebellah! Looks most interesting - I'll have a play. I'm most impressed by the speed of response!


----------



## JNTM (May 2, 2010)

I've had a chance to look at your suggestion more closely. It has done a great job of clearing the tags to leave plain text (thank you!).

However, what I was really hoping for was something with the original styling. See attached thumbnail, which was created by copying the report in the demo I sent you into Notepad, getting rid of duplicate quote marks, changing the file extension from '.txt' to '.htm' and then viewing in a browser (my attachment to this message is a screen dump of that).

I can certainly view the fully styled reports using the Notepad/Browser route, but I'm not sure how well that will scale up to 1000 reports or more, and it loses the manipulability of a spreadsheet.

So in an ideal world the styled copy would still be inside an Excel cell - just as in your reply, but with styled text rather than plain text.

Any suggestions?


----------



## Keebellah (Mar 27, 2008)

This last one I'm not so sure.
I think the vba code could carry out something like that but then there will have be quite some programming to be done.
You would have to go through the whole text and then based upon the formatting code * xxxxxxx * you would in this case select that part of the text en set the attribute to BOLD.
The same with all the other codes.

It will be a search and replace set the attributes and then remove teh * and the *
_ Italics _

 underline
These are some of the clues and ideas but I don't know if I can put it together for you.
I suggest Google and something like enterig HTML formatted text in cells ?
If I find something I'll letb you know. I'll see if I can do something for the bold part and if that works you will have to add the other coding yourslef. Okay? I'll try


----------



## slurpee55 (Oct 20, 2004)

Hans, VBA help has this - I don't know if you could use it:
WebPreFormattedTextToColumns Property
See AlsoApplies ToExampleSpecificsReturns or sets whether data contained within HTML tags in the Web page is parsed into columns when you import the page into a query table. The default is True. Read/write Boolean.

Remarks
This property is used only when the QueryType property of the query table is xlWebQuery and the query returns a HTML document.

Example
This example adds a new Web query table to the first worksheet in the first workbook. Note that the example doesn't parse into columns any data located between the HTML tags.

Set shFirstQtr = Workbooks(1).Worksheets(1) 
Set qtQtrResults = shFirstQtr.QueryTables _
.Add(Connection := "URL;http://datasvr/98q1/19980331.htm", _
Destination := shFirstQtr.Cells(1,1))
With qtQtrResults
.WebFormatting = xlNone
.WebPreFormattedTextToColumns = False
.Refresh
End With


----------



## Keebellah (Mar 27, 2008)

Hi Slurpee,
I must confess that I didn't take the time for that.
I use the function I posted because I have a helpdesk support programma that exports RTF coded text and that needs to 'undone' of all the formatting, so that's why I used it in this case.
It's true that HTML is more structured than RTF.
I'll see if I can do something with it.

Thanks, I'll let you know too.:up:


----------



## Keebellah (Mar 27, 2008)

Since I don't have the source documents I think you should test what Sulpee has suggested and see it it does what you expect.
Reading the code I expect it does.
Let us know.


----------



## JNTM (May 2, 2010)

Hi, Keebellah and Slurpee55

Many thanks for your obviously erudite comments on this. I'm afraid I'm well out of my depth here, but I'll see if one of my more tech literate colleagues can interpret it for me!


----------



## Keebellah (Mar 27, 2008)

if you attach one of the documents I'll give it a swing, no promises though.
Okay?


----------



## JNTM (May 2, 2010)

Thank you! Your offer to have a go is much appreciated! 

I attach a version of my earlier demo now with five rows rather than just one - the real thing could have 1000 rows or more, but I hope that 5 gives you enough to work with. 

The five rows are essentially duplicates, but with small changes to each cell so that they are uniquely identifiable. 

I've also added a graphic (with apologies to Wikipedia, but it seemed a safe and neutral source to link to).

Many thanks! JNTM


----------



## Keebellah (Mar 27, 2008)

Well, It's a little out of my league but I'm going to give it some thought.
You want formatted text and the image? You want to see it in the cell?
Like I said, itá a challenge and i'll take it step by step.
Like I said in one of the other replies in another post, 'miracles take a little longer'.


----------



## JNTM (May 2, 2010)

Thanks - 'No promises' fully understood! 

Cheers, JNTM


----------



## Keebellah (Mar 27, 2008)

I can't make it work with what I have found.
To program every cell will be a huge chunck of code and then do somethinmg with the attached image even more.
I have to admit it's quite a job.
The way will be slow since I do this in my own time, I'll keep you posted as I find things


----------



## JNTM (May 2, 2010)

Hi, Keebellah

I'm very grateful to you for giving this so much time (and I feel rather guilty about landing it on you!). I assumed originally that my difficulty was simply a reflection of my ignorance of Excel refinements, and one of you would be able to say: 'Oh, you just have to do xxx.' 

But this is clearly a much tougher problem than that!!!

So if you are enjoying the challenge, I'd be delighted to see where you get to (and very happy to wait!). But if it is being a real pain please don't feel you have to stick it out to the bitter end! Life is too short!

Though the solution you are working would be very much preferred, I do have a crude alternative that can get me by 'on the ground' (essentially reorganising the Excel in some simple ways, doing a plain text copy-and-paste into Notepad, getting rid of some surplus quote marks that this seems to create, saving it as a TXT file, relabelling it as an HTM file, and viewing in my browser). 

Cheers

JNTM


----------



## Keebellah (Mar 27, 2008)

Well you could use my parat of the code to extract the part that has to go to the notepad., once you have the contents (in te cell) you could also reexport this cell to a file with htm as extension, but then you would have to rebuild the code.
I'll keep this in my backhead as an alternative.
Like I said, a challenge is fun


----------



## JNTM (May 2, 2010)

> Like I said, a challenge is fun


Great! I'll look foreward to whatever emerges (or not!)!


----------

