# Batch file help! How to read data from URL



## mrsen99 (Nov 8, 2010)

Hi there!

I was wondering if there is any script to read data when a URL is given as an input and save it in .csv format?

Any help is appreciated!

Thanks!
Sen


----------



## TheOutcaste (Aug 8, 2007)

So you input *www.google.com*, and the output is a csv file containing *www,google,com*?
*Edit*:* Or does the URL point to a Website you want to download data from?

You can read anything you want from a csv file but you have to know the format of the file first.
How many items per line?
Always the same number of items per line?
Which Item do you want to retrieve?
If the number of items varies, is(are) the one(s) you want always in the same position?
Any special characters in the data? (*!"%^&()<>?*).
If there are, a VBScript would be better than a batch file.

***Thanks *Ent*, could have sworn I'd typed that line in the first time around


----------



## Ent (Apr 11, 2009)

Do you want to process the URL itself as OutCaste is talking about, or do you want to download data from the site to which that URL points and process that data as CSV?


----------



## mrsen99 (Nov 8, 2010)

Thanks for replying guyz,
Sorry for not being clear , No its not suppose to process the URL ,but should go to that webpage-site which it points and save the text data that it contains, into a .csv or.txt file.

So is there a way it scans through the webpage and save that into some file.

say i have this url http://webcode.xyz.com/scripts/sen/ intranet site within a company.
what I now do is to go to file menu click save as .csv and then open with notepad

this url opens up a page with text data..

So wondering if this can be automated, it does have special characters like "%..!

any insights?

Hope i am clear! if not let me know pls

Thanks,


----------



## TheOutcaste (Aug 8, 2007)

You mean you save the webpage as a text file, and give it a .csv extension?
Or the URL is to a CSV file that is on a Web Server?

Neither Firefox or IE has a *Save as CSV* option, only Web Page (.html/.htm), Web Archive(.mhtml), or text file (.txt).
Saving as a text file will simply save the source of the page as text, it won't convert it into a CSV file.
You can use wget in a batch file to download a web page or file.
How are you converting the html to csv?
Once converted, you need to know the answers to all of these questions:


TheOutcaste said:


> How many items per line?
> Always the same number of items per line?
> Which Item do you want to retrieve, i.e. waht are you doing with the data?
> If the number of items varies, is(are) the one(s) you want always in the same position?


Another question, could any of the data itself contain a comma, as that wold make that item appear as two separate fields.


----------



## mrsen99 (Nov 8, 2010)

Thnx Outcaste for replying,
What i usually do is to save page as .csv, I give this extension..and save xyz.csv. Then I convert this file to .txt. This flat file acts a source in informatica and goes through ETL process. This process is done for a reason as when we directly save page as .txt they had some problems loading file into target.
So thats how it is
coming to your questions, 
no not always same no of items per line, It has no format. they are all kinds of special characters in the file. With lot of text data
If you don't get the picture i can send a screen shot the page on how it looks, if that helps

Well, So - this info helped you ?

Thanks


----------



## Squashman (Apr 4, 2003)

Sounds like you are just trying to download a text file that is comma separated.
If that is the case just use wget.


----------



## TheOutcaste (Aug 8, 2007)

If there is no format to the data, a batch file or VBScript won't be able to do much. There has to be someway of identifying the data you want to process.

Maybe I'm reading to much into this as Squashman says. Do you just want something that will download and save a webpage as a csv file? That's easy
Download and install wget


```
wget -l 1 -O C:\Test\Test.csv hxxp://webcode.xyz.com/scripts/sen/index.html
```


----------



## mrsen99 (Nov 8, 2010)

Hi Squash man, Outcaste thnx for the reply's
well text file has all special characters not just comma, 
I did install wget,
couple of things i'd like to ask,
Can we use wget in a batch script( i tried but won't work)
What i am trying here is ..to automate the process by just clicking batch file.So that script executes
in series of steps as i mentioned above.
outcaste i tried wget in command prompt too it says server not found! am i missing something?
it worked fine for other sites.
Excuse me if my questions are lame , being a BA this whole thing is like chinese to me..
Thanx for your time and patience,


i tried in command prompt it worked for regular sites, but when i tried my url it says host not found..unable to resolve host address.


----------



## Squashman (Apr 4, 2003)

how about if you post the code for the batch file and the examples you were trying to use so that we can see the syntax you are using. We are not clairvoyant.


----------



## TheOutcaste (Aug 8, 2007)

When you type that url into your web browser, does it open the page? Does the address in the address bar get changed from what you type?

*Wget* should work just fine in a batch file. I'm not familiar with *informatica*, so I don't know if you can give that program a filename to process from the command line.

Sounds like you just want to download the info, save it, and send it to *informatica*, so the content won't matter, as the batch file isn't processing any of the content.


----------



## mrsen99 (Nov 8, 2010)

Squashman ofcourse not , i just tried few simple scripts like following
[ @echo off
cls
rem Sample url
wget http://webcode.xyz.com/scripts/sen/ ] and saving it as .bat, which does nothing

[@echo off
cls
start http://webcode.xyz.com/scripts/sen/ ] this opens webpage
very basic stuff.

@Outcaste precisely, batch script has nothing do with the content, i have logics in informatica to deal with the flat file. All i need is to download data into csv when i click on batch file.

and ya when i type url into browser it opens the page, adress won't change.
from command line it said resolving failed, unable resolve host server.

well, batch script i mentioned, is it outlandish or anywhere near to what i want to accomplish!

Thanks!!


----------



## TheOutcaste (Aug 8, 2007)

mrsen99 said:


> from command line it said resolving failed, unable resolve host server.


I don't know why it's not working for you, it works for me:

```
C:\Temp Dir>wget http://webcode.xyz.com/scripts/sen/
--2010-11-10 03:40:28--  http://webcode.xyz.com/scripts/sen/
Resolving webcode.xyz.com... 208.73.210.29
Connecting to webcode.xyz.com|208.73.210.29|:80... connected.
HTTP request sent, awaiting response... 200 (OK)
Length: 1115 (1.1K) [text/html]
Saving to: `index.html'

100%[==============================================================================>] 1,115       --.-K/s   in 0s

2010-11-10 03:40:28 (10.5 MB/s) - `index.html' saved [1115/1115]


C:\Temp Dir>
```
Try using the *-d* switch to output debugging information. Might try changing the timeouts (*-T*), it may be timing out before it's getting a response from DNS, the debug output may help with that, let you see how long it's actually waiting.


----------



## mrsen99 (Nov 8, 2010)

@Outcaste yes even i get the same thing when i tried what you did, but then when i tried the same with actual url it saying 
Error 404: Not Found. 
[ wget -d -t 45 http://icwebprod.xy.xyz.com/scripts/sen]

When entered in browser it works fine, its a intra website....thats why this problem? xyz is company name..so if i type that ,this problem arises .


> ---request end---
> HTTP request sent, awaiting response...
> ---response begin---
> HTTP/1.1 404 Not Found
> ...


if this won't work any other way to do so?

thnx for the patience,


----------



## TheOutcaste (Aug 8, 2007)

OK, that's a different error message. 404 means it was able to resolve the server, but the Server couldn't find the page.

Try the *--user-agent=AGENT* switch. The server may be set to ignore wget requests.

Are there any non-english letters in the URL? Could be the code page for the command prompt is not set for your region. An accented a may be being sent as just a, so the Wevserver can't find the page.
Check the Get line right after the request begin line, make sure it shows the correct characters:
*GET /scripts/sen HTTP/1.0*
Copy that part and paste it into your text editor, see if it displays the correct characters.

You can check the code page using the *chcp* command. By itself it displays the current code page. Specify a code page and it will change it:

```
C:\Users\TheOutcaste>[B]chcp[/B]
Active code page: [B]437[/B]

C:\Users\TheOutcaste>[B]chcp 850[/B]
Active code page: [B]850[/B]

C:\Users\TheOutcaste>
```
437 is the usual default on English installs, 850, 1250, 1251 may be needed instead.

Does your url specify the actual html page name, or just the domain and path?
You may have to specify it, for example, *www.google.com* won't work but *www.google.com/index.html* will work.

You may have to go through the web server logs to see just exactly what it thinks wget is sending and compare it to what a browser sends. Might have to enable more verbose logging to capture enough info.

Is this by any chance an https site? I'd expect an error about authentication if that were the case though.

As this is an internal site, any reason you can't just access the file directly rather than going through the Web Server? Or is it a page that is created on the fly at the time of the request?


----------



## mrsen99 (Nov 8, 2010)

@Outcaste..i was able to resolve that by placing " " [wget "http:/ URL"]
So this works in command prompt! thanks to you.., but the same won't work when i save that code as .bat, batch file does nothing..

as you mentioned code for converting into .csv, i tried that in command prompt i got this following message,


> C:\Documents and Settings\ssen\My Documents\GnuWin32\bin>wget -l 1 c:\atest.
> csv "http://icwebprod.wv.xyz.com/scripts/sen"
> SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
> syswgetrc = C:\Documents and Settings\ssen\My Documents\GnuWin32/etc/wgetrc
> ...


Without [l 1 C:\ test.csv ] it works just fine..! am i missing something? y won't this work in batch file..is there a procedure to dat?

Thanks alot for your patience.

yea,


----------



## mrsen99 (Nov 8, 2010)

@Outcaste my bad i missed -O in the syntax, thats y that error i guess!..I am ever thank ful to you for the help, this is downloading file and saving it in .csv! thanks 
One qucike question,now this .csv is saved in a directory. so when i open .csv using notepad
i see some html tags ..can i a put some code to convert this .csv into .txt..by eliminating html tags.? 
possible?

Thanks!


----------



## TheOutcaste (Aug 8, 2007)

wget is just saving it with a .csv extension, it's still an html file. Both csv and html are text files. You'd need to use something like Notepad++ or Notetab Light to strip the html tags from the files. Notetab Light can strip the tabs but preserve any URLs so may be a better choice. Depends on what you want to so with the data.

After striping the tags using Notepad++, this is all that is left of the file downloaded from http://icwebprod.wv.xyz.com/scripts/sen:

```
xyz.com

Click here to go to xyz.com .
```
Using Notetab Light and preserving the URL leaves this, which preserves the URL of the frame page:

```
xyz.com <http://icwebprod.wv.xyz.com?epl=2pCw0E7sBYbFRdej6L6kHmxWh79BQuEUyV381ZeyVL8EYw-qBITcBMQYy2pDMjX48TuZkyqDSpgNQ83upcQV4BYmOjMpOYMnjfAaTfKrBBFzGFLNz31C2jQaIHqiyVMPRWg0gLSB9GhKj54iagAgcNynvwAA4H8BAABAgNsHAACb1eFvWVMmWUExNmhaQmcAAADw> Click here to go to xyz.com .
```
Or you can pass the file to a VBScript that can strip the tags. Some examples here:
http://www.4guysfromrolla.com/webtech/042501-1.shtml
http://authors.aspalliance.com/brettb/VBScriptRegularExpressions.asp


----------



## mrsen99 (Nov 8, 2010)

@Outcaste..! Thank you very much for the help! I was able to solve it with your help!
Once again thanks a lot buddy!


----------



## TheOutcaste (Aug 8, 2007)

You're Welcome!

If your issue has been resolved you (and ONLY you) can mark this thread Solved by using the Mark Solved button at the Top Left of this thread (above the first post) 










Jerry


----------

