# Reading strings from unicode file using getline in c++



## dev_tyagi

hi everyone
i am using C++ for reading unicode formated file i have to read strings to a specified deliminator but this funcion is not working poperly with unicode file 
so please help me out to do so 
the code i m writing for reading string is 
fstream fp,fp1;
fp.open("filename",ios::in);
if (!fp.is_open())
{
cout << "couldn't open file" << endl;
exit(3);
}

else
while(!fp.eof())
{
fp.getline(buffer,2500,'¶');
fp1.open("c:\\notfounddata2.txt",ios:ut);
fp1.write(buffer,2500);
cout<<buffer;
}
fp.close();
fp1.close();
but the prob is its not reading the file properly
so what should i do i m new to c++

with regards
dev


----------



## AGCurry

I'm a C (not C++) programmer, but doesn't getline() read only until it encounters a newline character (or CR/LF for DOS)?

Also, I see a problem in that you are opening your output file inside the loop and closing it outside the loop. Open it once and close it once.


----------



## Shadow2531

^^ only with non-member getline().

For member getline(), you can specify how many characters to read and specify the delimeter.

@dev_tyagi
If it was a regular file, you could just use *182* or *0xB6* for the getline delimiter, but not sure about unicode.

For visual c++, I think you can use wfstream, but not sure.


----------



## dev_tyagi

HI 
THANKX FOR UR REPLY 
BUT THAT WAS JUST A SAMPLE CODE SO BY MISTAKE I HAD WRITTEN LIKE THAT ACTUALLY I WAS WRITTEN LIKE THIS BUT ITS NOT READING 
char buffer[300][2500];
fstream fp,fp1;
int i=0;
fp.open("c:\\notfounddata1.txt",ios::in|std::ios::binary);
fp1.open("c:\\notfounddata2.txt",ios:ut|std::ios::binary);
if (!fp.is_open())
{
std::cout<<"couldn't open file" << endl;
exit(3);
}

else
while(!fp.eof())
{
fp.getline(buffer_,2500);
fp1<<buffer<<endl;
std::cout<<buffer<<endl;
i++;
}

fp.close();
fp1.close();_


----------



## Shadow2531

Still not sure about the unicode part and getline, but you can ask here and you'll get your answer.

A couple of tips though.

instead of

fstream instance1, instance2
instance1.open("file.txt", ios::in | ios::binary);
instance2.open("file.txt", ios:ut | ios::binary);

you can do

ifstream instance1("file.txt", ios::binary);
ofstream instance2("file.txt", ios::binary);

Also, you shouldn't use exit(). Destructors for anything won't be called. In main() when you want to exit with an error, use return so the destructors get called.

Here's just a simple text file copier as an example.



Code:


#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main() {
    ifstream in("in.txt");
    if (!in) {
        cout << "\n" << "Error reading in.txt" << endl;
        return 1;
    }
    ofstream out("out.txt");
    if (!out) {
        cout << "\n" << "Error writing to out.txt" << endl;
        return 1;
    }
    for (string s; getline(in,s) ; ) {
        out << s;
        if ( !in.eof() ) {
            out << "\n";
        }
    }
}

In this case, I don't have to .close() the streams as they will be destructed when main() reaches the end of its scope. Now for example, if the stream is open and I need to delete the file, I'd need to .close() it, but instead of using .close(), it's usually better to put the stream operations in a function and let the stream be closed when the function reaches the end of its scope.


----------



## Shadow2531

I did figure this out though.



Code:


#include <iostream>
#include <fstream>

using namespace std;

int main() {
    // Write  ¶ to file in unicode
    ofstream out("out.txt", ios::binary);
    if (!out) {
        return 1;
    }
    out << static_cast<char>(0xFFFFFFFF) 
        << static_cast<char>(0xFFFFFFFE) 
        << static_cast<char>(0xFFFFFFB6) 
        << static_cast<char>(0);
}

When you save in unicode with EditPlus for example , each char is 32bits long or 16bits long depending. So if "&" is the first character in the file, the first four characters you get when you read the file will make up the bits of &.

Are you talking about utf-8, utf-16 or utf-32 specifically?

If utf-8, you'd write ¶ to a file like this.



Code:


out << static_cast<char>(0xFFFFFFC2);
out << static_cast<char>(0xFFFFFFB6);

That'll give you some hints, but I would ask at that link I posted.

(You don't really need to use binary mode though)


----------

