CS 4 Lecture Supplement

I/O Streams


Check out basic_ios, ios_base, smanip in the C++ Standard Library Documentation


Overview of the C++ I/O Class Library

The C++ Standard Library includes utilities that define a suite of classes for performing I/O.

These include classes for reading and writing:

The basic abstraction is a stream, a sequence of bytes.

Buffering is used to increase the efficiency of I/O.

Writing:

Reading:

In the C++ library, these buffers are built from a class called streambuf.

Inheritance is used:

ios_base: defines mode and control features like field width, base, precision

Other classes inherit from these to add functionality needed for reading and writing files (basic_*fstream) or strings (basic_*stringstream) (in-memory buffers).

Other typdefes are used for the common case of 8-bit characters:

Functionality of istream

Input from an istream object is called extraction, because you are pulling characters out of the stream and into your program.

For example, cin is an istream object.

istream objects keep track of the stream error state and several values related to formatting.

The error state specifies whether extractions can be done on the stream, or whether an error has occurred.

Errors include reaching end of file, formatting errors (e.g. getting non-digits when digits were expected), and serious errors (e.g. no more room, file not found).

Two types of extractions are supported: formatted and unformatted.

Formatted extractions

The extraction operator

The >> operator is used for formatted extractions.

Example: cin >> x;

With overloaded operators, the compiler automatically selects the proper operator based on the type of the value being read.

In the example above, if x is an integer, then the integer extraction operator is used.

This makes extractions type safe—it is impossible to accidentally extract an integer value and store it in, say, a character array.

There are overloaded >> operators for the built-in C++ types, and you can define your own >> operators for classes you have written.

Steps in an extraction

There are four steps in a formatted extraction.

  1. The stream's error state is checked. If it is nonzero (indicating that end of file was reached or some error occurred), the remaining steps are skipped, the error state is set to indicate the failure, but the stream is otherwise left unchanged.
  2. If the extraction is from cin, then the cout stream is flushed.
    NOTE: You can arrange for your own streams to be tied together in this manner; see the on-line documentation.
  3. Leading whitespace is skipped.
  4. Characters are extracted as needed to obtain the desired value. Whitespace terminates the extraction.

The error state in more detail

Three bits are used to record the state of the stream.

The functions below are used to test the stream state.

Function Returns true if
istr.good() None of the error bits are set
istr.eof() eofbit is set
istr.fail() failbit or badbit is set
istr.bad() badbit is set

When the expression

cin >> i;

is evaluated, i is modified as a side effect; the result of the expression itself is the stream cin.

This fact allows extractions to be chained, like this:

cin >> a >> b >> c;

How does EOF work?

EOF is not set when you've read the last character from the stream; it is set when you try to read one character past the end.

If EOF is encountered while skipping leading white space, failbit is set.

Checking the result of an extraction

Consider this input sequence (where SP is a space, and NL is a newline):

" SP SP 1 2 3 NL "

An extraction

cin >> i;

works like this:

If another extraction occurs now:

Now consider this input sequence:

" SP SP 1 2 3"

An extraction

cin >> i;

works like this:

This means that if the stream's state is not fail, even if eof is set, the extraction succeeded!

It is incorrect to check for end of file after reading—you may miss the last value.

Instead, check for failure, as shown below:

cin >> i;
while( !cin.fail() ){
	// process the value in i
	cin >> i;
}

or, using shortcuts,

while( cin >> i ){
	// process the value in i
}

If end of file is encountered while skipping leading white space, failbit is set, terminating the loop.

If end of file is encountered after extracting the value (i.e. there is no trailing whitespace), eofbit is set, but this does not terminate the loop so the last value is correctly processed.

The next extraction will set failbit while trying to skip leading whitespace, and this will terminate the loop.

KEY POINT:

Note also that you must test the stream after doing the extraction, not before. If the stream is ok, that's no guarantee that there's another value.

The following loop is incorrect. Can you see why?

while( cin ){
	cin >> i;
	// process the value in i
}

It is incorrect because the last value read in is processed twice.

Extractions of basic types

Extraction operators are overloaded for each of the following types:

type

what is extracted

char a single character
char* a sequence of characters, terminated by whitespace
short
int
long
characters that get converted to an integer value
float
double
characters that get converted to a floating point value

Format state manipulators

The stream's format state doesn't affect extractions very much. However, there are a few things you should be aware of.

By default, integral values are converted to binary in the same way that integral constants in C++ programs are interpreted:

  1. If the value begins with 0, it is assumed to be an octal number.
  2. If the value begins with 0x or 0X, it is assumed to be a hexadecimal number.
  3. Otherwise, the value is assumed to be a decimal number.

This can lead to unexpected results. For example, consider the code below, which reads dates in the form mm/dd/yyyy:

int month, day, year;
char slash;
cin >> month >> slash >> day >> slash >> year;

If the month or day is 01 through 07, it works fine (octal 1-7 is equal to decimal 1-7), but if either is 08 or 09, the conversion fails because 8 and 9 are not valid octal digits.2

To force conversion in a specific base, you can set the appropriate format flags for the stream, or use the following manipulators:

Manipulator Sets ios_base flag And forces subsequent conversions to be done in
dec ios::dec decimal
oct ios::oct octal
hex ios::hex hexadecimal

The corrected code for the example above is:

cin >> dec >> month >> slash >> day >> slash >> year;

Unformatted extractions

Three major differences between formatted and unformatted extractions:

  1. No conversion is done in unformatted extractions: bytes are copied, that is all.
  2. Leading white space is not skipped.
  3. Calls are made to member functions rather than overloaded operators.

The following functions are provided. With the exception of the first get(), they all return the stream itself, and so they can also be tested as booleans.

Read input one character at a time:

ch = cin.get();
while( ch != EOF ){
	// process the character
	ch = cin.get();
}

Read input one character at a time:

while( cin.get( ch ) ){
	// process the character
}

Read characters up to (but not including) a particular delimiter (',' in this example):

const int BUFLEN( ... );
char buffer[ BUFLEN ];
cin.get( buffer, BUFLEN, ',' );

Characters are read until a comma is found, or until (BUFLEN - 1) have been stored in the buffer. The buffer is then null terminated.

Read input a line at a time:

while( cin.getline( buffer, BUFLEN ) ){
	// process the line in buffer
}

This is similar to cin.get( buffer, BUFLEN, '\n' ); except that in getline the delimiter is extracted from the stream and discarded. Note that the delimiter argument defaults to a newline in both cases.

Read binary data from a stream:

const int N_VALUES( ... );
SomeObj values[ N_VALUES ];
cin.read( (char*)values, N_VALUES*sizeof(SomeObj) );

Note that we're not reading characters and converting them to binary, we're reading the binary bytes themselves, as written earlier in binary, perhaps by another program. This is a very crude form of serialization; it does not allow you to define how exactly to read the object.

Functionality of Ostream

Output to an ostream is called an insertion, because you are inserting characters into the stream.

For example, cout is an ostream object.

ostream objects have state information too.

The error state is not nearly as useful, as there is less that can go wrong on output than on input.

The format state is much more useful, as it controls the details of the conversions for formatted insertions.

Formatted insertions

The insertion operator

The << operator is used for formatted insertions.

Example:

cout << x;

Steps in an insertion

The following steps are performed in a formatted insertion:

  1. The stream's error state is checked. If failbit or badbit are set, the remaining steps are skipped and the stream is left unchanged.
  2. The value is converted to the appropriate characters, possibly padded, and then inserted into the stream.

Format state

The format state consists of a number of flags and three additional values: the fill character, precision, and width.

By default,

Manipulator

Purpose

cout << dec; Equivalent to cout.setf( ios::dec );
cout << oct; Equivalent to cout.setf( ios::oct );
cout << hex; Equivalent to cout.setf( ios::hex );
cout << flush; Flushes the output buffer
cout << endl; Ends a line by inserting a newline and flushing the output buffer

If you #include the file <iomanip>, you get the following additional manipulators:

Manipulator

Purpose

cout << setw(w); Sets the width to w
cout << setfill(c); Sets the fill character to c
cout << setprecision(p); Sets the precision to p
cout << setiosflags(f); Equivalent to cout.setf( f );
cout << resetiosflags(f); Equivalent to cout.unsetf( f );

The precise meanings of these flags depends on the type of value being converted.

Note that any insertion that uses the width sets it back to zero. Other than this, insertions do not change the format state, so if you've set a flag, you must unset it.

4.2. Unformatted insertions

As with input, unformatted insertions simply transfer bytes; no conversion is done. Also as with input, standard member functions are called rather than overloaded operators.

Two functions are provided, as shown below.

The fstream classes

The <fstream> collection of classes adds functionality to access files. There are three classes of interest:

5.1. Constructors, opening and closing files

The constructors

each create an unopened stream of the specified type. The open function may be called later to associate the stream with a file.

The constructors

create a stream of the appropriate type and then call open (see below) with the given parameters to open the file. ios::badbit is set if the open failed.

The function

open( const char *name, int mode, int perm = 0644 );

attempts to open the given file and attach it to the stream.

Note that the constructors give appropriate default values for the mode; if you call open explicitly be sure to give a mode that matches the type of the stream.

The function

close();

closes the file associated with a stream. The stream can be reopened with another file after this.

Using fstreams

The fstream classes inherit from istream and ostream, so you use the same techniques

to read and write fstreams as you use on other istreams and ostreams.

Random access I/O can be achieved with the following functions. Note that these are actually implemented in istream and ostream, and can be used on any stream that is associated with a seekable device (e.g. a file, not a keyboard).

istream &istream::seekg( streamoff offset, ios::seek_dir where );

ostream &ostream::seekp( streamoff offset, ios::seek_dir where );

The String Stream classes

The strstream collection of classes adds functionality to do “input” and “output” from/to arrays of characters in memory.

Note that no actual I/O is done; it is like the character array is the buffer, but there is nowhere else for the characters to go. However, the conversions performed are exactly the same as those we have already covered.

This provides some interesting new ways to process input and output.

There are three classes of interest:

Using strstreams

One reason to use an ostrstream is to convert a value from its internal form (binary) to its external form (characters) without actually doing any output.

One reason to use an istrstream is to do conversions on data that have already been read in.

Example: consider a program whose input is supposed to contain four values on each line,

like this:

1 2 3 4
5 6 7 8
9 10 11 12

Consider the obvious approach:

cin >> a >> b >> c >> d;

The extractions will skip white space automatically. If the input is erroneous, for example:

1 2 3 4
5 6 7
9 10 11 12

missing a value

you won't be able to detect it, and all the output from the erroneous line to the end of the file will be incorrect.

The alternative is to read each line into a character array, and then use an istrstream to convert the values. If we run out of values in the array, the last extraction will fail rather than automatically going to the next line.

Writing >> and << operators for your classes

You can write functions for your classes to do extractions and insertions on those objects.

To do this, you overload the >> and << operators.

First, remember how operator functions work. Consider a class Point that contains two

data members, x and y, and a Point object called center:

The compiler will try to translate

cout << center;

to a call a member function, like this:

cout.operator<<( center );

However, this won't work because the author of the ostream class could not have predicted that you would write a class called Point, and therefore could not have provided the appropriate member function for you.

Therefore, the compiler translates the code to use a non-member function like this:

operator<<( cout, center );

Note that this cannot be a member function in class Point because the first argument is not the Point object.

Writing an insertor

Here is how you write this function.

  1. The first argument should be a reference to an ostream. The second argument should be a constant reference to a Point.
  2. The function should return a reference to an ostream so that insertions can be chained like this:

    cout << "Center=" << center << ", radius=" << radius << endl;
  3. The body of the function should perform whatever output is appropriate for a Point, but nothing more! Don't do any additional formatting or labelling unless it is appropriate for all Points that you ever want to print.
  4. If you need to access the data members of the Point directly (e.g. you don't have accessor functions to all of them), then the Point class must declare this function to be a friend, like so

    friend ostream &operator<<( ostream &out, const Point &p );

Here is an example of an insertor. What does its output look like?

ostream &operator<<( ostream &out, const Point &p ){
	out << '(' << p.x << ',' << p.y << ')';
	return out;
}

Writing an extractor

Extractors are more interesting because of the possibility of errors in the input.

The same basic steps listed above are followed, except that the second argument to the function cannot be const.

The challenge comes in error handling. What do you do if one of the elementary reads fails part way through the execution of your extractor?

C++ inserters and extractors could actually be designed to work like serializable objects. However, in the former case (output), this would contradict the idea that it should work more like Java's toString(), i.e., the output should be informative to a user. A human reader could probably not look at a file containing serialized Java objects and comprehend it without a great deal of tedious work.

For additional information on this topic, see section 21.3 in the Stroustrup text.


Examples