Java Streams
Like all programming languages, Java has a way for programs to make use
of the data streams provided by the operating system. Java's streams are
defined in the java.io package.
Here are references to several online book chapters that review using Java streams, mostly for file I/O.
- Chapter 11: Files and Streams (PDF) from Java Java Java
- Chapter 2, Input and Output from Core Java, Volume II--Advanced Features, 11th Edition
- Section 11.1, I/O Streams (PDF) (Source code)
- Chapter 7, Input/Output from Java Language Features
- Chapter 11, Saving and Loading Information (code)from Carleton University
- Chapter 6, Streams and File/Device I/O (code) from Carleton University
- I/O Streams from the Java Tutorials
All the example code mentioned this document is in the sub folder called "java_streams" from the following zip file.
I/O streams vs. Functional streams
The Java language now has two types of objects that are called
"streams". The two types are very different from each other with entirely
different uses. First, there's the traditional I/O streams that we discuss
in this document. Second, starting in Java 8, Java defined a Stream class
that implements the Stream abstract data type (ADT) from functional
programming languages. The new Stream class is not for doing I/O. The new
Stream class provides a modern way to process data structures from the Java
Collections Framework.
Here are links to the documentation for the package of "functional streams" and the package of "I/O streams".
Here are the basic "stream" classes. You can see that the
java.util.stream.Stream class is nothing like the java.io.InputStream
or java.io.OutputStream classes.
The Stream abstract data type is becoming an important part of modern programming languages. It plays a big part in modern Java.
- https://en.wikipedia.org/wiki/Stream_(abstract_data_type)
- https://dev.java/learn/api/streams/
- https://www.baeldung.com/java-8-streams
- https://link.springer.com/content/pdf/10.1007/978-1-4842-7080-6_8
- https://link.springer.com/content/pdf/10.1007/978-1-4842-7135-3_6
Basic I/O streams
We have seen that all the data that enters a Java process must go through
an InputStream object and all the data that leaves a Java process must
go through an OutputStream object. Both InputStream and OutputStream
objects can only work with bytes of data. No other data type can be read
or written by these objects.
We have seen how we can read or write primitive data types over streams
by making repeated calls to the read() or write() method and by using
a ByteBuffer object to convert primitive data type values into
byte arrays, and back.
In theory, we can do all the I/O a Java program needs using only
InputerStream and OutputStream objects. But our code would be
tedious to write and difficult to maintain. The designers of the Java
language found an elegant way to solve this problem.
We mentioned earlier that it is not a good design to make InputStream and
OutputStream more complicated by giving them a bunch of methods for reading
and writing other data types. How many data types should be coded directly into
these two classes? What happens when a new data type is developed? How would it
be integrated into these two stream classes?
Java solves this problem by designing a layered system of stream classes. Every stream class is given a single responsibility, or task. We solve complex I/O problems by "wrapping" stream objects around other stream objects, creating a kind of chain of stream objects, with each stream object doing one specific I/O task for us. The whole chain of stream object solves our I/O problem. This layered system is very flexible and adaptive. It is straight forward to create a stream class that implements some new I/O responsibility and then combine the new kind of stream object with other stream objects.
Let us look at a simple example. Recall that in the "data_formats" folder
there's a pair of programs called "IntClient.java" and "IntServer.java".
The "IntServer.java" program writes, to its standard output, the four byte
values that make up an int value. The "IntClient.java" program reads four
byte values form its standard input and interprets those bytes as an int
value. Each program in this pair uses a ByteBuffer object to translate
between byte and int values.
Here is the code from "IntClient.java" that reads four bytes and uses
a ByteBuffer to interpret those bytes as an int value.
final int b0 = System.in.read();
final int b1 = System.in.read();
final int b2 = System.in.read();
final int b3 = System.in.read();
final byte[] bytes = {(byte)b0,
(byte)b1,
(byte)b2,
(byte)b3};
final int n = ByteBuffer.wrap(bytes)
.getInt();
Java has a stream class called DataInputStream that can do the byte
translation that ByteBuffer does.
Here is the code that uses a DataInputStream to read four bytes and
interpret those bytes as an int value (this code is in the
"IntClient.java" program in the "java_streams" folder").
final DataInputStream in = new DataInputStream( System.in );
final int n = in.readInt();
Notice how we longer call read() directly on System.in. Instead we "wrap"
a DataInputStream object around the System.in object. Then we call the
readInt() method on the DataInputStream object. That object, which has a
reference to the System.in object, calls the read() method in System.in
four times to get four bytes of data, puts the four bytes together as a single
int and then returns the int value to us.
This code uses two kinds of stream objects, each with its own responsibility.
The System.in object has the responsibility of actually getting byte data
from the source of bytes. The DataInputStream object has the responsibility
of grouping those bytes into our desired data type. The resulting code is easy
to read and maintain.
We can draw a picture that helps us understand this "layered" stream design. Here is our usual picture of a Java process with its three standard streams.
IntClient
+-------------------------+
| |
>---->> System.in System.out >>---->
| |
| System.err >>---->
| |
+-------------------------+
The next picture visualizes the idea that a DataInputStream object is
"wrapped" around the System.in object.
IntClient
+---------------------------------------------+
| +-----------------+ |
| | DataInputStream | |
+-----+-------+ | |
>---->> System.in >> >> in System.out >>---->
+-----+-------+ | |
| | | |
| +-----------------+ System.err >>---->
| |
+---------------------------------------------+
The System.in object gets data from outside the process. The DataInputStream
object gets data from the System.in object it is "wrapped" around.
We do not in fact "wrap" one object around another object. What we really do is give the "outer" object a reference to the "inner" object. When we make this constructor call,
final DataInputStream in = new DataInputStream( System.in );
we are passing a reference to the System.in object to the DataInputStream
object that we are constructing. The DataInputStream object stores that
reverence value in some instance field of the object. When we call readInt()
on the DataInputStream object, it uses its reference to the System.in
object to call the read() method in System.in.
Every time we call in.readInt() on the DataInputStream object, readInt()
calls the read() method on its System.in object four times to read four
bytes. Then readInt() converts those four bytes into an int value and
returns that int to us.
This picture leads to another concept. The "stream" metaphor implies that data
bytes "flow" like a stream of water. We can even us a "pipe" to carry this flow.
But this metaphor can fool you. Data bytes do not "flow" into the System.in
stream from their source. Rather, data bytes are "pulled" into System.in from
their source. The data bytes come from some source (maybe the keyboard, maybe
a file). We must call the read() method in System.in in order for it to
pull a data byte from the source. If we never call read(), then no data
bytes will ever leave the source.
When we call readInt() on the in object, we are "pulling" an int value
out of the DataInputStream stream. It needs to "pull" four bytes out of its
System.in stream. System.in pulls those bytes out of whatever is its data
source.
Let's look at an example that works in the other direction and converts
an int value into four byte values.
Here is the code from the "IntServer.java" program in the "data_formats"
folder. It reads an integer from its standard input as a string of digits,
uses a ByteBuffer to convert the int value into four byte values,
and then writes the four bytes to its standard output.
final int n = new java.util.Scanner(System.in).nextInt();
byte[] bytes = ByteBuffer.allocate(Integer.BYTES)
.putInt(n)
.array();
System.out.write(bytes[0]);
System.out.write(bytes[1]);
System.out.write(bytes[2]);
System.out.write(bytes[3]);
System.out.flush();
Java has a DataOutputStream class that lets us write primitive values as
sequences of int values. Let us replace the ByteBuffer in the previous
code with a DataOutputStream.
final Scanner in = new Scanner(System.in);
final DataOutputStream out = new DataOutputStream(System.out);
final int i = in.nextInt();
out.writeInt(i);
out.flush();
Notice how we longer call write() directly on System.out. Instead we
"wrap" a DataOutputStream object around the System.out object. Then we
call the writeInt() method on the DataOutputStream object. That object
takes our int value, breaks it up into its four byte values, then, using
its reference to the System.out object, calls the write() method in
System.out four times to put the four bytes into their destination.
We are again using two kinds of stream objects, each with its own responsibility.
The DataOutputStream object has the responsibility of destructuring the
given data type into the appropriate number of bytes. The System.out object
has the responsibility of actually putting byte data into its destination.
Notice that we are also using two other stream objects, the System.in
object and a Scanner object. The Scanner object is "wrapped" around the
System.in object. As before, the responsibility of System.in is to pull
bytes out of the data source. The responsibility of Scanner is to build
primitive values out of their representations as a string of digits. When we
call in.nextInt() on the Scanner object, it pulls enough bytes out of the
System.in object to build up an integer string, and then parse it to an int
value. Notice that the Scanner object does not know how many bytes it might
need. The number "423123456" needs a lot more bytes than the number "42".
The next picture visualizes the idea that a DataOutputStream object is
"wrapped" around the System.out object, and a Scanner object is
"wrapped" around the System.in object.
IntServer
+---------------------------------------------------------------+
| +----------------+ +------------------+ |
| | Scanner | | DataOutputStream | |
+-----+-------+ | | +--------+-----+
>---->> System.in >> >> in out >> >> System.out >>---->
+-----+-------+ | | +--------+-----+
| | | | | |
| +----------------+ +------------------+ |
| |
| System.err >>---->
| |
+---------------------------------------------------------------+
When we call out.writeInt(), the DataOutputStream object takes our int
value and reduces it to four byte values. Then it uses its reference to the
System.out object to call the write() method four times. Each time it
calls write(), one byte value is put into the data destination.
Let's look carefully at how System.in is being used in the last two programs,
"IntClient.java" and "IntServer.jara".
The "IntClient.java" program wraps a DataInputStream around System.in.
final DataInputStream in = new DataInputStream( System.in );
The "IntServer.java" program wraps a Scanner around System.in.
final Scanner in = new Scanner( System.in );
In both cases we use the name in to refer to the outer stream object.
In "IntClient.java" we call the in.readInt() method from the
DataInputStream class.
In "IntServer.java" we call the in.nextInt() method from the
Scanner class.
These two methods, readInt() and nextInt(), have similar names, and
somewhat similar responsibilities, but they behave in very different ways.
The readInt() method from DataInputStream reads exactly four bytes from
its inner (contained) input stream and bundles those four bytes into an int
value and then returns that int. Since any four byte values make up a
valid int value, the readInt() method cannot fail unless its inner
stream cannot produce four int values. In that case, readInt() throws
an EOFException.
The nextInt() method from the Scanner class reads bytes from its inner
stream until it has enough bytes to build a String token. Then it tries to
parse that token into an int value. If the parsing succeeds, the resulting
int value is returned. If the parsing fails, then thenextInt() method
throws an InputMismatchException. This can happen for several reasons. The
token may not represent an integer number or the token may represent an
integer number that is too big to fit in an int value. If the inner stream
runs out of bytes while nextInt() is building a token, then nextInt()
just uses what it has so far as the token and tries to parse it. If the inner
stream has no data bytes, so nextInt() cannot build any String token,
then nextInt() throws NoSuchElementException. Notice that nextInt()
never denotes end-of-file. That is part of the responsibility of the
hasNextInt() method, which should be called before calling nextInt().
Let's look at one more example. In the "java_streams" folder there is the
"IntServerN.java" program. It reads up to "N" integers (as strings of digits)
from its standard input stream and writes their equivalent byte values to
its standard output stream. If the value of "N" is very large, then this
program will write a lot of data to standard output. We can make the program
run faster by giving it a large output buffer.
final var in = new Scanner(System.in);
final var out = new DataOutputStream(
new BufferedOutputStream(
System.out, 4096));
for (int i = 0; i < n; ++i)
{
final int i = in.nextInt();
out.writeInt(i);
}
out.close();
Here is a picture illustrating this composition of three output streams.
IntServerN
+------------------------------------------------------------------+
| +----------------------+ |
| | DataOutputStream | |
| +--------------+ | +----------------+-----+ |
| | Scanner | | | BufferedOutputStream | |
+-----+-------+ | | | +---+ +--------+---+
>---->> System.in >> >> in out >> >> |buf| >> System.out >>---->
+-----+-------+ | | | | | +--------+---+
| | | | + +---+ | |
| +--------------+ | +----------------+-----+ |
| | | |
| +----------------------+ |
| |
| System.err >>---->
| |
+------------------------------------------------------------------+
When we call out.writeInt(i) on the DataOutputStream object, the writeInt()
method takes the int value and breaks it up into four byte values. Then it
calls the write() method four times on the BufferedOutputStream object. Each
call to write() deposits a byte value in the internal buffer array, called
buf in the picture. When the buf array becomes full, the write() method in
BufferedOutputStream calls the write(byte[]) method on System.out, which
sends the whole array of data to its destination. This final write is usually an
operating system call to write to a file. Operating system calls are slow, so it
helps to minimize how many of them we make. That is what the buf array in
BufferedOutputStream accomplishes. Without the BufferedOutputStream object
between the DataInputStream object and the System.out object, every time
writeInt() writes a single byte to System.out, that write causes an
operating system call to write to a file. The buf array, since it has size
4,096 bytes, reduces the number of operating system calls by many thousands.
Its not unusual for the buffered version of this program to be 10 times faster
than the unbuffered version.
Exercise: Explain what is wrong with this stream composition.
final var out = new BufferedOutputStream(
new DataOutputStream(
System.out), 4096);
Recall that in the "data_formats" folder there's the program "CreateData.java"
that writes 16 bytes of "raw" data to standard output, and a sequence of
programs called "ReadDataAs__.java" that each read 16 bytes form their
standard input stream and interpret the bytes as a stream of one particular
primitive data type. The "ReadDataAs__.java" programs in the "data_formats"
folder each use a ByteBuffer object to translate the "raw" bytes from
"CreateData.java" into the appropriate primitive data type.
In the folder "java_streams" there is a copy of "CreateData.java" and
another sequence of programs called "ReadDataAs__.java". These programs in
the "java_streams" folder each use a DataInputStream object to translate
the "raw" byte data from "CreateData.java" into the appropriate primitive
data type.
Filter streams
Java has quite a few stream classes. The stream classes are organized to all
derive from four abstract base classes, InputStream, OutputStream, Reader,
and Writer.
Streams derived from InputStream and OutputStream are meant to handle
byte data (often referred to as "binary streams). Streams derived from
Reader and Writer are meant to handle text data (strings).
Streams derived from InputStream and Reader are meant to handle data input.
Streams derived OutputStream and Writer are meant to handle data output.
Most stream classes have a single responsibility. An exception is
RandomAccessFile.
There are roughly three kinds of stream classes. Let's look at input streams (output streams are similar).
-
There are input streams classes that cannot be instantiated, like
InputStream,FilterInputStream,Reader, orFilterReader. -
There are input stream classes that must be instantiated by connecting them directly to a data source, like
FileInputStream,FileReader,ByteArrayInputStream,CharArrayReader, orStringReader. -
There are input streams that can only be instantiated by wrapping them around some other input stream. You can see these in the documentation for their constructors. Their constructors all require another input stream object that the constructed stream can read data from. Examples are
DataInputStream,BufferedInputStream,ObjectInputStream,InputStreamReader,BufferedReader.
The first set of input streams are not useful because we can never create them. We study them because they document the methods that their subclasses inherit.
The second set of input streams represent sources of data. But this list is very incomplete. There are many sources of data for which there is no concrete input stream class. For example, the keyboard is a source of input data, but it is not represented by any of these concrete stream classes. Another example is a network connection, which is a source of input data, but is not represented by a concrete input stream class.
The third set of input streams is the most interesting set. These are the
streams that let us build up sophisticated input streams by composing multiple
kinds of streams that each provide some needed functionality. We saw examples
of this above using DataInputStream and BufferedInputStream. There is
another example below using ZipInputStream.
Not all of Java's stream classes are in the java.io package. For example,
the ZipInputStream and ZipOutputStream classes are in the java.util.zip
package.
We can build an example using the ZipOutputStream class that shows how
versatile the Java stream model is. In the "java_streams" folder there is
the "ZipServer.java" program. This program wraps a ZipOutputStream around
the System.out stream. The program then writes to the zip stream all the
bytes it reads from its standard input stream. The ZipOutputStream object
compresses all the bytes it receives and writes the compressed data to the
System.out stream.
The code is simple. It looks like this.
final ZipOutputStream zip = new ZipOutputStream(System.out);
zip.putNextEntry(new ZipEntry("data"));
int b = 0;
while (-1 != (b = System.in.read()))
{
final byte[] bytes = { (byte)b };
zip.write(bytes, 0, 1);
}
zip.close();
Here is a picture that illustrates how simple this program's structure is.
All the logic for compressing data is in the object that "wraps" around
System.out.
ZipServer
+---------------------------------------------+
| +-----------------+ |
| | ZipOutputStream | |
| | +--------+------|
data >---->> System.in zip >> >> System.out >>----> zipped data
| | +--------+------|
| | | |
| +-----------------+ |
| |
| System.err >>---->
| |
+---------------------------------------------+
This is a good example of how extensible the Java stream classes are. Imagine
if this zip feature needed to be added to a monolithic OutputStream class
that "conveniently" tries to do everything. Java's composable stream classes
make it easy to create new stream features. It also makes it straight forward
to add a new feature to an existing stream. This design does add a layer of
complexity to using Java streams. You need to understand the overall design
of Java's stream classes to use them effectively.
Exercise: Modify "ZipServer.java" to be a program called "IntZipServer.java"
that reads integers as strings of digits from standard input (up until eof),
converts each int value to four bytes, compresses all the byte data, and
then writes the compressed data to standard output.
Exercise: Write a zip client program that uses a ZipInputStream to read
zipped data through the standard input stream and then write the uncompressed
data to the standard output stream.
Exercise: Explain the difference between the following stream compositions. Are they both useful? If so, is one more useful than the other?
final ZipOutputStream zip = new ZipOutputStream(
new BufferedOutputStream(
System.out));
final BufferedOutputStream bos = new BufferedOutputStream(
new ZipOutputStream(
System.out));
Exercise: Explain the relationship between the uses of the word "filter"
in FilterInputStream (or FilterOutputStream) and a "filter processes" that
reads from the standard input stream and writes to the standard output stream.
End-of-file (eof)
End-of-file means, more or less, that an input stream is in a state where no data can be read from it. This is not a good definition for eof, but I do not know a general, always correct definition for eof.
Java has four broad ways of telling us that an input stream is in the eof state. From a practical point of view, an input stream is in the eof state whenever the JVM tells us it is.
End-of-file is a state that any input stream can be in, even if the stream has nothing to do with reading from a file. An input stream that is not connected to a file can still have the end-of-file condition being true. It would be better if this state was called "end-of-stream". In some parts of the Java documentation, they do use the term end-of-stream instead of end-of-file.
End-of-file (end of stream) is much trickier than you might think. For example, an input stream can be in the end-of-file state and then something can change the stream's state and the stream is no longer in the eof state (see the program "End_of_file_Demo.java" in the "java_streams" folder). It would be wrong to say that eof means that you can no longer read data from a stream. While that is true for many input streams, it is not true for all of them.
Here are the four ways that methods in Java's I/O classes can denote that an input stream is in the eof state.
- The
read()methods inInputStream, and all the classes derived from it, return-1. - The
hasNext()method in theScannerclass returnsfalse. - The
readLine()method in theBufferedReaderclass returnsnull. - The various versions of "read" in the
DataInputStreamclass throw aEOFException.
- InputStream.read() returns -1
- Scanner.hasNext() returns false
- BufferedReader.readLine() returns null
- DataInputStream.readInt() throws EOFException
Here are the most important cases where we can get end-of-file on an input stream.
- The input stream is connected to a file.
- The input stream is connected to the keyboard.
- The input stream is connected to a pipe.
- The input stream is connected to a network connection.
We will look at the details of each of these cases.