Streams

Processes use streams for all of their I/O (input/output) operations. Streams are an abstraction created by the operating system. A stream represents a sequence of bytes. The bytes can represent any kind of data, for example, text, images, video, audio. Processes use streams to move data into and out of hardware I/O devices (like the keyboard), files, or even other processes.

Streams are one directional. A process can only read from or write to a specific stream. If a process can read from a stream we say it is an "input stream". If a process can write to a stream, we say it is an "output stream".

In introductory programming courses, streams are mostly associated with files. A program reads or writes a stream of data from a file on a storage device. But we will see that streams are much more versatile. We will show how programs can read or write streams of data from other programs. In other words, we will see that streams can be used to implement Inter-process Communication.

The operating system creates the file and stream abstraction and then makes available to programming languages for writing programs. Here are references to a few operating systems textbooks that explain how operating systems create these abstractions.

In this document we will emphasize how streams are used to build up complex command-lines. In another document we will look at how we can use the Java programming language to write code that uses streams.

Standard I/O Streams

When a process is created by the operating system, the process is always supplied with three open streams. These three streams are called the "standard streams". They are

  • standard input (stdin)
  • standard output (stdout)
  • standard error (stderr)

We can visualize a process as an object with three "connections" where data (bytes) can either flow into the process or flow out from the process.

                       process
                 +-----------------+
                 |                 |
        >------->> stdin    stdout >>------->
                 |                 |
                 |          stderr >>------->
                 |                 |
                 +-----------------+

A console application will usually have its stdin stream connected to the computer's keyboard and its stdout and stderr streams connected to the console window.

                       process
                 +-----------------+
                 |                 |
   keyboard >--->> stdin    stdout >>-----+---> console window
                 |                 |      |
                 |          stderr >>-----+
                 |                 |
                 +-----------------+

It is important to realize that the above picture is independent of the programming language used to write the program which is running in the process. Every process looks like this. It is up to each programming language to allow programs, written in that language, to make use of this setup provided by the operating system.

Every operating system has its own way of giving each process access to the internal data structures the operating system uses to keep track of what each standard stream is "connected" to.

The Linux operating system gives every process three file descriptors,

    #define  STDIN_FILENO 0,  STDOUT_FILENO 1,  STDERR_FILENO 2

Linux provides the read() and write() system calls to let a process read from and write to these file descriptors.

The Windows operating system gives every process three handles. We retrieve the handles using the GetStdHandle() function with one of these input parameters.

     STD_INPUT_HANDLE, STD_OUTPUT_HANDLE, STD_ERROR_HANDLE

Windows provides the ReadFile() and WriteFile() system calls to let a process read from and write to these handles.

Every programming language must have a way of representing the three standard streams and every language must provide a way to read from the standard input stream and a way to write to the standard output and standard error streams.

For example, here is how the three standard I/O streams are represented by some common programming languages.

    Java uses Stream objects.
      java.io.InputStream  System.in
      java.io.PrintStream  System.out
      java.io.PrintStream  System.err
    These are static fields in the java.lang.System class.

    Standard C uses pointers to FILE objects.
      FILE* stdin;
      FILE* stdout;
      FILE* stderr;
    These are defined in the stdio.h header file.

    Python uses text File objects.
      sys.stdin
      sys.stdout
      sys.stderr
    These are in the sys module.

    C++ uses stream objects.
      istream std::cin;
      ostream std::cout;
      ostream std::cerr;
    These are defined in the <iostream> header.

    .Net uses Stream objects.
      System.IO.TextReader  Console.In
      System.IO.TextWriter  Console.Out
      System.IO.TextWriter  Console.Error
    These are static fields in the System.Console class.

Most programming languages define their basic I/O functions to automatically work with the standard input and output streams. For example, in almost every programming language, the basic print function writes to the standard output stream. The print function itself is written to use the write() system call in Linux or the WriteFile() system call in Windows.

The C language provides functions like getchar(), scanf(), and fscanf() to read from stdin and it provides printf() and fprintf() to write to stdout and stderr. On a Windows computer, the C language's printf() function will be implemented using Window's WriteFile() system call with the STD_OUTPUT_HANDLE handle. On a Linux computer, the C language's printf() function will be implemented using Linux's write() system call with the STDOUT_FILENO file descriptor.

Every operating system provides a way for processes to open new streams. For example, in the following picture, a process, while it was running, opened three new stream, two input streams and one output stream. All three streams are connected to files.

                        process
                +-----------------------+
                |                       |
  keyboard >--->> stdin          stdout >>-----+---> console window
                |                       |      |
                |                stderr >>-----+
                |                       |
                | 1n1     in2     out   |
                +-/|\-----/|\-----\ /---+
                   |       |       |
    input1.txt >---+       |       +----------> output.txt
                           |
            input2.txt >---+

This process can now read data from any of its three input streams and it can write data to any of its three output streams. For example, it might copy data from the two input files into the output file.

After the process has read all the data it needs from the file input1.txt, the process can close the stream.

                        process
                +-----------------------+
                |                       |
  keyboard >--->> stdin          stdout >>-----+---> console window
                |                       |      |
                |                stderr >>-----+
                |                       |
                |         in2     out   |
                +---------/|\-----\ /---+
                           |       |
                           |       +----------> output.txt
                           |
            input2.txt >---+

As long as a process is running, it can continue to open and close input and output streams. Opening and closing streams to files is what most introductory programming textbooks cover in their chapters on file I/O.

I/O Redirection

Every process is created by the operating system at the request of some other process, the parent process. When the parent process asks the operating system to create a child process, the parent must tell the operating system how to "connect" the child's three standard streams. The parent telling the operating system how to connect the child's three standard streams is usually referred to as I/O redirection.

At a shell command prompt, if we type a command like this,

    > foo > result.txt

then the shell program (cmd.exe on Windows, or bash on Linux) is the parent process. The above command tells the shell process to ask the operating system to create a child process from the foo program. But in addition to asking the operating system to create the child process, the shell process also instructs the operating system to redirect the child process's standard output to the file result.txt. So when the foo process runs, it looks like this.

                       foo
                +-----------------+
                |                 |
   keyboard >-->> stdin    stdout >>----> result.txt
                |                 |
                |          stderr >>----> console window
                |                 |
                +-----------------+

Stdin and stderr have their default connections, and stdout is redirected to the file result.txt.

If we type a command like this,

    > foo > result.txt < data.txt

then the shell process asks the operating system to create a child process from the foo program and it also asks the operating system to redirect the child process's standard output to the file result.txt and redirect the child process's standard input to the file data.txt. So when the foo process runs, it looks like this.

                        foo
                +-----------------+
                |                 |
   data.txt >-->> stdin    stdout >>----> result.txt
                |                 |
                |          stderr >>----> console window
                |                 |
                +-----------------+

It is very important to know that the foo process does not know that its standard streams have been redirected. The foo process cannot tell if its standard output stream is connected to the console window (the default connection) or if it is connected to some file. If standard output is connected to a file, then foo is doing file I/O without even know that. When the foo process calls the print function, it is "printing" on a file (which does not, literally, make sense). The function name "print" is a hold over from many years ago when a computer's output was always printed on paper. The name "print" for the default output function is misleading, since the modern print function has nothing to do with printing on paper. But, as we have mentioned many times, once a name is chosen for something in a programming language (in this case, an I/O function), the name is never changed, no mater how outdated it becomes.

The order in which we place redirections in the command-line does not matter. The following two commands are equivalent.

    > foo > result.txt < data.txt
    > foo < data.txt > result.txt

When we use the input redirection operator, if the specified input file does not exist, then we get an error message and the command-line fails.

When we use the output redirection operator, if the specified output file does not exist, then the operating system creates an empty file for us with that name. However, be careful. If the specified output file does exist, then it is emptied of all its contents, and the command-line is given the empty file, so we lose any data that was in the specified output file.

There is a very useful alternative to the > output redirection operator. The >> append output redirection operator will, like >, create the specified output file if it does not exist, but instead of emptying an existing output file, this operator writes new data at the end of the previous data in the output file. One important use of this operator is for one file to accumulate results from several command-lines.

We can have the shell process redirect the standard error stream of a process. The following command-line,

    > bar < data.txt 2> errors.txt

tells the shell process to ask the operating system to create a child process from the bar program, redirect the child's standard input stream to the file data.txt, and redirect the child's standard error stream to the file errors.txt. The child's standard output stream will be connected to the console window. When the bar process runs, it looks like this.

                        bar
                +-----------------+
                |                 |
   data.txt >-->> stdin    stdout >>----> console window
                |                 |
                |          stderr >>----> errors.txt
                |                 |
                +-----------------+

The order of the redirections in the command-line does not matter. The following two commands are equivalent.

    > bar < data.txt 2> errors.txt
    > bar 2> errors.txt < data.txt

In fact, the following command-lines are all equivalent.

    > bar < data.txt > output.txt 2> errors.txt
    > bar > output.txt 2> errors.txt < data.txt
    > bar > output.txt < data.txt 2> errors.txt

What if we want to redirect both the standard output and standard error streams to a single file? The following command-line does not work.

    > bar > allOutput.txt 2> allOutput.txt

The Linux bash shell allows us to use the &> redirection operator.

    $ bar &> allOutput.txt

This creates the following picture.

                        bar
                +-----------------+
                |                 |
   keyboard >-->> stdin    stdout >>----+----> allOutput.txt
                |                 |     |
                |          stderr >>----+
                |                 |
                +-----------------+

With the Windows cmd shell, we need to use this slightly more complex command (which also works with bash).

   > bar > allOutput.txt 2>&1

This command says to redirect the standard output stream to the file allOutput.txt and then, in addition, redirect the standard error stream to the same place as the standard output stream. The 2>&1 operator must be at the end of the command-line.

Where do the numbers 1 and 2 in the I/O redirection operators come from? They are from the Unix operating system's implementation of file I/O. In Unix (and in Linux) every open file is given a positive integer number called a file descriptor. The file descriptor numbers are used by all the Unix (and Linux) file I/O functions. When a process is created, its standard input, output, and error streams are given the file descriptors 0, 1, and 2, respectively.

                       process
                 +-----------------+
                 |                 |
        >------->> 0             1 >>------->
                 |                 |
                 |               2 >>------->
                 |                 |
                 +-----------------+

The bash and cmd shells use these file descriptor numbers as part of their I/O redirection operators. This is an example of a "leaky abstraction". The shell program is supposed to let us manipulate processes and files with out knowing about the details of how the underlying operating system handles processes and files. The Windows operating system does not even use file descriptors, but it still exposes them in the syntax of the cmd shell (in order to be consistent with bash). A leaky abstraction is when a lower level implementation detail appears in the interface of a higher level abstraction.

Do not confuse I/O redirection with the idea of opening a new stream to a file. The above foo process, that has its stdin redirected to the file data.txt, and its stdout redirected to the file result.txt, can still open new streams connected to other files.

                          foo
                +-----------------------+
                |                       |
  data.txt >--->> stdin          stdout >>-----> result.txt
                |                       |
                |                stderr >>------> console window
                |                       |
                |     in         out    |
                +-----/|\--------\ /----+
                       |          |
                       |          |
         input.txt >---+          +----------> output.txt

Opening (and closing) new file streams does not change the fact that this process has had its standard input and output streams redirected.

Shared streams

At a shell command prompt, if we type this command-line,

    > foo

then we are asking the shell process to create and run a foo process. The shell process (cmd.exe on Windows, or bash on Linux) is the parent process and foo is its child process. The shell process causes the foo process to have its standard streams connected in the following, usual, way.

                        foo
                 +-----------------+
                 |                 |
   keyboard >--->> stdin    stdout >>----+----> console window
                 |                 |     |
                 |          stderr >>----+
                 |                 |
                 +-----------------+

But this picture is incomplete. It does not show the relationship between the foo process and the shell process, its parent process. The shell process is itself a command-line program, so it uses the keyboard for its input and the console window for its output.

Here is how the two processes are related to each other. The two process "share" the input stream for the keyboard and they share the output stream to the console window.

                               shell
                         +-----------------+
                         |                 |
                  +----->> stdin    stdout >>-----+
                  |      |                 |      |
                  |      |          stderr >>-----+
                  |      |                 |      |
                  |      +-----------------+      |
    keyboard >----+                               +----> console window
                  |            foo                |
                  |      +-----------------+      |
                  |      |                 |      |
                  +----->> stdin    stdout >>-----+
                         |                 |      |
                         |          stderr >>-----+
                         |                 |
                         +-----------------+

If, at a shell command prompt, we type this command-line,

    > foo > result.txt

then the shell process is the parent process and the foo process is the child process. The child has its standard output stream redirected to a file, but it uses the default input stream (and default error stream), which it shares with the shell process. The two processes and their streams will look like this.

                               shell
                         +-----------------+
                         |                 |
                  +----->> stdin    stdout >>-----+----> console window
                  |      |                 |      |
                  |      |          stderr >>-----+
                  |      |                 |      |
                  |      +-----------------+      |
    keyboard >----+                               |
                  |                foo            |
                  |      +-----------------+      |
                  |      |                 |      |
                  +----->> stdin    stdout >>----------> result.txt
                         |                 |      |
                         |          stderr >>-----+
                         |                 |
                         +-----------------+

If, at a shell command prompt, we type this command-line,

    > foo 2> errors.txt

then we get the following picture. The foo process shares its standard input and output streams with the shell process.

                               shell
                         +-----------------+
                         |                 |
                  +----->> stdin    stdout >>-----+----> console window
                  |      |                 |      |
                  |      |          stderr >>-----+
                  |      |                 |      |
                  |      +-----------------+      |
    keyboard >----+                               |
                  |                foo            |
                  |      +-----------------+      |
                  |      |                 |      |
                  +----->> stdin    stdout >>-----+
                         |                 |
                         |          stderr >>----------> errors.txt
                         |                 |
                         +-----------------+

When two processes share a stream, it is usually the case that one of the two processes is idle while the other process uses the shared stream (the idle process will often be waiting for the other process to terminate). If two processes are simultaneously using a shared stream, the results can be confusing and unpredictable.

If two processes simultaneously use an output stream, then their outputs will be, more or less, randomly intermingled in the stream's final destination. This can lead to unusable results.

If two processes simultaneously use an input stream, as in the following picture, then it is not the case that every input byte flows into each process. Each input byte can only be consumed by one of the two processes. Which process gets a particular byte of input depends on the ordering of when each process calls its read() function on the input stream. This is almost never a desirable situation. Processes almost never simultaneously use a shared input stream. Shared input streams are very common, but the two processes almost always have a way to synchronize their use of the stream so that they are never reading from it simultaneously. The most common way for two processes to share an input stream is for the parent process to wait for the child process to terminate. Then the parent process can resume reading from the input stream.

                        parent
                  +-----------------+
                  |                 |
           +----->> stdin    stdout >>-------->
           |      |                 |
           |      |          stderr >>----->
           |      |                 |
           |      +-----------------+
      >----+
           |
           |               child
           |         +-----------------+
           |         |                 |
           +-------->> stdin    stdout >>------>
                     |                 |
                     |          stderr >>---->
                     |                 |
                     +-----------------+

Pipes

So far, we have seen that streams can connect a process to either a file or an I/O device (like the keyboard or a console window).

It would be useful if the output stream of one process could be connected to the input stream of another process, something like this.

                 foo                            bar
          +-----------------+            +-----------------+
          |                 |            |                 |
    >---->> stdin    stdout >>---------->> stdin    stdout >>----->
          |                 |            |                 |
          |          stderr >>--->       |          stderr >>---->
          |                 |            |                 |
          +-----------------+            +-----------------+

This picture is supposed to represent the idea that the foo process can send information to the bar process by foo printing to its standard output stream and bar reading from its standard input stream.

The above picture is not possible. The operating system does not allow the output stream of one process to be connected directly to the input stream of another process. But the idea is very useful, so the operating system provides an object, called a pipe, that can be placed between two processes, and can allow the output from one process to be used as input to another process.

Consider the following command-line.

    > foo | bar

The | character is (in the context of a command-line) called the pipe symbol. This command-line asks the shell process to create two child processes, one from the foo program and the other from the bar program. In addition, the shell process will ask the operating system to create a pipe object and have the standard output stream of the foo process redirected to the input of the pipe, and have the standard input stream of the bar process redirected to the output of the pipe. This create a picture that looks like the following. Notice that foo shares the keyboard with the shell, and bar shares the console window with the shell. Also notice that the error stream from foo is combined with the output and errors streams from both the shell and bar.

                                      shell
                               +-----------------+
                               |                 |
                +------------->> stdin    stdout >>--------------------+
                |              |                 |                     |
                |              |          stderr >>--------------------+
                |              |                 |                     |
                |              +-----------------+                     |
                |                                                      |
   keyboard >---+                                                      +---> console window
                |                                                      |
                |          foo                           bar           |
                |   +----------------+            +----------------+   |
                |   |                |    pipe    |                |   |
                +-->> stdin   stdout >>--======-->> stdin   stdout >>--+
                    |                |            |                |   |
                    |         stderr >>--+        |         stderr >>--+
                    |                |   |        |                |   |
                    +----------------+   |        +----------------+   |
                                         |                             |
                                         |                             |
                                         +-----------------------------+

The shell process will wait for both child processes to terminate before the shell will resume using the shared keyboard and console window.

If we type a command-line like this,

    > foo < data.txt | bar > result.txt

then the shell process will ask the operating system to create two child processes, one from the foo program and the other from the bar program. In addition, the shell process will ask the operating system to create a pipe object and have stdout of the foo process redirected to the input of the pipe, and have stdin of the bar process redirected to the output of the pipe. Finally, the shell process will ask the operating system to redirect the foo process's standard input to the file data.txt and redirect the bar process's standard output to the file result.txt. While this command is executing, it looks like the following picture (this picture doesn't show the parent shell process and its streams).

                      foo                           bar
               +----------------+            +----------------+
               |                |    pipe    |                |
  data.txt >-->> stdin   stdout >>--======-->> stdin   stdout >>-----> result.txt
               |                |            |                |
               |         stderr >>--+        |         stderr >>---+-> console window
               |                |   |        |                |    |
               +----------------+   |        +----------------+    |
                                    |                              |
                                    +------------------------------+

In the above command, the two processes, foo and bar, are running simultaneously (in parallel) with each other. The pipe object acts as a "buffer" between the two processes. Whenever the foo process writes something to its output stream, that something gets put in the pipe "buffer". Then when the bar process wants to read some input data, it reads whatever is currently in the pipe "buffer".

If the foo process writes data into the pipe buffer faster than the bar process can read data out of the pipe buffer, then data accumulates in the buffer. If the foo process writes data so fast that it fills up the buffer, then the operating system makes the foo' process "block" and wait for thebarprocess to read some data from the pipe buffer. When thebarprocess reads some data from the buffer, then the operating system "unblocks" thebar` process so that it can resume writing data into the buffer.

If the bar process reads data out of the pipe buffer faster than the foo process can write data into the buffer, then the bar process will often find the pipe empty when bar wants to read some data. In that case, the operating system "blocks" the bar process and makes it wait until some data shows up in the pipe. When the foo process writes some data to the pipe, then the operating system "unblocks" the bar process so that it can resume reading data from the pipe buffer. (You should compare this to what happens when a process tries to pop() and empty stack data structure.)

When foo terminates, it may be that data still remains in the pipe. In that case bar will continue to run until it has emptied the pipe. When bar reads the last byte of data from the pipe buffer, then the operating systems tells bar that it has reached the end-of-file on its input stream.

It is possible for the bar process to terminate before the foo process does. In that case, it is not a good idea to let the foo process fill up the pipe buffer and then block (forever). If the bar process terminates and the foo process then writes data into the pipe buffer, the operating system sends an I/O exception to the foo process. Any data left in the pipe buffer is considered lost.

This coordination that we just described, between the two processes on the ends of a pipe, is referred to in computer science as "bounded buffer synchronization" or the "producer-consumer problem".

In the Linux bash shell there is another version of the pipe operator, the |& operator. If we type this command-line,

    $ foo |& bar

then the bash process will ask the operating system to create foo and bar child processes, then bash will ask the operating system to create a pipe object and have stdout and stderr of the foo process redirected to the input of the pipe, and have stdin of the bar process redirected to the output of the pipe. While this command is executing, it looks like the following picture (this picture doesn't show the bash shell process and its streams).

                      foo                             bar
               +----------------+              +----------------+
               |                |      pipe    |                |
  keyboard >-->> stdin   stdout >>--+-======-->> stdin   stdout >>---+-> console window
               |                |   |          |                |    |
               |         stderr >>--+          |         stderr >>---+
               |                |              |                |
               +----------------+              +----------------+

This would be useful if the bar process needs to know about and handle errors from the foo process. The Windows cmd shell does not have this version of the pipe operator but it can be implemented with this slightly more complex command-line (which works on Linux too).

    > foo 2>&1 | bar

Here is another way to think about the shell's pipeline operator. The shell process could run the two programs, foo and bar, sequentially, one after the other. In other words, the shell process could interpret this command,

    > foo < data.txt | bar > result.txt

as the following three commands.

    > foo < data.txt > temp
    > bar < temp > result.txt
    > del temp

These three commands would have a picture that looks like this.

                        foo
                +-----------------+
                |                 |
   data.txt >-->> stdin    stdout >>----> temp
                |                 |
                |          stderr >>----> console window
                |                 |
                +-----------------+

                        bar
                +-----------------+
                |                 |
       temp >-->> stdin    stdout >>----> result.txt
                |                 |
                |          stderr >>----> console window
                |                 |
                +-----------------+

First the foo process runs with its output stored in a temporary file called temp. Then the bar process runs with its input coming from the temp file. Then the temp file is deleted.

Notice that this sequential interpretation of the pipeline command might be considerably slower than the parallel interpretation. And since the sequential interpretation needs to store all the intermediate data in a temp file, the sequential interpretation may require far more storage space than the parallel interpretation.

One final remark. Do not confuse the shell's pipe operator, the | character, with the operating system's pipe object. The operating system's pipe object is an object provided by the OS to efficiently implement one kind of interprocess communication. The shell's pipe operator is a way for the shell's user to request that two processes communicate. The shell may or may not implement its pipe operator using an OS pipe object (see the last few paragraphs).

Here is the documentation for the Linux and Windows operating system functions that create pipe objects.

Here are references to the bash and cmd.exe pipe operators.

Filters and Pipelines

All the example code mentioned this section is in the sub folder called filter_programs from the following zip file.

Pipes are extremely useful. Their usefulness comes from combining them with a kind of program called a filter. When pipes and filters are combined together, we call these systems data pipelines.

A filter is a program that reads data from its standard input stream, does some kind of operation on the data, and then writes the converted data to its standard output stream.

Data pipelines are usually implemented on a very large scale, processing gigabytes of data. But pipelines can also be useful at a small scale, while working with files on your personal computer. The Windows and Linux operating systems both come with many filter programs installed. Filter programs can be used, for example, to sort, search, format, or convert files.

To get a feel for working with pipes and filters, it helps to experiment with actual filter programs. In this section we will work with a collection of simple filter programs, written in Java and C, contained in the folder filter_programs.

In the filter_programs folder there are Java programs that act as filters. They are all short programs that do simple manipulations of the their input characters. Look at the source code to these programs. Compile them and then run them using command-lines like the following.

    > java Reverse < Readme.txt > result.txt
    > java Double < Readme.txt | java Reverse
    > java Double | java ToUpperCase | java Reverse
    > java ShiftN 2 | java ToUpperCase | java Reverse
    > java Twiddle < Readme.txt | java ToUpperCase | java Double | java RemoveVowels > result2.txt
    > java Find pipe < Readme.txt | java CountLines
    > java OneWordPerLine < Readme.txt | Find pipe | java CountLines

Run a couple of the programs by themselves, without any I/O redirection or pipes, to see how they manipulate input data (from the keyboard) to produce output data (in the console window).

    > java ToUpperCase
    > java Double
    > java Reverse
    > java MakeOneLine

Notice that you need to tap the Enter key to send input from the keyboard to the process. Sometimes you see immediate output. Sometimes, for example CountLines.java' orLongestLine.java, there is no output until the input is terminated (end-of-file). You denote the end of your input to the process by typingControl-zon Windows orControl-don Linux. **Do not* useControl-C. That terminates the process (instead of terminating just the process's input) and causes the process's output to be lost. Sometimes (for exampleMakeOneLine.java`) the results in the console window seem not to explain how the program works as a filter.

Command-line Syntax

We have seen that command-lines can be made up of, among other things, program names, command-line arguments, file names, I/O redirection operators (the < and > characters), and pipes (the | character). In this section we will look at the syntax of building complex command-lines that combine all of these elements along with a few new elements.

We need to be careful when we use the phrase "command-line argument". Here is why.

Consider the following "command-line". It uses the Java program Find.java from the filter_programs directory. How many "command-line arguments" are there? The answer is, of course, "It depends!".

    > java Find pipe < Readme.txt > temp.txt

One the one hand, we can say that there are "no command-line arguments" because this is just an input string that the shell process reads from its standard input stream. The shell process parses this string and then builds a command-line to give to the operating system. The command-line for the operating system asks the OS to create a java process with two command-line arguments, "Find" and "pipe". The rest of the input string is used by the shell process to decide to ask the OS to redirect the standard input and output streams for the java process. From the point of view of the java process we can say that there are "two command-line arguments". But there is still a third point of view. The java process implements the Java Virtual Machine (JVM) and the Find.class file an executable file from the point of view of the JVM. The JVM (virtually) executes a Find process. The main() method of the Find process is passed "one command-line argument", the string "pipe".

So the answer to the question, "How many command-line arguments are there?" is none, from the point of view of the shell process, two from the point of view of the java process, and one from the point of view of the (virtual) Find process.

Anther way to say this is that the tokens "Find" and "pipe" are definitely command-line arguments, and the "java", "<", "Readme.txt", ">", and "temp.txt" tokens are not command-line arguments, they are tokens used by the shell process.

Here are some references for the CMD shell syntax.

Here are some references for the Bash shell syntax.

Here are some review problems that ask you to use the materail discussed in this document.

Problem 1

Explain what each of the following possible command-lines mean. In each problem, you need to associate an appropriate meaning to the symbols a, b and c. Each symbol can represent either a program, a file, or a command-line argument.

For example "a is the name of a program, b and c are the names of files", or "a and b are the names of programs and c is the name of a file", or "a is the name of a program, b and c are arguments to the program". Also give a specific example of a runnable command-line with the given format using Windows command-line programs like dir, more, sort, find, echo, etc.

    > a > b < c
    > a < b > c
    > a | b > c
    > a < b | c
    > a   b   c
    > a   b > c
    > a   b | c
    > a & b < c
    > a < b   c
    > a < b & c
    > a & b | c
    > a &(b | c)
    >(a & b)| c
    >(a & b)> c
    > a & b   c
    > a & b & c

Problem 2

Draw a picture illustrating the processes, streams, pipes, and files in each of the following command-lines.

(a)

    > b < a | c > d

(b)

    > a < b | c 2> d | e > f 2> d

Problem 3

Draw a picture that illustrates all the processes, pipes, files, and (possibly shared) streams in the following situation. A process p1 opens the file a.txt for input and then it opens the file b.txt for output. Then process p1 creates a pipe. Then p1 creates a child process p2 with p2 inheriting a.txt, the pipe's input, and p1's stderr as p2's stdin, stdout and stderr streams. Then p1 creates another child process p3 with p3 inheriting the pipe's output, b.txt, and p1's stderr as p3's stdin, stdout and stderr streams. Then p1 closes its stream to a.txt and the pipe's output.

Problem 4

For the Windows cmd.exe shell, the dir command is a builtin command so the cmd.exe process does all the work for the directory listing (there is no dir.exe program). On the other hand, the sort and find commands are not builtin (so there are sort.exe and find.exe programs).

For each of the following cmd.exe command-lines, draw a picture of all the relevant processes that shows the difference between a pipeline with a buitin command and a pipeline with non builtin commands.

(a)

    > dir | find "oops"

(b)

    > sort /? | find "oops"

Problem 5

What problem is there with each of the following two command lines? Hint: Try to draw a picture of all the associated processes, streams, pipes, and files.

    > a | b < c
    > a > b | c

Creating a pipe

For CS 33600, this section is optional.

Using Java to create a pipe is an interesting topic and is useful for solving a number of practical problems (in fact, we will use it later in the course when we implement an HTTP application server), but we need to move on to other topics.

In this section we will show how a Java process can create a pipeline of two other processes (the other two processes need not be Java processes). We will approach this in two steps. In the first step, we will show how a Java process can start a child process and feed data into the child and draw data from the child. In the second step, we will show how a Java process can start a pipeline of two child processes and feed data into the beginning of the pipeline and draw data from the end of the pipeline.

For the first step, we want a Java process to create the following picture.

                      Java process
                +-----------------------+
                |                       |
   keyboard >-->> stdin          stdout >>-----+---> console window
                |                       |      |
                |                stderr >>-----+
                |                       |      |
                |    out          in    |      |
                +----\ /---------/|\----+      |
                      |           |            |
                      |           |            |
               +------+           +------+     |
               |          child          |     |
               |    +---------------+    |     |
               |    |               |    |     |
               +--->> stdin  stdout >>---+     |
                    |               |          |
                    |        stderr >----------+
                    |               |
                    +---------------+

The Java process should create a new input stream, a new output stream, and a child process, and then redirect the child's standard input to the new output stream, and redirect the child's standard output to the new input stream.

For the second step, we want a Java process to create the following picture.

                      Java process
                +-----------------------+
                |                       |
   keyboard >-->> stdin          stdout >>---------+---> console window
                |                       |          |
                |                stderr >>---------+
                |                       |          |
                |    out          in    |          |
                +----\|/---------/|\----+          +------------+
                      |           |                             |
                      |           |                             |
    +-----------------+           +------------------------+    |
    |                                                      |    |
    |        child1                        child2          |    |
    |   +----------------+            +----------------+   |    |
    |   |                |    pipe    |                |   |    |
    +-->> stdin   stdout >>--======-->> stdin   stdout >>--+    |
        |                |            |                |        |
        |         stderr >>--+        |         stderr >>-------+
        |                |   |        |                |        |
        +----------------+   |        +----------------+        |
                             |                                  |
                             |                                  |
                             +----------------------------------+

The Java process should create a new input stream, a new output stream, two child processes, and a pipe, and then redirect the first child's standard input to the new output stream, redirect the second child's standard output to the new input stream, and connect the two child processes with the pipe.