CS 404 Distributed Systems Final Exam Version 1.0 Write brief answers (a couple of paragraphs each) to the following questions. You can write your solutions into any kind of document that you prefer (text file, word document, tex file, Maple worksheet, a piece of paper, etc.). Please feel free to ask me questions by email, or ask for an appointment sometime in the next week. You can either e-mail your answers to me or put them in my mail box in CLO 316. This is due by Saturday, December 16. 1.) In class we described a message passing algorithm that a network of computers can use to discover a spanning tree for the graph of their network (we did both a generic spanning tree version and a depth first search version). Come up with, and then describe, two message passing algorithms that the computers in the network can use to count how many computers there are. In the first algorithm, have the network perform its spanning tree algorithm, then have the network perform the counting algorithm while making use of the spanning tree data. For the second algorithm, do not use the spanning tree algorithm. The counting algorithm should work without assuming the existence of any spanning tree data or a spanning tree algorithm. For both algorithms you can assume that each network computer knows its adjacency list, that one special node of the network knows that it is the "root" of the network, and (for the first algorithm) that each computer knows the spanning tree algorithms. 2.) A common example of a distributed application is distributed compilation, Suppose that you have a source program that consists of m files and you have n>m computers to use to do the compilation (so there are more computers to do the compiling than files that need to be compiled). The best that you can achieve is an m-fold speedup over using a single computer for the compilation. What factors might cause the speedup in the distributed compilation to be less than (even much less than) the ideal speedup of m? (We are considering only the compilation step here, not the linking step. Assume in a compilation that all of the files need to be compiled. Also, assume that the computers are otherwise idle.) Note: The factors can vary form one language to another, for example, between C and Java. Think of things like file size, communication overhead, dependencies, etc. (If you are interested in the idea of distributed compilation, look at http://distcc.samba.org/ or http://en.wikipedia.org/wiki/Distcc) 3.) In all of our socket examples, our servers listened on only a single port. Can a server process listen on more than one port simultaneously? If not, explain why, if so, briefly sketch out how this would be done. 4.) The C language has the notion of a union. For example, you can define union weirdStuff { float x; char buf[5]; }; and then create a variable like this union weirdStuff w; and then do the following w.x = 1.0/3.0; w.buf[3] = '#'; If you follow these last two lines with printf( "%f\n", w.x ); you will not get 0.333333 as the print put. The main idea here is that the space allocated to the variable w is the maximum of the space needed for a float or an array of five chars (which will probably be the array of five chars). The two fields x and buf of weirdStuff live in the exact same space, essentially on top of each other. You can check how much space a compiler allocates for w with printf( "%d\n", sizeof(union weirdStuff) ); What kind of problems do union data types cause for Remote Procedure Call systems? Do you think that it would even be possible for an RPC system to allow unions? You do not need to say how these problems might be solved, just describe some examples of what problems might come up.