This problem set has seven problems for a total of 100 points. All problems except the first have several parts. As before, this assignment is to be handed in at the beginning of class on paper, not submitted by email.
You know of the Friendster network from class. Orkut is another network that is similar in many ways, but currently, membership is by invitation only. You must indicate that someone already in the network is your friend in order to join. A large number of people joined right after the beta version of the service was announced. Until late January is was impossible to "unfriend" someone — you could not remove your link to someone whom you had previously indicated as a friend. You also cannot could not close your account once you had joined.
Given only the information above what is known about the minimum degree of the graph? What is known about the maximum path length? Briefly justify your answers. What we can say for certain is this: the minimum degree is at least one, and the maximum path length is finite. In both cases, this is because the graph is connected. To join, you must be connected to an existing node, and links and nodes cannot be removed. These statements are adequate for full credit. We could also say, more specifically, that the maximum path length does not exceed the total number of people who joined minus one. Finally, we could say that the minimum degree is not more than the total number of people who joined minus one.
Describe an algorithm which generates an Erdös-Renyí random graph with n vertices and m edges. The algorithm you are to write must begin with N disconnected vertices and add only one edge at a time. The class of random graphs that are generated by this process should be one in which the distribution of degree is uniform and every possible link is equally likely. No real programming language syntax is required in your answer, although you can use pseudocode if you like. Just make it completely clear what is to be done at each step. You should end up with a set V of N vertices (which will be the same one that you start with) and a set E of m edges. If your algorithm can possibly add the same edge twice or can add a loop (an "edge" between a vertex and itself), it isn't correct. Several correct answers are possible, e.g.: [Example 1] Begin with V containing the N vertices, E empty, and the set P which initially contains all choose(N,2) sets of two vertices — defining all possible edges. For m steps, remove an element from P uniformly at random and add it to V. [Example 2] Begin with V containing the N vertices and E empty. Until there are m elements in E, do the following: Pick vertex a uniformly at random from V, and pick vertex b uniformly at random from V. If a=b (that is, they are the same vertex) or the edge (a,b) is already in E, do nothing; otherwise, add the edge (a,b) to E. [Example 3, just for fun] Begin with V containing the N vertices, and E containing all choose(N,2) sets of two vertices — (V,E) is a complete graph. For (choose(n,2) - m) steps, remove an element from E uniformly at random.
Simulate your algorithm for N=6 and m=10. You can use random.org to get random numbers. Write down what the graph looks like in all eleven configurations: m=0 ... 10. E.g., edges could be added in this order: (1, 6) (1, 4) (2, 5) (1, 5) (3, 6) <- this edge connects the graph (1, 3) (5, 6) (3, 5) (3, 4) (4, 5) The point of your doing the simulation is to make sure that your algorithm above actually works. Some algorithms are easier to simulate than others; you can request all the numbers from random.org at once if you use the algorithm in example 2, for instance, while you will have to make requests one at a time using the algorithms in example 1 or 3.
Is your graph connected? If so, at what step did it become connected? In the example above, yes; the fifth edge connects it. It is extremely unlikely, but possible, to generate a graph that is not fully connected.
Generate a graph using the Barabási-Albert preferential attachment model, which Watts calls preferential growth (see pages 108-111). Begin with a pair of nodes with a single link between them. Go for eight steps. At each step, add one node and a single link connecting it to the previous graph.
Draw each step, writing down the appropriate probability next to each
node to indicate how likely it is that the new node will attach there.
(Note: When you have a set of preferences for different options, for
instance, things you prefer with weights 5, 2, and 1, you convert them into
probabilities by summing them, 5+2+1 = 8, and dividing each one by the sum:
5/8, 2/8, 1/8. Again, you can use random.org as a source of random
numbers.) One way to use random.org to do this involves numbering "both ends" of all the edges, assigning the numbers 1, 2, 3, ... 2*m. In the following picture these numbers are shown in red.
Then generate a random number in the range 1 to 2*m, where 2*m is the highest one of these red numbers. Now add a new node and connect it with a new edge to the node next to the random number just drawn. The probability of each node to be connected to the next node is obtained by dividing its degree by twice the number of edges in the graph.

What is the degree sequence of your final graph? There are 2 nodes with degree 3, 2 nodes with degree 2 and 4 nodes with degree 1, so the degree sequence is 3, 3, 2, 2, 1, 1, 1, 1.
What is the expected degree sequence of a graph with this number of edges and this number of vertices that is generated in this manner?Approximately 2.9326, 2.9326, 1.9551, 1.5641, 1.3406, 1.1917, 1.0833.
Begin by drawing a regular graph: arrange 12 nodes in a circle and connect each node to the four nodes nearest to it — that is, the two nodes to one side of the node and the two nodes to the other side along the circle.
Compute the average distance between all pairs of different nodes. As the graph is symmetric, it is enough to compute the average distance only from one node to all the others. (1+1+2+2+3+3+3+2+2+1+1)/11=21/11 (In the graph below the red numbers display the distance of each node to vertex 1)

Make a single rewiring which minimizes the average distance. One correct answer is to remove edge (1,2) and add an edge (1,7).
Why is it now much harder to compute the average distance on this modified graph?By rewiring the graph we destroyed the symmetry.
In this question, we consider the World Wide Web as a graph, with HTML Web pages being vertices and hyperlinks being edges. For this problem, we'll consider only HTML Web pages (not Flash files or PDFs or anything else) as vertices and only plain, static hyperlinks (not Google's search facility or any other fancy way of linking) as edges. If a page A has one or more hyperlinks to page B, we consider that there is a single directed edge from page A to page B; it does not matter whether there is one link or ten. We'll ignore any links from a page to itself, and any links that lead to or from things other than HTML Web pages.
Is the graph of the Web, as described above:
One page on the Web, and one node in our graph, is this homework page itself: http://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/homework2.html
You will now do a search on the Web graph, beginning at the node above. You are to only use local information (the HTML page you are examining at each step) to determine where to go next. Do no use Google or any other Web resource to direct your search: simply pick the link that seems most likely to take you to your destination, while looking only at the current page. Your target is the official Web site of Janet Jackson.
For ten steps, or until you reach Janet-Jackson.com, if you reach the site in fewer than ten steps, record all of the following information:
Note that you do not need to reach Janet-Jackson.com to get full credit for this part of the problem. You just need to record all the links that you followed and why you followed them. If you hit a dead end of any sort or seem to be on an unhelpful path, you can backtrack and start from an earlier point on your path; just note clearly that you are doing this. This is an example (note that I mistakenly started from the solution page rather than the homework page, but this still provides an example) — http://www.cis.upenn.edu/~mkearns: I know Prof. Kearns' home page has many links to different organizations and articles. I followed a link to Time Digital hoping to find a news item with a link: http://www.time.com/time/digital/magazine/articles/0,4753,45789,00.html. The article had expired, but there was a link to the main Time page: http://www.time.com/. I clicked on "Entertainment" looking for Janet Jackson news: http://www.time.com/time/arts/. Score! She's currently the top story, in an article named "The Hypocrisy Bowl." I click the link to that story: http://www.time.com/time/magazine/article/0,9171,1101040216-588871,00.html. Unfortunately, no link here, but I can skip to page 3 hoping that there are some external links at the end — I'm worried that this corporate media site may have few or no external links now: http://www.time.com/time/magazine/article/0,9171,1101040216-588871-3,00.html. Sure enough, no external links. I now think it will be impossible to leave the Time site through a link, so I return to Prof. Kearns' home page. Despite just getting burned, I go to http://www.nytimes.com/library/tech/00/02/circuits/articles/10matc.html since I know The New York Times does have external links. I click on "Home" to go to http://www.nytimes.com/yr/mo/day/ and there's an article "My Hero: Janet Jackson" http://www.nytimes.com/2004/02/15/arts/15RICH.html. Again, I have to click on page 2 to get to the end of this, where the links usually are: http://www.nytimes.com/2004/02/15/arts/15RICH.html?pagewanted=2. But there are no links there. And that was the 10th link I followed, anyway.
Consider this graph:
What is the chromatic number? From part (1) you know that this graph has a 5-clique and hence a chromatic number of at least 5. That it is in fact 5 can be checked with the applet mentioned in class:


Consider the two "small worlds" network generation models we examined in class — the α-model of Watts, and the model of Kleinberg in which vertices lie on a grid, and a vertex is connected to all others within grid distance p, as well as having q "long distance" connections. A vertex at grid distance d has probability proportional to 1/dr of being chosen as a long-distance connection, for some r > 0.