Problem Set 2

CSE 112: Networked Life, Spring 2004

Handed out Feb 17 - Due 9:00 am Feb 26

This problem set has seven problems for a total of 100 points. All problems except the first have several parts. As before, this assignment is to be handed in at the beginning of class on paper, not submitted by email.

1. (10 points)

You know of the Friendster network from class. Orkut is another network that is similar in many ways, but currently, membership is by invitation only. You must indicate that someone already in the network is your friend in order to join. A large number of people joined right after the beta version of the service was announced. Until late January is was impossible to "unfriend" someone — you could not remove your link to someone whom you had previously indicated as a friend. You also cannot could not close your account once you had joined.

Given only the information above what is known about the minimum degree of the graph? What is known about the maximum path length? Briefly justify your answers.

2. (14 points)
1. Describe an algorithm which generates an Erdös-Renyí random graph with n vertices and m edges. The algorithm you are to write must begin with N disconnected vertices and add only one edge at a time. The class of random graphs that are generated by this process should be one in which the distribution of degree is uniform and every possible link is equally likely. No real programming language syntax is required in your answer, although you can use pseudocode if you like. Just make it completely clear what is to be done at each step.

2. Simulate your algorithm for N=6 and m=10. You can use random.org to get random numbers. Write down what the graph looks like in all eleven configurations: m=0 ... 10.

3. Is your graph connected? If so, at what step did it become connected?

3. (20 points)

Generate a graph using the Barabási-Albert preferential attachment model, which Watts calls preferential growth (see pages 108-111). Begin with a pair of nodes with a single link between them. Go for eight steps. At each step, add one node and a single link connecting it to the previous graph.

1. Draw each step, writing down the appropriate probability next to each node to indicate how likely it is that the new node will attach there. (Note: When you have a set of preferences for different options, for instance, things you prefer with weights 5, 2, and 1, you convert them into probabilities by summing them, 5+2+1 = 8, and dividing each one by the sum: 5/8, 2/8, 1/8. Again, you can use random.org as a source of random numbers.)

2. What is the degree sequence of your final graph?

3. What is the expected degree sequence of a graph with this number of edges and this number of vertices that is generated in this manner?

4. (12 points)

Begin by drawing a regular graph: arrange 12 nodes in a circle and connect each node to the four nodes nearest to it — that is, the two nodes to one side of the node and the two nodes to the other side along the circle.

1. Compute the average distance between all pairs of different nodes.

2. Make a single rewiring which minimizes the average distance.

3. Why is it now much harder to compute the average distance on this modified graph?

5. (12 points)

In this question, we consider the World Wide Web as a graph, with HTML Web pages being vertices and hyperlinks being edges. For this problem, we'll consider only HTML Web pages (not Flash files or PDFs or anything else) as vertices and only plain, static hyperlinks (not Google's search facility or any other fancy way of linking) as edges. If a page A has one or more hyperlinks to page B, we consider that there is a single directed edge from page A to page B; it does not matter whether there is one link or ten. We'll ignore any links from a page to itself, and any links that lead to or from things other than HTML Web pages.

1. Is the graph of the Web, as described above:

• complete?
• acyclic?
• of uniform out-degree distribution, over the entire range of possible out-degrees?
2. One page on the Web, and one node in our graph, is this homework page itself: http://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/homework2.html

• What is this node's out-degree?
• What is its in-degree?
• How did you determine its out-degree and in-degree?
3. You will now do a search on the Web graph, beginning at the node above. You are to only use local information (the HTML page you are examining at each step) to determine where to go next. Do no use Google or any other Web resource to direct your search: simply pick the link that seems most likely to take you to your destination, while looking only at the current page. Your target is the official Web site of Janet Jackson.

For ten steps, or until you reach Janet-Jackson.com, if you reach the site in fewer than ten steps, record all of the following information:

• The URL of the link you followed.
• A phrase or sentence explaining why you followed that link.

Note that you do not need to reach Janet-Jackson.com to get full credit for this part of the problem. You just need to record all the links that you followed and why you followed them. If you hit a dead end of any sort or seem to be on an unhelpful path, you can backtrack and start from an earlier point on your path; just note clearly that you are doing this.

6. (12 points)

Consider this graph:

1. How large is a maximum clique?
2. What is the average degree?
3. Write the numbers of the nodes that are in a maximum matching.
4. What is the chromatic number?

5. What vertices are in a maximum independent set?
6. Draw a spanning tree of this graph.
7. (20 points)

Consider the two "small worlds" network generation models we examined in class — the α-model of Watts, and the model of Kleinberg in which vertices lie on a grid, and a vertex is connected to all others within grid distance p, as well as having q "long distance" connections. A vertex at grid distance d has probability proportional to 1/dr of being chosen as a long-distance connection, for some r > 0.

1. Discuss how the Erdös-Renyí model can be approximated in both the α-model and Kleinberg's model. What settings of the parameters of each model realizes this approximation? Briefly justify your answer.
2. Briefly discuss the similarities and differences between the α-model and Kleinberg's model. Be as precise as possible, referring to the different roles of the two models' parameters.
3. In the Kleinberg model, briefly explain why the value of r must be carefully chosen in order to permit efficient decentralized navigation.