mkse150  Homework 5

Overview

Do these exercises from the E&K textbook: 4, 5, and 6 in section 19.8.
A programming problem follows below.
The paper problems should be handed in at the beginning of class on Apr 23,
and the program should be submitted via turnin by midnight the same day.

Programming Background

In this assignment you will be seeding a social network to cause maximum adoption of a new technology. Your job is to implement the algorithm to choose the initial set by editing the file Marketer.java.

Setting up the Project

Download this eclipse project hw5.zip and Import it into Eclipse.
In the Package Explorer, open hw5/src/DefaultPackage.
Highlight Driver.java and run it. You should get a dozen lines, ending with this:
reading graph file example1.data
budget for early seeding: 2
showing graph:
4: [(0.30;5)]
2: [(0.60;1), (0.60;3), (0.60;5), (0.60;4)]
1: [(0.40;3), (0.50;2)]
choices for early seeding: []
This means that you have succeeded to compile and run the provided programs.

The 3 rows above describe the edges in the network. Node 4 has influence on node 5, and with probability 0.30 node 5 will adopt after node 4 does. Node 2 has influence on nodes 1, 3, 5 and 4; with probability 0.60 any of those will adopt after node 2 does. Node 1 has influence on node 3 with probability 0.30, and on node 2 with probability 0.50. The final row is the set of nodes to seed, and since you haven't done the work yet, that set is empty right now.

Getting Oriented

The file Driver.java generates different examples of social networks to seed. (Initially, most are commented out.) After constructing one of these objects, it will print out all the edges, and then call selectSeedNodes (which is what you will be writing). The object you need to know most about is the InfluenceGraph, which is a normal directed graph object with edges that carry a floating point number on them that specifies the probability of infection occurring along that edge. Open InfluenceGraph.java and you will see:

	public Set<Integer> influentials() { ...
	public Set<Sway> getNhbrs(int node) {...
	public List<Integer> runInfection(Set<Integer> EA) { ...
The first method returns a list of all the interesting nodes in the graph. The only ones that might not appear there are nodes that have no outgoing edges, and thus have no influence over any others. The second method, getNhbrs, returns a list of outgoing edges for any node you query.

The method runInfection will simulate the infection process for you. Give it a set of initial adopters and it will return the final set of infected nodes.

You might also want to use these extra features:

	public List<Integer> runInfections(Set<Integer> EA, int iterations) { ...
	public void fixFlipper(int s) { ...
As a convenience, runInfections (note the 's') will call runInfection for you a bunch of times and return a count of how many nodes got infected on each run.

Whenever runInfection is executed, coins are flipped and the outcome is probabilistic. This can make some types of debugging hard. If you wish to temporarily force the coin flipping to be deterministic, simply call the fixFlipper method with a positive argument. When you get your code debugged, take this call out again. When we run your code the fixFlipper will have no effect, so your program should work regardless of how the flipping proceeds.

What You Get To Do

Open the file Marketer.java, which is the only file you will be editing. Inside the class, you will see these members:

	static class record { ...
	public static float mean(List<Integer> list) { ...
	public static double stddev(List<Integer> list) { ...
All of these are completely irrelevant to understanding the problem. You might find you want to use them in your solution, but you do not need to. By the time you think you need these, then you will understand them already.

The only data you need comes as parameters to the selectSeedNodes method:

	public static Vector<Integer> selectSeedNodes(InfluenceGraph ig, int budget) { ...
The first argument describes the network you are given, and the second is the budget you have for seeding that network with initial adopters.

Your job is just to finish writing selectSeedNodes. (You may also alter anything else that you please in this one file.) Note that you will be using runInfection() or runInfections() to find out how well the seed nodes managed to infect the rest of the network. There is a limit to how often you can do this, though; you do not have infinite computer resources to calculate a good set either in the real world or in this assignment. You may use runInfection only 100,000 times. After that, your program will bomb, so don't be TOO greedy!     ;-)

When you get your algorithm working, go back to Driver and uncomment those lines in main so that you get more test cases.

Note that there is not going to be a single correct answer to this problem. There may be many different best sets; the coin-flipping will make every answer just an estimate; and the CPU budget limitation will keep you from overcoming that issue to find an exact answer. What you should be satisfied with is that the seed sets you choose result in large infections. You may share your results on Piazza for others to compare. Just report the infection sizes you got with your program.

Use turnin to turn in your Marketer.java file.

The latest edits to this file happened 2013/4/17 12:50. Typos in the formatting of code sections were fixed.