PENN CIS 620, FALL 2006: SEMINAR ON SPONSORED SEARCH

Prof Michael Kearns
mkearns@cis.upenn.edu
Mondays 12-3 PM
307 Levine Hall


COURSE DESCRIPTION

Keyword auctions for sponsored search, such as those offered by Google and Yahoo!, have become profitable and influential. They are also a fascinating new kind of market --- namely, a market in natural language phrases. While the structure of these new markets is evolving rapidly, there is also a diverse and nascent science emerging around the challenges and issues they raise. In this seminar course we will attempt to survey the many academic and commercial views of sponsored search that are under rapid development. The list of candidate topics includes an examination of the details of the various commercial auction mechanisms, their pricing schemes, and the analytic tools and data avaiable to advertisers to track and optimize their keyword portfolios; game-theoretic analyses of the equilibrium and other properties of keyword auctions; and interactions between the economic/financial and natural language aspects of sponsored search. See Candidate Topics below for more details.

Since statistical language processing, theoretical computer science, game theory and economics, and machine learning are all topics of central importance to sponsored search, it is my hope that we will have strong representation from each of Penn's excellent groups in these areas. The interests of course participants will play a major role in determining the precise content.


COURSE FORMAT AND REQUIREMENTS

The seminar will be run as an informal and conversational reading/research group. We will meet once a week for an extended period (Mondays 12-3) to encourage discussion and brainstorming. Requirements of those taking the course for credit consist of active participation and presentation in the meetings, and a course project. Students will be expected to work with me to select some number of papers or topics to lead the group through during one or more meetings. More details will emerge gradually in the Detailed Meeting Schedule.

For the course projects, I'd like most of these to be "online" in the sense that they are conducted during the term, not after or at the end, and thus can be shared with the group gradually as they develop. I am hoping that some number of these projects might include empirical experiments with data from one of the keyword auctions. For potential ideas, see Project Ideas below.

We also have a number of very interesting guest speakers/visitors to the seminar; see the Detailed Meeting Schedule for details as they emerge.

This is a new field, and there are no absolute prerequisites to the course. However, I expect the ability to follow a mathematical argument to be handy at various points (e.g. we will examine some formal game-theoretic analyses of keyword auctions), as will some basic knowledge of algorithm design and analysis, statistics, natural language processsing, game theory and economics. General web savvy will also be helpful. Overall the spirit will be to make things self-contained so that everyone can follow. Again, check out Candidate Topics below for just a sample of the type of material I am hoping to cover.

Auditors and occasional participants are welcome. Strong undergraduates are also encouraged to join.


DETAILED MEETING SCHEDULE

Mon Sep 11

Organizational and introductory meeting. We'll introduce ourselves, I'll overview much of the material that is collected on this web site, and we'll brainstorm on topics and material we'd like to cover in the course. Everyone should come to the first meeting with basic familiarity with sponsored search auctions, which can be easily obtained by sampling some of the many readings and materials on this site. You can also look at "Sponsored Search: A Brief History" by Fain and Pedersen. Informal outline of what we'll cover:

  • Introductions
  • Course mechanics
  • Brief intro to sponsored search
  • Tour of Google AdWords, Overture ViewBids tool
  • Discussion of Yahoo! auction data
  • Tour of material on this site
  • Group discussion

    Mon Sep 18

    Material for discussion:

  • "Internet Advertising and the Generalized Second Price Auction: Selling Billions of Dollars of Keywords". Edelman, Ostrovsky, Schwarz.
  • "Position Auctions". Varian.
  • Strategic Bidder Behavior in Sponsored Search Auctions. Edelman, Ostrovsky.
  • "Truthful Auctions for Pricing Search Keywords". Aggarwal, Goel, Motwani. (Here is a related paper by Archer and Tardos. )
  • "An Analysis of Alternative Slot Auction Designs for Sponsored Search". Lahaie.

    Some course mechanics announcements: If you're taking the seminar for credit, please look at the announcement regarding projects under the Sep 25 entry (next week). Also, you'll notice that due to new visitors/speakers, the sessions are already filled until the Oct 30 gathering. Based on the remaining slots and the current number of registered students, this probably means that everyone will have to run a session at most once, and in fact you can/should probably work in teams. The basic protocol will be that I will determine the approximate content I'd like covered in advance, and then will request volunteers for the date in question.

    Mon Sep 25

    This week Ryan Gabbard will lead us through an investigation of the various services and tools provided to advertisers by Google and Yahoo!, as well as those provided by third parties. As preparation, please go and do some experimentation with the following tools:

  • Google's Keyword Tool and Traffic Estimator for advertisers
  • Yahoo!/Overture View Bids tool for advertisers
  • Any and all other such services you can find at Google!, Yahoo, or other vendors
  • Here are Ryan's most excellent slides on Google and Yahoo! advertiser services and on third party offerings. The discussion on the latter, particularly on the Atlas BidManager tool, led to a new suggestion under Project Ideas .

    IMPORTANT ANNOUNCEMENT FOR THOSE TAKING THE SEMINAR FOR CREDIT: In order to get people thinking about their course projects early, all seminar credit participants should send me (via email) an initial proposal(s) for your project by Monday, Sep 25. This is just the first round, so you can have more than one distinct proposal, and they can be brief.

    Mon Oct 2

    This week we have guest speaker Chris Dixon, founder of McAfee's SiteAdvisor, an interesting web trust and anti-fraud service. He will talk about various kinds of web fraud, including those directly related to sponsored search, such as click fraud.

    Readings (please check for updates):

  • Business Week article on click fraud. Following the main article are a few supplementary readings of related interest. Many thanks to seminar participants Edi Bice and Boulos Harb for pointing it out.
  • The Tuzhilin Report analyzing Google's click fraud efforts in the Lane's Gifts vs. Google lawsuit. This was mentioned briefly in Ryan Gabbard's presentation last week.
  • Please sample some of the readings on the SiteAdvisor blog.

    Of general interest to the seminar: at the ACM Electronic Commerce conference this past summer, there was an interesting panel discussion on models for sponsored search; a transcript has recently been made available.

    Mon Oct 9

    Guest speaker/visitor Alon Altman.

    An Axiomatic Approach to Ranking Systems

    Abstract: This talk will survey our recent work in applying the axiomatic approach to ranking systems. Ranking systems are systems in which agents rank each other to produce a social ranking. In the axiomatic approach we study ranking systems under the light of basic properties, or axioms. In this talk I will present our axiomatization theorem for the PageRank ranking system, prove an impossibility and possibility result for general ranking systems, and discuss the issue of incentives in ranking systems. Finally, I will show initial results regarding personalized ranking systems, where a specialized ranking is generated for each agent.

    IMPORTANT ANNOUNCEMENT FOR THOSE TAKING THE SEMINAR FOR CREDIT: In order to get the creative juices flowing for empirical projects, I am giving a little assignment required of all students in the seminar. You should have all received Kuzman's Script in an earlier email (let me know if you need a resend). You should all use this script to obtain data for some moderately large (say, at least in the dozens) set of phrases of your own choosing. Presumably these phrases will be "related" in some interesting way (e.g. the names of all 50 states). Please try to present an analysis of the data, what explains similarities, differences, rankings, etc. I will discuss this example in Excel in class today to help clarify. I'd like everyone to send me their brief analysis by this coming Sunday night, Oct 15, for discussion in next Monday's session.

    Mon Oct 16

    This time we'll hear from Kartik Hosanagar, who is both a Wharton faculty member and closely involved with a very interesting startup called NatPal, which he will tell us about. Because they do large-scale keyword auction bidding on behalf of many clients, NatPal addresses many important components of sponsored search, including various keyword and portfolio optimization problems, bid aggregation, and supply-demand matching. NatPal may also provide us some data for course projects, so we may hear about this as well.

    Mon Oct 23

    Fall break; no meeting

    Mon Oct 30

    This week we get a brief break from our flurry of guest speakers; so we'll cover the following paper:

  • "Quality Uncertainty and the Performance of Online Sponsored Search Markets: An Empirical Investigation". Animesh, Ramachandran, Viswanathan.

    Please read the paper in advance of our meeting. Our discussion leaders will be Qian Liu and Michael Zavlanos. The paper will probably not consume our entire meeting, so we will then go on to take a look at the empirical studies many of you recently did using Kuzman's Script; please be prepared to say a little about yours.

    Mon Nov 6

    Today we will have guest speaker/visitor David Pennock of Yahoo! Research. Here is a description from David about what he will cover; please read the papers he cites in advance:

    I will give a brief and biased history of the sponsored search industry. I will survey some of the sponsored search research being conducted in the microeconomics group at Yahoo! Research. I will summarize some of the activities at the annual Workshops on Sponsored Search Auctions. Among other topics, I will discuss these papers:

  • Implementing sponsored search in web search engines: Computational evaluation of alternative mechanisms. Jane Feng, Hemant Bhargava, David M. Pennock Informs Journal on Computing, forthcoming.
  • An Analysis of Alternative Slot Auction Designs for Sponsored Search. Sebastien Lahaie. ACM Conference on Electronic Commerce, 2006. [MK note: This paper was on our Sep 18 agenda but we never got to it.]

    Mon Nov 13

    This week we will be led by Kuzman Ganchev, Alex Kulesza, and Jenn Wortman through the following papers, but concentrating on the sections cited; please read accordingly. The common theme is bidding strategies under extended models (budgets, multiple keywords, dynamic behavior).

  • An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services. Rusmevichientong, Williamson. Sections 1, 2, 4.2, 5.
  • Bid Optimization in Online Advertisement Auctions. Borgs, Chayes, Etesami, Immorlica, Jain, Mahadian. All sections, but not the proofs.
  • Dynamics of Bidding in Sponsored Search Auctions: An Analytical Investigation. Kursad Asdemir. Sections 1, 3, 5 but skip Propositions 2 and 3, 6.

    Mon Nov 20

    Today's meeting will be devoted to the discussion and development of course projects. Everyone is encouraged to come and contribute, but attendance and participation are required of registered students.

    More specifically, all registered students should be prepared to make a brief presentation in this session on their proposed course project. If you feel you already have an independent project idea that you have discussed with me and obtained approval for, you should present that idea to the group. You should clearly frame the question(s) you are investigating, motivate them, describe your planned methods (theoretical or experimental), etc. Each presentation will be followed by brief discussion and suggestions from the group.

    Alternately, you will all recall that in our Oct 23 meeting (see above), we examined some of the many interesting empirical experiments you did using Kuzman's Script, and I suggested that another possibility was a group of us joining forces for a larger project and possible paper along similar lines. If you are interested in joining such an effort, you should come prepared with some suggested empirical experiments for group discussion.

    To put some structure/discipline around all this, regardless of which category you fall into above, I'd like everyone to send me a couple of slides we can put up for discussion during the meeting; please send these to me by sometime on Sunday, Nov 19.

    Finally, once we have settled the project ideas in this meeting, I expect at least some nontrivial work to begin on them before the end of the term, so our final meeting on Dec 4 will be devoted to progress reports on the projects.

    ADDENDUM posted Nov 15: Due to a death in my family last night, I will be flying to CA this weekend and not returning until later Monday. So though I will not be present, I would still like everyone to carry on with the plan outlined above --- we need to get working on the projects.

    Fernando has generously offered to informally oversee the proceedings on Monday. I suggest having people who are planning on doing solo projects present first, followed by the brainstorming and presentation of ideas for the larger group project. In all cases the point is to generate ideas and get feedback, and for the group project, I'd like you all to generate a reasonably specific plan of action and division of labor.

    In addition to sending your materials to me by Sunday night so I can see what everyone is planning, please send them to Jenn (wortmanj@seas.upenn.edu) so she can put them all on a common machine for projection on Monday.

    Mon Nov 27

    Today we have the fascinating Ben Edelman as our visitor and speaker for a real extravaganza. First, Ben will present a more formal talk 11 AM Mon Nov 27 in 307 Levine. This talk is open to the general community, but should also be of specific interest to Sponsored Search Seminar participants. Here is the abstract for that talk:

    Adverse Selection in Online "Trust" Certifications

    Ben Edelman
    Department of Economics
    Harvard University

    Widely-used online "trust" authorities issue certifications without substantial verification of the actual trustworthiness of recipients. Their lax approach gives rise to adverse selection: The sites that seek and obtain trust certifications are actually significantly less trustworthy than those that forego certification. I demonstrate this adverse selection empirically via a new dataset on web site characteristics and safety. I find that TRUSTe-certified sites are more than twice as likely to be untrustworthy as uncertified sites, a difference which remains statistically and economically significant when restricted to "complex" commercial sites. I also present analogous results of adverse selection in search engine advertising - finding ads at leading search engines to be more than twice as likely to be untrustworthy as corresponding organic search results for the same search terms. (Here is a related paper. )

    Following the talk above, Ben will continue and present in our usual informal seminar setting. Here is a description of what he plans to cover:

    Guest speaker Ben Edelman will present his work on equilibria in sponsored search: Estimates of search engine revenues from unstable allocations, theoretical derivation of one equilibrium, and new simulation results to test convergence as well as to compare bidder outcomes and to compute policy counterfactuals.  What happens to search engine revenues and advertiser surpluses if a search engine changes its minimum bid rule?  Reduces the number of ads shown in results?  Changes the relative prominence of ads in different positions?  And if you had to choose a strategy for a PPC advertiser, what strategy would you recommend?  Ben's simulations attempt to offer some answers.

    Ben will also show us a bit of the darker side of sponsored search. Ben has caught spyware vendors showing Yahoo PPC ads in troubling and allegedly prohibited ways -- like covering merchants' sites with their own ads.  He also has video proof of spyware performing click-fraud.  We'll look at some of these examples, and we'll untangle the relationships that have created this mess.  Ben may also tell us a bit about Draucker Development et al. v. Yahoo (in which he serves as co-counsel), advertiser class action litigation claiming that Yahoo breached its contract with advertisers when it put ads into spyware, typosquatting sites, untargeted banner ads, and other unsavory placements.

    Mon Dec 4

    It has been a great ride, and this is our final meeting. As promised, we will devote it to progress reports on course projects; please send me your materials for projection by Sunday night. For those not doing projects, I am also happy if anyone wants to share any final thoughts on sponsored search, research ideas, and so on. Again, just send me anything you want projected and I will consolidate.

    Here is the directory of project materials.

    FINAL POST

    Course project write-ups are due to me via email Wed Dec 20. I look forward to reading them. Thanks to all of you for participating in the seminar --- I hope you all learned as much as I did. Happy Holidays!


    CANDIDATE TOPICS AND MATERIAL

    What follows below is my attempt to organize thoughts and material on some sponsored search topics I find interesting. It is by no means a definitive or exhaustive list of what we will examine. In some places I have not yet found much or any literature on the imagined topic, but think there might (or should) be some. My anticipation is that this list will grow and evolve as we discover more topics and material, and that much of the material will eventually be incorporated into the Detailed Meeting Schedule above.

    The Advertiser's View. Given that our investigation will include not just the science but also the practice of sponsored search, it seems natural to begin with a detailed look at what sponsored search really is, and how it works. In particular, what is it like to be an advertiser using sponsored search? We'll look at the portions of advertiser documentation provided by the various keyword auctions that describe how the auctions are run, how prices and charges are set, and related topics. We'll also examine the extensive tools available to advertisers from both the auctioneers and third parties for keyword performance tracking, keyword selection, bid placement and management, and other functionality. A central focus here will be what data is available to advertisers across the different auctions.

    Some sample material:

  • Google's information for advertisers and keyword tool.
  • Yahoo!'s search marketing resources , including the View Bids Tool.
  • Ask's sponsored listing basics.
  • Third-party optimization and management tools and services such as Efficient Frontier,   Did-It,   Atlas Search,   Bloofusion,   SearchRev, and Hitwise.
  • Some keyword bidding robots: Atlas Search,   Did-It's Maestro Client,   BidRank,   Dynamic BidMaximizer,   Apex Pacific,   PPC Management, and Search Marketing Tools PPC BidTracker.
  • The Search Engine Marketing Professional Organization.
  • "An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services. Rusmevichientong, Williamson.
  • Optimal Bidding on Keyword Auctions. Kitts, Leblanc.

    Game-Theoretic Views of Keyword Auctions. The field of algorithmic mechanism design has exploded within the last several years, and there is now a body of literature on the algorithmic and game-theoretic properties of sponsored search in particular (sometimes referred to as "slot auctions"). We'll sample selectively in this growing area, including formal equilibrium analyses of existing keyword auction mechanisms, examinations of alternative mechanisms, and empirical analyses.

    Some sample material:

  • Papers from the Second Workshop on Sponsored Search Auctions from the 2006 ACM Electronic Commerce conference, and the First Workshop from ACM EC 2005.
  • "Internet Advertising and the Generalized Second Price Auction: Selling Billions of Dollars of Keywords". Edelman, Ostrovsky, Schwarz.
  • "Position Auctions". Varian.
  • "Truthful Auctions for Pricing Search Keywords". Aggarwal, Goel, Motwani.
  • "An Analysis of Alternative Slot Auction Designs for Sponsored Search". Lahaie.
  • Strategic Bidder Behavior in Sponsored Search Auctions. Edelman, Ostrovsky.

    Sponsored Search and Natural Language. While there is of course a very large literature on natural language search and information retrieval, and a smaller but growing one on the economic aspsects of keyword auctions, my limited survey so far suggests to me that there is a dearth of works truly in the intersection of these two topics. How can we best blend the tools of economics and natural language processing/IR to understand sponsored search? As a simple and concrete instance of the type of question one might ask, how would we define and detect when two terms synonymous with each other in the semantic sense are "mispriced" by the market relative to each other? (See Project Ideas below.)

    Some sample material:

  • "Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation". Bartz, Murthi, Sebastian.

    The Sponsored Search Jungle. The topics above mainly focus on keyword auctions and their advertisers. But the success of the sponsored search markets and the web more generally has spawned an elaborate ecosystem in which all manner of methods are employed by advertisers, search engines and others to advance their own interests. This jungle gives rise to phenomena like link farms, web spam, click fraud and many others --- some of it intentional, some of it not, and much of it "undesirable" in one or another sense. We should spend some time examining the broader environment and impacts of sponsored search.

    Some sample material:

  • Writings of Ben Edelman, including some fascinating analyses [1]   [2] of how money flows between advertisers, auctions, and syndicates.
  • SiteAdvisor report on The Safety of Internet Search Engines.
  • "On Advertiser's Bidding Strategies for Search, Experience, and Credence Goods: an Empirical Investigation". Animesh, Ramachandran, Viswanathan.
  • Text Link Ads, a company selling hard links.
  • The commercial service Click Monkeys. Apparently a joke, but still a real issue.
  • "Pay-Per-Percentage of Impressions: An Advertising Method that is Highly Robust to Fraud". Goodman.

    SEM vs. SEO. Search engine optimization (SEO) refers to the process of tailoring a web site to optimize its (unpaid, or "left side", or "organic") ranking for a given set of keywords or phrases. Search engine marketing (SEM) refers to paid (or "right side") or sponsored search. While our emphasis will be on SEM, it is inevitable that we consider some SEO issues and methods as well, in particular because advertisers engage in both kinds of activity. I haven't gotten around to assembling SEO material yet, but here seems to be one extensive site and I am sure we can find more together.

    Sponsored Search and Traditional Finance. It is interesting to think about whether ideas and methods from traditional finance can be imported to sponsored search in a useful way. For instance, how could one apply classical mean-variance portfolio optimization ideas to keyword auctions? How could one support a secondary market for keywords, and who would benefit? What if the keyword auctions offered options, so that for a price a florist could lock in now their position for the term "flowers" for the week preceding Mother's Day 2007?

    Some sample material: None found/posted yet.

    Other Topics and Material.

  • The first Pay Per Click Bidding Agent Competition, from ACM EC 2006

    A number of readings suggested by David Pennock (need to assemble links and integrate with above):

  • Weber & Zeng. A model of search intermediaries and paid referrals
  • Bhargava & Feng. Preferential placement in Internet search engines
  • Feng, Bhargava, & Pennock. Implementing sponsored search in web search engines: Computational evaluation of alternative mechanisms
  • Feng. Optimal allocation mechanisms when bidders’ ranking for objects is common
  • Asdemir. Internet advertising pricing models
  • Asdemir. A theory of bidding in search phrase auctions: Can bidding wars be collusive?
  • Mehta, Saberi, Vazirani, & Vazirani. AdWords and generalized on-line matching
  • Paid Search as an Information Seeking Paradigm. Jansen.
  • The effectiveness of Web search engines for retrieving relevant ecommerce links. Jansen, Molina.


    PROJECT IDEAS

    Mispricing in Keyword Auctions. In traditional finance, there are many settings in which it is possible to define and detect "mispricings" between two or more investment instruments that are directly or indirectly related to each other; perhaps the canonical example is that of the current (spot) price for the share of a stock and the price of an option to buy or sell that stock at a fixed price and future time (the subject of the classical Black-Sholes theory). Mispricings, or more accurately the absence of them, are one measure of the "efficiency" of a given market. What are the analogues of this line of thought in sponsored search? One direction is to view different keywords as being related to each other via their semantic (definitional) or usage aspects. A semi-specific project would be to compile sets of semantically or operationally synonymous words/phrases, and empircally investigate how prices vary across synonymous terms. A next step would be to measure how much of ths variance can be explained by overall search popularity of the terms.

    Discovering Supply-Demand Imbalances. (Suggested by Chris Dixon ) One can view the "demand" for a given search term as being roughly represented by the volume and quality of people searching on it (where quality might be measured by how much searchers spend), and the "supply" as being roughly represented by the volume and quality of advertisers purchasing the search term (where quality might be measured by whether the advertisers provide relevant/good products). An interesting project would quantify these vague notions and empirically investigate the nature and extent of search terms for which there are supply-demand imbalances.

    Risk Mitigation Services for Advertisers. (Based on conversations with David Pennock and John Langford of Yahoo!) Suppose you are a Philadelphia florist; you might be quite willing to pay some amount now for the guarantee that during the week leading up to Mother's Day 2007, your bid of 50 cents per click on the phrase "Philadelphia flowers" will be one of the top four sponsored search results. We can think of this guarantee as an option --- the option to purchase a spot in the top four sponsored results at a future time for a fixed price. If (say) the top bid in the week before Mother's day is only 25 cents, you obviously won't exercise this option; if the fourth highest bid is a dollar, you will. (For simplicity here we are ignoring "quality" components to the ranking of sponsored ads.) What can we say about how to price such options? Can we import or apply the classical theory of options pricing in finance such as Black-Scholes, and how?

    Note that although such options are not offered by any of the major sponsored search auctions (yet), it is interesting to think about the issues involved in setting up an independent service offering them. Such a service would enter option agreements with advertisers and then be responsible for meeting the resulting guarantees on behalf of those advertisers, bidding whatever was necessary to meet them, possibly at a loss. From the perspective of this service, the goal is to profit from the combination of correct pricing of the options on average and the aggregation of risk across many advertisers.

    Another interesting idea is the use of independent information markets as a hedge against future price movement. Suppose the same Philadelphia florist knows that it can make profits only up to 50 cents per click, and is worried that in the future the competitive prices for the sponsored results on "Philadelphia flowers" will greatly exceed this limit. If there were a market in which participants could simply bet on the rise or fall of sponsored search prices, the florist could invest in the event that "Philadelphia flowers" will appreciate in value (that is, take a long position of shares in that phrase in the information market). If the florist gets priced out of the advertising market in the future, they will at least profit in the information market. See the Tech Buzz Game for an interesting example of a similar information market.

    "Agent-Based" Analysis of Sponsored Search. During Ryan Gabbard's presentation of third-party advertiser services , there was a brief discussion of the various heuristics available to users of Atlas' BidManager. One can easily imagine a future (or present?) in which the vast majority of bidding is done not by "fully rational" agents, but by large populations of such simple agents. One could then imagine doing modified equilibria analysis in which the strategies available to advertisers are limited to such heuristics. Such analysis could be either theoretical or based on simulations, the latter possibly informed by empirical auction data.