From: ryan.gabbard@gmail.com on behalf of Ryan Gabbard [ryang@audhumla.org] Sent: Monday, October 16, 2006 11:34 AM To: Michael Kearns Subject: Worrisome SS analysis results... Attachments: sample_search_stats.ods; argh.ods Sorry for the late submission - as you can see below, my intended analysis got a bit derailed, but it might provide some very useful warnings for people's projects. Since my project idea was to look at "sparse" auctions with few participants and if the reserve prices could be fiddled with in such a way as to make more money for the search engines, I chose 100 queries at random from the AOL query data. After removing the clearly sexually explicit, I reran these queries on Yahoo and examined what ads came up. Since the instances my project could potentially apply to were those with 2-8 queries and especially those with 5-8, I was hoping there would be a significant number in that range, and indeed there were: # ads %queries 0 27% 1 16% 2 6% 3 2% 4 3% 5 9% 6 5% 7 7% 8 0% 9 24% So 49% of the queries were in my target range, and 21% in the prime target range. So I took those 49% of the queries and broke them down in to multiple possible keywords that query might match and then ran those keywords with Kuzman's tool. I then attempted to reassemble the data in order to reverse-engineer the prices of the ads which actually appeared in the search. This proved to be extremely difficult. A variety of observations: 1. Yahoo's broad match equivalent makes things very difficult - it brings in to the "auction" bids on keywords which don't appear in the query at all. Sometimes I wasn't able to find any plausibly related query which would trigger ads which actually appeared. Sometimes the relationship seems to be at some level of semantic clustering (e.g. a search for www.wwe [a big chunk of search queries are people typing or mistyping domain names; the WWE is a wrestling group] turned up lots of ads for things related to the NFL, even though no NFL adds bid on WWE keywords). 2. Sometimes Yahoo won't display some ads the bid tool shows which are over the reserve price even though it has room. 3. Yahoo does seem to do some sort of Google-type fiddling with the bids even though they supposedly don't. Among other things, eBay's ads would consistently place much high than their bids would justify. 4. The rate of change of bids and ads is extremely rapid. Just over the couple of hours it took me to gather the data, the number of ads displayed for about half the queries changed, sometimes dramatically (e.g. "illustrator brushes" went from 2 ads to over 8!). This may very much complicate people's data gathering attempts. 5. There's a very large difference between the number of ads Google displays and Yahoo displays for different queries, which was very unexpected. And it's not clearly in one direction or the other, either. In view of these difficulties, actually matching bids from the bid tool to the displayed ads is extremely difficult and sometimes not possible. From the cases for which I was able to obtain the bids for the bottom ads, they were consistently right at or above the reserve, which is good evidence I should abandon this project idea and choose another one. The spreadsheets with the raw data are attached. Apologies again for lateness! Ryan On 10/9/06, Michael Kearns wrote: > > All --- In order to get the creative juices flowing for empirical projects, > I am giving a little assignment required of all students taking the seminar > for credit. I am attaching an update of Kuzman's Script for obtaining > Overture > price data, which he has kindly modified to give nicely formatted output. > You need to have access to a unix or linux box in order to run it, which > I assume you all do. > > You should all use this script to obtain Overture prices for some moderately > > large (say, at least in the dozens) set of phrases of your own choosing. > Presumably these phrases will be "related" in some interesting way (e.g. the > > names of all 50 states). Please try to present an analysis of the prices, > what > explains their similarities, differences, rankings, etc. I will give an > example > in class today to help clarify. I'd like everyone to send me their brief > analysis > by this coming Sunday night, Oct 16, for discussion in next Monday's > session. > > Best > Prof Kearns > > > -- Ryan Gabbard Department of Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~gabbard