CIS Homeline
   
arrow About CIS
spacer spacer
arrow Events
  CIS events in Penn Calendar
spacer spacer
arrow People
spacer spacer
arrow Research
spacer spacer
arrow Undergraduate program
spacer spacer
arrow Graduate program
spacer spacer
arrow Job Openings
   

 

CIS Home divider Penn Engineering divider PENN   spacer  

 
 Saul Gorn Memorial Lecture, 2007 

Photos of the Saul Gorn Reception and Dinner

 

Monday, April 9th, 2007

 

The Saul Gorn Memorial Lecture Series was established in honor of the late Professor Saul Gorn who played a key role in the establishment of the Computer Science Graduate Group in the Moore School, which later became the Department of Computer and Information Science.

The Department of Computer and Information Science and the Institute for Research in Cognitive Science are proud to present distinguished lecturer..

 

Hector Garcia-Molina

Professor,Departments of Computer Science

and Electrical Engineering

Stanford University

Time: 3:00 - 4:30 pm
Place: Wu and Chen Auditorium, 101 Levine Hall

             http://infolab.stanford.edu/people/hector.html


Generic Entity Resolution

Abstract:


Entity resolution (ER) is a problem that arises in many information integration scenarios: We have two or more sources containing records on the same set of real-world entities (e.g., customers).  However, there are no unique identifiers that tell us what records from one source correspond to those in the other sources.  Furthermore, the records representing the same entity may have differing information, e.g., one record may have the address misspelled, another record may be missing some fields.  An ER algorithm attempts to identify the matching records from multiple sources (i.e., those corresponding to the same real-world entity), and merges the matching records as best it can.


In this talk I will describe a "generic" ER approach where the functions for comparing and merging records are black-boxes, invoked on pairs of records.  I will describe a set of important properties of the black-boxes that enable efficient ER.  I will also introduce three algorithms for ER: one for the general case, one for the case the properties hold, and one when the computations can be distributed across multiple processors.  If time permits, I will show some experimental comparisons of the algorithms, based on comparison shopping data provided by Yahoo.

Brief Bio:

Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford. He served as the Chairman of the Computer Science Department from 2001-2004, and was the Director of the Computer Systems Laboratory from 1994 to 1997. He was a faculty member of the Computer Science Department at Princeton University from 1979 to 1991, and a member of the President's Information Technology Advisory Committee (PITAC) from 1997 to 2001. Garcia-Molina’s research interests include distributed computing systems, digital libraries and database systems. He received his BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico in 1974, and earned an MS in electrical engineering in 1975 and a PhD in computer science in 1979 from Stanford University.  

Garcia-Molina is involved in numerous professional activities. He is a member of the National Academy of Engineering, and a Fellow of the Association for Computing Machinery and the American Academy of Arts and Sciences. He is on the Technical Advisory Board of DoCoMo Labs USA, Yahoo Search & Marketplace. A Venture Advisor for Diamondhead Ventures, Garcia-Molina is a member on the Board of Directors of Oracle and Kintera.  He received the ACM SIGMOD Innovations Award in 1999.

 

 

 

Archived Lectures

2007

2006

Speakers prior to

2006

 




 
 
CIS Home divider Penn Engineering divider PENN   spacer
  Send comments on this page to