|
Integration and sharing of data among multiple sources is one of the longest standing problems facing the data management community. The need to share and integrate data arises in a wide variety of settings, including enterprise data management, large science projects, collaboration among government agencies and data management on the World-Wide Web. The problem of building data integration systems is both technically deep and organizationally challenging. The talk will begin by reviewing some of the recent advances in this area and their significant impact on the commercial world, but I will then suggest that we have been thinking about data integration in too narrow a fashion.
I will propose dataspaces as a new abstraction for data management that is appropriate for data integration and sharing environments. Unlike a data integration system, a dataspace system does not require its users to fully specify semantic relationships among its component systems. Instead, a dataspace system offers some query and management capabilities right out of the box. Semantic integration evolves over time as needed to provide additional querying and management functionality. I will discuss an initial set of components for dataspace systems. In particular, I will argue that Machine Learning methods will play a key role in evolving the structure of dataspaces by capturing and reusing users' attention and work.
Joint work with Mike Franklin, David Maier and Jennifer Widom
|