Cyber Center Lecture- Liangcai (Steven) Shu
March 12 @ 12:00 PM - 1:00 PM - Lawson 1142
Efficient Blocking for Entity Resolution
In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services, customers in supply chain management, and users in social networks. It becomes even more difficult to solve when data is integrated from multiple data sources as there usually lacks unique identifier in the system to represent a real-world entity. Entity resolution is to identify and discover objects in the data sets that refer to the same entity in the real-world.
We investigate the entity resolution problem for large data sets where efficient and scalable solutions are needed. We propose an unsupervised blocking method, which is used to divide a data set into blocks such that candidate objects representing the same entity appear in the same block. Our experimental results with real-world data show that our approach is promising.