Projects
My research interests lie at the intersection of machine learning (ML) and databases. I apply ML techniques to data integration and interactive analytics while involving a human-in-the-loop. Towards this direction I have worked on the following projects.
- Human Intent Prediction for Data Exploration
- Unified Active Learning for Entity Matching
- Data Integration of Electric System Schemata
- Rule Discovery in Knowledge Bases
- Interpretable Entity Matching
- Statistical Data Cleaning
Human Intent Prediction for Data Exploration
In this project, we aim at making the human-database interaction seamless during a data exploration session, by predicting the dynamically changing human intent.
Publications: ICDE 2018 (Lightning Talk Abstract), EDBT 2019 (Short Paper), TODS 2021 (Research Paper)
Collaborators: Kanchan Chowdhury, Mohamed Sarwat
Unified Active Learning for Entity Matching
We build a unified active learning framework for entity matching to evaluate combinations of learners and example selectors w.r.t quality, latency, #labels and interpretability metrics. We also compare the active learning strategies against state-of-the-art supervised learning approaches.
Publications: SIGMOD 2020 (Research Paper)
Collaborators: Prithviraj Sen, Lucian Popa, Mohamed Sarwat
Data Integration of Electric System Schemata
We integrate real world schemata with a lot of inconsistencies and apply approximate entity matching and schema alignment techniques to reconcile electric system transmission, distribution and location data with diverse format. This project was a collaborative effort between the CASCADE team at ASU and Salt River Project (SRP) which is one of the primary electricity distributors in Arizona.
Collaborators: Stewart Nunn, Dragan Boscovic, Mohamed Sarwat
Rule Discovery in Knowledge Bases
We mine positive and negative rules that satisfy or negate pre-specified relationships between the subject and object entities in a Knowledge Graph. This is done by traversing the paths between several instances (RDF triples) of the relationship and generalizing them into rules.
Publications: ICDE 2018 (Research Paper), VLDB 2018 (Demo Paper), JDIQ 2019 (Research Paper)
Collaborators: Stefano Ortona, Paolo Papotti, Naser Ahmadi, Viet-Phi Huynh
Interpretable Entity Matching
We use a powerful technique called program synthesis and a solver named Sketch to generate concise and interpretable boolean expressions (rules) satisfying matching and non-matching assertions on the training data to perform entity matching.
Publications: PVLDB 2017 (Research Paper) presented in VLDB 2018, SIGMOD 2017 (Demo Paper)
Collaborators: Rohit Singh, Paolo Papotti, Nan Tang, Armando Solar-Lezama, Samuel Madden, Ahmed K. Elmagarmid, Jorge-Arnulfo Quiane-Ruiz
Statistical Data Cleaning
We design and develop a statistical data cleaning framework called BayesWipe which obviates the need for clean master data. Rather, it learns a model of the clean data from the dirty data itself in a probabilistically principled manner.
Publications: JDIQ 2016 (Research Paper)
Collaborators: Sushovan De, Yuheng Hu, Yi Chen, Subbarao Kambhampati