Projects

My research interests lie at the intersection of machine learning (ML) and databases. I apply ML techniques to data integration and interactive analytics while involving a human-in-the-loop. Towards this direction I have worked on the following projects.

Human Intent Prediction for Data Exploration

In this project, we aim at making the human-database interaction seamless during a data exploration session, by predicting the dynamically changing human intent.

Publications: ICDE 2018 (Lightning Talk Abstract), EDBT 2019 (Short Paper), TODS 2021 (Research Paper)
Collaborators: Kanchan Chowdhury, Mohamed Sarwat


Unified Active Learning for Entity Matching

We build a unified active learning framework for entity matching to evaluate combinations of learners and example selectors w.r.t quality, latency, #labels and interpretability metrics. We also compare the active learning strategies against state-of-the-art supervised learning approaches.

unifiedAL

Publications: SIGMOD 2020 (Research Paper)
Collaborators: Prithviraj Sen, Lucian Popa, Mohamed Sarwat


Data Integration of Electric System Schemata

We integrate real world schemata with a lot of inconsistencies and apply approximate entity matching and schema alignment techniques to reconcile electric system transmission, distribution and location data with diverse format. This project was a collaborative effort between the CASCADE team at ASU and Salt River Project (SRP) which is one of the primary electricity distributors in Arizona.

Collaborators: Stewart Nunn, Dragan Boscovic, Mohamed Sarwat


Rule Discovery in Knowledge Bases

We mine positive and negative rules that satisfy or negate pre-specified relationships between the subject and object entities in a Knowledge Graph. This is done by traversing the paths between several instances (RDF triples) of the relationship and generalizing them into rules.

rudikFig

Publications: ICDE 2018 (Research Paper), VLDB 2018 (Demo Paper), JDIQ 2019 (Research Paper)
Collaborators: Stefano Ortona, Paolo Papotti, Naser Ahmadi, Viet-Phi Huynh


Interpretable Entity Matching

We use a powerful technique called program synthesis and a solver named Sketch to generate concise and interpretable boolean expressions (rules) satisfying matching and non-matching assertions on the training data to perform entity matching.

ERSynth

Publications: PVLDB 2017 (Research Paper) presented in VLDB 2018, SIGMOD 2017 (Demo Paper)
Collaborators: Rohit Singh, Paolo Papotti, Nan Tang, Armando Solar-Lezama, Samuel Madden, Ahmed K. Elmagarmid, Jorge-Arnulfo Quiane-Ruiz


Statistical Data Cleaning

We design and develop a statistical data cleaning framework called BayesWipe which obviates the need for clean master data. Rather, it learns a model of the clean data from the dirty data itself in a probabilistically principled manner.

BayesWipe

Publications: JDIQ 2016 (Research Paper)
Collaborators: Sushovan De, Yuheng Hu, Yi Chen, Subbarao Kambhampati