Projects

My research interests lie at the intersection of machine learning (ML) and databases. I apply ML techniques to data integration and interactive analytics while involving a human-in-the-loop. Towards this direction I have worked on the following projects.

Human Intent Prediction for Data Exploration
Unified Active Learning for Entity Matching
Data Integration of Electric System Schemata
Rule Discovery in Knowledge Bases
Interpretable Entity Matching
Statistical Data Cleaning

Human Intent Prediction for Data Exploration

In this project, we aim at making the human-database interaction seamless during a data exploration session, by predicting the dynamically changing human intent.

Publications: ICDE 2018 (Lightning Talk Abstract), EDBT 2019 (Short Paper), TODS 2021 (Research Paper)
Collaborators: Kanchan Chowdhury, Mohamed Sarwat

Unified Active Learning for Entity Matching

We build a unified active learning framework for entity matching to evaluate combinations of learners and example selectors w.r.t quality, latency, #labels and interpretability metrics. We also compare the active learning strategies against state-of-the-art supervised learning approaches.

unifiedAL

Publications: SIGMOD 2020 (Research Paper)
Collaborators: Prithviraj Sen, Lucian Popa, Mohamed Sarwat

Data Integration of Electric System Schemata

We integrate real world schemata with a lot of inconsistencies and apply approximate entity matching and schema alignment techniques to reconcile electric system transmission, distribution and location data with diverse format. This project was a collaborative effort between the CASCADE team at ASU and Salt River Project (SRP) which is one of the primary electricity distributors in Arizona.

Collaborators: Stewart Nunn, Dragan Boscovic, Mohamed Sarwat

Rule Discovery in Knowledge Bases

We mine positive and negative rules that satisfy or negate pre-specified relationships between the subject and object entities in a Knowledge Graph. This is done by traversing the paths between several instances (RDF triples) of the relationship and generalizing them into rules.

rudikFig

Publications: ICDE 2018 (Research Paper), VLDB 2018 (Demo Paper), JDIQ 2019 (Research Paper)
Collaborators: Stefano Ortona, Paolo Papotti, Naser Ahmadi, Viet-Phi Huynh

Interpretable Entity Matching

We use a powerful technique called program synthesis and a solver named Sketch to generate concise and interpretable boolean expressions (rules) satisfying matching and non-matching assertions on the training data to perform entity matching.

ERSynth

Publications: PVLDB 2017 (Research Paper) presented in VLDB 2018, SIGMOD 2017 (Demo Paper)
Collaborators: Rohit Singh, Paolo Papotti, Nan Tang, Armando Solar-Lezama, Samuel Madden, Ahmed K. Elmagarmid, Jorge-Arnulfo Quiane-Ruiz

Statistical Data Cleaning

We design and develop a statistical data cleaning framework called BayesWipe which obviates the need for clean master data. Rather, it learns a model of the clean data from the dirty data itself in a probabilistically principled manner.

BayesWipe

Publications: JDIQ 2016 (Research Paper)
Collaborators: Sushovan De, Yuheng Hu, Yi Chen, Subbarao Kambhampati

Venkata VamsiKrishna Meduri