Projects
- Delivery of Query Optimizer in Watsonx.data 2.0
- Initiation and GA of IBM Data Lakehouse (Watsonx.data)
- Guided Data Analysis for Conversational Business Intelligence
- Active Learning for Ontology Alignment
- Human Intent Prediction for Data Exploration
- Unified Active Learning for Entity Matching
- Data Integration of Electric System Schemata
- Rule Discovery in Knowledge Bases
- Interpretable Entity Matching
- Statistical Data Cleaning
- Write-efficient sort for PCM
- Sub-query Plan reuse-based Query Optimization
Delivery of Query Optimizer in Watsonx.data 2.0
IBM research initiated and delivered enterprise grade query optimization in Watsonx.data. We initiated the idea of using the Db2 query optimizer as a disaggregated optimizer for complex Presto SQL queries, prototyped the initial proof-of-concept, and collaborated with Data & AI Business Unit to deliver the technology in Watsonx.data 2.0. We internally delivered better price performance compared to Databrick’s Photon engine on a query benchmark derived from public 100TB TPC-DS. We accomplished equal query runtime at less than 60% of the cost using Watsonx.data 2.0 with query optimizer and Presto C++ v0.286 on IBM Fusion HCI.
Team: Berthold Reinwald, Hamid Pirahesh, Michael Kaufmann, Nasrullah Sheikh, Richard Sidle, Venkata Vamsikrishna Meduri, Zoltan Arnold Nagy, Ronald Barber, Pascal Spoerri, Gregory Kishi, Aditi Pandit, Ajay Gupta, Arin Mathew, Ashok Kumar, Austin Clifford, Calisto Zuzarte, Christian Zentgraf, Deepak Majeti, Ethan Zang, George Lapis, Jason Sizto, Sudheesh Kairali
Initiation and General Availability (GA) of IBM Data Lakehouse (Watsonx.data)
IBM Data Lakehouse became GA in July 2023. IBM Research (Almaden Research Center) initiated the effort for IBM to enter the growing data lakehouse market. Research closely worked with Data and AI Business Unit (Silicon Valley Lab, Toronto, India) in setting the strategy and delivering the product. IBM Data Lakehouse is built on open source PrestoDB enriched with IBM technologies to make it enterprise ready. IBM Data Lakehouse builds the foundation for Watsonx.data.
Team: Hamid Pirahesh, Berthold Reinwald, Larry Chiu, Ronald Barber, Richard Sidle, Scott Guthridge, Venkata Vamsikrishna Meduri, Nasrullah Sheikh, Frank Schmuck
Guided Data Analysis for Conversational Business Intelligence
We built a Business Intelligence (BI) query recommender system that guides analysts towards the interesting segments of the data during a conversational data analysis session.
Technical Report: BI-REC
Authors: Venkata Vamsikrishna Meduri, Abdul Quamar, Chuan Lei, Vasilis Efthymiou, Fatma Özcan
Active Learning for Ontology Alignment
We built an active learning framework for ontology alignment using Graph neural Networks (GNNs).
Publications: VLDBJ 2024
Authors: Venkata Vamsikrishna Meduri, Abdul Quamar, Chuan Lei, Xiao Qin, Berthold Reinwald
Human Intent Prediction for Data Exploration
In this project, we aim at making the human-database interaction seamless during a data exploration session, by predicting the dynamically changing human intent.
Publications: ICDE 2018 (Lightning Talk Abstract), EDBT 2019 (Short Paper), TODS 2021 (Research Paper)
Authors: Venkata Vamsikrishna Meduri, Kanchan Chowdhury, Mohamed Sarwat
Unified Active Learning for Entity Matching
We build a unified active learning framework for entity matching to evaluate combinations of learners and example selectors w.r.t quality, latency, #labels and interpretability metrics. We also compare the active learning strategies against state-of-the-art supervised learning approaches.
Publications: SIGMOD 2020 (Research Paper)
Authors: Venkata Vamsikrishna Meduri, Prithviraj Sen, Lucian Popa, Mohamed Sarwat
Data Integration of Electric System Schemata
We integrate real world schemata with a lot of inconsistencies and apply approximate entity matching and schema alignment techniques to reconcile electric system transmission, distribution and location data with diverse format. This project was a collaborative effort between the CASCADE team at ASU and Salt River Project (SRP) which is one of the primary electricity distributors in Arizona.
Collaborators: Stewart Nunn, Dragan Boscovic, Mohamed Sarwat
Rule Discovery in Knowledge Bases
We mine positive and negative rules that satisfy or negate pre-specified relationships between the subject and object entities in a Knowledge Graph. This is done by traversing the paths between several instances (RDF triples) of the relationship and generalizing them into rules.
Publications: ICDE 2018 (Research Paper), VLDB 2018 (Demo Paper), JDIQ 2019 (Research Paper)
Authors: Stefano Ortona, Venkata Vamsikrishna Meduri, Paolo Papotti, Naser Ahmadi, Viet-Phi Huynh
Interpretable Entity Matching
We use a powerful technique called program synthesis and a solver named Sketch to generate concise and interpretable boolean expressions (rules) satisfying matching and non-matching assertions on the training data to perform entity matching.
Publications: PVLDB 2017 (Research Paper) presented in VLDB 2018, SIGMOD 2017 (Demo Paper)
Authors: Rohit Singh, Venkata Vamsikrishna Meduri, Paolo Papotti, Nan Tang, Armando Solar-Lezama, Samuel Madden, Ahmed K. Elmagarmid, Jorge-Arnulfo Quiane-Ruiz
Statistical Data Cleaning
We design and develop a statistical data cleaning framework called BayesWipe which obviates the need for clean master data. Rather, it learns a model of the clean data from the dirty data itself in a probabilistically principled manner.
Publications: JDIQ 2016 (Research Paper)
Authors: Sushovan De, Yuheng Hu, Venkata Vamsikrishna Meduri, Yi Chen, Subbarao Kambhampati
Write-efficient Sort for Phase Change Memory
We design a sort algorithm that minimizes the writes on Phase Change Memory under a hybrid main memory setting comprising a large PCM and a tiny DRAM without sacrificing latency. The purpose is to cater to the limited write endurance of PCM.
Publications: DEXA 2012 (Research Paper)
Authors: Venkata Vamsikrishna Meduri, Zhan Su, Kian-Lee Tan
Sub-query Plan reuse-based Query Optimization
We detect near-isomorphic subquery graphs with similar selectivities and reuse the optimal plan generated upon one candidate subquery for another isomorphic subquery enumerated during the optimal plan detection for a complex query. This was implemented in the PostgreSQL engine to reduce the latency of the (Iterative) Dynamic Programming query optimizer.
Publications: COMAD 2011 (Research Paper)
Authors: Venkata Vamsikrishna Meduri, Kian-Lee Tan