Schema-GraphRAG: Bridging Hybrid Search and Graph Traversal for Complex Retrieval Tasks (accepted in demo track)

Published in IEEE ICDE, 2026

Abstract

Large-language-model–powered systems are increasingly used over heterogeneous enterprise data, but hybrid search (lexical + dense) alone cannot enforce structured filters and multi-hop predicates when schemas are complex. In practice, many user predicates refer to relationships—keys, joins, and paths—that must be followed to constrain results. We address this by executing graph-traversal on top of hybrid search. We develop Schema-GraphRAG, a retrieval-only backend that couples hybrid text search with schema-aware traversal. Each row is serialized to JSON for BM25 and vector indexing. Primary–foreign-key links form an instance graph with a schema overlay. Queries accept lexical terms, a dense text query, and optional Gremlin traversals that express filter predicates and path constraints. The engine first runs hybrid search to produce seed entities, then applies the traversal to enforce filters, expand along permissible paths, and assemble evidence; results are returned as {id, json} with provenance. We demonstrate the system on a retail slice through four scenarios—hybrid only, deterministic path expansion, flexible end-type traversal, and a live insert that updates results immediately.