Chapter13

Chapter 13: Composing Graphs\index{graph composition}¶

Start a Claude Code session. Add two MCP servers: one for a kgraph-derived Postgres graph of recent endocrinology literature, one for DBpedia. The model sees twelve tools: the six BFS-QL tools prefixed with bfs-ql. and the same six prefixed with dbpedia.. The servers are identically structured. The model does not know or care that one is backed by Postgres and the other by Virtuoso. It knows only that each gives it six tools for navigating a graph.

This is not a specially engineered federation capability. No protocol extension is required. No shared schema is negotiated in advance. The two servers are independent; they know nothing about each other. What makes them composable is not the protocol -- it is the identifiers.

Identity Bridging¶

Desmopressin in the kgraph Postgres graph has the canonical ID RxNorm:3251. Desmopressin in DBpedia has the URI <http://dbpedia.org/resource/Desmopressin>, which the SPARQL backend normalizes to DBpedia:Desmopressin. These are different identifiers -- the graphs use different ID schemes. But both entities carry an RxNorm property. An LLM that knows to look for it can recognize that RxNorm:3251 in one graph and RxNorm: 3251 in the DBpedia record refer to the same compound.

When graphs share a canonical ID scheme -- both use RxNorm for drugs, both use MeSH for diseases -- bridging is automatic. The LLM queries the first graph, finds RxNorm:3251, uses that ID as a seed in the second graph's bfs_query, and traverses the boundary. No mapping table. No federation protocol. The shared ID is the bridge.

When graphs use different ID schemes, bridging requires a step: take the entity's label from the first graph ("desmopressin"), call search_entities in the second graph with that label, inspect the results, and pick the right match. This is the same disambiguation step the LLM performs at the start of any session. The difference is that it is now cross-graph.

Composability is proportional to shared canonical identity. Two graphs that both use RxNorm for drugs and MeSH for diseases can be traversed as a single logical graph for any query that stays within those domains. Two graphs with entirely bespoke ID schemes can be bridged only by label matching, which is slower and more ambiguous. The degree of composability is not a property of the BFS-QL protocol. It is a property of the graphs.

What the LLM Actually Sees¶

In a session with two BFS-QL servers connected, the model sees something like this in its tool list:

bfs-ql.describe_schema()
  -- medlit Postgres graph
bfs-ql.search_entities(query, ...)
bfs-ql.bfs_query(seeds, ...)
bfs-ql.describe_entity(id)
bfs-ql.describe_entities([id, ...])
bfs-ql.intersect_subgraphs(seeds, k, ...)

dbpedia.describe_schema()
  -- DBpedia SPARQL endpoint
dbpedia.search_entities(query, ...)
dbpedia.bfs_query(seeds, ...)
dbpedia.describe_entity(id)
dbpedia.describe_entities([id, ...])
dbpedia.intersect_subgraphs(seeds, k, ...)

The server name prefix is the only differentiator. The tool signatures are identical. The session workflow is identical. A query that begins in the kgraph graph -- orient, resolve desmopressin to RxNorm:3251, traverse 2 hops -- can continue in DBpedia by using RxNorm:3251 (or the label "desmopressin") as the seed for dbpedia.search_entities. The model bridges graphs the same way a human researcher bridges databases: by carrying a known identifier across sources.

The research literature graph knows what papers say about desmopressin -- which studies, which findings, which patient populations, which confidence scores. The encyclopedic backbone knows what desmopressin is -- its pharmacological class, its mechanism of action, its related compounds, its place in the drug taxonomy. Together they give the LLM both the frontier and the foundation. Neither graph has both. The composition does.

The Canonical ID Argument, Revisited¶

The companion volume argues for canonical IDs as a quality concern: an entity that is anchored to a MeSH term or RxNorm code is unambiguous, verifiable, and connected to a community of expert judgment. Here, the same argument appears as a composition argument: that anchoring is what makes the entity bridgeable across graphs.

The two arguments are not separate. They are the same observation from different vantage points. A canonical ID is not just a unique key for deduplication. It is a pointer into a shared epistemic commons -- the accumulated judgment of a community about how to name and classify things in a domain. When two graphs both point to that commons, they become connected through it, without any bilateral coordination.

The biomedical, legal, chemistry, and geography communities built their identifier infrastructures -- MeSH, MeSH, RxNorm, HGNC, ChEBI, PubChem, Wikidata, GeoNames -- over decades for their own internal purposes: literature indexing, regulatory compliance, compound tracking, geographic reference. They were not building an interoperability layer for LLM reasoning. But that is what they built, as a side effect of building a shared commons. The emergent property was always latent in the infrastructure. BFS-QL makes it accessible.