Appendix A: BFS-QL Protocol Reference¶
BFS-QL is a graph query protocol for language model (LLM) consumption. It exposes a knowledge graph through six MCP tools with a flat query format. The LLM traverses the graph by calling tools with natural-language seeds and structured filters, receiving subgraphs shaped for context-window efficiency.
The six tools¶
| Tool | Purpose |
|---|---|
describe_schema() |
Orient: learn what the graph contains |
search_entities(query, ...) |
Resolve: map a name to canonical IDs |
bfs_query(seeds, max_hops, ...) |
Traverse: expand a neighborhood |
describe_entity(id) |
Expand: full metadata for one stub |
describe_entities(ids) |
Expand (batch): full metadata for multiple stubs |
intersect_subgraphs(seeds, k, ...) |
Intersect: nodes reachable from every seed |
describe_schema()¶
Arguments: none
Returns:
{
"graph_description": "...",
"comprehensive": true,
"entity_types": ["Disease", "Drug", "Gene"],
"predicates": ["TREATS", "INHIBITS", "ENCODES"],
"next_steps": "...",
"tool_usage_notes": "..."
}
Fields:
| Field | Description |
|---|---|
graph_description |
Human-readable summary of the graph domain and data source. |
comprehensive |
true if lists are complete and exhaustive; false for large or open-world graphs where they are samples only. |
entity_types |
Valid type names for bfs_query node_types. May be empty. |
predicates |
Valid predicate names for bfs_query predicates. May be empty. |
next_steps |
Backend-authored workflow instructions. Follow these in preference to any generic default. |
tool_usage_notes |
Reference guide for all BFS-QL tools: parameter meanings, filtering rules, and critical usage patterns. |
When comprehensive is false, use schema_summary from an initial
bfs_query result to discover valid type and predicate values for the
local neighborhood.
search_entities(query, node_types=None)¶
Resolves a natural-language name or alias to canonical entity IDs.
Always call before bfs_query when you do not already have a
canonical ID.
Arguments:
| Field | Required | Description |
|---|---|---|
query |
Yes | Name, alias, or partial name to look up. |
node_types |
No | Restrict results to these entity types. Use to exclude high-volume types (e.g., papers) when resolving concept names. |
Returns: array of EntityStub
[
{
"id": "MeSH:D003480",
"entity_type": "Disease",
"name": "Cushing Syndrome",
"score": 0.94
},
{
"id": "MeSH:D047748",
"entity_type": "Disease",
"name": "Cushing Disease",
"score": 0.91
}
]
name is the entity's display name. score is the vector
similarity score (0--1, higher is better) when the backend uses
embedding-based search; null when full-text search is used instead.
Inspect results before choosing a seed -- common names are often
ambiguous. Use entity_type to distinguish concept entities from
papers or authors that share the same name.
bfs_query(seeds, max_hops, ...)¶
Performs a breadth-first search from one or more seed entities.
Returns the union of their neighborhoods as a BfsResult. Filters
control the detail level of items in the result, not which items
are included -- non-matching nodes and edges always appear as
lightweight stubs.
Arguments:
| Field | Required | Default | Description |
|---|---|---|---|
seeds |
Yes | -- | One or more canonical entity IDs. All seeds expand simultaneously; the result is the union of their neighborhoods. |
max_hops |
Yes | -- | Maximum graph distance from any seed. Values 1--3 are typical. |
node_types |
No | all | Matching nodes receive full metadata; others become stubs. |
predicates |
No | all | Matching edges receive full metadata; others become bare triples. |
topology_only |
No | false |
When true, every node is a bare {id, entity_type} and every edge a bare triple. Overrides node_types and predicates. |
exclude_node_types |
No | none | Remove these types and all edges touching them entirely. Topology is no longer guaranteed complete. Use for high-volume types that add no conceptual value. |
min_mentions |
No | 1 |
Remove nodes with total_mentions below this threshold (and touching edges). Nodes without a total_mentions field are always included. Filters the result, not the traversal. |
limit |
No | none | Cap the number of nodes returned. Counts always reflect the full traversal. |
offset |
No | 0 |
Skip the first N nodes. Use with limit to page through large results. |
describe_entity(id)¶
Retrieves full metadata for a single entity by canonical ID. Use
when bfs_query returns a stub and you want the full record for
that node.
Arguments:
| Field | Required | Description |
|---|---|---|
id |
Yes | Canonical entity ID. |
Returns: full node metadata as a flat dict -- the id,
entity_type, and all metadata fields merged at the top level.
Same keys as the metadata dict inside a full Node, but
without nesting.
describe_entities(ids)¶
Retrieves full metadata for multiple entities in a single call. Use
instead of sequential describe_entity calls when expanding several
stubs at once.
Arguments:
| Field | Required | Description |
|---|---|---|
ids |
Yes | List of canonical entity IDs. |
Returns: list of full node metadata dicts, same shape as full
nodes in bfs_query results. IDs that do not exist in the graph
are silently omitted from the output.
intersect_subgraphs(seeds, k, ...)¶
Returns only nodes within k undirected hops of every seed
simultaneously -- the intersection of their neighborhoods rather
than the union. Use when a multi-seed bfs_query returns too many
nodes for the LLM to intersect manually.
Arguments:
| Field | Required | Default | Description |
|---|---|---|---|
seeds |
Yes | -- | Two or more canonical entity IDs. |
k |
Yes | -- | Hop radius (1--5). Every result node must be reachable from all seeds within this distance, treating edges as undirected. |
node_types |
No | all | Matching nodes receive full metadata; others become stubs. |
exclude_node_types |
No | none | Remove these types and all edges touching them. |
predicates |
No | all | Matching edges receive full metadata; others become bare triples. |
min_mentions |
No | 1 |
Remove nodes with total_mentions below this threshold. |
topology_only |
No | false |
When true, return IDs and types only. |
Returns an IntersectionResult:
{
"seeds": ["MeSH:D003480", "MeSH:D049970"],
"k": 2,
"node_count": 12,
"edge_count": 15,
"nodes": [...],
"edges": [...],
"schema_summary": {
"entity_types_found": ["Drug", "Gene"],
"predicates_found": ["TREATS", "INHIBITS"]
}
}
Note: intersect_subgraphs does not support limit/offset
pagination.
Response format¶
bfs_query returns a BfsResult:
{
"seeds": ["MeSH:D003480"],
"max_hops": 2,
"node_count": 84,
"edge_count": 99,
"nodes": [...],
"edges": [...],
"schema_summary": {
"entity_types_found": ["Disease", "Drug", "Gene"],
"predicates_found": ["TREATS", "INHIBITS"]
}
}
node_count and edge_count reflect the full traversal regardless
of limit/offset. schema_summary reflects the full traversal
regardless of filters -- it always contains the actual types and
predicates present in the subgraph.
Full node (entity type matches node_types, or no filter):
{
"id": "PUB:PMC2386281",
"entity_type": "Publication",
"metadata": {
"name": "The Diagnosis of Cushing's Syndrome",
"source": "pubmed",
"canonical_url": "https://pubmed.ncbi.nlm.nih.gov/18493314/",
"confidence": 0.99,
"total_mentions": 12
}
}
Metadata keys vary by entity type and backend. Common keys across
backends: name, source, canonical_url, confidence,
usage_count, total_mentions, synonyms.
Stub node (entity type does not match node_types):
Full edge (predicate matches predicates, or no filter):
{
"subject": "DRUG:rxnorm:41493",
"predicate": "TREATS",
"object": "MeSH:D003480",
"metadata": {
"confidence": 0.91,
"source_documents": ["PMC2386281", "PMC3367558"]
}
}
Edge metadata always includes confidence and source_documents
(a list of document IDs supporting the relationship) when available.
Full provenance text is stored in the backend but stripped from
MCP responses to manage context size; use describe_entity(id)
on the source document node to retrieve it.
Stub edge (predicate does not match predicates):
Session workflow¶
The recommended sequence for any BFS-QL session:
1. describe_schema()
→ learn entity types, predicates, graph description
→ follow next_steps instructions
2. search_entities(name, node_types=[...])
→ resolve name to one or more canonical IDs
→ use node_types to suppress high-volume types
3. bfs_query(seeds, max_hops=1, topology_only=True)
→ survey structure cheaply
→ read schema_summary for valid filter values
4. describe_entities([id, id, ...])
→ batch-expand stubs identified as significant
→ one call regardless of how many IDs
5. bfs_query(seeds, max_hops=1,
node_types=[...], predicates=[...])
→ targeted re-query using filters from schema_summary
Steps 1 and 2 may be skipped when the server injects schema into tool descriptions at startup. Steps 3--5 are iterative: each traversal may reveal stubs that motivate further expansion or re-query.
For large literature-derived graphs where a topology survey exceeds the context budget, replace step 3 with a direct concept query:
Use intersect_subgraphs in place of bfs_query when the question
is "what do all of these entities share?" and the result would be
too large for the LLM to intersect manually.
Design properties¶
Topology is always complete. node_types
and predicates filters control detail level, not presence.
A stub node is not a missing node. exclude_node_types is the only
filter that removes items -- use it deliberately.
Stubs are navigational handles. A stub in a
bfs_query result carries a canonical ID. Call describe_entity(id)
or describe_entities(ids) for full metadata. Seed a new bfs_query
at a stub to expand its neighborhood.
schema_summary closes the vocabulary loop.
For open-world backends where describe_schema returns
comprehensive: false, schema_summary provides the valid
node_types and predicates values for the actual neighborhood.
Read it after a topology survey before issuing filtered follow-up
queries.
Multi-seed queries express relational questions.
bfs_query with multiple seeds returns the union of neighborhoods.
intersect_subgraphs with multiple seeds returns the intersection.
Use bfs_query for "what connects to any of these?" and
intersect_subgraphs for "what do all of these share?"
Implementation notes¶
Caching. Cache at the GraphDbInterface primitive level --
edges_from, edges_to, get_node, metadata_for_node -- keyed
on (backend_id, method, args). All traversal intelligence in the
server layer benefits automatically. Cache entity_types() and
predicates() results for the lifetime of a session.
Schema injection. At
startup, if the schema has fewer than ~20 entity types and ~30
predicates, inject valid values into the bfs_query tool description.
FastMCP supports dynamic tool descriptions. Above the threshold,
suppress injection and rely on describe_schema().
Async concurrency. All
GraphDbInterface methods are async. During BFS expansion, call
edges_from/edges_to and get_node/metadata_for_node
concurrently via asyncio to minimize latency on I/O-bound backends.
Context-window budget. Production deployments against
well-connected graphs should accept an optional max_tokens hint
and truncate or stub additional items when the estimated response
size approaches the budget. Approximate response sizes for a 2-hop
traversal over a moderately connected graph: ~110K characters with
full metadata, ~57K with provenance stripped, ~14K with
topology_only=true.