Appendix A: BFS-QL Protocol Reference¶

BFS-QL is a graph query protocol for language model (LLM) consumption. It exposes a knowledge graph through six MCP tools with a flat query format. The LLM traverses the graph by calling tools with natural-language seeds and structured filters, receiving subgraphs shaped for context-window efficiency.

The six tools¶

Tool	Purpose
`describe_schema()`	Orient: learn what the graph contains
`search_entities(query, ...)`	Resolve: map a name to canonical IDs
`bfs_query(seeds, max_hops, ...)`	Traverse: expand a neighborhood
`describe_entity(id)`	Expand: full metadata for one stub
`describe_entities(ids)`	Expand (batch): full metadata for multiple stubs
`intersect_subgraphs(seeds, k, ...)`	Intersect: nodes reachable from every seed

`describe_schema()`¶

Arguments: none

Returns:

{
  "graph_description": "...",
  "comprehensive": true,
  "entity_types": ["Disease", "Drug", "Gene"],
  "predicates": ["TREATS", "INHIBITS", "ENCODES"],
  "next_steps": "...",
  "tool_usage_notes": "..."
}

Fields:

Field	Description
`graph_description`	Human-readable summary of the graph domain and data source.
`comprehensive`	`true` if lists are complete and exhaustive; `false` for large or open-world graphs where they are samples only.
`entity_types`	Valid type names for `bfs_query` `node_types`. May be empty.
`predicates`	Valid predicate names for `bfs_query` `predicates`. May be empty.
`next_steps`	Backend-authored workflow instructions. Follow these in preference to any generic default.
`tool_usage_notes`	Reference guide for all BFS-QL tools: parameter meanings, filtering rules, and critical usage patterns.

When comprehensive is false, use schema_summary from an initial bfs_query result to discover valid type and predicate values for the local neighborhood.

`search_entities(query, node_types=None)`¶

Resolves a natural-language name or alias to canonical entity IDs. Always call before bfs_query when you do not already have a canonical ID.

Arguments:

Field	Required	Description
`query`	Yes	Name, alias, or partial name to look up.
`node_types`	No	Restrict results to these entity types. Use to exclude high-volume types (e.g., papers) when resolving concept names.

Returns: array of EntityStub

[
  {
    "id": "MeSH:D003480",
    "entity_type": "Disease",
    "name": "Cushing Syndrome",
    "score": 0.94
  },
  {
    "id": "MeSH:D047748",
    "entity_type": "Disease",
    "name": "Cushing Disease",
    "score": 0.91
  }
]

name is the entity's display name. score is the vector similarity score (0--1, higher is better) when the backend uses embedding-based search; null when full-text search is used instead. Inspect results before choosing a seed -- common names are often ambiguous. Use entity_type to distinguish concept entities from papers or authors that share the same name.

`bfs_query(seeds, max_hops, ...)`¶

Performs a breadth-first search from one or more seed entities. Returns the union of their neighborhoods as a BfsResult. Filters control the detail level of items in the result, not which items are included -- non-matching nodes and edges always appear as lightweight stubs.

Arguments:

Field	Required	Default	Description
`seeds`	Yes	--	One or more canonical entity IDs. All seeds expand simultaneously; the result is the union of their neighborhoods.
`max_hops`	Yes	--	Maximum graph distance from any seed. Values 1--3 are typical.
`node_types`	No	all	Matching nodes receive full metadata; others become stubs.
`predicates`	No	all	Matching edges receive full metadata; others become bare triples.
`topology_only`	No	`false`	When `true`, every node is a bare `{id, entity_type}` and every edge a bare triple. Overrides `node_types` and `predicates`.
`exclude_node_types`	No	none	Remove these types and all edges touching them entirely. Topology is no longer guaranteed complete. Use for high-volume types that add no conceptual value.
`min_mentions`	No	`1`	Remove nodes with `total_mentions` below this threshold (and touching edges). Nodes without a `total_mentions` field are always included. Filters the result, not the traversal.
`limit`	No	none	Cap the number of nodes returned. Counts always reflect the full traversal.
`offset`	No	`0`	Skip the first N nodes. Use with `limit` to page through large results.

`describe_entity(id)`¶

Retrieves full metadata for a single entity by canonical ID. Use when bfs_query returns a stub and you want the full record for that node.

Arguments:

Field	Required	Description
`id`	Yes	Canonical entity ID.

Returns: full node metadata as a flat dict -- the id, entity_type, and all metadata fields merged at the top level. Same keys as the metadata dict inside a full Node, but without nesting.

`describe_entities(ids)`¶

Retrieves full metadata for multiple entities in a single call. Use instead of sequential describe_entity calls when expanding several stubs at once.

Arguments:

Field	Required	Description
`ids`	Yes	List of canonical entity IDs.

Returns: list of full node metadata dicts, same shape as full nodes in bfs_query results. IDs that do not exist in the graph are silently omitted from the output.

`intersect_subgraphs(seeds, k, ...)`¶

Returns only nodes within k undirected hops of every seed simultaneously -- the intersection of their neighborhoods rather than the union. Use when a multi-seed bfs_query returns too many nodes for the LLM to intersect manually.

Arguments:

Field	Required	Default	Description
`seeds`	Yes	--	Two or more canonical entity IDs.
`k`	Yes	--	Hop radius (1--5). Every result node must be reachable from all seeds within this distance, treating edges as undirected.
`node_types`	No	all	Matching nodes receive full metadata; others become stubs.
`exclude_node_types`	No	none	Remove these types and all edges touching them.
`predicates`	No	all	Matching edges receive full metadata; others become bare triples.
`min_mentions`	No	`1`	Remove nodes with `total_mentions` below this threshold.
`topology_only`	No	`false`	When `true`, return IDs and types only.

Returns an IntersectionResult:

{
  "seeds":      ["MeSH:D003480", "MeSH:D049970"],
  "k":          2,
  "node_count": 12,
  "edge_count": 15,
  "nodes":      [...],
  "edges":      [...],
  "schema_summary": {
    "entity_types_found": ["Drug", "Gene"],
    "predicates_found":   ["TREATS", "INHIBITS"]
  }
}

Note: intersect_subgraphs does not support limit/offset pagination.

Response format¶

bfs_query returns a BfsResult:

{
  "seeds":      ["MeSH:D003480"],
  "max_hops":   2,
  "node_count": 84,
  "edge_count": 99,
  "nodes":      [...],
  "edges":      [...],
  "schema_summary": {
    "entity_types_found": ["Disease", "Drug", "Gene"],
    "predicates_found":   ["TREATS", "INHIBITS"]
  }
}

node_count and edge_count reflect the full traversal regardless of limit/offset. schema_summary reflects the full traversal regardless of filters -- it always contains the actual types and predicates present in the subgraph.

Full node (entity type matches node_types, or no filter):

{
  "id":          "PUB:PMC2386281",
  "entity_type": "Publication",
  "metadata": {
    "name":   "The Diagnosis of Cushing's Syndrome",
    "source": "pubmed",
    "canonical_url": "https://pubmed.ncbi.nlm.nih.gov/18493314/",
    "confidence": 0.99,
    "total_mentions": 12
  }
}

Metadata keys vary by entity type and backend. Common keys across backends: name, source, canonical_url, confidence, usage_count, total_mentions, synonyms.

Stub node (entity type does not match node_types):

{"id": "PERSON:67890", "entity_type": "Person"}

Full edge (predicate matches predicates, or no filter):

{
  "subject":   "DRUG:rxnorm:41493",
  "predicate": "TREATS",
  "object":    "MeSH:D003480",
  "metadata": {
    "confidence":       0.91,
    "source_documents": ["PMC2386281", "PMC3367558"]
  }
}

Edge metadata always includes confidence and source_documents (a list of document IDs supporting the relationship) when available. Full provenance text is stored in the backend but stripped from MCP responses to manage context size; use describe_entity(id) on the source document node to retrieve it.

Stub edge (predicate does not match predicates):

{
  "subject":   "DRUG:rxnorm:41493",
  "predicate": "INTERACTS_WITH",
  "object":    "DRUG:rxnorm:88014"
}

Session workflow¶

The recommended sequence for any BFS-QL session:

1. describe_schema()
   → learn entity types, predicates, graph description
   → follow next_steps instructions

2. search_entities(name, node_types=[...])
   → resolve name to one or more canonical IDs
   → use node_types to suppress high-volume types

3. bfs_query(seeds, max_hops=1, topology_only=True)
   → survey structure cheaply
   → read schema_summary for valid filter values

4. describe_entities([id, id, ...])
   → batch-expand stubs identified as significant
   → one call regardless of how many IDs

5. bfs_query(seeds, max_hops=1,
             node_types=[...], predicates=[...])
   → targeted re-query using filters from schema_summary

Steps 1 and 2 may be skipped when the server injects schema into tool descriptions at startup. Steps 3--5 are iterative: each traversal may reveal stubs that motivate further expansion or re-query.

For large literature-derived graphs where a topology survey exceeds the context budget, replace step 3 with a direct concept query:

bfs_query(
    seeds=[seed_id],
    max_hops=1,
    exclude_node_types=["paper", "author"],
    min_mentions=2,
)

Use intersect_subgraphs in place of bfs_query when the question is "what do all of these entities share?" and the result would be too large for the LLM to intersect manually.

Design properties¶

Topology is always complete. node_types and predicates filters control detail level, not presence. A stub node is not a missing node. exclude_node_types is the only filter that removes items -- use it deliberately.

Stubs are navigational handles. A stub in a bfs_query result carries a canonical ID. Call describe_entity(id) or describe_entities(ids) for full metadata. Seed a new bfs_query at a stub to expand its neighborhood.

schema_summary closes the vocabulary loop. For open-world backends where describe_schema returns comprehensive: false, schema_summary provides the valid node_types and predicates values for the actual neighborhood. Read it after a topology survey before issuing filtered follow-up queries.

Multi-seed queries express relational questions. bfs_query with multiple seeds returns the union of neighborhoods. intersect_subgraphs with multiple seeds returns the intersection. Use bfs_query for "what connects to any of these?" and intersect_subgraphs for "what do all of these share?"

Implementation notes¶

Caching. Cache at the GraphDbInterface primitive level -- edges_from, edges_to, get_node, metadata_for_node -- keyed on (backend_id, method, args). All traversal intelligence in the server layer benefits automatically. Cache entity_types() and predicates() results for the lifetime of a session.

Schema injection. At startup, if the schema has fewer than ~20 entity types and ~30 predicates, inject valid values into the bfs_query tool description. FastMCP supports dynamic tool descriptions. Above the threshold, suppress injection and rely on describe_schema().

Async concurrency. All GraphDbInterface methods are async. During BFS expansion, call edges_from/edges_to and get_node/metadata_for_node concurrently via asyncio to minimize latency on I/O-bound backends.

Context-window budget. Production deployments against well-connected graphs should accept an optional max_tokens hint and truncate or stub additional items when the estimated response size approaches the budget. Approximate response sizes for a 2-hop traversal over a moderately connected graph: ~110K characters with full metadata, ~57K with provenance stripped, ~14K with topology_only=true.

Appendix A: BFS-QL Protocol Reference¶

The six tools¶

describe_schema()¶

search_entities(query, node_types=None)¶

bfs_query(seeds, max_hops, ...)¶

describe_entity(id)¶

describe_entities(ids)¶

intersect_subgraphs(seeds, k, ...)¶

Response format¶

Session workflow¶

Design properties¶

Implementation notes¶

`describe_schema()`¶

`search_entities(query, node_types=None)`¶

`bfs_query(seeds, max_hops, ...)`¶

`describe_entity(id)`¶

`describe_entities(ids)`¶

`intersect_subgraphs(seeds, k, ...)`¶