Skip to content

Part II: The Protocol

Chapter 4: Six Tools

In 1974, computer scientist Christopher Alexander published A Pattern Language, a catalogue of 253 design patterns for buildings and towns. The book's argument was not that architects should memorize 253 patterns. It was that good design recurs -- that the same solutions to the same problems appear across different scales and contexts, and that naming them makes them easier to recognize, teach, and apply. The patterns ranged from urban planning ("City Country Fingers") to room layout ("The Flow Through Rooms") to the placement of a window seat. Each was a named, composable solution to a recurring problem.

What Alexander discovered, and what software engineers rediscovered twenty years later when they adapted his framework for code, is that the value of a pattern library is not in its size. It is in the coverage-to-complexity ratio. A small set of well-chosen patterns that together cover the full space of common problems is more useful than a large set that covers the same space redundantly or inconsistently. The goal is completeness with economy.

BFS-QL has six tools. The choice is not arbitrary and not conservative -- it is the result of asking, for every candidate tool, whether it covers something the others do not, and whether the space it covers is one an LLM actually needs.

Why Six

The full space of what an LLM needs to do with a knowledge graph can be decomposed into six operations, each distinct, together exhaustive:

Orientation. The LLM arrives at a graph it has never seen. It does not know what kinds of entities the graph contains, what relationships are represented, or how they are named. Before it can navigate, it needs a map. This is describe_schema.

Resolution. The LLM has a name -- a drug, a disease, an author. It needs the canonical ID that the graph uses for that entity. Names are ambiguous; canonical IDs are not. The operation of mapping a name to an ID is fundamental and cannot be collapsed into traversal without introducing the hallucination problem Chapter 2 described. This is search_entities.

Traversal. The LLM has a seed -- one or more canonical IDs. It wants to know what they connect to. This is the core operation, the one that makes graph knowledge accessible. Everything else is setup or follow-up. This is bfs_query.

Expansion. The traversal returns stubs -- lightweight placeholders for nodes that were present in the topology but did not warrant full metadata. The LLM sees that something is there and wants to know what it is. For a single stub, this is describe_entity.

Batch expansion. A single bfs_query call typically surfaces several stubs worth inspecting. Calling describe_entity on each in sequence means one round-trip per entity: the LLM issues a call, waits, reads the result, decides to expand the next stub, and repeats. Each round-trip carries the full overhead of a tool invocation in an LLM session -- not a database round-trip, but a model reasoning step. In practice, Claude Code flagged this explicitly as friction: sequential single-entity expansion is slow and accumulates latency when several stubs warrant attention. describe_entities accepts a list of IDs and returns full records for all of them in a single call. It is not a convenience alias for a loop; it is the operation that makes batch expansion a first-class primitive rather than an emergent pattern the model has to construct.

Intersection. The LLM has a set of seeds and wants to know what is common to all of them -- not the union of their neighborhoods, but the nodes reachable from every seed simultaneously. bfs_query returns the union; the LLM cannot reliably do the set intersection itself over hundreds of nodes. This is intersect_subgraphs.

Orient, resolve, traverse, expand, batch-expand, intersect. The protocol has grown by one each time a real gap appeared -- intersect_subgraphs when multi-seed reasoning proved unreliable without it, describe_entities when sequential single-entity expansion proved too costly. Candidate additions like "find shortest path" or "list all entities of type X" reduce to compositions of existing tools without material cost, or add query-oriented answers rather than navigational handles. The bar is real demonstrated need, not speculation.

The Session Workflow

The six tools define a natural sequence that a well-behaved LLM follows against any BFS-QL graph:

1. describe_schema()
   → learn entity types, predicates, graph description

2. search_entities(name, node_types=[...])
   → resolve a name to one or more canonical IDs
   → pass node_types to avoid noise in results
   → inspect entity_type to disambiguate if needed

3. bfs_query(seeds, max_hops, ...)
   → traverse from the resolved ID
   → start with topology_only=True for large graphs, OR
   → use exclude_node_types=["paper","author"], min_mentions=2
     for a concept-only result on literature graphs
   → use node_types and predicates to focus metadata detail

4. describe_entities([id, id, ...])
   → expand any stubs that warrant closer inspection
   → batch multiple IDs in a single call

Steps 1 and 2 may be partially redundant if the BFS-QL server injects schema into tool descriptions at startup -- in that case the LLM may skip the explicit describe_schema call. Steps 3 and 4 are iterative: the output of one bfs_query call identifies stubs that motivate describe_entity calls, which may motivate further bfs_query calls seeded at newly discovered nodes. The workflow is a loop, not a pipeline.

This matters for how the tools were designed. Each tool must be callable in any order, with the outputs of earlier calls serving as inputs to later ones. bfs_query takes canonical IDs -- which search_entities produces. describe_entity takes canonical IDs -- which appear in bfs_query results. The interface is compositional by construction.

What Is Not a Tool

The choice of six tools is also a choice of what not to include. Some candidates worth examining:

A "shortest path" tool. Useful for certain graph analyses. Not needed for LLM reasoning, which doesn't navigate to specific destinations -- it explores neighborhoods. An LLM that needs to know whether two entities are connected can issue a multi-hop bfs_query and inspect the result. The two-step answer is not materially worse than a dedicated tool, and adding the tool adds one more surface for the LLM to reason about.

A "list all entities of type X" tool. The medlit demo has 119 disease entities. A tool that returns all of them is not useful to an LLM trying to reason; it is a context flood. The right operation is bfs_query from a relevant seed with node_types=["disease"], which returns the disease entities that are connected to something the LLM already cares about. Relevance is structural, not taxonomic.

A "count" tool. Useful for human analysts building dashboards. Not useful for LLM reasoning. An LLM that receives "there are 119 disease entities" has not learned anything it can act on. The count tells it nothing about which diseases matter, how they connect, or what the graph structure implies about the domain.

The pattern in all three cases is the same: the candidate tool answers a query-oriented question rather than a traversal-oriented one. It gives the LLM a fact rather than a navigational handle. BFS-QL is designed for navigation. The six tools reflect that.

Chapter 5: describe_schema -- Self-Orienting Graphs

In the early days of the web, connecting to a new API meant reading its documentation. The documentation was a separate artifact -- a PDF, a wiki page, a sequence of example curl commands -- maintained by humans, often out of sync with the actual API, and unavailable to the software that needed it. A client that wanted to know what endpoints were available had to be told by a human who had read the docs.

This was not a fundamental limitation. Roy Fielding's REST dissertation, published in 2000, included hypermedia as a first-class constraint: a well-designed REST API should carry, in its responses, the information a client needs to navigate it. Links, not documentation. The API tells you what it can do; you don't need to be told separately. This principle -- that interfaces should be self-describing -- has become standard in modern API design. OpenAPI specifications, GraphQL introspection, FastAPI's /docs endpoint: all are expressions of the same idea.

describe_schema is BFS-QL's implementation of this principle for knowledge graphs. An LLM connecting to a graph it has never seen -- a private Fuseki instance, a domain-specific SPARQL endpoint, a kgraph-derived Postgres store for a hospital's clinical data -- needs to know what entity types and predicates exist before it can construct a meaningful query. In the SPARQL world, this required reading documentation. In BFS-QL, it requires one tool call.

What It Returns

A describe_schema response contains three things:

  • graph_description: A human-readable string describing the graph and its domain -- what the data represents, where it came from, what kinds of questions it is meant to answer. This is provided by the graph operator when the BFS-QL server is configured. A well-written description tells the LLM whether this is the right graph for its current question.

  • entity_types: The complete list of valid entity type names in the graph. These are exactly the values the LLM can pass as node_types in a bfs_query call. Not approximate names, not documentation -- the actual strings the query engine understands.

  • predicates: The complete list of valid predicate names. These are exactly the values the LLM can pass as predicates in a bfs_query call.

The medlit graph, for example, returns 19 entity types and 16 predicates. After one call, the LLM knows that drug, disease, and procedure are valid node types -- and that protein and enzyme are also present, which tells it something about the level of mechanistic detail in the graph. It knows that TREATS, CAUSES, and INHIBITS are valid predicates -- and that CITES and AUTHORED are also present, which tells it that the graph includes bibliographic structure alongside clinical knowledge.

This is orientation in the strict sense. The LLM knows what it is looking at before it starts navigating.

Two Delivery Modes

The describe_schema tool can be called explicitly or made unnecessary through a second mechanism: schema injection.

At startup, the BFS-QL server calls entity_types() and predicates() on the backend and holds the results in memory. If the schema is small enough -- the implementation uses a threshold of 20 entity types and 30 predicates -- the server injects the valid values directly into the bfs_query tool description. The LLM reads the tool description before it calls the tool, so it arrives at bfs_query already knowing what node_types and predicates values are valid. No explicit describe_schema call required.

This is a zero-cost optimization for small schemas. The LLM doesn't spend a tool call on orientation; the orientation is already embedded in the interface.

The tradeoff is tool description size. A graph with 19 entity types and 16 predicates adds roughly 200 characters to the bfs_query description -- negligible. A graph with 200 entity types and 500 predicates would make the tool description unwieldy and consume context before the LLM has done anything. Above the threshold, injection is suppressed and explicit calling is the path.

Both modes are supported transparently. The server chooses based on schema size. The LLM's behavior is the same either way: it starts a session knowing the schema, whether that knowledge came from injection or from a tool call.

The graph_description as a First-Class Signal

The graph description is worth more attention than it usually receives. In the medlit example, it reads: "36 PubMed papers on Cushing disease and related endocrinology." That sentence tells an LLM several things that affect how it should reason:

  • The corpus is small (36 papers). Claims that seem universal may be specific to this literature.
  • The domain is focused (Cushing disease). Entities and relationships outside that domain are unlikely to be well-represented.
  • The data source is biomedical literature. Relationships have provenance and carry confidence scores.

A graph operator deploying BFS-QL should treat the description as they would treat a system prompt: an opportunity to shape how the LLM approaches the data. "This graph contains inferred relationships; verify important claims against source documents." "The entity type provisional indicates entities whose canonical IDs could not be resolved." "Predicates are directional; TREATS runs from drug to disease, not the reverse."

The server instructions mechanism serves a similar function. BFS-QL's server sends a block of instructions to the LLM at session initialization, before any tool calls. These instructions can include graph-specific guidance that doesn't fit in the tool descriptions -- in the medlit deployment, for example, the instructions note that entity IDs beginning with prov: are provisional artifacts from the ingestion pipeline, carry no external canonical meaning, and should be treated as anonymous placeholders. Without that note, an LLM might waste reasoning cycles wondering what a provisional ID like prov:2e02b663d97c45499d4ce644abf81b8a refers to.

Self-description is not just schema. It is everything the graph operator knows about the data that the LLM would benefit from knowing before it starts.

Chapter 6: The Query Model

The core of BFS-QL is a single query structure with five parameters. Understanding why each parameter is present -- and why the others are not -- is the key to using the protocol well and to implementing it correctly.

The Parameters

seeds is a list of canonical entity IDs. This is the starting point of the traversal. Multiple seeds are supported because many useful questions are inherently relational: not "what connects to this entity?" but "what do these two entities have in common?" A multi-seed query issues a single BFS from all seeds simultaneously and returns their combined neighborhood, deduplicated. The LLM doesn't need to issue separate queries and merge the results manually.

max_hops is an integer controlling traversal depth. A value of 1 returns only immediate neighbors; 2 returns neighbors of neighbors; and so on up to a maximum of 5. The practical guidance is to start at 1 and expand only if the first result doesn't contain what you need. A 2-hop traversal from a well-connected node in the medlit graph returns 84 nodes and 99 edges. A 3-hop traversal from the same node would return most of the graph. Depth is a context budget decision, not a correctness decision -- the graph is the same either way.

node_types is an optional list of entity type names. Nodes whose type matches receive full metadata in the response. Nodes whose type does not match are returned as stubs -- present in the result with their ID and type, but no metadata. Omitting node_types gives full metadata for all nodes, which is appropriate when the graph is small or when the LLM needs comprehensive information. Providing node_types focuses the context budget on what matters.

predicates is an optional list of predicate names. Edges whose predicate matches receive full metadata in the response, including confidence scores, source documents, and provenance. Edges whose predicate does not match are returned as bare subject-predicate-object triples. The behavior is symmetric with node_types: topology is always present, detail is selectively paid for.

topology_only is a boolean that, when true, suppresses all metadata from the response. Every node is returned as a bare ID and type; every edge as a bare subject-predicate-object triple. No node metadata, no edge metadata, no provenance. The response is pure structural skeleton.

exclude_node_types is an optional list of entity type names to remove entirely from the result. Unlike node_types (which demotes non-matching nodes to stubs but keeps them), exclude_node_types removes the specified types and all edges that touch them. The topology is no longer guaranteed complete when this parameter is used -- that is the point. Use it to suppress high-volume types that dominate large traversals without adding conceptual value. The canonical use case is exclude_node_types=["paper", "author"] on a concept-oriented query: papers and authors are the connective tissue of a literature-derived graph and account for the majority of nodes in a deep traversal, but an LLM reasoning about disease mechanisms rarely needs them.

min_mentions is an optional integer (default 1, no filtering) that removes nodes whose total_mentions field in metadata is below the threshold, along with all edges touching them. This suppresses low-confidence provisional entities that appear in only one or two source documents and are structurally present but semantically unreliable. Nodes without a total_mentions field are always included regardless of threshold, so the filter is safe on backends that do not populate it. Note that min_mentions filters the result, not the traversal -- a low-mention node can still serve as a bridge to high-mention nodes at deeper hops, but it will not appear in the returned result.

limit and offset are optional integers for paginating large results. limit caps the number of nodes returned; offset skips the first N nodes. Together they allow an LLM to page through a large neighborhood without requesting everything at once. node_count and edge_count always reflect the full traversal regardless of pagination, so the LLM can see the total size and decide whether to request more pages. Edges are filtered to those whose both endpoints appear in the returned node window, so each page is a self-consistent subgraph. When neither parameter is specified the full result is returned unchanged.

The Flat Format

These five parameters are passed as a flat JSON object. There is no nesting, no sub-query structure, no boolean expression language. The query either specifies seeds, a depth, and optional filters, or it doesn't. This flatness is a deliberate choice.

Query languages like SPARQL and GraphQL support arbitrarily nested structures because they need to -- they are designed to express complex constraints precisely. BFS-QL is not designed for precise constraint expression. It is designed for reliable generation by a language model. Every level of nesting in a query format is an opportunity for the model to make a structural error -- a misplaced bracket, a wrong level of indentation, a filter applied at the wrong scope. A flat format has no levels. The model either provides the parameter or it doesn't.

This is not a limitation on expressiveness. The five parameters cover the full space of what BFS-QL needs to express. The flatness is expressiveness appropriate to the operation.

Context Budget Management

The central design constraint of the query model is the context window. Every token in the response consumes context budget; too many tokens degrade reasoning. The query parameters are the mechanism for managing that budget.

The recommended query progression reflects this:

First: topology survey. Call bfs_query with topology_only=True and max_hops=2. This returns the complete structural skeleton of the neighborhood -- every node and edge -- at minimum token cost. For the medlit desmopressin example, this is 14,000 characters for 84 nodes and 99 edges. The LLM can read the full topology and identify what matters before committing context budget to metadata.

Second: selective expansion. Call describe_entities with the IDs of the nodes the topology survey identified as significant. This retrieves full metadata for multiple nodes in a single call. The LLM pays for exactly the information it has decided it needs, and nothing else. (The single-node describe_entity remains available for one-off lookups; use describe_entities when expanding several stubs at once.)

Third: targeted re-query. If a follow-up traversal is needed -- perhaps the topology survey revealed an unexpected cluster that warrants its own exploration -- issue a new bfs_query with node_types and predicates filters focused on what matters. The third query is more expensive than the first but more targeted: it retrieves full metadata only for the entity types and predicates the LLM has decided are relevant.

This progression from cheap-and-broad to expensive-and-targeted is the working set principle in practice. The first query establishes the topological working set. The second and third queries fill in detail selectively.

Alternative for concept-dense graphs. On large literature-derived graphs, a topology survey at max_hops=2 may itself exceed the context budget -- hundreds of paper and author nodes dominate the result. In this case, skip the topology survey and issue a direct concept-only query:

bfs_query(
    seeds=[seed_id],
    max_hops=1,
    exclude_node_types=["paper", "author"],
    min_mentions=2,
)

This returns only concept entities (diseases, genes, drugs, pathways, etc.) with 2 or more corpus mentions -- high-signal nodes with full metadata -- in a single in-band response. The breast cancer 1-hop query on the graphwright corpus returns 73 nodes and 86 edges this way, compared to 1,347 nodes in the unfiltered 2-hop result. Use max_hops=1 as the default and expand to 2 only if the 1-hop result is too sparse.

Multi-Seed Queries

The multi-seed case deserves more attention than it typically receives, because it is the natural form for a large class of clinically and scientifically interesting questions.

bfs_query with multiple seeds returns the union of their neighborhoods, deduplicated. This is useful for many questions: "What connects this disease to this gene?" returns the combined neighborhood of both seeds, and the structural answer -- the nodes that appear in both halves of the union -- is present in the result for an LLM to inspect. For small result sets, this works well.

For larger graphs, union-and-inspect becomes unreliable. When each seed's 1-hop neighborhood contains hundreds of nodes, asking the LLM to identify which nodes appear in both is structured bookkeeping that language models do poorly -- they miss nodes, conflate similar IDs, and produce inconsistent results. This is the problem intersect_subgraphs solves: it returns only the nodes within k hops of every seed, without the LLM performing any manual set operations.

The medlit example illustrates the bfs_query case. A 1-hop multi-seed query from desmopressin (RxNorm:3251) and Cushing syndrome (MeSH:D003480) returns 35 nodes and 37 edges. Of those, exactly two nodes are in the direct neighborhood of both seeds: PMC11128938, the paper that co-describes both entities, and DBPedia:Cushing's_disease, the specific disease subtype that desmopressin treats. For a 36-paper graph at 1-hop depth, the LLM can inspect the union reliably. For a larger graph or deeper traversal, intersect_subgraphs is the right tool.

What the Response Contains

A BFS-QL response contains:

  • seeds: The seed IDs used. Included for reference -- in a multi-turn session, the LLM may need to recall which seeds were used for a given result.
  • max_hops: The depth used.
  • node_count and edge_count: Total counts. These are useful for calibrating follow-up queries -- a result with 200 nodes warrants a more targeted re-query than a result with 15.
  • nodes: A list of node records. Each is either a full Node (with metadata) or a stub EntityStub (ID and type only), depending on whether its type matched node_types.
  • edges: A list of edge records. Each is either a full EdgeWithMetadata (with confidence, source documents, and provenance) or a bare Edge (subject, predicate, object only), depending on whether its predicate matched predicates.
  • schema_summary: The entity types and predicates actually present in this result subgraph, regardless of the filters applied. See the next section.

One design choice worth noting: stub nodes are always included. If a Disease node is present in the topology but node_types=["drug"], the Disease node appears as a stub -- ID and type, no metadata. It is not omitted. The topology is always complete. This is the separation of topology from presentation that Chapter 3 argued for: filtering controls detail level, not presence.

Schema Discovery in Results

Every BFS-QL query response includes a schema_summary field containing the entity types and predicates actually present in that result subgraph. This applies to both bfs_query and intersect_subgraphs. This is a first-class feature, not implementation detail.

"schema_summary": {
  "entity_types_found": ["disease", "drug", "gene", "paper"],
  "predicates_found": ["associated_with", "targets", "treats"]
}

The value of schema_summary is especially clear in two situations.

Large or open-world graphs. describe_schema may return comprehensive=False when the graph is too large to enumerate entity types and predicates exhaustively -- a Wikidata endpoint, for instance, has thousands of predicates that cannot all be listed upfront. In this case, the LLM cannot know what filters are valid before issuing a query. schema_summary solves the problem by reporting the vocabulary actually present in the neighborhood. After a topology_only survey, the LLM can read schema_summary and use those values as node_types and predicates filters in a targeted follow-up query. No documentation needed, no guessing at predicate names.

Paginated results. When limit and offset are used to page through a large traversal, schema_summary always reflects the full traversal, not just the current page. The LLM sees the complete vocabulary of the neighborhood even if it is only reading a window of nodes. This matters because the decision about which types and predicates to filter on should be made with knowledge of the whole subgraph, not just the first page.

schema_summary closes the loop that describe_schema opens. Together they ensure an LLM always has valid filter values available, whether from the static schema at startup or from the live vocabulary of a result.

Name Disambiguation in search_entities

search_entities accepts a node_types parameter that restricts results to entities of the specified types. This exists to address a common disambiguation problem.

Common scientific terms match multiple entity types. "Breast cancer" matches the disease concept (MeSH:D001943) and also dozens of papers whose titles contain the phrase. When an LLM calls search_entities to resolve a disease name, it typically wants the disease concept, not the papers. Without node_types, the results may be dominated by papers; the disease entity may not appear in the top results at all.

```python search_entities("breast cancer", node_types=["disease"])