Chapter08
Chapter 8: The GraphDbInterface ABC¶
There is a tension in interface design between expressiveness and simplicity. An expressive interface gives the caller more power -- more ways to phrase a request, more control over execution, more options to tune. A simple interface gives the implementor less to build and the caller fewer things to get wrong. The usual engineering response is to find a balance, to offer enough expressiveness without crossing into complexity.
BFS-QL resolves this tension by separating it into two layers. The
GraphDbInterface abstract base class is the implementor-facing interface.
It is deliberately primitive -- eight methods, all basic graph navigation,
no traversal intelligence. The BFS-QL server layer is the caller-facing
interface, where all the expressiveness lives: multi-seed BFS, stub/full
filtering, topology mode, caching, the full six-tool surface. These are
not in tension because they are in different places. The ABC is simple so
that backends are easy to write. The server is expressive because it has to
be. Neither layer compromises the other.
Eight Methods¶
The complete interface:
class GraphDbInterface(ABC):
@abstractmethod
async def search_entities(
self, query: str, node_types: list[str] | None = None
) -> list[EntityStub]:
"""Resolve a name or alias to candidate stubs."""
@abstractmethod
async def edges_from(self, entity_id: str) -> list[Edge]:
"""Return all outgoing edges from the given entity."""
@abstractmethod
async def edges_to(self, entity_id: str) -> list[Edge]:
"""Return all incoming edges to the given entity."""
@abstractmethod
async def get_node(self, entity_id: str) -> Node:
"""Return the node record for the given entity ID."""
@abstractmethod
async def metadata_for_node(
self, entity_id: str
) -> dict[str, Any]:
"""Return all available metadata for the entity."""
@abstractmethod
async def metadata_for_edge(
self, edge: Edge
) -> dict[str, Any]:
"""Return full metadata, including provenance."""
@abstractmethod
async def entity_types(self) -> list[str]:
"""Return valid entity type names in this graph."""
@abstractmethod
async def predicates(self) -> list[str]:
"""Return valid predicate names in this graph."""
Three pairs and two singletons. edges_from / edges_to are the traversal
primitives -- directed graph navigation in both directions. get_node /
metadata_for_node separate identity from detail: the first returns just
the entity ID and type, the second returns everything else. metadata_for_edge
pairs with the traversal primitives to provide full provenance. Then the two
singletons: search_entities maps natural-language names to canonical IDs,
and entity_types / predicates expose the graph's own vocabulary.
The separation of get_node from metadata_for_node is deliberate. During
BFS expansion, the engine calls both concurrently for nodes that need full
records, but calls only get_node for nodes that will become stubs. A backend
that fetches metadata lazily -- or from a separate service -- can implement
both cheaply without conflating the two concerns.
What the Interface Does Not Contain¶
There is no bfs_query method. There is no count_neighbors method. There
is no find_shortest_path method. There is no filter parameter, no hop limit,
no traversal mode.
All of that is in the server layer. The bfs_query function in engine.py
implements multi-seed BFS entirely in terms of edges_from, edges_to,
get_node, and metadata_for_node. The stub/full decision is made at the
server layer. The topology-only mode is a serialization choice at the
server layer. The caching layer wraps the backend transparently.
The consequence is that a backend implementor answers only one question: how do I perform these eight operations against this particular graph store? Not: how do I implement BFS? Not: how do I cache edge lists efficiently? Not: how do I serialize results? Those questions are already answered, once, in the server layer, and the answers apply to every backend automatically.
The Caching Layer¶
CachedGraphDb wraps any backend in a dict-keyed cache at the primitive
level. Every call to edges_from(id) checks a dict before hitting the backend.
Every metadata_for_node(id) is cached after the first fetch. entity_types
and predicates are cached indefinitely -- they are stable for the lifetime
of a session.
class CachedGraphDb(GraphDbInterface):
def __init__(
self, backend: GraphDbInterface, maxsize: int = 1024
) -> None:
self._backend = backend
self._edges_from_cache: dict[str, list[Edge]] = {}
self._node_meta_cache: dict[str, dict[str, Any]] = {}
# ... per-method dicts for all eight methods
async def edges_from(self, entity_id: str) -> list[Edge]:
if entity_id not in self._edges_from_cache:
res = await self._backend.edges_from(entity_id)
self._edges_from_cache[entity_id] = res
return self._edges_from_cache[entity_id]
The critical property is that caching operates at the level where it pays. BFS traversal at depth 2 may visit the same node from multiple directions. Without caching, each visit triggers a backend round-trip. With primitive-level caching, the second visit is a dict lookup. The backend sees each distinct call at most once per session. A multi-hop traversal over a well-connected graph can reduce backend round-trips by an order of magnitude.
Because the cache is transparent -- CachedGraphDb implements
GraphDbInterface -- backends do not implement caching themselves. They
return fresh data on every call. The server layer decides caching policy.
Backends stay simple.
All Methods Are Async¶
Every method in GraphDbInterface is async. This is not a formality.
BFS expansion calls edges_from and edges_to for every node in the
current frontier concurrently, via asyncio.gather. A 2-hop BFS over
a frontier of 40 nodes issues 80 concurrent edge queries. Against a
Postgres backend, these resolve as concurrent connection pool requests.
Against a SPARQL endpoint, they resolve as concurrent HTTP requests.
Against a Neo4j backend, they resolve as concurrent Bolt protocol calls.
A synchronous interface would serialize this work unnecessarily. The async
design is a performance contract: backends that can serve concurrent queries
concurrently will. Backends that cannot (in-memory dicts, file-backed stores)
pay no penalty -- async def with no await inside is just a regular
function in async clothing.