Chapter11
Chapter 11: The Neo4j Backend\index{Neo4j backend}¶
Neo4j is a property graph database, not an RDF store. The distinction matters for the implementation, though not for the BFS-QL interface. Where RDF graphs represent everything as triples of URIs and literals, property graphs attach key-value pairs directly to nodes and relationships. A node in Neo4j has a label (or multiple labels) and a set of properties. A relationship has a type and a set of properties. There are no blank nodes; everything is either a node or a named relationship.
For BFS-QL, the mapping from the property graph model to the
GraphDbInterface is direct:
- Node labels →
entity_type - Relationship types → predicates
- Node identity (Neo4j's internal ID or a canonical ID property) → entity ID
- Node properties →
metadata_for_node - Relationship properties →
metadata_for_edge
The implementation requires one configuration decision: which node property
holds the canonical ID. In a kgraph-derived Neo4j graph, this would be
entity_id. In a general Neo4j graph, it might be id, uri, name,
or something domain-specific. The backend is initialized with the canonical
ID property name.
Cypher Traversal¶
edges_from and edges_to are natural Cypher traversals:
-- edges_from(entity_id)
MATCH (n {entity_id: $id})-[r]->(m)
RETURN n.entity_id AS subject,
type(r) AS predicate,
m.entity_id AS object
-- edges_to(entity_id)
MATCH (n)-[r]->(m {entity_id: $id})
RETURN n.entity_id AS subject,
type(r) AS predicate,
m.entity_id AS object
type(r) returns the relationship type as a string, which becomes the
predicate. Neo4j relationship types are uppercase by convention (TREATS,
INHIBITS, ASSOCIATED_WITH); BFS-QL predicates are lowercase by
convention. The backend normalizes to lowercase at query time.
Full-Text Search¶
search_entities in Neo4j requires a full-text index. Unlike Postgres
(which can fall back to ILIKE) or a SPARQL endpoint (which can use
CONTAINS on labels), Neo4j has no built-in substring search on node
properties. A full-text index must be created at graph construction time:
With the index in place, search_entities becomes:
CALL db.index.fulltext.queryNodes("entity_names", $query)
YIELD node, score
RETURN node.entity_id AS id, labels(node)[0] AS entity_type
ORDER BY score DESC
LIMIT 10
The index requirement is a constraint on graph construction, not on
BFS-QL. A Neo4j graph served through BFS-QL must have the index; graphs
without it cannot support search_entities. The backend checks for index
existence at initialization and raises a clear error if it is missing,
rather than failing silently at query time.
entity_types and predicates¶
-- entity_types()
CALL db.labels() YIELD label RETURN label ORDER BY label
-- predicates()
CALL db.relationshipTypes() YIELD relationshipType
RETURN relationshipType ORDER BY relationshipType
Neo4j's db.labels() and db.relationshipTypes() procedures return the
complete label and relationship type vocabularies without scanning the
graph. They are fast, stable, and the natural implementation of
entity_types and predicates. No SELECT DISTINCT required.