1.011 Graph Database Clients#

Explainer

Graph Database Clients: For Technical Decision Makers#

Purpose: Help CTOs, architects, and product managers understand Python graph database client libraries without deep graph theory expertise.

Audience: Technical leaders evaluating graph database adoption, teams planning migrations, developers choosing between query languages.

What This Solves#

Graph database client libraries solve the connection and interaction problem between your Python application and graph database systems.

The Core Problem: You have relationship-heavy data (social networks, fraud rings, supply chains, knowledge graphs) that traditional SQL databases handle poorly. Graph databases excel at traversing multi-hop relationships, but you need a way for your Python code to:

Send queries to the graph database
Receive and process results efficiently
Manage connections, transactions, and errors
Abstract away database-specific protocols

Who Encounters This:

Startups building social features, recommendation engines, or knowledge bases
Enterprise teams implementing fraud detection, network analysis, or supply chain optimization
Data scientists constructing knowledge graphs for LLM-powered applications (GraphRAG)
SaaS developers adding relationship-driven features to existing products

Why It Matters: Choosing the wrong client library locks you into a specific database vendor, query language, and ecosystem. Migration costs can reach hundreds of thousands of dollars in engineering time. The choice made today determines your flexibility, operational costs, and feature velocity for years.

Accessible Analogies#

The Translator Analogy#

Think of your graph database as a foreign city, and the client library as your tour guide/translator:

Neo4j driver (Cypher): A local expert who speaks the native language fluently, knows all the shortcuts, and has deep cultural knowledge. Fastest and most natural, but only works in this one city.
gremlinpython (Gremlin): A professional translator who works across multiple cities in the same region (TinkerPop family). Slightly less fluent in each individual dialect, but you can move between cities (Neptune, Cosmos DB, JanusGraph) without finding a new guide.
TigerGraph (GSQL): A hyper-specialized guide for a unique city with its own invented language. Incredibly effective for that specific place, but you can’t take this guide anywhere else—total lock-in.

The Filing System Analogy#

Imagine organizing a library:

Relational databases (SQL): Books organized in strict categories with catalog cards. Finding related books means walking to different sections, checking multiple cards (JOINs). Slow for “show me all mystery novels written by authors who also wrote sci-fi and were influenced by Author X.”
Graph databases: Every book has strings connecting it to related books, authors, and genres. Following those strings (traversals) is instant. The client library is the tool that lets you pull strings and read the labels.

The Protocol Translator#

Your Python code speaks Python. Your graph database speaks a specialized protocol (Bolt, WebSocket, gRPC, HTTP). The client library is the interpreter that:

Translates your session.run("MATCH (n:User) RETURN n") into Bolt binary protocol
Manages the TCP connection pool
Deserializes results back into Python objects

Without it, you’d be manually crafting binary packets—impractical and error-prone.

When You Need This#

✅ You Need Graph Database Clients If:#

Relationship depth matters (3+ hops):

“Find friends-of-friends-of-friends who like this product” (social networks)
“Trace transaction chains 5 levels deep” (fraud detection)
“Show supply chain impact 4 tiers upstream” (risk analysis)

Pattern matching drives value:

Detecting rings, cycles, or suspicious subgraph structures
Knowledge graph reasoning (entity → relationship → entity chains)
Network influence propagation

Data model is naturally a graph:

Org charts with complex reporting structures
Infrastructure dependency maps
Recommendation engines with collaborative filtering

❌ You DON’T Need This If:#

Simple CRUD operations:

User profiles with basic lookups (use PostgreSQL)
Document storage without complex relationships (use MongoDB)
Time-series data (use InfluxDB, TimescaleDB)

All relationships are 1-2 hops:

Basic “user has many orders” (SQL foreign keys suffice)
Simple hierarchies (category → subcategory → product)

You’re already solving it well:

PostgreSQL with recursive CTEs handling your traversals adequately
Current solution meets performance SLAs and you’re not hitting scaling issues

Decision Criteria: When to Upgrade to Graph#

Your Situation	Recommendation
10M+ nodes, 4+ hop traversals regularly	Dedicated graph DB (Neo4j, Neptune)
`<1`M nodes, occasional 3-hop queries	PostgreSQL with AGE extension (graph as feature)
Multi-model needs (graph + documents)	ArangoDB (BSL license limits apply)
AWS-committed, managed service preferred	Amazon Neptune (Gremlin/openCypher)
Need semantic reasoning (RDF triples)	rdflib (SPARQL ecosystem)

Trade-offs#

Query Language Lock-in Spectrum#

Most Portable ←----------------------------------→ Most Locked-In
   Gremlin          Cypher/GQL          GSQL         Proprietary
   (multi-DB)       (converging)      (TigerGraph)    (single vendor)

Gremlin (gremlinpython) - Portability Choice:

✅ Pros: Works across Neptune, Cosmos DB, JanusGraph, DataStax
✅ Pros: Apache governance, multi-vendor PMC
❌ Cons: Imperative style is harder to read than declarative Cypher
❌ Cons: Doesn’t work with Neo4j (market leader)

Cypher (neo4j) - Developer Experience Choice:

✅ Pros: Most readable query language, largest community (44% market share)
✅ Pros: Converging to ISO GQL standard (future-proofing)
✅ Pros: Best GraphRAG and LLM integration (2025+)
❌ Cons: Neo4j dominance means de facto lock-in
❌ Cons: GQL standardization timeline uncertain (2026-2028 expected)

GSQL (pyTigerGraph) - Performance Choice:

✅ Pros: Best performance for deep analytics (10+ hop traversals)
✅ Pros: Turing-complete query language (complex algorithms)
❌ Cons: Complete vendor lock-in (portability score: 2/10)
❌ Cons: Smaller ecosystem and hiring pool

OGM (Object-Graph Mapper) vs Raw Driver#

OGM (neomodel) - Productivity:

✅ Django-style models, faster CRUD development (30-50% time savings)
❌ Hides query details, can obscure performance issues
❌ Adds abstraction overhead

Raw Driver (neo4j, gremlinpython) - Control:

✅ Full query optimization control
✅ No abstraction overhead
❌ More boilerplate code for simple CRUD

Recommendation: Start with raw driver, add OGM if CRUD patterns dominate.

Build vs Buy (Self-Hosted vs Managed)#

Self-Hosted (Community Edition):

✅ Free for Neo4j Community, no usage limits
✅ Full control over infrastructure
❌ Operational burden (backups, scaling, monitoring)
❌ No enterprise support or high availability (Community)

Managed Services (AuraDB, Neptune, Cosmos):

✅ Zero operational overhead
✅ Built-in backups, scaling, monitoring
❌ $65-2,000+/month (can reach $10K+ at scale)
❌ Cloud vendor lock-in (migration complexity)

Cost Considerations#

Direct Costs#

Service	Entry Price	Scale Price	Notes
Neo4j AuraDB	$65/month	$2,000+/month	Managed, auto-scaling
Amazon Neptune	$0.10/hour + storage	Variable	Pay per instance + I/O
Self-hosted Neo4j Community	$0	Infrastructure only	No clustering/HA
Self-hosted Neo4j Enterprise	$36K+/year	License + infra	HA, support, advanced features

Hidden Costs#

Learning Curve:

Cypher/Gremlin training: 1-2 weeks per developer
Graph modeling: 2-4 weeks for team to shift thinking
Query optimization: Ongoing (graph queries need different tuning than SQL)

Migration Lock-in Cost:

Cypher to Gremlin: 3-6 months for medium codebase (all queries rewritten)
GSQL to anything: 6-12 months (proprietary language, no migration tools)
Data export/import: 1-4 weeks depending on volume and schema complexity

Opportunity Cost Examples:

Choosing TigerGraph for a small project: If you outgrow GSQL lock-in, migration cost = $200K-500K in engineering time
Skipping GraphRAG opportunity: Companies report 300-320% ROI on knowledge graph implementations (LinkedIn: 63% ticket resolution improvement)

ROI Break-Even Analysis#

When graph DB pays for itself:

Query performance gains: 10-100x faster for 3+ hop traversals
Developer productivity: 40% faster feature delivery for relationship-heavy features (anecdotal, industry reports)
Fraud detection improvements: 50% better detection rates (finance industry data)

Typical payback period: 6-18 months for teams with clear relationship-heavy use cases.

Implementation Reality#

First 90 Days: What to Expect#

Weeks 1-2: Learning & Proof of Concept

Install local Neo4j or use AuraDB free tier
Team learns Cypher/Gremlin basics (online tutorials: 5-10 hours per developer)
Model one core use case (e.g., user → friend → product recommendation)
Build simple prototype showing multi-hop traversal

Weeks 3-6: Production Architecture

Choose managed vs self-hosted (decision driven by team expertise)
Set up connection pooling, error handling, retry logic
Migrate subset of production data
Benchmark queries against existing SQL approach (expect 10-50x improvement for graph queries)

Weeks 7-12: Integration & Optimization

Integrate with existing Python services
Tune indexes (graph indexes work differently than SQL—expect learning curve)
Handle schema evolution (graph schemas are more flexible, but migrations still needed)
Monitor performance under load

Team Skill Requirements#

Essential:

Comfortable with Python async/await patterns (if using async clients)
Willingness to learn declarative query language (Cypher is SQL-like, Gremlin is more programmatic)
Graph modeling mindset (thinking in nodes/edges vs tables/rows)

Nice to Have:

Prior experience with NoSQL databases (helps with schema flexibility concepts)
Understanding of graph algorithms (PageRank, community detection—libraries often provide these)

Hiring Impact:

Cypher developers: Moderate pool (Neo4j dominance means growing talent base)
Gremlin developers: Smaller pool (more specialized)
GSQL developers: Very small pool (lock-in includes talent availability)

Common Pitfalls#

1. “All graph databases are the same”

Reality: Neo4j (Cypher), Neptune (Gremlin), TigerGraph (GSQL) are fundamentally different query languages and data models.
Avoidance: Evaluate client library portability upfront.

2. “We can switch databases later”

Reality: Query language lock-in is real. Cypher → Gremlin migration = months of rewriting.
Avoidance: Choose based on 5-year horizon, not 6-month prototype needs.

3. “Graph databases are slow for everything”

Reality: 10-100x faster for multi-hop traversals, but slower for simple key-value lookups than Redis.
Avoidance: Use graph DBs for graph queries, not as a general-purpose database.

4. “OGMs always improve productivity”

Reality: OGMs excel for CRUD, but complex traversals often need raw Cypher/Gremlin for control.
Avoidance: Start with raw driver, add OGM selectively for CRUD-heavy patterns.

5. “Gremlin works everywhere”

Reality: Gremlin is TinkerPop-family only (Neptune, Cosmos, JanusGraph). Neo4j and TigerGraph do NOT support Gremlin.
Avoidance: Verify database compatibility before committing to Gremlin.

Success Metrics (Realistic Expectations)#

Performance:

3-hop queries: 50-100ms (vs 500-2000ms in SQL with JOINs)
5-hop queries: 200-500ms (vs timeouts in SQL)

Development Velocity:

Feature delivery for relationship-heavy features: 30-50% faster (once team is trained)
Query writing: Initially slower (learning curve), then faster (more expressive language)

Operational:

Managed service uptime: 99.9%+ (AuraDB, Neptune SLAs)
Self-hosted: Depends on team expertise (expect 2-4 weeks to stabilize production setup)

Common Misconceptions#

“Graph databases replace all databases”#

Reality: Graph DBs are specialized tools. Use them for relationship-heavy queries, not for simple CRUD, time-series, or full-text search. Most production systems use graph DBs alongside PostgreSQL, Redis, and Elasticsearch.

“GQL will solve all portability issues”#

Reality: ISO GQL standard (2024) is a major step forward, but:

Full vendor adoption: 2026-2028 expected
Gremlin is a separate family (no GQL convergence planned)
Legacy codebases won’t auto-migrate

“Self-hosting is always cheaper”#

Reality: Managed services ($65-2K/month) often cheaper than:

DevOps salary ($120K+/year)
Downtime costs (hours of engineer time debugging crashes)
Backup/DR infrastructure

Calculate total cost of ownership, not just license fees.

Decision Framework for Stakeholders#

Primary Recommendation: Neo4j + Cypher#

Choose neo4j (official driver) when:

Starting a new graph database project
Developer experience and time-to-market matter
GraphRAG or knowledge graph use case (best LLM integration)
Betting on GQL standardization (Cypher converging to ISO GQL)

Risk: Medium lock-in (acceptable for velocity gains, GQL migration path exists)

Alternative: TinkerPop + Gremlin#

Choose gremlinpython when:

Multi-cloud or multi-database strategy required (Neptune + Cosmos DB flexibility)
Vendor-neutral governance important (Apache Foundation)
Portability outweighs developer convenience

Risk: Low lock-in, higher learning curve

Niche: TigerGraph + GSQL#

Choose pyTigerGraph when:

Deep graph analytics (10+ hop traversals, complex algorithms) required
Existing TigerGraph investment
Performance justifies severe lock-in (portability: 2/10)

Risk: High lock-in, small hiring pool

Questions to Ask Before Committing#

What’s our relationship depth? (1-2 hops → maybe Postgres, 3+ hops → graph DB)
Do we need multi-database portability? (Yes → Gremlin, No → Cypher)
What’s our scale projection? (<10M edges → any, >1B edges → TigerGraph/Neptune)
Is semantic reasoning required? (Yes → RDF/rdflib, No → property graph)
What’s our cloud commitment? (AWS → Neptune, Azure → Cosmos, Multi-cloud → self-hosted or Neo4j)

Emerging Trends to Monitor (2025-2030)#

GQL Standardization (ISO/IEC 39075:2024)#

Published April 2024, Cypher converging to compliance
By 2028, expect broad vendor adoption (reduced lock-in)
Migration path: Cypher users have smooth transition, Gremlin separate family

GraphRAG (Graph + LLMs)#

Microsoft open-sourced GraphRAG (July 2024)
Knowledge graphs as LLM context (retrieval-augmented generation)
Neo4j leading integration (vector + graph hybrid)
Implication: Graph DB clients need vector embedding support (table stakes by 2026)

PostgreSQL AGE (Apache Graph Extension)#

Cypher queries in PostgreSQL (Apache incubator project)
“Good enough” graph for many use cases
Implication: Lowers barrier to entry, may slow pure graph DB adoption for simple use cases

Multi-Model Convergence#

ArangoDB (graph + document + key-value in one database)
DuckDB adding graph capabilities
Implication: Graph becomes a feature, not a separate database (reduces operational complexity)

Date compiled: February 5, 2026 Research ID: 1.011

S1: Rapid Discovery

S1 Rapid Discovery: Graph Database Python Clients#

Research Methodology#

Scope Definition#

This discovery focuses on client libraries for interacting with graph databases from Python, NOT the graph databases themselves. The goal is to evaluate developer experience, community health, and production readiness of each library.

Discovery Process#

Initial Library Identification
- Categorize by database: Neo4j, ArangoDB, TigerGraph, Amazon Neptune, Dgraph
- Identify multi-database solutions: Apache TinkerPop (Gremlin), RDFLib
- Note official vs community-maintained libraries
Metrics Collection For each library, we gather:
- GitHub metrics: stars, forks, contributors, open issues, last commit date
- PyPI metrics: weekly downloads, latest version, Python version support
- Maintenance signals: release frequency, issue response time
- Documentation quality: quick scan of docs completeness
First Impression Evaluation
- Installation simplicity: pip install <package> should work cleanly
- Quickstart availability: can a developer be productive in 10 minutes?
- API design: does it feel Pythonic?

Libraries Evaluated#

Database	Official Client	Alternative/OGM
Neo4j	neo4j (driver)	neomodel (OGM), py2neo (EOL)
ArangoDB	python-arango	python-arango-async
TigerGraph	pyTigerGraph	-
Amazon Neptune	gremlinpython + neptune-python-utils	-
Dgraph	pydgraph	-
Multi-DB (Gremlin)	gremlinpython	-
RDF Graphs	rdflib	-
OrientDB	pyorient	- (stale)

Evaluation Criteria#

Tier 1 - Production Ready

500k+ weekly downloads
Active maintenance (commits within 30 days)
Official/vendor support
Comprehensive documentation

Tier 2 - Mature Community

50k-500k weekly downloads
Regular releases (quarterly or better)
Good documentation
Active issue resolution

Tier 3 - Emerging/Niche

<50k weekly downloads
May have gaps in documentation
Smaller community
Specialized use cases

Tier 4 - Caution Advised

Stale/EOL projects
No recent releases
Deprecated in favor of alternatives

Data Sources#

PyPI: https://pypi.org/ (version, dependencies, downloads)
PyPI Stats: https://pypistats.org/ (download statistics)
GitHub: Repository metrics and activity
Snyk Advisor: Package health analysis
Official documentation sites

Key Findings Summary#

See individual library files and recommendation.md for detailed analysis. The most widely adopted libraries are gremlinpython (5.7M weekly downloads), rdflib (1.4M), neo4j driver (520K), and python-arango (350K-1.2M).

gremlinpython - Apache TinkerPop Gremlin for Python#

Quick Facts#

Metric	Value
Package Name	`gremlinpython`
Latest Version	3.8.0 (Nov 17, 2025)
Python Support	3.10+
Weekly Downloads	~5.7 million
GitHub Stars	1,900+ (TinkerPop repo)
License	Apache-2.0
Maintainer	Apache TinkerPop

Installation#

pip install gremlinpython

First Impression#

Strengths:

Universal graph query language (Gremlin)
Works with multiple databases: Neptune, JanusGraph, CosmosDB, etc.
Most downloaded graph Python library
Apache Foundation backing
Stable, mature codebase

Considerations:

Gremlin syntax differs from Cypher
Generic API may lack database-specific optimizations
TinkerPop 4.0 brings breaking changes (HTTP replacing WebSockets)

Compatible Databases#

Amazon Neptune
JanusGraph
Azure Cosmos DB (Gremlin API)
DataStax Enterprise Graph
IBM Compose for JanusGraph
OrientDB
TigerGraph (limited)

Quick Example#

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.structure.graph import Graph

# Connect to Gremlin server
graph = Graph()
conn = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
g = graph.traversal().withRemote(conn)

# Traverse the graph
people = g.V().hasLabel('person').values('name').toList()
print(people)

# Find friends of Alice
friends = g.V().has('person', 'name', 'Alice').out('knows').values('name').toList()
print(friends)

conn.close()

Amazon Neptune Usage#

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# With neptune-python-utils for IAM auth
from neptune_python_utils.gremlin_utils import GremlinUtils

gremlin_utils = GremlinUtils()
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)

Assessment#

Tier: 1 - Production Ready

gremlinpython is the standard for Gremlin-based graph traversal. Its massive adoption (5.7M downloads/week) reflects use in AWS Neptune and other cloud graph services. Essential for multi-database portability or when using Gremlin-compatible databases.

Links#

PyPI: https://pypi.org/project/gremlinpython/
GitHub: https://github.com/apache/tinkerpop/tree/master/gremlin-python
Docs: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python
Neptune Utils: https://github.com/awslabs/amazon-neptune-tools

neo4j - Official Neo4j Python Driver#

Quick Facts#

Metric	Value
Package Name	`neo4j`
Latest Version	6.0.3 (Nov 6, 2025)
Python Support	3.10, 3.11, 3.12, 3.13
Weekly Downloads	~520,000
GitHub Stars	1,000
Contributors	58
License	Apache-2.0
Maintainer	Neo4j, Inc. (official)

Installation#

pip install neo4j

# With optional Rust extension for 10x performance
pip install neo4j-rust-ext

# With pandas/numpy integration
pip install neo4j[numpy,pandas,pyarrow]

First Impression#

Strengths:

Official vendor support with dedicated team
Production-stable with semantic versioning
Async support built-in
Rust extensions available for performance-critical workloads
Excellent documentation and examples
Type hints throughout

Considerations:

Python 3.10+ required (no legacy support)
Deprecated neo4j-driver package still in PyPI (causes confusion)

Quick Example#

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

with driver.session() as session:
    result = session.run("MATCH (n:Person) RETURN n.name LIMIT 10")
    for record in result:
        print(record["n.name"])

driver.close()

Assessment#

Tier: 1 - Production Ready

The official Neo4j driver is the clear choice for Neo4j integration. It has strong community adoption, official support, excellent documentation, and modern Python features. The Rust extension option makes it suitable for high-throughput workloads.

Links#

PyPI: https://pypi.org/project/neo4j/
GitHub: https://github.com/neo4j/neo4j-python-driver
Docs: https://neo4j.com/docs/python-manual/current/

neomodel - Python OGM for Neo4j#

Quick Facts#

Metric	Value
Package Name	`neomodel`
Latest Version	6.0.0 (Nov 26, 2025)
Python Support	3.10+
Weekly Downloads	~25,000
GitHub Stars	1,100
Contributors	97
Open Issues	49
License	MIT
Maintainer	Neo4j Labs (community)

Installation#

pip install neomodel

# With Rust driver extension for performance
pip install neomodel[rust-driver-ext]

# With optional dependencies (Shapely, pandas, numpy)
pip install neomodel[extras,rust-driver-ext]

First Impression#

Strengths:

Django-style model definitions (familiar pattern)
Schema enforcement with cardinality restrictions
Full transaction and async support
Neo4j Labs project (good maintenance quality)
Django integration via django-neomodel plugin
Vector and full-text search support (v6.0+)

Considerations:

Abstracts away Cypher (less control for complex queries)
Learning curve for graph-specific concepts
Performance overhead vs raw driver

Quick Example#

from neomodel import StructuredNode, StringProperty, RelationshipTo

class Person(StructuredNode):
    name = StringProperty(required=True)
    friends = RelationshipTo('Person', 'FRIEND')

# Create and relate nodes
alice = Person(name="Alice").save()
bob = Person(name="Bob").save()
alice.friends.connect(bob)

# Query
for friend in alice.friends.all():
    print(friend.name)

Assessment#

Tier: 2 - Mature Community

neomodel is the recommended OGM for Neo4j, especially for developers coming from Django/SQLAlchemy backgrounds. It provides a Pythonic abstraction over the graph while still allowing raw Cypher when needed. Good choice for rapid development.

Links#

PyPI: https://pypi.org/project/neomodel/
GitHub: https://github.com/neo4j-contrib/neomodel
Docs: https://neomodel.readthedocs.io/
Neo4j Labs: https://neo4j.com/labs/neomodel/

py2neo - End of Life (EOL)#

Quick Facts#

Metric	Value
Package Name	`py2neo`
Latest Version	2021.2.4
Status	END OF LIFE
Last Meaningful Release	2021
GitHub Stars	~1,200 (archived)
License	Apache-2.0

Status Warning#

py2neo has been officially declared End of Life as of April 2025.

The project is no longer maintained and will receive no further updates. The GitHub repository has moved to neo4j-contrib/py2neo but is effectively archived.

Migration Path#

Neo4j recommends migrating to:

neo4j - Official Python driver for direct Cypher queries
neomodel - For ORM-style object-graph mapping

Why py2neo Was Popular#

Historically, py2neo offered:

Higher-level API than the raw driver
Built-in OGM (Object-Graph Mapper)
HTTP and Bolt protocol support
Cypher lexer for Pygments
Command-line tools

Migration Example#

Old (py2neo):

from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
graph.run("MATCH (n) RETURN n LIMIT 10")

New (neo4j driver):

from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
    session.run("MATCH (n) RETURN n LIMIT 10")

Assessment#

Tier: 4 - Do Not Use

Do not start new projects with py2neo. For existing codebases, plan migration to the official neo4j driver or neomodel. Historical releases remain available on PyPI for legacy compatibility.

Links#

PyPI (historical): https://pypi.org/project/py2neo/
GitHub (archived): https://github.com/neo4j-contrib/py2neo
Migration Guide: https://neo4j.com/blog/developer/py2neo-end-migration-guide/

pydgraph - Official Dgraph Python Client#

Quick Facts#

Metric	Value
Package Name	`pydgraph`
Latest Version	24.3.0 (Aug 5, 2025)
Python Support	3.7 - 3.12
Weekly Downloads	~8,300
GitHub Stars	288
Forks	90
Open Issues	0
License	Apache-2.0
Maintainer	Hypermode Inc. (steward of Dgraph)

Installation#

pip install pydgraph

First Impression#

Strengths:

Official client with gRPC protocol
Good Python version support (3.7+)
Clean, simple API
Connection string support for clusters
ACL (Access Control List) authentication

Considerations:

Smaller ecosystem compared to Neo4j
Dgraph uses GraphQL-like DQL, learning curve
gRPC dependency can cause build issues on older systems

Quick Example#

import pydgraph

# Create client
client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)

# Set schema
schema = """
name: string @index(exact) .
friends: [uid] .
type Person {
    name
    friends
}
"""
op = pydgraph.Operation(schema=schema)
client.alter(op)

# Create data
txn = client.txn()
try:
    mutation = pydgraph.Mutation(set_nquads='_:alice <name> "Alice" .')
    txn.mutate(mutation)
    txn.commit()
finally:
    txn.discard()

# Query
query = """
{
  people(func: has(name)) {
    name
    friends { name }
  }
}
"""
res = client.txn(read_only=True).query(query)
print(res.json)

client_stub.close()

Connection String Format#

# Standard connection
client_stub = pydgraph.DgraphClientStub("localhost:9080")

# With authentication
client_stub = pydgraph.DgraphClientStub("dgraph://username:password@host:9080")

Assessment#

Tier: 3 - Emerging/Niche

pydgraph is the correct choice for Dgraph integration. While the community is smaller than Neo4j, the library is well-maintained with zero open issues. Suitable for applications needing Dgraph’s distributed graph capabilities and native GraphQL-like query language.

Links#

PyPI: https://pypi.org/project/pydgraph/
GitHub: https://github.com/dgraph-io/pydgraph
Dgraph Docs: https://dgraph.io/docs/

python-arango - Official ArangoDB Python Driver#

Quick Facts#

Metric	Value
Package Name	`python-arango`
Latest Version	8.2.5 (Dec 22, 2025)
Python Support	3.9, 3.10, 3.11, 3.12
Weekly Downloads	~350,000 - 1.2M
GitHub Stars	466
Contributors	32+
Open Issues	0
License	MIT
Maintainer	ArangoDB (official)

Installation#

pip install python-arango

# For async support
pip install python-arango-async

First Impression#

Strengths:

Official vendor support
Excellent maintenance (zero open issues)
Clean, Pythonic API
Comprehensive AQL query support
Graph traversal, document, and key-value operations
Async alternative available

Considerations:

ArangoDB-specific (multi-model but single vendor)
Smaller community than Neo4j ecosystem

Quick Example#

from arango import ArangoClient

client = ArangoClient(hosts="http://localhost:8529")
db = client.db("mydb", username="root", password="password")

# Create a graph
graph = db.create_graph("social")
people = graph.create_vertex_collection("people")
friends = graph.create_edge_definition(
    edge_collection="friends",
    from_vertex_collections=["people"],
    to_vertex_collections=["people"]
)

# Insert vertices and edges
alice = people.insert({"_key": "alice", "name": "Alice"})
bob = people.insert({"_key": "bob", "name": "Bob"})
friends.insert({"_from": "people/alice", "_to": "people/bob"})

# AQL query
cursor = db.aql.execute("FOR p IN people RETURN p.name")
print([doc for doc in cursor])

Assessment#

Tier: 1 - Production Ready

python-arango is an excellent choice for ArangoDB integration. The library is well-maintained with official support, zero open issues, and comprehensive coverage of ArangoDB features. Particularly strong for applications needing multi-model (document + graph + key-value) capabilities.

Links#

PyPI: https://pypi.org/project/python-arango/
GitHub: https://github.com/arangodb/python-arango
Docs: https://docs.python-arango.com/
Async: https://github.com/arangodb/python-arango-async

pyTigerGraph - TigerGraph Python Client#

Quick Facts#

Metric	Value
Package Name	`pyTigerGraph`
Latest Version	1.9.1 (Nov 4, 2025)
Python Support	3.8+
Weekly Downloads	~5,600
GitHub Stars	34
Contributors	22
Open Issues	7
License	Apache-2.0
Maintainer	TigerGraph (official)

Installation#

# Core functionality
pip install pyTigerGraph

# With Graph Data Science / ML capabilities
pip install pyTigerGraph[gds]

First Impression#

Strengths:

Official vendor support
Graph machine learning integration
Async support (v1.8+)
DataFrame loading from Pandas
Good for analytics and ML workloads

Considerations:

Smaller community (niche database)
Less documentation compared to Neo4j/ArangoDB
TigerGraph-specific GSQL language

Quick Example#

import pyTigerGraph as tg

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

# Get auth token
conn.getToken(conn.createSecret())

# Run a query
results = conn.runInstalledQuery("find_friends", params={"person": "Alice"})

# Upsert vertices
conn.upsertVertices("Person", [
    {"id": "alice", "name": "Alice"},
    {"id": "bob", "name": "Bob"}
])

Graph Data Science Features#

# With pyTigerGraph[gds]
from pyTigerGraph.gds import featurizer

# Create graph features for ML
feat = conn.gds.featurizer()
feat.installAlgorithm("pagerank")
feat.runAlgorithm("pagerank", params={"v_type": "Person"})

Assessment#

Tier: 3 - Emerging/Niche

pyTigerGraph is the right choice when using TigerGraph, especially for graph analytics and machine learning use cases. The library has official support but a smaller community. Best suited for enterprise analytics workloads where TigerGraph’s performance advantages justify the ecosystem trade-offs.

Links#

rdflib - Python Library for RDF#

Quick Facts#

Metric	Value
Package Name	`rdflib`
Latest Version	7.5.0 (Nov 28, 2025)
Python Support	3.8.1+
Weekly Downloads	~1.45 million
GitHub Stars	2,400
Contributors	189
Open Issues	291
License	BSD-3-Clause
Maintainer	RDFLib community

Installation#

pip install rdflib

First Impression#

Strengths:

Dominant library for RDF/semantic web in Python
Comprehensive format support (RDF/XML, Turtle, JSON-LD, N-Quads, etc.)
Full SPARQL 1.1 implementation
Mature, well-documented
Large ecosystem of extensions

Considerations:

Focus on RDF graphs (different paradigm from property graphs)
Not designed for high-performance graph traversal
Steeper learning curve for developers new to semantic web

Quick Example#

from rdflib import Graph, Literal, RDF, URIRef, Namespace

# Create a graph
g = Graph()

# Define namespace
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
g.bind("foaf", FOAF)

# Add triples
alice = URIRef("http://example.org/alice")
g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, URIRef("http://example.org/bob")))

# SPARQL query
results = g.query("""
    SELECT ?name WHERE {
        ?person foaf:name ?name .
    }
""")
for row in results:
    print(row.name)

# Serialize
print(g.serialize(format="turtle"))

RDF vs Property Graphs#

RDF (rdflib)	Property Graphs (Neo4j, etc.)
Triple-based (subject-predicate-object)	Nodes and relationships with properties
URIs for identifiers	Internal IDs
SPARQL query language	Cypher, Gremlin, etc.
Semantic web / linked data focus	Application data modeling
Standards-based (W3C)	Vendor-specific

Assessment#

Tier: 1 - Production Ready

rdflib is the standard for RDF processing in Python. Essential for semantic web applications, knowledge graphs with linked data, and SPARQL-based querying. Different use case than property graph databases but equally mature.

Links#

PyPI: https://pypi.org/project/rdflib/
GitHub: https://github.com/RDFLib/rdflib
Docs: https://rdflib.readthedocs.io/
Website: https://rdflib.dev/

Graph Database Python Clients: Recommendations#

Quick Assessment Summary#

Tier Rankings#

Tier	Library	Downloads/Week	Use Case
1	gremlinpython	5.7M	Multi-DB, Neptune, JanusGraph
1	rdflib	1.45M	RDF/Semantic web
1	neo4j	520K	Neo4j (official)
1	python-arango	350K-1.2M	ArangoDB (official)
2	neomodel	25K	Neo4j OGM
3	pydgraph	8K	Dgraph
3	pyTigerGraph	5.6K	TigerGraph
4	py2neo	EOL	Do not use
4	pyorient	Stale (2017)	Avoid

Top Picks by Use Case#

For Neo4j Integration#

Primary: neo4j (official driver)

Best for: Direct Cypher queries, maximum control, performance
Install: pip install neo4j

Alternative: neomodel (OGM)

Best for: Django-style model definitions, rapid development
Install: pip install neomodel

For Multi-Database Portability#

Primary: gremlinpython

Best for: Neptune, JanusGraph, CosmosDB, any Gremlin-compatible DB
Install: pip install gremlinpython

For ArangoDB#

Primary: python-arango

Best for: Multi-model (document + graph + key-value) applications
Install: pip install python-arango

For Semantic Web / Knowledge Graphs#

Primary: rdflib

Best for: RDF processing, SPARQL queries, linked data
Install: pip install rdflib

For Graph Analytics / ML#

Primary: pyTigerGraph[gds]

Best for: Large-scale graph analytics, ML on graphs
Install: pip install pyTigerGraph[gds]

Decision Matrix#

Requirement	Recommended Library
Need Cypher query language	neo4j, neomodel
Need Gremlin query language	gremlinpython
Need SPARQL / RDF	rdflib
Need vendor portability	gremlinpython
Need ORM-style abstraction	neomodel
AWS Neptune	gremlinpython + neptune-python-utils
Multi-model (doc + graph)	python-arango
Graph machine learning	pyTigerGraph[gds]
Distributed graph at scale	pydgraph, pyTigerGraph

Libraries to Avoid#

py2neo - Officially EOL (April 2025), migrate to neo4j/neomodel
pyorient - Last release 2017, OrientDB has limited Python support
neo4j-driver - Deprecated package name, use neo4j instead

Key Insights#

gremlinpython dominates downloads due to AWS Neptune and cloud adoption
Neo4j ecosystem is strongest with official driver + OGM options
ArangoDB has excellent official support with zero open issues
RDFLib serves a distinct use case (semantic web vs property graphs)
TigerGraph and Dgraph are niche but officially supported

Next Steps for Deeper Evaluation#

Test connection setup with actual database instances
Benchmark query performance for representative workloads
Evaluate async support for concurrent applications
Review error handling and retry mechanisms
Assess integration with web frameworks (FastAPI, Django)

Data Sources#

PyPI: Package metadata and downloads
PyPI Stats: https://pypistats.org/
GitHub: Repository metrics
Snyk Advisor: Package health analysis
Official documentation for each library

Research conducted: December 2025

S2: Comprehensive

S2 Comprehensive Discovery: Graph Database Python Client Libraries#

Overview#

This document outlines the methodology for evaluating Python client libraries for graph databases. The analysis covers official drivers, community libraries, and Object-Graph Mappers (OGMs) across multiple graph database platforms.

Scope#

Libraries Evaluated#

Library	Database	Type	Maintenance
neo4j-driver	Neo4j	Official Driver	Active (Neo4j Inc.)
py2neo	Neo4j	Community Driver	EOL (Archived)
neomodel	Neo4j	OGM	Active (Neo4j Labs)
python-arango	ArangoDB	Official Driver	Active (ArangoDB)
pyTigerGraph	TigerGraph	Official Client	Active (TigerGraph)
gremlinpython	Multi-DB	Official (TinkerPop)	Active (Apache)
pydgraph	Dgraph	Official Driver	Active (Hypermode)
rdflib	RDF/SPARQL	Library	Active (Community)

Evaluation Criteria#

1. API Design and Ergonomics#

Pythonic Design: Adherence to Python idioms (PEP 8, context managers, generators)
Type Hints: MyPy compatibility and IDE support
Documentation Quality: Official docs, examples, and community resources
Learning Curve: Time to productivity for developers

2. Performance Characteristics#

Connection Pooling: Configuration options and efficiency
Bulk Operations: Batch insert/update capabilities
Serialization: Data format handling (JSON, Binary, custom)
Rust Extensions: Native code acceleration options

3. Async Support#

Native asyncio: Built-in async/await support
Framework Integration: FastAPI, aiohttp, Starlette compatibility
Concurrent Transactions: Parallel query execution

4. Transaction and Consistency#

ACID Support: Transaction isolation levels
Retry Logic: Automatic retry on transient failures
Causal Consistency: Bookmark/session management
Read/Write Splitting: Routing to appropriate cluster nodes

5. Query Language Support#

Library	Primary	Secondary
neo4j-driver	Cypher	-
neomodel	Python OGM	Cypher (raw)
python-arango	AQL	-
pyTigerGraph	GSQL	REST API
gremlinpython	Gremlin	-
pydgraph	GraphQL+/DQL	-
rdflib	SPARQL	RDF/Turtle

6. Schema and Migration#

Schema Definition: Programmatic vs. declarative
Constraint Management: Unique, existence, type constraints
Index Management: Creation, deletion, optimization
Migration Tooling: Version control for schema changes

7. Testing and Development#

Mocking Support: Test doubles and fixtures
Embedded Mode: In-process database for testing
CI/CD Integration: Docker, testcontainers compatibility

Data Sources#

Primary Sources#

Official Documentation: Driver manuals and API references
GitHub Repositories: Source code, issues, release notes
PyPI: Package metadata, version history, dependencies

Secondary Sources#

Community Forums: Stack Overflow, database-specific communities
Performance Benchmarks: Published comparisons and metrics
Migration Guides: Version upgrade documentation

Analysis Deliverables#

Per-Library Deep Dives: 100-200 lines covering features, patterns, and limitations
Feature Matrix: Side-by-side comparison across all criteria
Recommendations: Use-case based guidance with justifications

Versioning Context#

All analysis conducted against library versions current as of December 2024:

neo4j-driver: 6.0.x
neomodel: 6.0.x
python-arango: 8.2.x
pyTigerGraph: 1.6.x
gremlinpython: 3.7.x
pydgraph: 24.x / 25.x
rdflib: 7.2.x

Graph Database Python Client Libraries: Feature Matrix#

Overview#

This matrix compares Python client libraries for graph databases across key functional and technical criteria. Libraries are evaluated as of December 2024.

Quick Reference#

Library	Database	Query Language	Status
neo4j-driver	Neo4j	Cypher	Active
py2neo	Neo4j	Cypher	EOL
neomodel	Neo4j	OGM/Cypher	Active
python-arango	ArangoDB	AQL	Active
pyTigerGraph	TigerGraph	GSQL	Active
gremlinpython	Multi-DB	Gremlin	Active
pydgraph	Dgraph	DQL	Active
rdflib	RDF stores	SPARQL	Active

Async Support#

Library	Native asyncio	Async Variant	Framework Compat
neo4j-driver	Yes	Built-in	FastAPI, aiohttp
py2neo	No	-	-
neomodel	Yes	Built-in (v5+)	Django, FastAPI
python-arango	No	python-arango-async	FastAPI (separate pkg)
pyTigerGraph	Partial	AsyncTigerGraphConnection	Limited
gremlinpython	No	aiogremlin, goblin	Via third-party
pydgraph	No	gRPC futures only	Limited
rdflib	No	Manual wrapping	Via thread pool

Connection Management#

Library	Connection Pooling	Pool Size Config	Liveness Check
neo4j-driver	Yes	max_connection_pool_size	liveness_check_timeout
py2neo	Basic	Limited	No
neomodel	Yes (via driver)	Via driver_options	Via driver
python-arango	No	-	No
pyTigerGraph	No	-	No
gremlinpython	Yes	pool_size parameter	Known issues
pydgraph	Manual	Multiple stubs	Manual
rdflib	N/A	N/A	N/A

Transaction Support#

Library	ACID	Managed Txn	Auto-retry	Causal Consistency
neo4j-driver	Yes	execute_read/write	Yes	Bookmarks
py2neo	Yes	Context manager	No	No
neomodel	Yes	Context manager	No	Via driver
python-arango	Yes	Stream/JS txn	No	No
pyTigerGraph	Limited	Via REST	No	No
gremlinpython	Yes	tx.begin/commit	No	No
pydgraph	Yes	txn() context	Manual	No
rdflib	No	N/A	N/A	N/A

Query Language Features#

Library	Parameterized	Prepared/Cached	Bulk Operations
neo4j-driver	Yes ($params)	No	UNWIND pattern
py2neo	Yes	No	Batch methods
neomodel	Yes	No	save() loop
python-arango	Yes (@params)	No	insert_many()
pyTigerGraph	Yes	Installed queries	upsertVertices()
gremlinpython	Limited	No	Batch traversals
pydgraph	Yes ($params)	No	JSON arrays
rdflib	Yes (initBindings)	prepareQuery()	addN()

OGM/ORM Capabilities#

Library	OGM Layer	Schema Definition	Hooks	Validation
neo4j-driver	No	Manual	No	No
py2neo	Built-in	GraphObject	Limited	No
neomodel	Built-in	StructuredNode	Yes	Property-level
python-arango	No	Manual	No	No
pyTigerGraph	Schema API	Object-oriented	No	GSQL
gremlinpython	Via Goblin	Vertex/Edge classes	Limited	Via Goblin
pydgraph	No	DQL schema	No	No
rdflib	No	RDF/OWL	No	SHACL (ext)

Type System#

Library	Type Hints	MyPy Support	Spatial Types	Temporal Types
neo4j-driver	Yes	Good	Point	Date/DateTime/Duration
py2neo	Partial	Limited	Via Cypher	Via Cypher
neomodel	Yes	Good	PointProperty	DateTime/Date
python-arango	Partial	Limited	GeoJSON	ISO strings
pyTigerGraph	Limited	Limited	GSQL types	DATETIME
gremlinpython	Limited	Limited	Via properties	Via properties
pydgraph	Limited	Limited	Geo (geo:)	dateTime
rdflib	Partial	Limited	GeoSPARQL	xsd:dateTime

Performance Features#

Library	Native Extensions	Binary Protocol	Compression
neo4j-driver	Rust (optional)	Bolt	No
py2neo	No	Bolt/HTTP	No
neomodel	Via driver	Bolt	No
python-arango	No	HTTP/REST	Optional
pyTigerGraph	No	HTTP/REST	No
gremlinpython	No	WebSocket	GraphBinary
pydgraph	No	gRPC	Protocol Buffers
rdflib	No	N/A	N/A

Error Handling#

Library	Typed Exceptions	Retry Categories	Error Codes
neo4j-driver	Yes	Transient/Client/DB	Yes
py2neo	Partial	No	Limited
neomodel	Via driver	Via driver	Via driver
python-arango	Yes	No	ArangoDB codes
pyTigerGraph	Basic	No	REST status
gremlinpython	GremlinServerError	No	Server codes
pydgraph	gRPC errors	Manual	gRPC codes
rdflib	Standard Python	No	No

Testing Support#

Library	Mocking	Embedded Mode	Testcontainers
neo4j-driver	Manual	No	Yes
py2neo	No	No	Possible
neomodel	Manual	No	Yes
python-arango	No	No	Yes
pyTigerGraph	No	No	Limited
gremlinpython	No	JVM only	Yes (Gremlin Server)
pydgraph	No	No	Yes
rdflib	In-memory graph	Yes	N/A

Documentation and Community#

Library	Official Docs	API Reference	Examples	Community
neo4j-driver	Excellent	Complete	Extensive	Large
py2neo	Archived	Archived	Limited	Inactive
neomodel	Good	Complete	Moderate	Active
python-arango	Good	Complete	Good	Moderate
pyTigerGraph	Good	Complete	Good	Moderate
gremlinpython	Good	Reference	Book available	Large
pydgraph	Moderate	README	Basic	Small
rdflib	Excellent	Complete	Extensive	Large

Python Version Support#

Library	Min Version	Max Version	Notes
neo4j-driver	3.10	3.14	Drops 3.9 in v6
py2neo	3.x	-	EOL
neomodel	3.8	3.12+
python-arango	3.9	Latest
pyTigerGraph	3.7	Latest
gremlinpython	3.10	Latest
pydgraph	3.7	Latest
rdflib	3.8	Latest

Database Version Support#

Library	Supported Versions	LTS Support
neo4j-driver	4.4+, 5.x	4.4 LTS, 5.26 LTS
py2neo	4.x (frozen)	-
neomodel	4.4+, 5.x	Via driver
python-arango	3.11+	Via ArangoDB
pyTigerGraph	3.x+	Via TigerGraph
gremlinpython	TinkerPop 3.x	Via database
pydgraph	Version-matched	Via Dgraph
rdflib	N/A	N/A

Installation Size#

Library	Core Size	Dependencies	Optional Extras
neo4j-driver	~500KB	pytz	rust-ext (~2MB)
py2neo	~1MB	Several	pygments
neomodel	~200KB	neo4j-driver	shapely, extras
python-arango	~300KB	requests	-
pyTigerGraph	~500KB	requests	torch (GDS)
gremlinpython	~200KB	aiohttp, nest-asyncio	-
pydgraph	~100KB	grpcio, protobuf	-
rdflib	~2MB	pyparsing, isodate	lxml, html5lib

Summary Scores (1-5)#

Library	API Design	Performance	Async	Ecosystem	Overall
neo4j-driver	5	5	5	5	5.0
py2neo	4	3	1	1	2.3
neomodel	5	4	4	4	4.3
python-arango	4	4	3	4	3.8
pyTigerGraph	3	3	2	3	2.8
gremlinpython	3	3	2	4	3.0
pydgraph	3	4	2	2	2.8
rdflib	4	3	1	4	3.0

Scores based on: API ergonomics, Python idiom adherence, documentation quality, maintenance activity, and production readiness.

gremlinpython - Apache TinkerPop Gremlin Client#

Overview#

gremlinpython is the official Python language variant (GLV) for Apache TinkerPop’s Gremlin graph traversal language. It provides a consistent API for interacting with any TinkerPop-enabled graph database, offering database portability through a standardized query language.

Key Information#

Attribute	Value
Package	`gremlinpython`
Version	3.7.x
Python Support	3.10+
Protocol	WebSocket (GraphBinary/GraphSON)
License	Apache 2.0
Repository	github.com/apache/tinkerpop

Supported Databases#

gremlinpython works with any TinkerPop-compliant database:

Amazon Neptune
Azure Cosmos DB (Gremlin API)
JanusGraph
OrientDB
Neo4j (with TinkerPop plugin)
TigerGraph (with TinkerPop connector)
DataStax Graph

Installation#

pip install gremlinpython

Connection Management#

Basic Connection#

from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# Create traversal source
g = traversal().with_remote(
    DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
)

# With authentication
g = traversal().with_remote(
    DriverRemoteConnection(
        'wss://your-endpoint:8182/gremlin',
        'g',
        username='user',
        password='password'
    )
)

Connection Options#

from gremlin_python.driver.client import Client

# Lower-level client access
client = Client(
    'ws://localhost:8182/gremlin',
    'g',
    pool_size=8,           # Connection pool size
    max_workers=4,         # Thread pool workers
    message_serializer=None # Custom serializer
)

# Submit raw Gremlin
result = client.submit("g.V().count()")
for r in result:
    print(r)

Traversal Basics#

Creating Vertices#

# Add vertex
g.addV('person').property('name', 'Alice').property('age', 30).next()

# Add with ID
g.addV('person').property(T.id, 'alice').property('name', 'Alice').next()

Creating Edges#

# Add edge between vertices
g.V().has('person', 'name', 'Alice').as_('a') \
     .V().has('person', 'name', 'Bob').as_('b') \
     .addE('knows').from_('a').to('b').property('since', 2020).next()

Reading Data#

# Get all vertices of type
people = g.V().hasLabel('person').toList()

# Get specific vertex
alice = g.V().has('person', 'name', 'Alice').next()

# Get vertex properties
props = g.V().has('person', 'name', 'Alice').valueMap().next()

Updating Data#

# Update property
g.V().has('person', 'name', 'Alice') \
     .property('age', 31).next()

# Add multiple properties
g.V().has('person', 'name', 'Alice') \
     .property('email', '[email protected]') \
     .property('city', 'NYC').next()

Deleting Data#

# Delete vertex (and connected edges)
g.V().has('person', 'name', 'Alice').drop().iterate()

# Delete edge
g.E().hasLabel('knows').drop().iterate()

Traversal Patterns#

Filtering#

# Has filters
g.V().has('person', 'age', P.gt(25)).toList()
g.V().has('person', 'name', P.within('Alice', 'Bob')).toList()

# Multiple conditions
g.V().hasLabel('person') \
     .has('age', P.gte(18)) \
     .has('age', P.lt(65)).toList()

# Not filter
g.V().hasLabel('person').not_(__.has('retired', True)).toList()

Traversing Relationships#

# Outgoing edges
friends = g.V().has('person', 'name', 'Alice').out('knows').toList()

# Incoming edges
followers = g.V().has('person', 'name', 'Alice').in_('follows').toList()

# Both directions
connections = g.V().has('person', 'name', 'Alice').both('knows').toList()

# Multiple hops
friends_of_friends = g.V().has('person', 'name', 'Alice') \
                          .out('knows').out('knows') \
                          .dedup().toList()

Path Queries#

# Get paths
paths = g.V().has('person', 'name', 'Alice') \
             .repeat(__.out('knows')).times(2) \
             .path().by('name').toList()

# Shortest path
path = g.V().has('person', 'name', 'Alice') \
            .repeat(__.out().simplePath()) \
            .until(__.has('person', 'name', 'Charlie')) \
            .path().limit(1).next()

Aggregation#

# Count
count = g.V().hasLabel('person').count().next()

# Group by
by_age = g.V().hasLabel('person') \
              .group().by('age').by(__.count()).next()

# Statistics
stats = g.V().hasLabel('person') \
             .values('age').fold() \
             .project('min', 'max', 'avg', 'count') \
             .by(__.min()) \
             .by(__.max()) \
             .by(__.mean()) \
             .by(__.count()).next()

Serialization#

GraphBinary (Recommended)#

from gremlin_python.driver.serializer import GraphBinarySerializersV1

g = traversal().with_remote(
    DriverRemoteConnection(
        'ws://localhost:8182/gremlin',
        'g',
        message_serializer=GraphBinarySerializersV1()
    )
)

GraphSON#

from gremlin_python.driver.serializer import GraphSONSerializersV3d0

g = traversal().with_remote(
    DriverRemoteConnection(
        'ws://localhost:8182/gremlin',
        'g',
        message_serializer=GraphSONSerializersV3d0()
    )
)

Transaction Support#

# Begin transaction
tx = g.tx()

# Get transaction-bound traversal
gtx = tx.begin()

try:
    gtx.addV('person').property('name', 'Alice').next()
    gtx.addV('person').property('name', 'Bob').next()
    tx.commit()
except:
    tx.rollback()
    raise

Async Alternatives#

gremlinpython itself is synchronous. For async support, consider:

aiogremlin#

from aiogremlin import Cluster, Graph

cluster = await Cluster.open(hosts=['localhost'])
client = await cluster.connect()
g = Graph().traversal().withRemote(client)

# Async operations
result = await g.V().toList()

Goblin OGM#

from goblin import Goblin, Vertex, String

class Person(Vertex):
    name = String()

app = await Goblin.open(hosts=['localhost'])
session = await app.session()

person = Person(name='Alice')
session.add(person)
await session.flush()

gremlinpy (FastAPI compatible)#

from gremlinpy import Graph

g = Graph().traversal()
# Compatible with existing event loops

Traversal Strategies#

from gremlin_python.process.strategies import *

# Read-only strategy
g = g.withStrategies(ReadOnlyStrategy())

# Subgraph strategy (filter)
g = g.withStrategies(SubgraphStrategy(
    vertices=__.hasLabel('person'),
    edges=__.hasLabel('knows')
))

# Partition strategy
g = g.withStrategies(PartitionStrategy(
    partitionKey='region',
    writePartition='us-west'
))

Error Handling#

from gremlin_python.driver.protocol import GremlinServerError

try:
    result = g.V().has('invalid').next()
except GremlinServerError as e:
    print(f"Server error: {e}")
except StopIteration:
    print("No results found")

Connection Pooling Issues#

Known limitation (TINKERPOP-3114):

“Once a connection error occurred, pooled connections are broken and will not be recovered.”

Workaround:

# Implement connection health checks
def get_connection():
    try:
        g.V().limit(1).next()
        return g
    except:
        # Reconnect
        return create_new_connection()

Limitations#

No native Python asyncio (use aiogremlin/goblin)
Connection pool recovery issues
WebSocket-only protocol
Remote execution only (no embedded mode)
Reference-only objects from server (no full properties)
Significant memory overhead for large result sets

When to Use#

Choose gremlinpython when:

Database portability is important
Working with TinkerPop-compatible databases
Standard graph query language preferred
Amazon Neptune or Azure Cosmos DB target

Consider alternatives when:

Native async required (use aiogremlin/goblin)
Database-specific features needed
Maximum performance critical
OGM patterns preferred (use goblin)

Resources#

Neo4j Python Driver (neo4j)#

Overview#

The official Neo4j Python driver provides low-level, high-performance access to Neo4j databases using the Bolt protocol. Maintained by Neo4j Inc., it serves as the foundation for higher-level libraries like neomodel.

Key Information#

Attribute	Value
Package	`neo4j` (formerly `neo4j-driver`)
Version	6.0.x
Python Support	3.10, 3.11, 3.12, 3.13, 3.14
Protocol	Bolt 4.4, 5.0-5.8, 6.0
License	Apache 2.0
Repository	github.com/neo4j/neo4j-python-driver

Installation#

pip install neo4j

# With optional Rust extensions for performance
pip install neo4j-rust-ext

Core Features#

Connection Management#

from neo4j import GraphDatabase

# Basic connection
driver = GraphDatabase.driver(
    "neo4j://localhost:7687",
    auth=("neo4j", "password")
)

# With context manager (recommended)
with GraphDatabase.driver(uri, auth=auth) as driver:
    driver.verify_connectivity()
    # Use driver...

Query Execution#

# Simple query execution (v5.0+)
records, summary, keys = driver.execute_query(
    "MATCH (p:Person {name: $name}) RETURN p",
    name="Alice",
    database_="neo4j"
)

# With routing control
from neo4j import RoutingControl

records, summary, keys = driver.execute_query(
    "MATCH (p:Person) RETURN p",
    routing_=RoutingControl.READ
)

Session-Based Transactions#

with driver.session(database="neo4j") as session:
    # Managed transaction (recommended - auto-retry)
    result = session.execute_read(
        lambda tx: tx.run("MATCH (p:Person) RETURN p").data()
    )

    # Write transaction
    session.execute_write(
        lambda tx: tx.run(
            "CREATE (p:Person {name: $name})",
            name="Bob"
        )
    )

Async Support#

Full async/await support mirrors the synchronous API:

from neo4j import AsyncGraphDatabase

async def main():
    async with AsyncGraphDatabase.driver(uri, auth=auth) as driver:
        async with driver.session() as session:
            result = await session.execute_read(
                lambda tx: tx.run("MATCH (n) RETURN n").data()
            )

Async Features#

AsyncDriver, AsyncSession, AsyncTransaction
AsyncResult with async iteration
Compatible with asyncio, FastAPI, aiohttp
Shares non-I/O components with sync implementation

Connection Pooling#

Configuration Options#

driver = GraphDatabase.driver(
    uri, auth=auth,
    max_connection_pool_size=100,        # Max connections per host
    connection_acquisition_timeout=60,    # Seconds to wait for connection
    connection_timeout=30,                # TCP connection timeout
    max_connection_lifetime=3600,         # Max age of pooled connection
    liveness_check_timeout=60             # Idle check threshold
)

Best Practices#

Create one driver instance per application
Driver objects are expensive to create (connection pool setup)
Sessions are lightweight - create/close as needed
Use context managers for automatic resource cleanup

Transaction Support#

Transaction Types#

Auto-commit: Single statement, no retry
```
session.run("CREATE (n:Node)")
```

Managed Transactions: Recommended - includes retry logic

session.execute_read(work_function)
session.execute_write(work_function)

Explicit Transactions: Manual control

tx = session.begin_transaction()
try:
    tx.run(query)
    tx.commit()
except:
    tx.rollback()

Causal Consistency#

# Bookmark management for causal chains
with driver.session(bookmarks=[bookmark]) as session:
    session.execute_write(work)
    new_bookmark = session.last_bookmark()

Error Handling#

from neo4j.exceptions import (
    ServiceUnavailable,
    TransientError,
    ClientError,
    DatabaseError
)

try:
    driver.execute_query(query)
except ServiceUnavailable:
    # Connection/cluster issues
except TransientError:
    # Retryable errors (deadlock, etc.)
except ClientError:
    # Query syntax, constraint violations

Type System#

Neo4j to Python Type Mapping#

Neo4j Type	Python Type
Integer	int
Float	float
String	str
Boolean	bool
List	list
Map	dict
Node	neo4j.graph.Node
Relationship	neo4j.graph.Relationship
Path	neo4j.graph.Path
Date	datetime.date
DateTime	datetime.datetime
Duration	neo4j.time.Duration
Point	neo4j.spatial.Point

Performance Optimization#

Rust Extensions#

pip install neo4j-rust-ext

Drop-in replacement for default transport
Significant performance improvement for I/O-heavy workloads
No code changes required

Bulk Operations#

# Batch create with UNWIND
session.execute_write(lambda tx: tx.run(
    "UNWIND $batch AS row CREATE (n:Node {id: row.id, name: row.name})",
    batch=[{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]
))

Testing#

# Use testcontainers for integration tests
from testcontainers.neo4j import Neo4jContainer

with Neo4jContainer() as neo4j:
    driver = GraphDatabase.driver(
        neo4j.get_connection_url(),
        auth=("neo4j", neo4j.NEO4J_ADMIN_PASSWORD)
    )

Limitations#

No built-in ORM/OGM (use neomodel for that)
Cypher-only (no Gremlin support)
Manual schema management required
No migration tooling included

When to Use#

Choose neo4j-driver when:

Direct control over queries and transactions is needed
Performance is critical (with Rust extensions)
Building custom abstractions on top
Async support is required

Consider alternatives when:

OGM patterns would simplify development (use neomodel)
Multi-database portability is needed (use gremlinpython)

Resources#

neomodel - Neo4j Object Graph Mapper#

Overview#

neomodel is a Python Object Graph Mapper (OGM) for Neo4j that provides Django-style model definitions for graph data. It allows developers to work with graph data using Pythonic patterns without writing raw Cypher queries.

Key Information#

Attribute	Value
Package	`neomodel`
Version	6.0.x
Python Support	3.8+
Protocol	Bolt (via neo4j-driver)
License	MIT
Repository	github.com/neo4j-contrib/neomodel
Status	Neo4j Labs (actively maintained)

Installation#

pip install neomodel

# With extras (includes Shapely for spatial data)
pip install neomodel[extras]

# With Rust driver extensions for performance
pip install neomodel[rust-driver-ext]

Configuration#

from neomodel import config

# Connection string
config.DATABASE_URL = 'bolt://neo4j:password@localhost:7687'

# Or using dataclass configuration (v6.0+)
from neomodel import NeomodelConfig

config = NeomodelConfig(
    driver_options={"max_connection_pool_size": 50},
    database="neo4j",
    auto_install_labels=True
)

Environment Variables#

NEO4J_BOLT_URL=bolt://neo4j:password@localhost:7687
NEO4J_DATABASE=neo4j

Model Definition#

Basic Node Definition#

from neomodel import (
    StructuredNode, StringProperty, IntegerProperty,
    UniqueIdProperty, RelationshipTo, RelationshipFrom
)

class Person(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(required=True)
    age = IntegerProperty(index=True)

    # Relationships
    friends = RelationshipTo('Person', 'FRIENDS_WITH')
    employer = RelationshipTo('Company', 'WORKS_AT')

Property Types#

from neomodel import (
    StringProperty,      # String values
    IntegerProperty,     # Integer values
    FloatProperty,       # Floating point
    BooleanProperty,     # Boolean
    DateProperty,        # datetime.date
    DateTimeProperty,    # datetime.datetime
    UniqueIdProperty,    # Auto-generated UUID
    ArrayProperty,       # Lists
    JSONProperty,        # JSON-serializable dicts
    PointProperty,       # Spatial data
)

Relationship Properties#

from neomodel import StructuredRel, DateTimeProperty

class WorkedAt(StructuredRel):
    start_date = DateTimeProperty()
    end_date = DateTimeProperty()
    role = StringProperty()

class Person(StructuredNode):
    name = StringProperty()
    employers = RelationshipTo('Company', 'WORKED_AT', model=WorkedAt)

CRUD Operations#

Create#

# Create single node
person = Person(name="Alice", age=30).save()

# Create with relationships
company = Company(name="Acme").save()
person.employer.connect(company)

Read#

# Get by property
alice = Person.nodes.get(name="Alice")

# Filter nodes
adults = Person.nodes.filter(age__gte=18)

# All nodes
all_people = Person.nodes.all()

# First match
first_person = Person.nodes.first()

Update#

person = Person.nodes.get(name="Alice")
person.age = 31
person.save()

Delete#

person = Person.nodes.get(name="Alice")
person.delete()

Query API#

Filtering#

# Comparison operators
Person.nodes.filter(age__gt=25)      # Greater than
Person.nodes.filter(age__gte=25)     # Greater or equal
Person.nodes.filter(age__lt=25)      # Less than
Person.nodes.filter(age__lte=25)     # Less or equal
Person.nodes.filter(name__ne="Bob")  # Not equal

# String operators
Person.nodes.filter(name__contains="ali")
Person.nodes.filter(name__startswith="A")
Person.nodes.filter(name__endswith="ce")
Person.nodes.filter(name__icontains="ALI")  # Case insensitive

# List operations
Person.nodes.filter(name__in=["Alice", "Bob"])

Traversal (v6.0+)#

# Advanced traversal with filtering and ordering
results = Person.nodes.filter(name="Alice").traverse(
    relation_type="FRIENDS_WITH",
    filter_expr={"age__gte": 18},
    order_by="age"
)

Raw Cypher#

from neomodel import db

results, meta = db.cypher_query(
    "MATCH (p:Person) WHERE p.age > $age RETURN p",
    {"age": 25}
)

Async Support#

from neomodel import adb, AsyncStructuredNode

class Person(AsyncStructuredNode):
    name = StringProperty()

async def main():
    # Async operations
    person = await Person(name="Alice").save()
    alice = await Person.nodes.get(name="Alice")
    await person.delete()

    # Async traversal
    friends = await alice.friends.all()

Async Configuration#

from neomodel import adb

await adb.set_connection("bolt://localhost:7687")

Schema Management#

Constraints and Indexes#

from neomodel import install_all_labels, install_labels

# Install all constraints and indexes
install_all_labels()

# Install for specific models
install_labels(Person)

Schema Definition#

class Person(StructuredNode):
    # Unique constraint
    email = StringProperty(unique_index=True)

    # Index only
    name = StringProperty(index=True)

    # Required (not null)
    created = DateTimeProperty(required=True)

Hooks#

class Person(StructuredNode):
    name = StringProperty()

    def pre_save(self):
        # Called before saving
        self.name = self.name.strip()

    def post_save(self):
        # Called after saving
        log.info(f"Saved {self.name}")

    def pre_delete(self):
        # Called before deletion
        pass

    def post_delete(self):
        # Called after deletion
        pass

Transaction Support#

from neomodel import db

# Context manager
with db.transaction:
    person = Person(name="Alice").save()
    company = Company(name="Acme").save()
    person.employer.connect(company)

# Explicit control
db.begin()
try:
    person = Person(name="Alice").save()
    db.commit()
except:
    db.rollback()
    raise

Django Integration#

# settings.py
NEOMODEL_NEO4J_BOLT_URL = 'bolt://neo4j:password@localhost:7687'

# models.py
from django_neomodel import DjangoNode
from neomodel import StringProperty

class Person(DjangoNode):
    name = StringProperty()

    class Meta:
        app_label = 'myapp'

Vector and Full-Text Search (v6.0+)#

from neomodel import VectorIndex, FullTextIndex

class Document(StructuredNode):
    content = StringProperty()
    embedding = ArrayProperty()

    # Vector index for semantic search
    __vector_index__ = VectorIndex(
        property_name='embedding',
        dimensions=384
    )

    # Full-text index
    __fulltext_index__ = FullTextIndex(
        property_names=['content']
    )

Performance Considerations#

Batch Operations#

# Use batch_save for bulk inserts
from neomodel import db

with db.transaction:
    for data in large_dataset:
        Person(name=data['name']).save()

Connection Pooling#

Inherited from neo4j-driver configuration - set via driver_options in config.

Limitations#

Neo4j-specific (no multi-database portability)
No automatic migration tooling (schema drift possible)
OGM overhead vs. raw Cypher
Complex traversals may require raw Cypher

When to Use#

Choose neomodel when:

Django-like model patterns preferred
Type safety and validation important
Schema enforcement needed
Working primarily with Neo4j

Consider alternatives when:

Maximum performance required (use neo4j-driver)
Multi-database support needed (use gremlinpython)
Complex graph algorithms (use raw Cypher)

Resources#

py2neo (End of Life)#

Status: Archived#

IMPORTANT: py2neo is End of Life (EOL) as of 2023. No further updates will be released. Users should migrate to the official Neo4j Python driver.

The project has been transferred to Neo4j Inc. for archival purposes at neo4j-contrib/py2neo.

Overview#

py2neo was a comprehensive Neo4j client library and toolkit providing a high-level API, OGM capabilities, admin tools, and a Cypher lexer for Pygments. It supported both Bolt and HTTP protocols.

Key Information#

Attribute	Value
Package	`py2neo`
Final Version	2021.2
Python Support	3.x
Protocols	Bolt, HTTP
License	Apache 2.0
Repository	github.com/neo4j-contrib/py2neo (archived)

Historical Features#

Graph Object API#

from py2neo import Graph, Node, Relationship

# Connect to database
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))

# Create nodes and relationships
alice = Node("Person", name="Alice")
bob = Node("Person", name="Bob")
knows = Relationship(alice, "KNOWS", bob)

# Merge to database
graph.merge(alice, "Person", "name")

OGM Capabilities#

from py2neo.ogm import GraphObject, Property, RelatedTo

class Person(GraphObject):
    __primarykey__ = "name"

    name = Property()
    born = Property()
    friends = RelatedTo("Person", "KNOWS")

# Usage
person = Person()
person.name = "Alice"
graph.push(person)

Cypher Execution#

# Direct Cypher queries
results = graph.run(
    "MATCH (p:Person {name: $name}) RETURN p",
    name="Alice"
)

for record in results:
    print(record["p"])

Batch Operations#

from py2neo import Graph

tx = graph.begin()
for i in range(1000):
    tx.create(Node("Item", id=i))
    if i % 100 == 0:
        tx.commit()
        tx = graph.begin()
tx.commit()

Migration Path#

Recommended Migration: neo4j-driver#

For low-level access, migrate to the official Neo4j Python driver:

# py2neo (old)
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
result = graph.run("MATCH (n) RETURN n")

# neo4j-driver (new)
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
    result = session.run("MATCH (n) RETURN n")

Recommended Migration: neomodel#

For OGM functionality, migrate to neomodel:

# py2neo OGM (old)
from py2neo.ogm import GraphObject, Property

class Person(GraphObject):
    name = Property()

# neomodel (new)
from neomodel import StructuredNode, StringProperty

class Person(StructuredNode):
    name = StringProperty()

Why py2neo Was Deprecated#

Maintenance Burden: Single maintainer model not sustainable
Official Driver Improvements: Neo4j’s official driver matured significantly
Community Fragmentation: Multiple overlapping libraries caused confusion
Compatibility Challenges: Keeping up with Neo4j versions became difficult

Historical Strengths#

Clean, Pythonic API
Built-in OGM functionality
Cypher lexer for syntax highlighting
HTTP fallback when Bolt unavailable
Comprehensive documentation

Historical Limitations#

Single maintainer created bus factor risk
Calendar versioning led to breaking changes
No async support
Performance overhead vs. official driver
Infrequent updates in later years

Lessons for Library Selection#

The py2neo deprecation offers lessons for evaluating graph database libraries:

Prefer Official Drivers: Better long-term support guarantees
Check Maintainer Count: Multiple maintainers reduce abandonment risk
Evaluate Release Frequency: Regular releases indicate active maintenance
Consider Corporate Backing: Libraries backed by database vendors more stable

Current Alternatives#

Need	Recommended Library
Low-level access	neo4j-driver
OGM functionality	neomodel
Async support	neo4j-driver (async)
Bulk operations	neo4j-driver + UNWIND

Resources (Archival)#

pydgraph - Dgraph Python Client#

Overview#

pydgraph is the official Python client for Dgraph, a distributed, horizontally scalable graph database. It uses gRPC for high-performance communication and supports Dgraph’s GraphQL-like query language (DQL, formerly GraphQL+-).

Key Information#

Attribute	Value
Package	`pydgraph`
Version	24.x / 25.x
Python Support	3.7+
Protocol	gRPC
License	Apache 2.0
Repository	github.com/hypermodeinc/pydgraph

Version Compatibility#

Dgraph Version	pydgraph Version
21.03.x	21.03.x
23.0.x	23.0.x
24.0.x	24.0.x
25.0.x	25.0.x

Installation#

pip install pydgraph

Connection Management#

Basic Connection#

import pydgraph

# Create client stub
stub = pydgraph.DgraphClientStub('localhost:9080')

# Create client
client = pydgraph.DgraphClient(stub)

# Close when done
stub.close()

Multiple Stubs (Cluster)#

# Connect to multiple cluster nodes
stub1 = pydgraph.DgraphClientStub('node1:9080')
stub2 = pydgraph.DgraphClientStub('node2:9080')
stub3 = pydgraph.DgraphClientStub('node3:9080')

client = pydgraph.DgraphClient(stub1, stub2, stub3)

Connection Strings#

# Using connection string
client = pydgraph.DgraphClient.from_cloud(
    "dgraph://user:pass@host:9080?tls=true"
)

# Dgraph Cloud
client = pydgraph.DgraphClient.from_cloud(
    "https://your-instance.cloud.dgraph.io/graphql",
    api_key="your-api-key"
)

TLS Configuration#

import grpc

# Load credentials
with open('ca.crt', 'rb') as f:
    ca_cert = f.read()

credentials = grpc.ssl_channel_credentials(ca_cert)

stub = pydgraph.DgraphClientStub(
    'localhost:9080',
    credentials=credentials
)

Schema Management#

Alter Schema#

schema = """
    name: string @index(exact) .
    age: int @index(int) .
    email: string @index(hash) @upsert .

    type Person {
        name
        age
        email
        friends
    }
"""

client.alter(pydgraph.Operation(schema=schema))

Drop Operations#

# Drop all data and schema
client.alter(pydgraph.Operation(drop_all=True))

# Drop specific predicate
client.alter(pydgraph.Operation(drop_attr='name'))

# Drop specific type
client.alter(pydgraph.Operation(drop_op=pydgraph.Operation.TYPE, drop_value='Person'))

Transaction Types#

Read-Write Transaction#

txn = client.txn()
try:
    # Mutations and queries
    response = txn.mutate(set_nquads='_:alice <name> "Alice" .')
    txn.commit()
finally:
    txn.discard()

Read-Only Transaction#

txn = client.txn(read_only=True)
try:
    response = txn.query(query_string)
finally:
    txn.discard()

Best-Effort Transaction#

# For stale reads (better performance)
txn = client.txn(read_only=True, best_effort=True)

Mutations#

JSON Mutations#

import json

txn = client.txn()
try:
    data = {
        'uid': '_:alice',
        'dgraph.type': 'Person',
        'name': 'Alice',
        'age': 30,
        'friends': [
            {'uid': '_:bob', 'dgraph.type': 'Person', 'name': 'Bob'}
        ]
    }

    response = txn.mutate(set_obj=data)

    # Get assigned UIDs
    alice_uid = response.uids['alice']
    bob_uid = response.uids['bob']

    txn.commit()
finally:
    txn.discard()

N-Quads Mutations#

txn = client.txn()
try:
    nquads = """
        _:alice <dgraph.type> "Person" .
        _:alice <name> "Alice" .
        _:alice <age> "30"^^<xs:int> .
    """
    response = txn.mutate(set_nquads=nquads)
    txn.commit()
finally:
    txn.discard()

Delete Mutations#

txn = client.txn()
try:
    # Delete specific predicate
    txn.mutate(del_nquads=f'<{uid}> <name> * .')

    # Delete node completely
    txn.mutate(del_obj={'uid': uid})

    txn.commit()
finally:
    txn.discard()

Queries (DQL)#

Basic Query#

query = """
    {
        people(func: type(Person)) {
            uid
            name
            age
            friends {
                name
            }
        }
    }
"""

response = client.txn(read_only=True).query(query)
result = json.loads(response.json)
people = result['people']

Parameterized Query#

query = """
    query findPerson($name: string) {
        person(func: eq(name, $name)) {
            uid
            name
            age
        }
    }
"""

variables = {'$name': 'Alice'}
response = client.txn(read_only=True).query(query, variables=variables)

Aggregation Queries#

query = """
    {
        stats(func: type(Person)) {
            count: count(uid)
            avgAge: avg(age)
            minAge: min(age)
            maxAge: max(age)
        }
    }
"""

Upsert Operations#

Basic Upsert#

query = """
    query {
        user as var(func: eq(email, "[email protected]"))
    }
"""

mutation = """
    uid(user) <name> "Alice" .
    uid(user) <email> "[email protected]" .
"""

txn = client.txn()
try:
    request = txn.create_request(
        query=query,
        mutations=[pydgraph.Mutation(set_nquads=mutation)]
    )
    response = txn.do_request(request)
    txn.commit()
finally:
    txn.discard()

Conditional Upsert#

query = """
    query {
        user as var(func: eq(email, "[email protected]"))
    }
"""

# Only mutate if user doesn't exist
mutation = pydgraph.Mutation(
    set_nquads='uid(user) <name> "Alice" .',
    cond='@if(eq(len(user), 0))'
)

Async Operations#

pydgraph provides async variants using gRPC futures:

async_alter#

future = client.async_alter(pydgraph.Operation(schema=schema))

# Handle result
try:
    result = pydgraph.DgraphClient.handle_alter_future(future)
except Exception as e:
    if pydgraph.util.is_jwt_expired(e):
        # Refresh token and retry
        pass

async_query and async_mutation#

txn = client.txn()

# Async query
query_future = txn.async_query(query_string)
result = pydgraph.Txn.handle_query_future(query_future)

# Async mutation
mutation_future = txn.async_mutation(set_obj=data)
result = pydgraph.Txn.handle_mutate_future(mutation_future)

Note: Async methods use gRPC futures, not native Python asyncio. They cannot retry on JWT expiration.

ACL and Authentication#

# Login with credentials
client.login("groot", "password")

# Login with namespace (multi-tenancy)
client.login_into_namespace("user", "password", namespace=1)

Refresh Token#

# Tokens expire - refresh periodically
client.retry_login()

Error Handling#

import grpc

try:
    response = txn.mutate(set_obj=data)
    txn.commit()
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.ABORTED:
        # Transaction conflict - retry
        pass
    elif e.code() == grpc.StatusCode.UNAUTHENTICATED:
        # JWT expired
        client.retry_login()
except pydgraph.errors.TransactionError as e:
    # Transaction-specific error
    pass
finally:
    txn.discard()

Performance Considerations#

Batch Mutations#

# Accumulate mutations, commit in batches
txn = client.txn()
batch = []

for item in large_dataset:
    batch.append(item)
    if len(batch) >= 1000:
        txn.mutate(set_obj=batch)
        batch = []

if batch:
    txn.mutate(set_obj=batch)

txn.commit()

Connection Reuse#

# Reuse client and stubs across requests
# Create once at application startup

Limitations#

gRPC futures not native asyncio
No connection pooling (manage stubs manually)
No OGM layer included
DQL learning curve (different from Cypher/Gremlin)
Limited IDE support for DQL
No built-in migration tooling

When to Use#

Choose pydgraph when:

Distributed, horizontally scalable graph needed
GraphQL-native development preferred
Multi-tenancy (namespaces) required
Integration with Dgraph Cloud
High-write throughput scenarios

Consider alternatives when:

Native asyncio critical
OGM patterns preferred
Cypher or Gremlin expertise exists
Smaller scale deployments

Resources#

python-arango - ArangoDB Python Driver#

Overview#

python-arango is the official Python driver for ArangoDB, providing comprehensive access to ArangoDB’s multi-model capabilities including document, graph, and key-value operations. It offers a Pythonic interface to ArangoDB’s REST API.

Key Information#

Attribute	Value
Package	`python-arango`
Version	8.2.x
Python Support	3.9+
ArangoDB Support	3.11+
Protocol	HTTP REST
License	MIT
Repository	github.com/arangodb/python-arango

Installation#

pip install python-arango

# For async support (separate package)
pip install python-arango-async

Connection Management#

Basic Connection#

from arango import ArangoClient

# Initialize client
client = ArangoClient(hosts="http://localhost:8529")

# Connect to database
db = client.db("mydb", username="root", password="password")

# System database for admin operations
sys_db = client.db("_system", username="root", password="password")

Connection Options#

client = ArangoClient(
    hosts="http://localhost:8529",
    http_client=None,          # Custom HTTP client
    serializer=None,           # Custom JSON serializer
    deserializer=None          # Custom JSON deserializer
)

Document Operations#

Basic CRUD#

# Get collection
collection = db.collection("users")

# Insert
metadata = collection.insert({"name": "Alice", "age": 30})
# Returns: {'_id': 'users/12345', '_key': '12345', '_rev': '_abc123'}

# Get by key
doc = collection.get("12345")

# Update
collection.update({"_key": "12345", "age": 31})

# Replace
collection.replace({"_key": "12345", "name": "Alice", "age": 31})

# Delete
collection.delete("12345")

Batch Operations#

# Batch insert
docs = [{"name": f"User{i}"} for i in range(1000)]
results = collection.insert_many(docs)

# Batch update
updates = [{"_key": key, "status": "active"} for key in keys]
collection.update_many(updates)

# Batch delete
collection.delete_many([{"_key": k} for k in keys])

AQL Queries#

Query Execution#

# Simple query
cursor = db.aql.execute(
    "FOR doc IN users FILTER doc.age > @min_age RETURN doc",
    bind_vars={"min_age": 25}
)

# Iterate results
for doc in cursor:
    print(doc)

# Get all results as list
results = cursor.batch()

Query Options#

cursor = db.aql.execute(
    query,
    bind_vars={"param": value},
    count=True,           # Include count
    batch_size=100,       # Results per batch
    ttl=3600,             # Cursor TTL in seconds
    max_runtime=30.0,     # Max execution time
    profile=True          # Include query profile
)

# Access statistics
print(cursor.statistics())
print(cursor.profile())

Graph Operations#

Graph Management#

# Create graph
graph = db.create_graph(
    "social",
    edge_definitions=[{
        "edge_collection": "knows",
        "from_vertex_collections": ["users"],
        "to_vertex_collections": ["users"]
    }]
)

# Get existing graph
graph = db.graph("social")

Vertex Operations#

# Get vertex collection
users = graph.vertex_collection("users")

# Insert vertex
users.insert({"_key": "alice", "name": "Alice"})

# Get vertex
alice = users.get("alice")

# Update vertex
users.update({"_key": "alice", "age": 30})

Edge Operations#

# Get edge collection
knows = graph.edge_collection("knows")

# Insert edge
knows.insert({
    "_from": "users/alice",
    "_to": "users/bob",
    "since": 2020
})

# Traverse graph
result = graph.traverse(
    start_vertex="users/alice",
    direction="outbound",
    max_depth=2
)

Async Support#

python-arango-async (Separate Package)#

from arangoasync import ArangoClient
from arangoasync.auth import Auth

async with ArangoClient(hosts="http://localhost:8529") as client:
    auth = Auth(username="root", password="password")
    db = await client.db("mydb", auth=auth)

    # Async operations
    collection = db.collection("users")
    await collection.insert({"name": "Alice"})

    cursor = await db.aql.execute("FOR doc IN users RETURN doc")
    async for doc in cursor:
        print(doc)

Fire-and-Forget Async (python-arango)#

# Create async execution context
async_db = db.begin_async_execution(return_result=True)

# Queue operations
job1 = async_db.collection("users").insert({"name": "Alice"})
job2 = async_db.collection("users").insert({"name": "Bob"})

# Check job status
print(job1.status())  # 'pending', 'done', or 'error'

# Get results when ready
result1 = job1.result()

Transaction Support#

Single Request Transactions#

# Stream transaction
txn = db.begin_transaction(
    read=["users"],
    write=["orders"]
)

try:
    txn.collection("users").insert({"name": "Alice"})
    txn.collection("orders").insert({"item": "Book"})
    txn.commit()
except:
    txn.abort()
    raise

JavaScript Transactions#

# Server-side transaction with JavaScript
result = db.transaction(
    read=["users"],
    write=["orders"],
    action="""
        function(params) {
            const db = require('@arangodb').db;
            const user = db.users.insert({name: params.name});
            return user;
        }
    """,
    params={"name": "Alice"}
)

Index Management#

# Create persistent index
collection.add_persistent_index(
    fields=["name", "email"],
    unique=True,
    sparse=False
)

# Create geo index
collection.add_geo_index(
    fields=["location"],
    geo_json=True
)

# Create fulltext index (deprecated, use ArangoSearch)
collection.add_fulltext_index(
    fields=["description"],
    min_length=3
)

# List indexes
for index in collection.indexes():
    print(index)

ArangoSearch Views#

# Create search view
db.create_arangosearch_view(
    name="users_view",
    properties={
        "links": {
            "users": {
                "analyzers": ["text_en"],
                "fields": {
                    "name": {},
                    "bio": {"analyzers": ["text_en"]}
                }
            }
        }
    }
)

# Search query
cursor = db.aql.execute("""
    FOR doc IN users_view
    SEARCH ANALYZER(doc.bio == "developer", "text_en")
    RETURN doc
""")

Error Handling#

from arango.exceptions import (
    ArangoError,
    DocumentInsertError,
    DocumentGetError,
    AQLQueryExecuteError,
    TransactionAbortError
)

try:
    collection.insert({"_key": "duplicate"})
except DocumentInsertError as e:
    print(f"Error code: {e.error_code}")
    print(f"HTTP status: {e.http_code}")
    print(f"Message: {e.error_message}")
except ArangoError as e:
    # Generic error handling
    pass

Foxx Microservices#

# Install Foxx service
db.foxx.install(
    mount="/myapp",
    source="https://github.com/user/foxx-service/archive/main.zip"
)

# Call Foxx endpoint
response = db.foxx.request(
    method="POST",
    mount="/myapp",
    path="/api/endpoint",
    data={"param": "value"}
)

Cluster Support#

# Cluster health
health = sys_db.cluster.server_health()

# Cluster statistics
stats = sys_db.cluster.statistics()

# Rebalance shards
sys_db.cluster.rebalance_shards()

Limitations#

No native Python asyncio in main package (use python-arango-async)
No OGM layer (document-centric design)
HTTP protocol only (no binary protocol)
Fire-and-forget async differs from true async

When to Use#

Choose python-arango when:

Multi-model database needed (document + graph + key-value)
AQL query language preferred
Microservice architecture (Foxx)
Horizontal scaling required

Consider alternatives when:

Pure graph database needed (use Neo4j)
Native asyncio critical (use python-arango-async)
Gremlin compatibility needed (use gremlinpython)

Resources#

pyTigerGraph - TigerGraph Python Client#

Overview#

pyTigerGraph is the official Python package for interacting with TigerGraph databases. It provides comprehensive access to TigerGraph’s graph analytics and machine learning capabilities, with special emphasis on Graph Data Science (GDS) workflows.

Key Information#

Attribute	Value
Package	`pyTigerGraph`
Version	1.6.x
Python Support	3.7+
Protocol	REST API
License	Apache 2.0
Repository	github.com/tigergraph/pyTigerGraph

Installation#

# Core package
pip install pyTigerGraph

# With Graph Data Science features
pip install 'pyTigerGraph[gds]'

Connection Management#

Basic Connection#

import pyTigerGraph as tg

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

# Generate API token
conn.getToken(conn.createSecret())

TigerGraph Cloud#

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    apiToken="your-api-token"
)

Async Connection#

from pyTigerGraph import AsyncTigerGraphConnection

async_conn = AsyncTigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

Schema Management#

Object-Oriented Schema (v1.5+)#

# Define vertex types
person = conn.gds.vertexType("Person", [
    ("name", "STRING"),
    ("age", "INT"),
    ("email", "STRING")
])

# Define edge types
knows = conn.gds.edgeType("KNOWS",
    from_vertex="Person",
    to_vertex="Person",
    attributes=[
        ("since", "DATETIME"),
        ("strength", "FLOAT")
    ]
)

# Create graph from schema
conn.gds.createGraph("SocialNetwork", [person], [knows])

GSQL Schema Definition#

# Execute GSQL directly
conn.gsql("""
    CREATE VERTEX Person (
        PRIMARY_ID id STRING,
        name STRING,
        age INT
    )

    CREATE DIRECTED EDGE KNOWS (
        FROM Person,
        TO Person,
        since DATETIME
    )
""")

Data Operations#

Vertex Operations#

# Upsert vertex (insert or update)
conn.upsertVertex(
    vertexType="Person",
    vertexId="alice",
    attributes={"name": "Alice", "age": 30}
)

# Get vertex
vertex = conn.getVertices(
    vertexType="Person",
    vertexIds=["alice"]
)

# Delete vertex
conn.delVertices(
    vertexType="Person",
    vertexId="alice"
)

Edge Operations#

# Upsert edge
conn.upsertEdge(
    sourceVertexType="Person",
    sourceVertexId="alice",
    edgeType="KNOWS",
    targetVertexType="Person",
    targetVertexId="bob",
    attributes={"since": "2020-01-01"}
)

# Get edges
edges = conn.getEdges(
    sourceVertexType="Person",
    sourceVertexId="alice",
    edgeType="KNOWS"
)

Bulk Operations#

# Bulk upsert vertices
vertices = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Charlie", "age": 35}
]
conn.upsertVertices("Person", vertices)

# Bulk upsert edges
edges = [
    {
        "from_id": "alice",
        "to_id": "bob",
        "attributes": {"since": "2020-01-01"}
    }
]
conn.upsertEdges("Person", "KNOWS", "Person", edges)

GSQL Queries#

Installed Queries#

# Install query
conn.gsql("""
    CREATE QUERY findFriends(VERTEX<Person> p) FOR GRAPH MyGraph {
        Start = {p};
        Friends = SELECT t
                  FROM Start:s -(KNOWS)-> Person:t;
        PRINT Friends;
    }
    INSTALL QUERY findFriends
""")

# Run installed query
result = conn.runInstalledQuery("findFriends", {"p": "alice"})

# Async query execution
job_id = conn.runInstalledQuery("longQuery", params={}, runAsync=True)
status = conn.checkQueryStatus([job_id])

Interpreted Queries#

# Run query without installing
result = conn.gsql("""
    INTERPRET QUERY () FOR GRAPH MyGraph {
        Persons = SELECT p FROM Person:p;
        PRINT Persons;
    }
""")

Query Metadata#

# Get query information
metadata = conn.getQueryMetadata("findFriends")

# List running queries
running = conn.getRunningQueries()

# Abort query
conn.abortQuery(["query_id_1", "query_id_2"])

Graph Data Science#

Feature Engineering (with GDS package)#

# Install GDS algorithms
conn.gds.featurizer.installAlgorithm("pagerank")

# Run PageRank
result = conn.gds.featurizer.runAlgorithm(
    "pagerank",
    params={"v_type": "Person", "e_type": "KNOWS"}
)

# Community detection
result = conn.gds.featurizer.runAlgorithm(
    "louvain",
    params={"v_type": "Person", "e_type": "KNOWS"}
)

Graph Neural Networks#

# PyTorch Geometric data loader
from torch_geometric.loader import NeighborLoader

# Create graph data
data = conn.gds.getVertexDataFrame("Person")

# Vertex feature extraction
features = conn.gds.featurizer.extractVertexFeatures(
    v_type="Person",
    attributes=["age", "degree"]
)

# Edge feature extraction
edge_features = conn.gds.featurizer.extractEdgeFeatures(
    e_type="KNOWS",
    attributes=["strength"]
)

Train/Test Split#

# Split vertices for ML
conn.gds.vertexSplitter(
    v_types=["Person"],
    train_fraction=0.8,
    validate_fraction=0.1,
    test_fraction=0.1
)

Error Handling#

from pyTigerGraph.pyTigerGraphException import TigerGraphException

try:
    result = conn.runInstalledQuery("nonexistent")
except TigerGraphException as e:
    print(f"Error: {e}")

Authentication#

Token Management#

# Create secret
secret = conn.createSecret()

# Get token with lifetime
token = conn.getToken(secret, lifetime=86400)  # 24 hours

# Refresh token
new_token = conn.refreshToken(secret)

Role-Based Access#

# With RBAC
conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="analyst",
    password="password"
)

Performance Considerations#

Caveats#

From official documentation:

“pyTigerGraph may perform slower than direct HTTP requests to the TigerGraph REST API due to its feature-rich abstraction layer adding URL setup, logging, authentication, and validation.”

Optimization Tips#

# Use bulk operations for large datasets
conn.upsertVertices("Person", large_list, atomic=False)

# Disable unnecessary logging
import logging
logging.getLogger("pyTigerGraph").setLevel(logging.WARNING)

# Use async for long-running queries
job_id = conn.runInstalledQuery("heavyQuery", runAsync=True)

Limitations#

REST-only protocol (higher latency than binary protocols)
Performance overhead from abstraction layer
GSQL learning curve for complex queries
Less mature ecosystem than Neo4j/ArangoDB
GDS features require additional package

When to Use#

Choose pyTigerGraph when:

Graph analytics and ML are primary use cases
Large-scale graph processing needed
GSQL expertise available
TigerGraph Cloud deployment
Integration with PyTorch Geometric or DGL needed

Consider alternatives when:

Simple CRUD operations primary use case
Low latency critical (consider direct REST)
Multi-database portability needed
Smaller graphs with simpler requirements

Resources#

rdflib - RDF Graph Library for Python#

Overview#

rdflib is a pure Python library for working with RDF (Resource Description Framework) data. It provides comprehensive support for parsing, serializing, and querying RDF graphs using SPARQL, making it the standard choice for semantic web and linked data applications in Python.

Key Information#

Attribute	Value
Package	`rdflib`
Version	7.2.x
Python Support	3.8+
Query Language	SPARQL 1.1
License	BSD-3-Clause
Repository	github.com/RDFLib/rdflib

Installation#

pip install rdflib

# With optional dependencies
pip install rdflib[html,lxml]

Core Concepts#

RDF Triples#

RDF data consists of triples: (subject, predicate, object)

from rdflib import Graph, Literal, URIRef, Namespace
from rdflib.namespace import RDF, FOAF, XSD

# Create graph
g = Graph()

# Define namespace
EX = Namespace("http://example.org/")

# Add triple
g.add((
    EX.alice,                           # Subject
    FOAF.name,                          # Predicate
    Literal("Alice", datatype=XSD.string)  # Object
))

Node Types#

from rdflib import URIRef, Literal, BNode

# URI Reference (resources)
person = URIRef("http://example.org/alice")

# Literal (values)
name = Literal("Alice")
age = Literal(30, datatype=XSD.integer)
name_en = Literal("Alice", lang="en")

# Blank Node (anonymous)
address = BNode()

Graph Operations#

Creating and Populating Graphs#

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF, FOAF

g = Graph()

# Bind namespace prefix
g.bind("foaf", FOAF)
g.bind("ex", EX)

# Add triples
alice = URIRef("http://example.org/alice")
bob = URIRef("http://example.org/bob")

g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, bob))

g.add((bob, RDF.type, FOAF.Person))
g.add((bob, FOAF.name, Literal("Bob")))

Querying Triples#

# All triples
for s, p, o in g:
    print(s, p, o)

# Specific patterns
for person in g.subjects(RDF.type, FOAF.Person):
    name = g.value(person, FOAF.name)
    print(f"{person}: {name}")

# Check existence
if (alice, FOAF.knows, bob) in g:
    print("Alice knows Bob")

Removing Triples#

# Remove specific triple
g.remove((alice, FOAF.knows, bob))

# Remove by pattern (None = wildcard)
g.remove((alice, None, None))  # Remove all triples about Alice

SPARQL Queries#

SELECT Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    SELECT ?name ?friend
    WHERE {
        ?person a foaf:Person ;
                foaf:name ?name ;
                foaf:knows ?friendUri .
        ?friendUri foaf:name ?friend .
    }
"""

for row in g.query(query):
    print(f"{row.name} knows {row.friend}")

ASK Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    ASK {
        ?person foaf:name "Alice" .
    }
"""

result = g.query(query)
print(bool(result))  # True or False

CONSTRUCT Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX ex: <http://example.org/>

    CONSTRUCT {
        ?person ex:displayName ?name .
    }
    WHERE {
        ?person foaf:name ?name .
    }
"""

result_graph = g.query(query).graph

Parameterized Queries#

from rdflib.plugins.sparql import prepareQuery

query = prepareQuery("""
    SELECT ?name
    WHERE {
        ?person foaf:name ?name .
    }
""", initNs={"foaf": FOAF})

# With initial bindings
results = g.query(
    query,
    initBindings={'person': alice}
)

SPARQL Update#

update = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    INSERT DATA {
        <http://example.org/charlie> a foaf:Person ;
            foaf:name "Charlie" .
    }
"""

g.update(update)

Serialization#

Parsing RDF#

# Parse from file
g.parse("data.ttl", format="turtle")

# Parse from URL
g.parse("http://example.org/data.rdf")

# Parse from string
g.parse(data=rdf_string, format="turtle")

# Supported formats
formats = ["xml", "turtle", "n3", "nt", "nquads", "trig", "json-ld"]

Serializing RDF#

# Serialize to string
turtle = g.serialize(format="turtle")
jsonld = g.serialize(format="json-ld")

# Serialize to file
g.serialize("output.ttl", format="turtle")

# Available formats
# RDF/XML, N3, NTriples, N-Quads, Turtle, TriG, TriX, JSON-LD, HexTuples

Persistence#

In-Memory (Default)#

g = Graph()  # Default in-memory store

Berkeley DB#

from rdflib import Graph
from rdflib.plugins.stores import BerkeleyDB

store = BerkeleyDB()
g = Graph(store, identifier="mygraph")
g.open("/path/to/store", create=True)

# Use graph...

g.close()

SQLite (via rdflib-sqlalchemy)#

# pip install rdflib-sqlalchemy
from rdflib import Graph, ConjunctiveGraph
from rdflib_sqlalchemy import registerplugins

registerplugins()

g = Graph(store="SQLAlchemy", identifier="mygraph")
g.open("sqlite:///graph.db", create=True)

Remote SPARQL Endpoints#

SPARQLWrapper#

# pip install sparqlwrapper
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    SELECT ?label
    WHERE {
        <http://dbpedia.org/resource/Python_(programming_language)>
            rdfs:label ?label .
        FILTER (lang(?label) = 'en')
    }
""")
sparql.setReturnFormat(JSON)

results = sparql.query().convert()

Federated Queries#

query = """
    SELECT ?name ?abstract
    WHERE {
        ?person foaf:name ?name .
        SERVICE <http://dbpedia.org/sparql> {
            ?dbperson rdfs:label ?name ;
                      dbo:abstract ?abstract .
            FILTER (lang(?abstract) = 'en')
        }
    }
"""

Named Graphs and Datasets#

Conjunctive Graph#

from rdflib import ConjunctiveGraph, URIRef

# Dataset with multiple named graphs
ds = ConjunctiveGraph()

# Add to specific graph
graph1 = URIRef("http://example.org/graph1")
ds.add((alice, FOAF.name, Literal("Alice"), graph1))

# Query across graphs
for ctx in ds.contexts():
    print(f"Graph: {ctx.identifier}")

Dataset#

from rdflib import Dataset

ds = Dataset()
g1 = ds.graph(URIRef("http://example.org/graph1"))
g1.add((alice, FOAF.name, Literal("Alice")))

Custom SPARQL Functions#

from rdflib.plugins.sparql.operators import register_custom_function
from rdflib import Literal, URIRef

def custom_uppercase(value):
    return Literal(str(value).upper())

# Register function
register_custom_function(
    URIRef("http://example.org/uppercase"),
    custom_uppercase
)

# Use in query
query = """
    SELECT (ex:uppercase(?name) AS ?upper)
    WHERE { ?person foaf:name ?name }
"""

Async Support#

rdflib is synchronous. For async operations:

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor()

async def async_query(graph, query):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(
        executor,
        lambda: list(graph.query(query))
    )

Inference and Reasoning#

RDFS Inference#

from rdflib import Graph
from rdflib.plugins.stores.memory import IOMemory

# Enable RDFS reasoning
from rdflib import RDF, RDFS

g = Graph()
g.parse("ontology.ttl")

# Manual inference example
for s, _, o in g.triples((None, RDFS.subClassOf, None)):
    # Infer types based on subclass
    pass

OWL-RL (via owlrl)#

# pip install owlrl
import owlrl

g = Graph()
g.parse("data.ttl")

# Apply OWL-RL reasoning
owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(g)

Limitations#

No native async/await support
Memory-intensive for large graphs
SPARQL performance varies by store
Limited OWL reasoning (requires extensions)
Not a graph database (in-memory or file-based)

When to Use#

Choose rdflib when:

Working with RDF/semantic web data
SPARQL queries required
Linked data integration needed
Ontology processing
Standards compliance important (W3C RDF)

Consider alternatives when:

Property graph model preferred (use Neo4j)
High-performance database needed
Native async required
Large-scale graph analytics

Resources#

Graph Database Python Client Recommendations#

Executive Summary#

This document provides optimized recommendations for selecting Python client libraries for graph databases based on common use cases and technical requirements.

Primary Recommendations by Use Case#

1. General-Purpose Graph Applications#

Recommended: neo4j-driver + neomodel

Component	Library	Rationale
Low-level access	neo4j-driver	Best-in-class async, connection pooling, Rust extensions
OGM layer	neomodel	Django-style models, validation, hooks

Why this combination:

Neo4j has the most mature Python ecosystem
neo4j-driver provides native asyncio for modern applications
neomodel adds productivity without sacrificing performance
Excellent documentation and community support

2. Multi-Database Portability#

Recommended: gremlinpython (with aiogremlin for async)

Compatible databases: Amazon Neptune, Azure Cosmos DB, JanusGraph, and more

Why Gremlin:

Standardized query language across vendors
Reduces vendor lock-in risk
Single codebase for multiple deployment targets

Caveats:

Native gremlinpython is synchronous; use aiogremlin or goblin for async
Database-specific features may not be accessible
Connection pooling has known recovery issues

3. Multi-Model Requirements (Document + Graph + Key-Value)#

Recommended: python-arango

Why ArangoDB:

Single database for multiple data models
AQL is powerful and SQL-like
Good Python driver quality

Async strategy: Use python-arango-async for true asyncio support

4. Semantic Web / RDF / Linked Data#

Recommended: rdflib + SPARQLWrapper

Why rdflib:

De facto standard for Python RDF processing
Full SPARQL 1.1 support
Extensive serialization format support

Limitations:

No native async (wrap with thread pools)
Not suitable for large-scale production graphs (use graph databases with SPARQL endpoints)

5. High-Scale Graph Analytics and ML#

Recommended: pyTigerGraph[gds]

Why TigerGraph:

Built-in graph data science algorithms
Direct integration with PyTorch Geometric and DGL
Distributed processing for large graphs

Caveats:

GSQL learning curve
Performance overhead in Python client
Smaller community than Neo4j

6. Distributed/Horizontally Scalable Graphs#

Recommended: pydgraph

Why Dgraph:

Native horizontal scaling
GraphQL-native design
gRPC for efficient communication

Caveats:

Async uses gRPC futures, not native asyncio
Smaller ecosystem than alternatives
DQL query language unique to Dgraph

Decision Matrix#

Requirement	Best Choice	Runner-up
Best overall Python experience	neo4j-driver	python-arango
OGM/Django-style models	neomodel	Goblin (Gremlin)
Native async/FastAPI	neo4j-driver	python-arango-async
Database portability	gremlinpython	-
Multi-model (doc+graph)	python-arango	-
Graph ML/Analytics	pyTigerGraph[gds]	-
Semantic web/RDF	rdflib	-
Horizontal scaling	pydgraph	TigerGraph
Cloud-native (AWS)	gremlinpython (Neptune)	-
Cloud-native (Azure)	gremlinpython (Cosmos)	-

Libraries to Avoid#

py2neo (Deprecated)#

End of Life - no further updates
Migrate to neo4j-driver + neomodel

Framework Integration Recommendations#

FastAPI Applications#

# Recommended stack
neo4j-driver (AsyncDriver)
# OR
python-arango-async

Django Applications#

# Recommended stack
neomodel with django_neomodel

Data Science / Jupyter#

# Recommended stack
pyTigerGraph[gds]  # For graph ML
# OR
rdflib  # For RDF/semantic data

Performance Optimization Tips#

Neo4j Stack#

Install neo4j-rust-ext for 20-40% performance improvement
Use execute_query() for simple operations (avoids session overhead)
Configure connection pool based on concurrency needs
Use UNWIND for bulk operations

ArangoDB Stack#

Use batch methods (insert_many, update_many) for bulk operations
Consider async driver for I/O-bound workloads
Use ArangoSearch for full-text queries instead of AQL filters

Gremlin Stack#

Prefer GraphBinary serialization over GraphSON
Use prepared traversals for repeated queries
Consider Goblin OGM for complex object mapping

Migration Considerations#

From py2neo to neo4j-driver#

Replace Graph.run() with session.run() or driver.execute_query()
Update transaction patterns to managed transactions
Migrate OGM code to neomodel

From SQL to Graph#

Start with neomodel for familiar ORM patterns
Use Cypher for complex traversals
Consider ArangoDB if joining existing document data

Conclusion#

For most Python graph database applications, the Neo4j ecosystem (neo4j-driver + neomodel) offers the best balance of:

API quality and Pythonic design
Native async support
Documentation and community
Performance (with Rust extensions)
OGM productivity (neomodel)

For specialized requirements (multi-database portability, RDF/semantic web, graph ML, or horizontal scaling), select the specialized library that best matches the use case as outlined above.

S3: Need-Driven

S3 Need-Driven Discovery: Graph Database Client Libraries#

Methodology Overview#

This analysis evaluates Python graph database client libraries through a need-driven lens, matching library capabilities to real-world use case requirements rather than comparing features in isolation.

Analysis Framework#

1. Use Case Decomposition#

Each use case is analyzed across five dimensions:

Dimension	Questions Addressed
Graph Model	Property graph vs RDF vs hypergraph? Schema flexibility needs?
Query Patterns	Traversal depth? Path finding? Aggregations? Pattern matching?
Scale Profile	Node/edge counts? Query concurrency? Growth trajectory?
Processing Mode	Real-time OLTP? Batch analytics? Hybrid?
Integration Context	REST APIs? Event streams? ETL pipelines? Existing stack?

2. Library Capability Mapping#

For each use case, libraries are evaluated on:

Native support: Does the library directly support required patterns?
Performance characteristics: Latency, throughput, memory efficiency
Developer experience: API ergonomics, documentation, debugging
Operational maturity: Stability, community support, enterprise readiness

3. Gap Analysis#

Identifying where library capabilities fall short:

Missing features requiring workarounds
Performance limitations at scale
Integration friction points
Operational blind spots

Use Cases Analyzed#

Use Case	Primary Pattern	Scale Profile	Processing Mode
Social Network	Traversal-heavy	High volume, real-time	OLTP
Knowledge Graph	Semantic queries	Medium volume, complex	Hybrid
Fraud Detection	Pattern matching	High throughput	Real-time + batch
Recommendation Engine	Collaborative filtering	Very high volume	Batch + real-time
Network Infrastructure	Topology analysis	Medium volume	OLTP + analytics
Supply Chain	Path optimization	Medium-high volume	Hybrid

Evaluation Criteria#

Functional Fit (40%)#

Query language expressiveness for use case patterns
Data model alignment with domain requirements
Built-in algorithms vs custom implementation needs

Performance Fit (30%)#

Query latency for typical operations
Throughput under concurrent load
Memory efficiency for graph size

Operational Fit (20%)#

Connection pooling and failover
Monitoring and observability hooks
Transaction management capabilities

Integration Fit (10%)#

Async/await support
Framework compatibility (FastAPI, Django, etc.)
Data pipeline integration (Pandas, Apache Spark)

Libraries Under Evaluation#

Library	Database	Graph Model	Maturity
`neo4j` (official)	Neo4j	Property Graph	Production
`py2neo`	Neo4j	Property Graph	Production
`python-arango`	ArangoDB	Multi-model	Production
`pyTigerGraph`	TigerGraph	Property Graph	Production
`gremlinpython`	Various	Property Graph	Production
`rdflib`	Various	RDF/Triple Store	Production
`NetworkX`	In-memory	General	Production

Deliverables#

Per-use-case analysis: Detailed evaluation of library fit
Recommendation matrix: Best-fit library by use case and constraint
Gap documentation: Known limitations and workarounds

Recommendation Summary: Graph Database Client Libraries by Use Case#

Quick Reference Matrix#

Use Case	Best Fit	Alternative	Scale Trigger
Social Network	neo4j	pyTigerGraph	> 100M users
Knowledge Graph	neo4j + neosemantics	rdflib (small)	> 1M triples
Fraud Detection	pyTigerGraph	neo4j	> 1B transactions
Recommendation Engine	neo4j	pyTigerGraph	> 100M users
Network Infrastructure	neo4j	python-arango	> 1M resources
Supply Chain	neo4j	pyTigerGraph	Global enterprise

Library Recommendations by Priority#

1. neo4j (Official Driver) - Primary Choice#

Best for: Most graph use cases at moderate scale

Strengths across use cases:

Cypher query language is most expressive for graph patterns
GDS (Graph Data Science) library covers common algorithms
Mature Python driver with async support
Strong community and documentation
Visualization tools (Bloom) for non-technical users

When to choose neo4j:

Team has or can develop Cypher expertise
Use case fits property graph model
Scale < 1B edges
Need graph algorithms (centrality, community, paths)
Visualization is important

Installation:

uv pip install neo4j

2. pyTigerGraph - Scale-First Choice#

Best for: High-volume fraud detection, massive recommendation systems

Strengths across use cases:

Distributed architecture handles massive scale
GSQL optimized for deep traversals
Strong financial services and enterprise focus
ML workbench for graph embeddings

When to choose pyTigerGraph:

Scale exceeds 1B edges
Deep traversals (5+ hops) are common
Distributed processing required
Enterprise budget available
Financial/fraud detection primary use case

Installation:

uv pip install pyTigerGraph

3. python-arango - Multi-Model Choice#

Best for: Knowledge graphs with complex documents, cost-sensitive deployments

Strengths across use cases:

Combines document + graph in single database
Good horizontal scaling
Cost-effective (open source core)
Flexible schema for evolving models

When to choose python-arango:

Need document storage alongside graph
Budget constraints on database licensing
Schema flexibility is priority
Multi-model queries beneficial

Installation:

uv pip install python-arango

4. rdflib - Standards-First Choice#

Best for: Small-to-medium knowledge graphs requiring RDF/SPARQL compliance

Strengths:

Full RDF/SPARQL specification compliance
Inference engine support
Standards-based data exchange
Good for linked data applications

When to choose rdflib:

RDF/SPARQL compliance required
Ontology reasoning needed
Scale < 1M triples
Academic or research contexts

Installation:

uv pip install rdflib

5. gremlinpython - Portability Choice#

Best for: Multi-database environments, cloud-native deployments

Strengths:

Works with many backends (Neptune, JanusGraph, etc.)
Cloud-managed options available
Standard traversal language

When to choose gremlinpython:

Using AWS Neptune or similar managed service
Need database portability
Multi-cloud strategy

Installation:

uv pip install gremlinpython

6. NetworkX - Analysis Choice#

Best for: Prototyping, offline analysis, algorithm development

Strengths:

Rich algorithm library
Easy Python integration
Great for research and prototyping
Integrates with scientific Python stack

When to choose NetworkX:

Prototyping graph logic before production
Offline batch analysis
Algorithm research and development
In-memory data fits requirements

Installation:

uv pip install networkx

Decision Framework#

START
  |
  v
Is scale > 1B edges?
  |-- YES --> pyTigerGraph
  |-- NO --> Continue
  |
  v
Is RDF/SPARQL compliance required?
  |-- YES --> Scale < 1M? --> rdflib
  |           Scale > 1M? --> neo4j + neosemantics
  |-- NO --> Continue
  |
  v
Is document + graph multi-model needed?
  |-- YES --> python-arango
  |-- NO --> Continue
  |
  v
Is database portability required?
  |-- YES --> gremlinpython
  |-- NO --> Continue
  |
  v
Production use case?
  |-- YES --> neo4j (official driver)
  |-- NO --> NetworkX for prototyping

Common Hybrid Patterns#

Pattern 1: Neo4j + NetworkX#

Neo4j for production serving
NetworkX for algorithm prototyping
Export graph subset for analysis

Pattern 2: Graph DB + Vector Search#

Graph database for relationship queries
Vector database (Pinecone, Milvus) for embeddings
Combine for hybrid recommendations

Pattern 3: Graph DB + Optimization Solver#

Graph database for topology storage
OR-Tools/Gurobi for constrained optimization
Write optimal solutions back to graph

Gaps Across All Libraries#

Gap	Workaround
Real-time graph algorithms	Pre-compute, cache results
Temporal queries	Temporal properties, time-bucketed subgraphs
Streaming ingestion	External stream processor (Kafka Connect)
Multi-tenant isolation	Database-per-tenant or property-based filtering
Schema migration	Version properties, migration scripts

Final Recommendation#

For teams starting with graph databases in Python:

Start with neo4j official driver - best documentation, most examples
Add NetworkX for prototyping and analysis workflows
Evaluate scale after initial deployment
Consider pyTigerGraph if scaling beyond 1B edges
Consider python-arango if multi-model becomes valuable

Use Case: Fraud Detection#

Domain Description#

Fraud detection leverages graph analysis to identify suspicious patterns in transactions, accounts, and entity relationships. Graphs excel at revealing hidden connections between seemingly unrelated entities, detecting ring structures, and identifying anomalous behavior patterns that traditional tabular analysis misses.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	Property Graph	Need rich properties on both nodes and edges
Schema	Flexible	Fraud patterns evolve; schema must adapt quickly
Temporality	Time-aware	Transaction timestamps critical for pattern detection

Key Entity Types:

Accounts (bank, merchant, user)
Transactions (payments, transfers, purchases)
Devices (phones, IP addresses, browsers)
Identities (SSN, email, phone numbers)
Locations (addresses, GPS coordinates)

Query Pattern Complexity#

Primary Patterns:

Ring detection: Circular money flows (A -> B -> C -> A)
Shared identity: Multiple accounts sharing device/IP/email
Velocity analysis: Transaction frequency and amount patterns
Network expansion: Exploring N-hop neighborhood of suspicious entity
Similarity matching: Finding accounts with similar behavior patterns

Query Characteristics:

Depth: 3-6 hops for pattern detection
Time windows: Queries scoped to time ranges (last 24h, 7d, 30d)
Aggregation: Sum, count, standard deviation of transactions
Pattern matching: Complex subgraph patterns with constraints

Scale Requirements#

Metric	Typical Range	High Scale
Accounts (nodes)	10M - 100M	1B+
Transactions (edges)	100M - 10B	100B+
Real-time queries	100 - 1K QPS	10K+ QPS
Pattern scans	1M - 100M/hour	1B+/hour

Processing Mode#

Real-time: Transaction scoring at payment time (< 100ms)
Near-real-time: Alert generation (< 5 min lag)
Batch: Pattern discovery, model training (hourly/daily)

Integration Requirements#

Transaction streaming (Kafka, Kinesis) for real-time ingestion
ML pipeline for fraud scoring models
Case management systems for investigation workflows
Regulatory reporting and audit trails
Alert delivery (email, SMS, dashboards)

Library Evaluation#

neo4j (Official Driver)#

Strengths:

Excellent pattern matching with Cypher
GDS library has community detection, PageRank for risk scoring
Good transaction support for consistent writes
Bloom visualization for investigators

Limitations:

Real-time scoring at 10K+ TPS challenging
Temporal queries require careful indexing
Graph algorithms not available in Community edition

Fit Score: 8/10

pyTigerGraph#

Strengths:

Built for high-throughput transaction processing
GSQL optimized for deep link analysis
Native support for temporal patterns
Designed for financial services scale

Limitations:

Enterprise licensing costs
Steeper learning curve for GSQL
Smaller Python community

Fit Score: 9/10 (high scale); 7/10 (smaller deployments)

python-arango#

Strengths:

Good throughput for transaction ingestion
Multi-model allows storing raw transaction documents
Flexible schema for evolving fraud patterns
Cost-effective scaling

Limitations:

Graph algorithms less mature than Neo4j GDS
Pattern matching syntax less expressive
Smaller fraud detection community

Fit Score: 7/10

gremlinpython (with Neptune)#

Strengths:

Managed service reduces operational burden
Good for AWS-native architectures
Scales horizontally

Limitations:

Query latency can be variable
Limited graph algorithm support
Gremlin verbose for complex patterns

Fit Score: 6/10

NetworkX (with external storage)#

Strengths:

Rich algorithm library for analysis
Good for offline pattern discovery
Easy prototyping of detection logic

Limitations:

In-memory only (not for production scale)
No persistence or transactions
Cannot handle real-time requirements

Fit Score: 4/10 (analysis only)

Gaps and Workarounds#

Gap	Impact	Workaround
Real-time graph algorithms	Cannot run PageRank per transaction	Pre-compute risk scores, incremental updates
Temporal pattern matching	Limited native time-series support	Time-bucketed subgraphs, temporal indices
Streaming ingestion	Not all drivers handle high-volume streams	Kafka Connect, custom streaming layer
Explainability	Graph patterns hard to explain to regulators	Path export, visualization, rule extraction
Model integration	Limited native ML support	Feature extraction to external ML pipeline

Architecture Pattern#

[Transaction Stream]
        |
        v
[Stream Processor] -- real-time features --> [ML Scoring Service]
        |
        v
[Graph Database] <-- enrichment queries
        |
        v
[Batch Analytics] -- pattern discovery --> [Rule Engine Update]

Hybrid Approach:

Real-time: Feature extraction + ML scoring (sub-100ms)
Near-real-time: Graph enrichment queries (100ms-1s)
Batch: Deep pattern analysis, model retraining

Recommendation#

Best Fit: pyTigerGraph for enterprise fraud detection

At the scale typical for financial fraud detection (billions of transactions), TigerGraph’s distributed architecture and GSQL’s pattern matching capabilities make it the strongest choice. The financial services focus means battle-tested at relevant scale.

Alternative: neo4j official driver for smaller deployments or teams with existing Cypher expertise. The GDS library provides excellent algorithm support for pattern discovery.

Hybrid pattern: Use Neo4j/TigerGraph for graph storage and queries, with NetworkX for offline algorithm development and prototyping.

Use Case: Knowledge Graph#

Domain Description#

Knowledge graphs represent entities and their semantic relationships, enabling structured knowledge representation, reasoning, and discovery. Common applications include enterprise knowledge management, semantic search, question answering systems, and data integration across disparate sources.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	RDF/Triple Store OR Property Graph	RDF for standards compliance; Property Graph for flexibility
Schema	Ontology-driven	Need formal type hierarchies and relationship constraints
Semantics	Rich typing	Entities have types, properties have ranges, relationships have semantics

Model Choice Considerations:

RDF/SPARQL: Best for standards compliance, linked data, inference
Property Graph: Better performance, easier development, less formal semantics

Query Pattern Complexity#

Primary Patterns:

Semantic traversal: Following typed relationships with constraints
Inference queries: Deriving implicit relationships from explicit ones
Faceted search: Filtering entities by multiple attribute combinations
Path queries: Finding connection paths with semantic constraints

Query Characteristics:

Depth: Variable (1-10+ hops depending on question complexity)
Filtering: Heavy use of type and property constraints
Aggregation: Counting, grouping by entity types
Reasoning: RDFS/OWL inference for RDF; manual for property graphs

Scale Requirements#

Metric	Typical Range	High Scale
Entities (nodes)	100K - 10M	100M+
Facts (edges)	1M - 100M	1B+
Concurrent queries	10 - 100 QPS	1K+ QPS
Ontology complexity	100 - 1K classes	10K+ classes

Processing Mode#

Primary: Interactive queries for search and exploration
Secondary: Batch ingestion from source systems
Latency target: < 500ms for exploratory queries; < 100ms for autocomplete

Integration Requirements#

NLP pipelines for entity extraction and linking
Data integration from multiple source systems (databases, APIs, documents)
Search engines (Elasticsearch) for full-text capabilities
Visualization tools for graph exploration
LLM integration for natural language querying

Library Evaluation#

rdflib#

Strengths:

Native RDF/SPARQL support
Standards compliant (W3C specifications)
Good for small-to-medium knowledge graphs
Inference engine support (OWL-RL)

Limitations:

In-memory by default (persistence requires plugins)
Performance degrades above 1M triples
Limited concurrent query support
No built-in clustering

Fit Score: 7/10 (small-medium); 4/10 (large scale)

neo4j (Official Driver)#

Strengths:

Excellent query performance at scale
Flexible property graph for evolving ontologies
Full-text search integration
Strong Python ecosystem

Limitations:

No native RDF/SPARQL (requires neosemantics plugin)
No built-in inference engine
Ontology constraints require manual enforcement

Fit Score: 8/10

python-arango#

Strengths:

Multi-model allows combining document + graph
Good for knowledge graphs with rich entity attributes
Full-text search built-in
Scales well horizontally

Limitations:

No RDF/SPARQL support
Limited semantic reasoning capabilities
Smaller knowledge graph community

Fit Score: 7/10

gremlinpython (with Neptune/JanusGraph)#

Strengths:

Cloud-native options (AWS Neptune)
Supports both property graph and RDF modes
Good for large-scale deployments

Limitations:

Verbose query syntax for complex patterns
Variable performance across backends
Less intuitive for semantic queries

Fit Score: 6/10

pyTigerGraph#

Strengths:

Excellent scale for massive knowledge graphs
GSQL supports complex pattern matching
Built-in ML workbench for embeddings

Limitations:

Enterprise-focused (cost considerations)
Steeper learning curve
Limited RDF ecosystem integration

Fit Score: 7/10 (large scale)

Gaps and Workarounds#

Gap	Impact	Workaround
Inference across libraries	Most lack native reasoning	External reasoner (HermiT, Pellet) or pre-materialization
Schema evolution	Ontology changes disruptive	Versioned schemas, migration scripts
Multilingual support	Limited language handling	External NLP, language-tagged properties
Provenance tracking	Need to track fact sources	Custom edge properties for provenance
Temporal knowledge	Facts change over time	Temporal properties, versioned subgraphs

Hybrid Architecture Pattern#

For production knowledge graphs, consider a hybrid approach:

[RDFLib for ontology management]
        |
        v
[Neo4j/ArangoDB for query execution]
        |
        v
[Elasticsearch for full-text search]

This combines:

RDFLib’s semantic capabilities for schema management
Property graph’s query performance for runtime
Search engine’s text capabilities for discovery

Recommendation#

Best Fit: neo4j official driver with neosemantics plugin

For knowledge graph applications requiring both semantic expressiveness and query performance, Neo4j with the neosemantics (n10s) plugin provides the best balance. It supports RDF import/export while leveraging Cypher’s performance for queries.

Alternative: rdflib for smaller knowledge graphs (< 1M triples) where standards compliance and inference are primary requirements.

Alternative: python-arango when knowledge entities have complex nested attributes and document-style storage is beneficial.

Use Case: Network Infrastructure#

Domain Description#

Network infrastructure graphs model the topology and dependencies of IT systems, including physical networks, cloud resources, microservices, and their interconnections. Use cases include impact analysis, root cause detection, capacity planning, and configuration management.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	Property Graph	Rich metadata on nodes (config, status); typed edges
Schema	Semi-structured	Core types stable; vendor-specific attributes vary
Hierarchy	Multi-level	Physical -> logical -> application layers

Key Entity Types:

Physical: Servers, switches, routers, data centers
Virtual: VMs, containers, Kubernetes pods
Application: Services, databases, APIs, queues
Configuration: Ports, IPs, certificates, credentials
Connections: Network links, API calls, data flows

Query Pattern Complexity#

Primary Patterns:

Impact analysis: “What is affected if server X fails?”
Root cause: “What upstream dependencies could cause service Y to fail?”
Path analysis: Network paths between two endpoints
Configuration drift: Finding misconfigurations across related resources
Dependency depth: How deep is the dependency tree for a service?

Query Characteristics:

Depth: Variable (1-hop for direct deps, 5+ for full impact)
Direction: Both upstream and downstream traversals
Filtering: By resource type, status, environment
Aggregation: Counting affected resources, grouping by type

Scale Requirements#

Metric	Typical Range	High Scale
Resources (nodes)	10K - 100K	1M+
Dependencies (edges)	50K - 1M	10M+
Query frequency	10 - 100 QPS	1K+ QPS
Update frequency	100 - 10K/min	100K/min

Processing Mode#

Real-time: Incident impact analysis (< 1s)
Near-real-time: Topology updates from discovery (< 1min lag)
Batch: Full topology reconciliation, analytics

Integration Requirements#

CMDB/asset management systems
Monitoring tools (Prometheus, Datadog, Nagios)
Cloud provider APIs (AWS, GCP, Azure)
Container orchestration (Kubernetes API)
Incident management (PagerDuty, ServiceNow)
IaC tools (Terraform, Ansible) for configuration

Library Evaluation#

neo4j (Official Driver)#

Strengths:

Excellent for dependency traversal queries
Cypher’s variable-length paths perfect for impact analysis
Good visualization integration for operations teams
APOC procedures for graph algorithms

Limitations:

Schema flexibility can lead to inconsistency
Need careful index strategy for large topologies
Single-node architecture limits write scale

Fit Score: 9/10

python-arango#

Strengths:

Multi-model stores rich configuration documents
Good for combining graph with document queries
Horizontal scaling for large infrastructures
Cost-effective for moderate scale

Limitations:

AQL traversal syntax less intuitive than Cypher
Fewer infrastructure-specific examples
Smaller operations/SRE community

Fit Score: 7/10

pyTigerGraph#

Strengths:

Scales well for very large infrastructures
Good for cross-region federated topologies
GSQL handles complex impact queries

Limitations:

Overkill for most infrastructure use cases
Enterprise licensing costs
Steeper learning curve

Fit Score: 6/10 (typical); 8/10 (very large scale)

gremlinpython (with Neptune/JanusGraph)#

Strengths:

Cloud-native options integrate with cloud monitoring
Standard traversal language
Good for multi-cloud environments

Limitations:

Verbose for operational queries
Debugging traversals challenging during incidents
Variable performance

Fit Score: 6/10

NetworkX#

Strengths:

Excellent for topology analysis algorithms
Easy integration with Python operations tools
Good for offline planning and analysis

Limitations:

In-memory only
Cannot handle real-time incident queries
No persistence

Fit Score: 5/10 (analysis only)

Gaps and Workarounds#

Gap	Impact	Workaround
Real-time topology updates	Discovery lag during changes	Event-driven updates, eventual consistency
Multi-layer correlation	Physical-logical-app mapping complex	Typed edges, layer property on nodes
Historical topology	Need point-in-time topology	Temporal properties, snapshot graphs
Dynamic environments	Kubernetes pods ephemeral	Aggregate by service, not pod
Cross-system correlation	Multiple source systems	Canonical ID mapping layer

Architecture Pattern#

[Discovery Sources]
   |-- Cloud APIs
   |-- K8s API
   |-- CMDB
   |-- Network monitoring
        |
        v
[Topology Aggregator] -- canonical model --> [Graph Database]
        |                                           |
        v                                           v
[Change Event Stream]                        [Query API]
        |                                           |
        v                                           v
[Alert Enrichment]                           [Dashboard/CLI]

Operational Queries:

// Impact analysis: What services are affected if this server fails?
MATCH (server:Server {id: $serverId})<-[:RUNS_ON*1..3]-(service:Service)
RETURN service.name, service.criticality

// Root cause: What could cause this API to fail?
MATCH path = (api:API {name: $apiName})-[:DEPENDS_ON*1..5]->(dep)
WHERE dep.status = 'unhealthy'
RETURN path

// Dependency depth
MATCH path = (service:Service {name: $svc})-[:DEPENDS_ON*]->(dep)
RETURN max(length(path)) as maxDepth

Operational Considerations#

Consideration	Approach
Incident response	Pre-computed impact sets for critical services
Discovery frequency	Balance freshness vs database load
Schema evolution	Version type hierarchies, migration scripts
Access control	Environment-based graph partitioning
Audit trail	Change log for topology modifications

Recommendation#

Best Fit: neo4j official driver

For network infrastructure and dependency mapping, Neo4j’s Cypher language provides the most natural expression of dependency queries. The ability to write variable-length path queries (-[:DEPENDS_ON*1..5]->) makes impact analysis and root cause queries intuitive.

Key advantages for operations:

Fast time-to-value with intuitive query language
Bloom visualization for non-technical stakeholders
Active operations/SRE community with examples

Alternative: python-arango when infrastructure includes complex configuration documents that benefit from document storage alongside graph relationships.

Complement with NetworkX for offline topology analysis, capacity planning, and what-if simulations that don’t need real-time data.

Use Case: Recommendation Engine#

Domain Description#

Recommendation engines leverage graph structures to model user-item relationships, enabling collaborative filtering, content-based recommendations, and hybrid approaches. Graphs naturally represent the bipartite relationship between users and items, as well as item-item and user-user similarities.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	Property Graph	Edge weights for ratings/interactions; rich node properties
Structure	Bipartite core	User-Item relationships; User-User and Item-Item derived
Weights	Numeric edge properties	Ratings, interaction counts, recency scores

Key Entity Types:

Users (profiles, preferences, segments)
Items (products, content, services)
Interactions (views, purchases, ratings, saves)
Categories/Tags (content metadata)

Query Pattern Complexity#

Primary Patterns:

Collaborative filtering: “Users who liked X also liked Y”
User neighborhood: Similar users based on shared interactions
Item neighborhood: Similar items based on shared user base
Path-based recommendations: Multi-hop reasoning (A likes B, B similar to C)
Popularity queries: Top items by interaction count

Query Characteristics:

Depth: 2-3 hops typical (user -> item -> similar items)
Aggregation: Heavy (counting, averaging, ranking)
Filtering: By recency, category, availability
Personalization: User-specific traversal starting points

Scale Requirements#

Metric	Typical Range	High Scale
Users (nodes)	100K - 10M	100M+
Items (nodes)	10K - 1M	10M+
Interactions (edges)	10M - 1B	100B+
Recommendation requests	100 - 10K QPS	100K+ QPS
Latency target	< 100ms	< 50ms

Processing Mode#

Real-time: Serving recommendations (< 100ms)
Batch: Computing similarity matrices, embeddings (hourly/daily)
Incremental: Updating recommendations as new interactions arrive

Integration Requirements#

API layer for client applications (REST, GraphQL)
Event streaming for real-time interaction capture
Feature store for ML model features
A/B testing infrastructure for recommendation experiments
Analytics for recommendation performance tracking

Library Evaluation#

neo4j (Official Driver)#

Strengths:

Cypher excellent for collaborative filtering queries
GDS library has similarity algorithms (cosine, Jaccard)
Node embedding algorithms for hybrid approaches
Good caching for repeated query patterns

Limitations:

Real-time computation at scale challenging
Need GDS for similarity algorithms (Enterprise)
No native matrix operations

Fit Score: 8/10

python-arango#

Strengths:

Good performance for bipartite graph queries
Multi-model allows storing item metadata as documents
Cost-effective scaling for high interaction volumes

Limitations:

Limited built-in similarity algorithms
Less mature recommendation-specific ecosystem
Need custom similarity implementations

Fit Score: 6/10

pyTigerGraph#

Strengths:

Excellent scale for high-volume interactions
GSQL supports complex aggregation patterns
Built-in ML workbench for embeddings
Graph feature extraction for ML models

Limitations:

Enterprise cost considerations
Overkill for smaller catalogs
Steeper learning curve

Fit Score: 8/10 (large scale); 6/10 (smaller deployments)

gremlinpython#

Strengths:

Works with multiple backends
Standard traversal patterns
Cloud options available

Limitations:

Verbose for aggregation-heavy queries
No built-in similarity algorithms
Performance varies by backend

Fit Score: 5/10

NetworkX#

Strengths:

Rich algorithm library (bipartite algorithms)
Easy prototyping of recommendation logic
Good for offline analysis and testing

Limitations:

In-memory only
Cannot serve real-time recommendations
No persistence

Fit Score: 3/10 (prototyping only)

Gaps and Workarounds#

Gap	Impact	Workaround
Real-time similarity	Cannot compute on-the-fly at scale	Pre-computed similarity cache
Cold start	New users/items have no connections	Content-based fallback, popularity-based
Implicit feedback	View `!=` purchase signal strength	Weight tuning, decay functions
Diversity	Graph algorithms tend toward popular items	Re-ranking layer, exploration bonus
Explanation	Hard to explain graph-based recommendations	Path extraction, rule-based overlays

Architecture Pattern#

[Interaction Events]
        |
        v
[Stream Processor] --> [Real-time Features]
        |
        v
[Graph Database] <-- batch similarity updates
        |
        v
[Recommendation Service]
        |
        v
[Cache Layer] --> [API Response]

Hybrid Recommendation Pattern:

Graph-based collaborative filtering for relationship signals
Embedding-based similarity for scale and cold start
Business rules layer for diversity, freshness, inventory
Caching layer for latency requirements

Pre-computation Strategy#

For production recommendation systems, pre-compute:

Computation	Frequency	Storage
Item-item similarity top-K	Daily	Graph edges or Redis
User-item affinity scores	Hourly	Feature store
User segments	Daily	User properties
Popular items per category	Hourly	Cache layer

Recommendation#

Best Fit: neo4j official driver for most recommendation use cases

Neo4j’s combination of expressive queries (Cypher) and graph algorithms (GDS) makes it well-suited for recommendation systems. The ability to compute Jaccard similarity, node embeddings, and community detection in the database enables sophisticated recommendations.

Alternative: pyTigerGraph for very high-scale systems (100M+ users, 1B+ interactions) where distributed processing is essential.

Hybrid pattern: Use the graph database for relationship storage and collaborative filtering queries, combined with vector similarity search (Pinecone, Milvus) for embedding-based recommendations and caching (Redis) for serving latency.

Domain Description#

Social networks model relationships between users including follows, friendships, group memberships, content sharing, and interactions. The graph structure captures the social fabric that enables features like friend suggestions, feed ranking, and influence analysis.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	Property Graph	Nodes need rich attributes (profiles); edges need properties (timestamp, strength)
Schema	Semi-flexible	Core user/relationship types stable; new interaction types added frequently
Directionality	Mixed	Follows are directional; friendships are bidirectional

Query Pattern Complexity#

Primary Patterns:

Friend-of-friend traversal: 2-3 hop neighborhood exploration
Mutual connections: Finding common neighbors between two users
Shortest path: Degrees of separation between users
Influence propagation: Multi-hop traversal with aggregation

Query Characteristics:

Depth: Typically 2-4 hops (beyond 4 hops performance degrades rapidly)
Breadth: Can explode (users with 10K+ connections)
Aggregation: Count, distinct, top-N patterns common

Scale Requirements#

Metric	Typical Range	High Scale
Users (nodes)	100K - 10M	100M+
Relationships (edges)	1M - 500M	10B+
Concurrent queries	100 - 1K QPS	10K+ QPS
Write throughput	100 - 10K/sec	100K+/sec

Processing Mode#

Primary: Real-time OLTP for user-facing features
Secondary: Batch analytics for recommendations and insights
Latency target: < 50ms for interactive queries

Integration Requirements#

REST/GraphQL API layer for mobile/web clients
Event streaming for activity feeds (Kafka, Redis Streams)
ML pipeline integration for recommendation models
Analytics warehouse sync for business intelligence

Library Evaluation#

neo4j (Official Driver)#

Strengths:

Excellent Cypher support for complex traversals
Native path-finding algorithms (shortest path, all paths)
Strong transaction support for consistent updates
Async driver available for high concurrency

Limitations:

Single-database focus limits multi-tenancy options
Graph algorithms require separate APOC/GDS plugins
Connection pooling configuration can be complex

Fit Score: 8/10

py2neo#

Strengths:

Pythonic OGM (Object-Graph Mapping) layer
Easier onboarding for developers new to graphs
Good integration with pandas for analytics

Limitations:

Performance overhead from OGM abstraction
Less control over query optimization
Maintenance concerns (community-driven)

Fit Score: 6/10

python-arango#

Strengths:

Multi-model allows document storage alongside graph
AQL provides flexible query patterns
Built-in support for graph traversal with configurable depth

Limitations:

Less mature graph algorithm ecosystem
Smaller community for social network patterns
Traversal syntax less intuitive than Cypher

Fit Score: 7/10

pyTigerGraph#

Strengths:

Designed for massive scale (10B+ edges)
GSQL optimized for deep traversals
Built-in distributed processing

Limitations:

Steeper learning curve
Enterprise licensing costs
Less flexible for rapid prototyping

Fit Score: 7/10 (9/10 at very high scale)

gremlinpython#

Strengths:

Database-agnostic (works with many backends)
Standard traversal language
Good for multi-database environments

Limitations:

Verbose syntax compared to Cypher
Performance varies by backend
Debugging traversals can be challenging

Fit Score: 6/10

Gaps and Workarounds#

Gap	Impact	Workaround
Real-time graph algorithms	Cannot compute PageRank on-the-fly	Pre-compute in batch, cache results
Supernodes (celebrities)	Traversal explosion	Bidirectional search, sampling strategies
Temporal queries	Limited time-series support	Add timestamp indices, partition by time
Multi-hop aggregations	Memory pressure	Streaming result processing, pagination

Recommendation#

Best Fit: neo4j official driver

For social network applications, the combination of expressive Cypher queries, mature ecosystem (GDS for algorithms), and strong async support makes the official Neo4j driver the best choice for most scale profiles.

Alternative: pyTigerGraph for platforms expecting 100M+ users where distributed processing becomes essential.

Use Case: Supply Chain#

Domain Description#

Supply chain graphs model the network of suppliers, manufacturers, distributors, and logistics providers that move products from raw materials to end customers. Graph analysis enables risk assessment, optimization of logistics, supplier diversification, and end-to-end traceability.

Requirements Analysis#

Graph Model Requirements#

Aspect	Requirement	Rationale
Model Type	Property Graph	Rich attributes on entities; weighted relationships
Multi-graph	Multiple edge types	Material flow, financial flow, information flow
Temporal	Time-aware edges	Lead times, seasonal variations, historical performance

Key Entity Types:

Organizations: Suppliers, manufacturers, distributors, retailers
Facilities: Factories, warehouses, ports, distribution centers
Products: SKUs, components, raw materials, finished goods
Logistics: Routes, carriers, shipments
Contracts: Agreements, terms, pricing

Query Pattern Complexity#

Primary Patterns:

Shortest path: Optimal route from supplier to customer
Risk propagation: “If supplier X fails, what products are affected?”
Alternative sourcing: Finding backup suppliers for a component
Bottleneck detection: Identifying single points of failure
Cost optimization: Weighted path finding for lowest total cost

Query Characteristics:

Depth: 3-10 hops (raw material to finished product)
Weighted: Edges have cost, time, capacity attributes
Aggregation: Sum costs, max lead times, min capacities
Constraints: Capacity limits, geographic restrictions

Scale Requirements#

Metric	Typical Range	High Scale
Entities (nodes)	10K - 100K	1M+
Relationships (edges)	100K - 1M	10M+
Query frequency	10 - 100 QPS	1K QPS
Path computations	100 - 10K/day	100K/day

Processing Mode#

Real-time: Disruption impact assessment (< 5s)
Batch: Route optimization, network redesign (hourly/daily)
Simulation: What-if analysis for planning

Integration Requirements#

ERP systems (SAP, Oracle) for order and inventory data
TMS (Transportation Management Systems) for logistics
Supplier portals for performance data
IoT/tracking systems for shipment visibility
BI tools for reporting and visualization
Planning tools for demand forecasting

Library Evaluation#

neo4j (Official Driver)#

Strengths:

Excellent weighted shortest path algorithms
Cypher handles multi-hop supply chain queries well
GDS library for network analysis (centrality, community)
Good visualization for supply chain mapping

Limitations:

Complex optimization needs external solvers
Limited native geospatial support
Large-scale simulations may need export to specialized tools

Fit Score: 8/10

python-arango#

Strengths:

Multi-model stores complex product/contract documents
Good geospatial support for logistics
Scales well for medium-large supply chains
Cost-effective for exploration

Limitations:

Fewer built-in graph algorithms
Path optimization less mature than Neo4j
Smaller supply chain community

Fit Score: 7/10

pyTigerGraph#

Strengths:

Excellent for very large global supply chains
GSQL handles complex path computations
Built-in graph analytics
Enterprise supply chain focus

Limitations:

Enterprise licensing costs
Steeper learning curve
Overkill for regional supply chains

Fit Score: 8/10 (global enterprise); 6/10 (smaller chains)

gremlinpython#

Strengths:

Database-agnostic
Standard traversal patterns
Works with Neptune for AWS supply chains

Limitations:

Verbose for weighted path queries
Limited optimization algorithms
Less intuitive for supply chain queries

Fit Score: 5/10

NetworkX#

Strengths:

Rich library for network optimization
Excellent for simulations and what-if analysis
Easy integration with optimization libraries (PuLP, OR-Tools)
Good for research and prototyping

Limitations:

In-memory only
Cannot serve production queries
Export/import overhead for real data

Fit Score: 6/10 (analysis); 2/10 (production)

Gaps and Workarounds#

Gap	Impact	Workaround
Constrained optimization	Cannot express capacity constraints in query	Export to optimization solver
Multi-objective paths	Trade-off cost vs time vs risk complex	Pareto frontier computation offline
Temporal edges	Lead times vary by season/volume	Time-parameterized edge properties
Geospatial routing	Distance calculations limited	Integrate with mapping APIs
Simulation	What-if at scale challenging	Clone subgraphs, sandbox environments
Data freshness	Supply chain data from many sources	ETL pipeline, change data capture

Architecture Pattern#

[Source Systems]
   |-- ERP
   |-- TMS
   |-- Supplier portals
   |-- IoT/Tracking
        |
        v
[ETL Pipeline] -- transformation --> [Graph Database]
        |                                    |
        v                                    v
[Master Data Management]              [Query API]
                                            |
                                            v
                            [Planning/Visualization Tools]

Query Examples:

// Shortest path by lead time
MATCH path = shortestPath(
  (supplier:Supplier {id: $supplierId})-[:SHIPS_TO*]-(dc:DistributionCenter {id: $dcId})
)
RETURN path, reduce(time=0, r in relationships(path) | time + r.leadTimeDays) as totalLeadTime

// Risk propagation: What's affected if this supplier fails?
MATCH (supplier:Supplier {id: $supplierId})<-[:SOURCED_FROM*1..5]-(product:Product)
RETURN product.sku, product.name, product.criticality

// Alternative suppliers
MATCH (product:Product {sku: $sku})-[:SOURCED_FROM]->(current:Supplier)
MATCH (component)<-[:CONTAINS]-(product)
MATCH (alt:Supplier)-[:PROVIDES]->(component)
WHERE alt <> current
RETURN alt.name, count(component) as componentsAvailable

Optimization Patterns#

For complex supply chain optimization, combine graph database with optimization:

Graph database: Topology storage, constraint queries
Export to pandas/NumPy: Data preparation
Optimization solver (OR-Tools, Gurobi): Route optimization
Write back to graph: Optimal routes as relationships

# Example hybrid pattern
from neo4j import GraphDatabase
from ortools.constraint_solver import routing_enums_pb2, pywrapcp

# 1. Extract network from graph
with driver.session() as session:
    network = session.run("MATCH (a)-[r:ROUTE]->(b) RETURN ...").data()

# 2. Build optimization model
manager = pywrapcp.RoutingIndexManager(...)
routing = pywrapcp.RoutingModel(manager)

# 3. Solve
solution = routing.SolveWithParameters(search_params)

# 4. Write optimal routes back to graph
with driver.session() as session:
    session.run("CREATE (r:OptimalRoute {...})", optimal_route)

Recommendation#

Best Fit: neo4j official driver

For supply chain applications, Neo4j provides the best balance of query expressiveness, graph algorithms, and ecosystem maturity. The combination of Cypher for querying and GDS for analytics covers most supply chain needs.

Key advantages for supply chain:

Weighted shortest path for logistics optimization
Centrality algorithms for identifying critical nodes
Community detection for supplier clustering
Good visualization for supply chain mapping

Alternative: pyTigerGraph for global enterprises with very large, distributed supply chains requiring massive scale.

Complement with NetworkX/OR-Tools for complex constrained optimization that goes beyond graph traversal (e.g., vehicle routing, facility location).

S4: Strategic

Strategic Analysis Methodology: Graph Database Client Libraries#

Analysis Framework#

This strategic assessment evaluates Python client libraries for graph databases through a 5-year viability lens, focusing on sustainability, portability, and ecosystem evolution.

Evaluation Dimensions#

1. Library Sustainability Assessment#

Maintenance Cadence: Release frequency, bug fix responsiveness, security patches
Corporate vs Community: Official vendor support vs community-driven development
Funding Model: Venture-backed, open-source foundation, or hybrid approaches
Bus Factor: Number of active maintainers, knowledge distribution
Breaking Change Philosophy: Semantic versioning adherence, deprecation cycles

2. Ecosystem Positioning#

Market Share Alignment: Does the library serve a growing or declining database?
Standards Compliance: GQL ISO standard readiness and migration path
AI/ML Integration: Support for GraphRAG, knowledge graphs, vector embeddings
Cloud Service Compatibility: Works with managed offerings (Neptune, Cosmos DB)

3. Portability Analysis#

Query Language Lock-in: Cypher vs Gremlin vs proprietary languages
Data Model Portability: Property graph standardization, export/import capabilities
Abstraction Layer Options: TinkerPop compatibility, ORM/OGM availability

4. Risk Assessment#

Vendor Viability: Financial health, acquisition risk, licensing changes
Technology Obsolescence: Language evolution (async support, typing)
Community Fragmentation: Fork risks, competing implementations

Data Sources#

PyPI release history and download statistics
GitHub commit activity and contributor metrics
Corporate financial disclosures and funding announcements
ISO GQL standardization progress (ISO/IEC 39075:2024)
Market research reports (Gartner, Forrester, independent analysts)
Vendor roadmaps and conference announcements

Scoring Methodology#

Each library receives ratings (1-5) across:

Maintenance Health
Corporate Backing Stability
Breaking Change Risk (inverted: lower = better)
Dependency Security
Long-term Viability Confidence

Selection Context#

The analysis considers different use case profiles:

Enterprise Production: Stability, support contracts, compliance
Startup/Growth: Flexibility, cost efficiency, rapid iteration
Research/Academic: Feature richness, community, documentation
Multi-Database: Portability, abstraction, standards compliance

Time Horizon Considerations#

Short-term (1-2 years): Current maintenance, Python version support
Medium-term (3-5 years): GQL adoption, cloud service evolution
Long-term (5+ years): Standards consolidation, market concentration

Graph Database Ecosystem Evolution (2025-2030)#

Market Growth Trajectory#

The graph database market is experiencing explosive growth, with projections ranging from $8.9B to $13.7B by 2030 (22-30% CAGR depending on source). Key growth drivers:

AI/ML Workloads: Knowledge graphs powering RAG and agentic systems
Cloud-Native Adoption: 72% of 2024 deployments are cloud-based
Fraud Detection: 28.4% of 2025 market revenue from fraud/risk analytics
SME Accessibility: Fastest-growing segment at 30%+ CAGR

GQL ISO Standard Impact (ISO/IEC 39075:2024)#

Timeline and Adoption#

April 2024: GQL standard officially published by ISO
2024-2025: openCypher evolving toward GQL compliance
2025: Neo4j Cypher 25 introduces GQL-conformant features
2026-2028: Expected broad vendor adoption

What This Means for Developers#

Cypher Users: Smooth transition path as Cypher converges to GQL
Gremlin Users: No direct GQL migration; separate language families
GSQL Users: Likely continued proprietary path; TigerGraph may add GQL layer
New Projects: Consider GQL-ready implementations

Standard Features#

600+ pages of formal definitions
Comparable in scope to SQL-92
Pattern matching, path finding, graph mutations
Expected to reduce vendor lock-in over time

Query Language Standardization Landscape#

Current State (2025)#

Language	Type	Vendors	GQL Path
Cypher	Declarative	Neo4j, Memgraph, AGE	Converging
Gremlin	Traversal	Neptune, Cosmos, JanusGraph	Separate family
GSQL	Proprietary	TigerGraph	Unknown
SPARQL	RDF	Various	Separate family
openCypher	Open Standard	Multiple	Evolving to GQL

Convergence Timeline#

Short-term (2025-2026): Cypher/openCypher implementations add GQL features
Medium-term (2027-2028): Majority of property graph databases GQL-compliant
Long-term (2029+): GQL becomes default query language for new databases

Multi-Model Database Convergence#

PostgreSQL Graph Capabilities#

Apache AGE: Graph extension bringing Cypher to PostgreSQL
Incubator Status: Apache Software Foundation project
Value Proposition: Add graph queries to existing PostgreSQL investments

MongoDB Evolution#

Current: Document-focused with limited graph features
Trend: Focus on Atlas Search, Vector Search, AI workloads
Graph Strategy: Not a primary focus

Market Implication#

Multi-model databases offer “good enough” graph capabilities for many use cases, potentially limiting growth of pure graph databases. However, deep graph analytics still favor specialized databases.

Cloud-Native Graph Services Growth#

Major Cloud Offerings#

Provider	Service	Query Languages	Status
AWS	Neptune	Gremlin, openCypher	Active
Azure	Cosmos DB (Gremlin)	Gremlin	Stable
Google	Spanner Graph	SQL + Graph	GA (2024)
Neo4j	AuraDB	Cypher	Growing

2024-2025 Developments#

Google Spanner Graph: Entered market with SQL-integrated graph
AWS Neptune + Bedrock: Graph RAG for knowledge bases
Neo4j Aura: New analytics and GenAI features

Trend: Managed Services Dominating#

Cloud-based deployments (72%+ share) reduce infrastructure concerns but increase vendor lock-in. Library selection should consider cloud provider compatibility.

AI/ML Integration with Graph Databases#

GraphRAG Revolution (2024+)#

Microsoft’s open-source GraphRAG (July 2024) established graph-augmented retrieval as a production pattern. Key developments:

Knowledge Graph Construction: LLMs extracting structured graphs from text
Graph + Vector Hybrid: Combining semantic search with relationship traversal
Agentic RAG: LLM agents using graph reasoning for multi-step workflows

Production Evidence#

300-320% ROI reported for knowledge graph implementations
LinkedIn: 63% improvement in ticket resolution with graph-based systems
Finance: 50% improvement in fraud detection rates

Library Implications#

Graph database clients increasingly need:

Vector index support (embeddings)
Streaming/async for real-time processing
LLM framework integration (LangChain, LlamaIndex)
Batch import for knowledge graph construction

Predictions for 2030#

High Confidence#

GQL becomes dominant property graph query language
Cloud-managed graph services capture majority of new deployments
GraphRAG/knowledge graph use cases drive enterprise adoption
Vector + graph hybrid architectures become standard

Medium Confidence#

Neo4j maintains market leadership but with reduced share
TinkerPop/Gremlin remains relevant for multi-database scenarios
PostgreSQL AGE captures significant “casual graph” use cases

Lower Confidence#

Complete query language standardization across vendors
Proprietary languages (GSQL) gaining significant share
On-premise deployments returning to favor

Graph Database Python Client Library Viability Assessment#

Executive Summary#

This assessment evaluates the long-term viability of Python client libraries for major graph databases. Libraries are rated on maintenance health, corporate backing, and sustainability for production use over a 5-year horizon.

Neo4j Python Driver#

Package: neo4j (PyPI) Type: Official vendor driver Current Version: 6.0.3 (November 2025)

Maintenance Status: EXCELLENT#

Release Cadence: Monthly releases since 5.0
Recent Activity: 6.0.x series actively developed with breaking changes for modernization
Python Support: 3.10, 3.11, 3.12, 3.13 (dropped 3.7-3.9 in 6.0)
Migration Tools: Official migration assistant for codebase upgrades

Corporate Backing: STRONG#

Vendor: Neo4j Inc. (founded 2007)
Funding: $581M total raised, $2B valuation
Revenue: $200M+ ARR (2024), 44% market share in graph DBMS
Customers: 75%+ Fortune 100, including BMW, NASA, UBS
Business Model: Open-source core + AuraDB managed service

Breaking Change History: MODERATE RISK#

Recent 5.x to 6.x migration requires attention:

Error handling redesign (DriverError vs Neo4jError separation)
Resource management changes (explicit .close() required)
Package rename from neo4j-driver to neo4j
Element IDs changed from integers to strings (5.x)

Dependency Health: EXCELLENT#

Minimal dependencies, optional Rust extensions for performance
No security vulnerabilities detected
Clean dependency tree

Bus Factor Risk: LOW#

Large engineering team at Neo4j
Multiple maintainers across driver ecosystem
Comprehensive documentation and enterprise support

Viability Score: 9/10#

Recommendation: Primary choice for Neo4j deployments. Strong long-term investment.

Neomodel (Neo4j OGM)#

Package: neomodel (PyPI) Type: Community OGM under Neo4j Labs Current Version: 6.0.0 (2024)

Maintenance Status: GOOD (Improved)#

Release Cadence: Active development resumed 2023
2024 Updates: Async support, mypy typing (95% coverage), vector index support
Python Support: 3.7+ with Neo4j 5.x and 4.4 LTS

Corporate Backing: COMMUNITY + LABS#

Moved to Neo4j Labs program (official recognition, community-driven)
Production use by Novo Nordisk (OpenStudyBuilder)
No dedicated corporate funding

Breaking Change History: MODERATE#

Major version bumps may require model adjustments
Configuration system overhaul in recent versions

Bus Factor Risk: MEDIUM#

Small maintainer team (Marius Conjeaud primary)
Active community but concentration of knowledge

Viability Score: 7/10#

Recommendation: Suitable for Neo4j projects needing OGM patterns. Monitor maintainer activity.

python-arango (ArangoDB)#

Package: python-arango (PyPI) Type: Official vendor driver Current Version: Latest 2024 release

Maintenance Status: GOOD#

Release Cadence: Healthy release activity
Weekly Downloads: 352,711 (popular package)
Python Support: 3.8+
Async Alternative: python-arango-async available

Corporate Backing: MODERATE (Changed)#

Vendor: ArangoDB GmbH (founded 2014)
Funding: $58.6M total raised
Licensing Change: Moved to BSL 1.1 for version 3.12+ (Q1 2024)
- Still source-available for non-commercial use
- Cannot be used for competing managed services
- Community Edition Transition Fund available

Breaking Change History: LOW#

Stable API evolution
Good backward compatibility

Dependency Health: GOOD#

No security vulnerabilities detected
Reasonable dependency footprint

Bus Factor Risk: MEDIUM#

Smaller company than Neo4j
Dual headquarters (San Francisco/Cologne)

Viability Score: 7/10#

Recommendation: Viable for multi-model needs. Watch licensing implications for SaaS deployments.

pyTigerGraph#

Package: pyTigerGraph (PyPI) Type: Official vendor SDK Current Version: 1.8.1

Maintenance Status: ADEQUATE#

Release Cadence: Active but less frequent
Weekly Downloads: 5,614 (smaller user base)
Recent Features: Async support (1.8), REST endpoint refactoring (1.7)
Contributors: 30 open-source contributors

Corporate Backing: STRONG (Enterprise Focus)#

Vendor: TigerGraph (founded 2012)
Funding: $172-174M total raised
Investors: Tiger Global, AME Cloud Ventures, Baidu
Focus: Enterprise analytics, fraud detection, supply chain
Customers: Uber, VISA, Alipay, Zillow

Breaking Change History: LOW-MODERATE#

Version 1.7+ requires TigerGraph DB 4.1+ for new features
Generally stable API

Dependency Health: GOOD#

No security vulnerabilities detected

Bus Factor Risk: MEDIUM#

Enterprise focus may limit open-source investment
Proprietary GSQL creates ecosystem isolation

Viability Score: 6/10#

Recommendation: Best for enterprise-scale graph analytics with existing TigerGraph investment. Not recommended as a first graph database choice due to GSQL lock-in.

gremlinpython (Apache TinkerPop)#

Package: gremlinpython (PyPI) Type: Apache Foundation project Current Version: 3.8.0 (November 2025)

Maintenance Status: EXCELLENT#

Release Cadence: Regular releases, 4.0 beta in development
Governance: Apache Software Foundation PMC
Python Support: Modern Python versions

Corporate Backing: FOUNDATION + MULTI-VENDOR#

Apache Software Foundation governance since 2016
Supported by multiple vendors (AWS, Microsoft, DataStax)
PMC includes contributors from diverse organizations
Active community with Discord, Twitch, YouTube presence

Breaking Change History: MODERATE (4.0 Coming)#

TinkerPop 4.0 introduces significant changes:

Dropping WebSockets for HTTP 1.1
Removing Bytecode in favor of gremlin-lang scripts
Simplifying connection options

Dependency Health: GOOD#

Standard Apache project quality

Bus Factor Risk: LOW#

Multiple major vendors invested
PMC governance ensures continuity
Long-term Apache stewardship

Viability Score: 8/10#

Recommendation: Excellent choice for multi-database portability strategy. Works with JanusGraph, Neptune, Cosmos DB, DataStax. TinkerPop 4.0 migration planning needed.

Cloud Provider SDKs#

AWS Neptune (boto3 + gremlinpython)#

Gremlin and openCypher support
Strong backing from AWS
Lock-in to AWS ecosystem
Viability tied to AWS platform (effectively permanent)

Azure Cosmos DB (azure-cosmos + gremlinpython)#

Gremlin API among multiple options
Microsoft backing but graph capabilities seen as stagnant
Multi-model flexibility
Viability tied to Azure platform

Summary Viability Matrix#

Library	Maintenance	Backing	Breaking Risk	Bus Factor	Overall
neo4j	9	9	7	9	9/10
neomodel	7	6	7	5	7/10
python-arango	8	6	8	6	7/10
pyTigerGraph	6	7	7	6	6/10
gremlinpython	9	8	6	9	8/10

Key Findings#

Safest Long-term Bets#

neo4j: Dominant market position, strong funding, active development
gremlinpython: Apache governance, multi-vendor support, portability value

Watch List#

python-arango: BSL licensing change may affect SaaS use cases
pyTigerGraph: GSQL proprietary language creates lock-in risk

Emerging Considerations#

All libraries adding async support (critical for modern Python)
Vector/embedding support becoming table stakes
GQL standard will reshape query language landscape

Vendor Lock-in Analysis: Graph Database Clients#

Query Language Portability Assessment#

Portability Spectrum#

Most Portable                                    Least Portable
     |                                                  |
  Gremlin -----> Cypher/GQL -----> GSQL -----> Proprietary

Gremlin (Apache TinkerPop)#

Portability Score: 9/10

Supported Databases: JanusGraph, Neptune, Cosmos DB, DataStax, OrientDB
Strengths: True multi-database abstraction, Apache governance
Weaknesses: Imperative style less intuitive than Cypher
Best For: Projects requiring database portability guarantees

Cypher / openCypher / GQL#

Portability Score: 7/10 (improving)

Current Support: Neo4j, Memgraph, AGE (PostgreSQL), RedisGraph (EOL)
GQL Future: Expected broad adoption 2026-2028
Strengths: Declarative, readable, standardizing via ISO GQL
Weaknesses: Neo4j dominance means de facto lock-in
Best For: Projects betting on GQL standardization

GSQL (TigerGraph)#

Portability Score: 2/10

Single Vendor: Only TigerGraph
Strengths: Turing-complete, optimized for deep analytics
Weaknesses: Complete vendor lock-in, no migration path
Best For: Enterprise analytics with long-term TigerGraph commitment

Data Model Portability#

Property Graph Model#

All major graph databases (Neo4j, TigerGraph, Neptune, JanusGraph) use property graphs, providing basic model compatibility:

Nodes/Vertices: Labeled entities with properties
Edges/Relationships: Typed connections with properties
Export Formats: CSV, JSON, GraphML widely supported

Migration Complexity Matrix#

From	To Neo4j	To Neptune	To TigerGraph	To JanusGraph
Neo4j	-	Medium	High	Medium
Neptune	Medium	-	High	Low
TigerGraph	High	High	-	High
JanusGraph	Medium	Low	High	-

Key Factors:

Query translation (Cypher <-> Gremlin <-> GSQL)
Schema and constraint differences
Indexing strategy variations
Application code rewrite requirements

Export/Import Tooling#

Available Tools:

Neo4j: LOAD CSV, neo4j-admin export, APOC procedures
Memgraph: Neo4j migration module (direct connection)
General: GraphML format interchange
Microsoft: MigrateToGraph (relational to graph)

Limitations:

No universal graph-to-graph migration standard
Query translation typically manual
Application logic must be rewritten

Abstraction Layer Options#

TinkerPop as Universal Layer#

What It Provides:

Common Gremlin query language
Vendor-agnostic driver interfaces
Standard property graph model

Databases Supported:

JanusGraph (native)
Amazon Neptune
Azure Cosmos DB
DataStax Enterprise
OrientDB

Databases NOT Supported:

Neo4j (native Cypher only, no TinkerPop)
TigerGraph (GSQL only)

When TinkerPop Makes Sense:

Multi-cloud strategy requiring database flexibility
Existing investment in Gremlin queries
Need to switch between Neptune/Cosmos/JanusGraph
Avoiding single-vendor dependency

When TinkerPop Doesn’t Make Sense:

Neo4j-specific features required
GSQL analytics capabilities needed
Cypher/GQL standardization bet
Simple use case not needing portability

ORM/OGM Abstraction#

Available OGMs:

Neomodel: Neo4j only
Object-Graph Mappers: Database-specific implementations

Limitation: No cross-database Python OGM exists. OGMs provide code abstraction but not database portability.

Lock-in Risk Mitigation Strategies#

Strategy 1: TinkerPop-First#

Choose Gremlin-compatible database; use gremlinpython exclusively.

Pros: Maximum portability, multi-vendor competition Cons: Excludes Neo4j, foregoes Cypher benefits Risk Level: Low lock-in, medium feature limitation

Strategy 2: GQL-Ready Cypher#

Choose Neo4j or openCypher database; prepare for GQL migration.

Pros: Best tooling (Neo4j), GQL future-proofing Cons: Near-term Cypher lock-in, GQL timeline uncertainty Risk Level: Medium lock-in, low feature limitation

Strategy 3: Abstraction Layer#

Build internal abstraction over database clients.

Pros: Control over interfaces, potential future migration Cons: Development overhead, incomplete feature coverage Risk Level: Low lock-in, high development cost

Strategy 4: Cloud Provider Lock-in Accept#

Choose Neptune or Cosmos DB; accept cloud platform dependency.

Pros: Managed service benefits, cloud ecosystem integration Cons: Full cloud vendor lock-in Risk Level: High lock-in, low operational burden

Recommendations by Use Case#

Startup/MVP#

Choice: Neo4j + Cypher
Rationale: Best developer experience, largest community, GQL path
Lock-in Acceptance: Medium (acceptable for velocity)

Enterprise Multi-Database#

Choice: TinkerPop/Gremlin
Rationale: Proven portability, vendor-neutral governance
Lock-in Acceptance: Low (portability required)

Deep Analytics#

Choice: TigerGraph + GSQL
Rationale: Best performance for complex algorithms
Lock-in Acceptance: High (feature-driven decision)

Cloud-Native#

Choice: Neptune or Cosmos DB (matching cloud provider)
Rationale: Operational simplicity, ecosystem integration
Lock-in Acceptance: High (cloud strategy dependent)

Strategic Recommendations: Graph Database Client Libraries#

5-Year Horizon Summary#

For Python projects requiring graph database capabilities over the next 5 years, the strategic landscape centers on two viable paths: Neo4j/Cypher with GQL evolution, or TinkerPop/Gremlin for multi-database portability.

Primary Recommendation: Neo4j Python Driver#

Package: neo4j When to Choose: Default choice for most new graph database projects

Rationale#

Market Leadership: 44% market share, $200M+ ARR, Fortune 100 adoption
Funding Stability: $581M raised, $2B valuation, path to IPO
Active Development: Monthly releases, Python 3.13 support, Rust extensions
GQL Alignment: Cypher converging to ISO GQL standard (smooth transition)
AI/ML Integration: Best GraphRAG tooling, LangChain integration, vector support
Community: Largest graph database community, extensive documentation

Risk Factors to Monitor#

Breaking changes in major versions (5.x to 6.x pattern)
Managed service (AuraDB) pricing evolution
GQL standardization timeline slippage

Secondary Recommendation: gremlinpython (TinkerPop)#

Package: gremlinpython When to Choose: Multi-database portability required

Rationale#

True Portability: Works across Neptune, Cosmos DB, JanusGraph, DataStax
Apache Governance: Foundation backing, multi-vendor PMC
Cloud Flexibility: Switch between AWS/Azure/on-premise
Long-term Stability: Apache projects rarely abandoned

Risk Factors to Monitor#

TinkerPop 4.0 migration (significant API changes)
No GQL convergence (separate from Cypher/GQL ecosystem)
Learning curve for imperative traversal patterns

Conditional Recommendations#

For OGM Requirements: neomodel#

Package: neomodel Condition: Need Python OGM patterns with Neo4j

Active maintenance under Neo4j Labs
Async support added 2024
Production use by major enterprises
Monitor maintainer activity (smaller team)

For Multi-Model Needs: python-arango#

Package: python-arango Condition: Document + graph + key-value in single database

BSL 1.1 licensing change (2024) limits SaaS use
Viable for internal applications
Async variant available

For Enterprise Analytics: pyTigerGraph#

Package: pyTigerGraph Condition: Deep graph algorithms, existing TigerGraph investment

Strong enterprise backing
GSQL lock-in is significant risk
Not recommended for new projects without specific requirements

Library Avoidance List#

Deprecated packages: neo4j-driver (use neo4j instead)
Abandoned projects: py2neo (deleted), unmaintained forks
Proprietary-only SDKs: Unless committed to that vendor long-term

Strategic Decision Framework#

Choose Neo4j (`neo4j`) When:#

Starting a new graph database project
Developer experience is a priority
GraphRAG or knowledge graph use case
Willing to bet on GQL standardization
Single-database architecture acceptable

Choose TinkerPop (`gremlinpython`) When:#

Multi-cloud or multi-database strategy required
Using Neptune, Cosmos DB, or JanusGraph
Vendor-neutral governance is important
Portability outweighs developer convenience

Choose Cloud-Specific When:#

Already committed to AWS (Neptune) or Azure (Cosmos)
Managed services preferred over self-hosted
Cloud ecosystem integration is primary concern

5-Year Outlook Summary#

Library	2025 Status	2030 Projection
neo4j	Strong	Dominant (GQL leader)
gremlinpython	Strong	Stable (portability)
neomodel	Good	Dependent on community
python-arango	Good	Viable (watch license)
pyTigerGraph	Adequate	Niche (enterprise)

Highest Confidence Bet: Neo4j Python driver with Cypher/GQL path Best Hedge Strategy: TinkerPop for projects needing future flexibility

Published: 2026-03-06 Updated: 2026-03-06