1.011 Graph Database Clients#
Explainer
Graph Database Clients: For Technical Decision Makers#
Purpose: Help CTOs, architects, and product managers understand Python graph database client libraries without deep graph theory expertise.
Audience: Technical leaders evaluating graph database adoption, teams planning migrations, developers choosing between query languages.
What This Solves#
Graph database client libraries solve the connection and interaction problem between your Python application and graph database systems.
The Core Problem: You have relationship-heavy data (social networks, fraud rings, supply chains, knowledge graphs) that traditional SQL databases handle poorly. Graph databases excel at traversing multi-hop relationships, but you need a way for your Python code to:
- Send queries to the graph database
- Receive and process results efficiently
- Manage connections, transactions, and errors
- Abstract away database-specific protocols
Who Encounters This:
- Startups building social features, recommendation engines, or knowledge bases
- Enterprise teams implementing fraud detection, network analysis, or supply chain optimization
- Data scientists constructing knowledge graphs for LLM-powered applications (GraphRAG)
- SaaS developers adding relationship-driven features to existing products
Why It Matters: Choosing the wrong client library locks you into a specific database vendor, query language, and ecosystem. Migration costs can reach hundreds of thousands of dollars in engineering time. The choice made today determines your flexibility, operational costs, and feature velocity for years.
Accessible Analogies#
The Translator Analogy#
Think of your graph database as a foreign city, and the client library as your tour guide/translator:
Neo4j driver (Cypher): A local expert who speaks the native language fluently, knows all the shortcuts, and has deep cultural knowledge. Fastest and most natural, but only works in this one city.
gremlinpython (Gremlin): A professional translator who works across multiple cities in the same region (TinkerPop family). Slightly less fluent in each individual dialect, but you can move between cities (Neptune, Cosmos DB, JanusGraph) without finding a new guide.
TigerGraph (GSQL): A hyper-specialized guide for a unique city with its own invented language. Incredibly effective for that specific place, but you can’t take this guide anywhere else—total lock-in.
The Filing System Analogy#
Imagine organizing a library:
Relational databases (SQL): Books organized in strict categories with catalog cards. Finding related books means walking to different sections, checking multiple cards (JOINs). Slow for “show me all mystery novels written by authors who also wrote sci-fi and were influenced by Author X.”
Graph databases: Every book has strings connecting it to related books, authors, and genres. Following those strings (traversals) is instant. The client library is the tool that lets you pull strings and read the labels.
The Protocol Translator#
Your Python code speaks Python. Your graph database speaks a specialized protocol (Bolt, WebSocket, gRPC, HTTP). The client library is the interpreter that:
- Translates your
session.run("MATCH (n:User) RETURN n")into Bolt binary protocol - Manages the TCP connection pool
- Deserializes results back into Python objects
Without it, you’d be manually crafting binary packets—impractical and error-prone.
When You Need This#
✅ You Need Graph Database Clients If:#
Relationship depth matters (3+ hops):
- “Find friends-of-friends-of-friends who like this product” (social networks)
- “Trace transaction chains 5 levels deep” (fraud detection)
- “Show supply chain impact 4 tiers upstream” (risk analysis)
Pattern matching drives value:
- Detecting rings, cycles, or suspicious subgraph structures
- Knowledge graph reasoning (entity → relationship → entity chains)
- Network influence propagation
Data model is naturally a graph:
- Org charts with complex reporting structures
- Infrastructure dependency maps
- Recommendation engines with collaborative filtering
❌ You DON’T Need This If:#
Simple CRUD operations:
- User profiles with basic lookups (use PostgreSQL)
- Document storage without complex relationships (use MongoDB)
- Time-series data (use InfluxDB, TimescaleDB)
All relationships are 1-2 hops:
- Basic “user has many orders” (SQL foreign keys suffice)
- Simple hierarchies (category → subcategory → product)
You’re already solving it well:
- PostgreSQL with recursive CTEs handling your traversals adequately
- Current solution meets performance SLAs and you’re not hitting scaling issues
Decision Criteria: When to Upgrade to Graph#
| Your Situation | Recommendation |
|---|---|
| 10M+ nodes, 4+ hop traversals regularly | Dedicated graph DB (Neo4j, Neptune) |
<1M nodes, occasional 3-hop queries | PostgreSQL with AGE extension (graph as feature) |
| Multi-model needs (graph + documents) | ArangoDB (BSL license limits apply) |
| AWS-committed, managed service preferred | Amazon Neptune (Gremlin/openCypher) |
| Need semantic reasoning (RDF triples) | rdflib (SPARQL ecosystem) |
Trade-offs#
Query Language Lock-in Spectrum#
Most Portable ←----------------------------------→ Most Locked-In
Gremlin Cypher/GQL GSQL Proprietary
(multi-DB) (converging) (TigerGraph) (single vendor)Gremlin (gremlinpython) - Portability Choice:
- ✅ Pros: Works across Neptune, Cosmos DB, JanusGraph, DataStax
- ✅ Pros: Apache governance, multi-vendor PMC
- ❌ Cons: Imperative style is harder to read than declarative Cypher
- ❌ Cons: Doesn’t work with Neo4j (market leader)
Cypher (neo4j) - Developer Experience Choice:
- ✅ Pros: Most readable query language, largest community (44% market share)
- ✅ Pros: Converging to ISO GQL standard (future-proofing)
- ✅ Pros: Best GraphRAG and LLM integration (2025+)
- ❌ Cons: Neo4j dominance means de facto lock-in
- ❌ Cons: GQL standardization timeline uncertain (2026-2028 expected)
GSQL (pyTigerGraph) - Performance Choice:
- ✅ Pros: Best performance for deep analytics (10+ hop traversals)
- ✅ Pros: Turing-complete query language (complex algorithms)
- ❌ Cons: Complete vendor lock-in (portability score: 2/10)
- ❌ Cons: Smaller ecosystem and hiring pool
OGM (Object-Graph Mapper) vs Raw Driver#
OGM (neomodel) - Productivity:
- ✅ Django-style models, faster CRUD development (30-50% time savings)
- ❌ Hides query details, can obscure performance issues
- ❌ Adds abstraction overhead
Raw Driver (neo4j, gremlinpython) - Control:
- ✅ Full query optimization control
- ✅ No abstraction overhead
- ❌ More boilerplate code for simple CRUD
Recommendation: Start with raw driver, add OGM if CRUD patterns dominate.
Build vs Buy (Self-Hosted vs Managed)#
Self-Hosted (Community Edition):
- ✅ Free for Neo4j Community, no usage limits
- ✅ Full control over infrastructure
- ❌ Operational burden (backups, scaling, monitoring)
- ❌ No enterprise support or high availability (Community)
Managed Services (AuraDB, Neptune, Cosmos):
- ✅ Zero operational overhead
- ✅ Built-in backups, scaling, monitoring
- ❌ $65-2,000+/month (can reach $10K+ at scale)
- ❌ Cloud vendor lock-in (migration complexity)
Cost Considerations#
Direct Costs#
| Service | Entry Price | Scale Price | Notes |
|---|---|---|---|
| Neo4j AuraDB | $65/month | $2,000+/month | Managed, auto-scaling |
| Amazon Neptune | $0.10/hour + storage | Variable | Pay per instance + I/O |
| Self-hosted Neo4j Community | $0 | Infrastructure only | No clustering/HA |
| Self-hosted Neo4j Enterprise | $36K+/year | License + infra | HA, support, advanced features |
Hidden Costs#
Learning Curve:
- Cypher/Gremlin training: 1-2 weeks per developer
- Graph modeling: 2-4 weeks for team to shift thinking
- Query optimization: Ongoing (graph queries need different tuning than SQL)
Migration Lock-in Cost:
- Cypher to Gremlin: 3-6 months for medium codebase (all queries rewritten)
- GSQL to anything: 6-12 months (proprietary language, no migration tools)
- Data export/import: 1-4 weeks depending on volume and schema complexity
Opportunity Cost Examples:
- Choosing TigerGraph for a small project: If you outgrow GSQL lock-in, migration cost = $200K-500K in engineering time
- Skipping GraphRAG opportunity: Companies report 300-320% ROI on knowledge graph implementations (LinkedIn: 63% ticket resolution improvement)
ROI Break-Even Analysis#
When graph DB pays for itself:
- Query performance gains: 10-100x faster for 3+ hop traversals
- Developer productivity: 40% faster feature delivery for relationship-heavy features (anecdotal, industry reports)
- Fraud detection improvements: 50% better detection rates (finance industry data)
Typical payback period: 6-18 months for teams with clear relationship-heavy use cases.
Implementation Reality#
First 90 Days: What to Expect#
Weeks 1-2: Learning & Proof of Concept
- Install local Neo4j or use AuraDB free tier
- Team learns Cypher/Gremlin basics (online tutorials: 5-10 hours per developer)
- Model one core use case (e.g., user → friend → product recommendation)
- Build simple prototype showing multi-hop traversal
Weeks 3-6: Production Architecture
- Choose managed vs self-hosted (decision driven by team expertise)
- Set up connection pooling, error handling, retry logic
- Migrate subset of production data
- Benchmark queries against existing SQL approach (expect 10-50x improvement for graph queries)
Weeks 7-12: Integration & Optimization
- Integrate with existing Python services
- Tune indexes (graph indexes work differently than SQL—expect learning curve)
- Handle schema evolution (graph schemas are more flexible, but migrations still needed)
- Monitor performance under load
Team Skill Requirements#
Essential:
- Comfortable with Python async/await patterns (if using async clients)
- Willingness to learn declarative query language (Cypher is SQL-like, Gremlin is more programmatic)
- Graph modeling mindset (thinking in nodes/edges vs tables/rows)
Nice to Have:
- Prior experience with NoSQL databases (helps with schema flexibility concepts)
- Understanding of graph algorithms (PageRank, community detection—libraries often provide these)
Hiring Impact:
- Cypher developers: Moderate pool (Neo4j dominance means growing talent base)
- Gremlin developers: Smaller pool (more specialized)
- GSQL developers: Very small pool (lock-in includes talent availability)
Common Pitfalls#
1. “All graph databases are the same”
- Reality: Neo4j (Cypher), Neptune (Gremlin), TigerGraph (GSQL) are fundamentally different query languages and data models.
- Avoidance: Evaluate client library portability upfront.
2. “We can switch databases later”
- Reality: Query language lock-in is real. Cypher → Gremlin migration = months of rewriting.
- Avoidance: Choose based on 5-year horizon, not 6-month prototype needs.
3. “Graph databases are slow for everything”
- Reality: 10-100x faster for multi-hop traversals, but slower for simple key-value lookups than Redis.
- Avoidance: Use graph DBs for graph queries, not as a general-purpose database.
4. “OGMs always improve productivity”
- Reality: OGMs excel for CRUD, but complex traversals often need raw Cypher/Gremlin for control.
- Avoidance: Start with raw driver, add OGM selectively for CRUD-heavy patterns.
5. “Gremlin works everywhere”
- Reality: Gremlin is TinkerPop-family only (Neptune, Cosmos, JanusGraph). Neo4j and TigerGraph do NOT support Gremlin.
- Avoidance: Verify database compatibility before committing to Gremlin.
Success Metrics (Realistic Expectations)#
Performance:
- 3-hop queries: 50-100ms (vs 500-2000ms in SQL with JOINs)
- 5-hop queries: 200-500ms (vs timeouts in SQL)
Development Velocity:
- Feature delivery for relationship-heavy features: 30-50% faster (once team is trained)
- Query writing: Initially slower (learning curve), then faster (more expressive language)
Operational:
- Managed service uptime: 99.9%+ (AuraDB, Neptune SLAs)
- Self-hosted: Depends on team expertise (expect 2-4 weeks to stabilize production setup)
Common Misconceptions#
“Graph databases replace all databases”#
Reality: Graph DBs are specialized tools. Use them for relationship-heavy queries, not for simple CRUD, time-series, or full-text search. Most production systems use graph DBs alongside PostgreSQL, Redis, and Elasticsearch.
“GQL will solve all portability issues”#
Reality: ISO GQL standard (2024) is a major step forward, but:
- Full vendor adoption: 2026-2028 expected
- Gremlin is a separate family (no GQL convergence planned)
- Legacy codebases won’t auto-migrate
“Self-hosting is always cheaper”#
Reality: Managed services ($65-2K/month) often cheaper than:
- DevOps salary ($120K+/year)
- Downtime costs (hours of engineer time debugging crashes)
- Backup/DR infrastructure
Calculate total cost of ownership, not just license fees.
Decision Framework for Stakeholders#
Primary Recommendation: Neo4j + Cypher#
Choose neo4j (official driver) when:
- Starting a new graph database project
- Developer experience and time-to-market matter
- GraphRAG or knowledge graph use case (best LLM integration)
- Betting on GQL standardization (Cypher converging to ISO GQL)
Risk: Medium lock-in (acceptable for velocity gains, GQL migration path exists)
Alternative: TinkerPop + Gremlin#
Choose gremlinpython when:
- Multi-cloud or multi-database strategy required (Neptune + Cosmos DB flexibility)
- Vendor-neutral governance important (Apache Foundation)
- Portability outweighs developer convenience
Risk: Low lock-in, higher learning curve
Niche: TigerGraph + GSQL#
Choose pyTigerGraph when:
- Deep graph analytics (10+ hop traversals, complex algorithms) required
- Existing TigerGraph investment
- Performance justifies severe lock-in (portability: 2/10)
Risk: High lock-in, small hiring pool
Questions to Ask Before Committing#
- What’s our relationship depth? (1-2 hops → maybe Postgres, 3+ hops → graph DB)
- Do we need multi-database portability? (Yes → Gremlin, No → Cypher)
- What’s our scale projection? (
<10M edges → any,>1B edges → TigerGraph/Neptune) - Is semantic reasoning required? (Yes → RDF/rdflib, No → property graph)
- What’s our cloud commitment? (AWS → Neptune, Azure → Cosmos, Multi-cloud → self-hosted or Neo4j)
Emerging Trends to Monitor (2025-2030)#
GQL Standardization (ISO/IEC 39075:2024)#
- Published April 2024, Cypher converging to compliance
- By 2028, expect broad vendor adoption (reduced lock-in)
- Migration path: Cypher users have smooth transition, Gremlin separate family
GraphRAG (Graph + LLMs)#
- Microsoft open-sourced GraphRAG (July 2024)
- Knowledge graphs as LLM context (retrieval-augmented generation)
- Neo4j leading integration (vector + graph hybrid)
- Implication: Graph DB clients need vector embedding support (table stakes by 2026)
PostgreSQL AGE (Apache Graph Extension)#
- Cypher queries in PostgreSQL (Apache incubator project)
- “Good enough” graph for many use cases
- Implication: Lowers barrier to entry, may slow pure graph DB adoption for simple use cases
Multi-Model Convergence#
- ArangoDB (graph + document + key-value in one database)
- DuckDB adding graph capabilities
- Implication: Graph becomes a feature, not a separate database (reduces operational complexity)
Date compiled: February 5, 2026 Research ID: 1.011
S1: Rapid Discovery
S1 Rapid Discovery: Graph Database Python Clients#
Research Methodology#
Scope Definition#
This discovery focuses on client libraries for interacting with graph databases from Python, NOT the graph databases themselves. The goal is to evaluate developer experience, community health, and production readiness of each library.
Discovery Process#
Initial Library Identification
- Categorize by database: Neo4j, ArangoDB, TigerGraph, Amazon Neptune, Dgraph
- Identify multi-database solutions: Apache TinkerPop (Gremlin), RDFLib
- Note official vs community-maintained libraries
Metrics Collection For each library, we gather:
- GitHub metrics: stars, forks, contributors, open issues, last commit date
- PyPI metrics: weekly downloads, latest version, Python version support
- Maintenance signals: release frequency, issue response time
- Documentation quality: quick scan of docs completeness
First Impression Evaluation
- Installation simplicity:
pip install <package>should work cleanly - Quickstart availability: can a developer be productive in 10 minutes?
- API design: does it feel Pythonic?
- Installation simplicity:
Libraries Evaluated#
| Database | Official Client | Alternative/OGM |
|---|---|---|
| Neo4j | neo4j (driver) | neomodel (OGM), py2neo (EOL) |
| ArangoDB | python-arango | python-arango-async |
| TigerGraph | pyTigerGraph | - |
| Amazon Neptune | gremlinpython + neptune-python-utils | - |
| Dgraph | pydgraph | - |
| Multi-DB (Gremlin) | gremlinpython | - |
| RDF Graphs | rdflib | - |
| OrientDB | pyorient | - (stale) |
Evaluation Criteria#
Tier 1 - Production Ready
- 500k+ weekly downloads
- Active maintenance (commits within 30 days)
- Official/vendor support
- Comprehensive documentation
Tier 2 - Mature Community
- 50k-500k weekly downloads
- Regular releases (quarterly or better)
- Good documentation
- Active issue resolution
Tier 3 - Emerging/Niche
<50k weekly downloads- May have gaps in documentation
- Smaller community
- Specialized use cases
Tier 4 - Caution Advised
- Stale/EOL projects
- No recent releases
- Deprecated in favor of alternatives
Data Sources#
- PyPI: https://pypi.org/ (version, dependencies, downloads)
- PyPI Stats: https://pypistats.org/ (download statistics)
- GitHub: Repository metrics and activity
- Snyk Advisor: Package health analysis
- Official documentation sites
Key Findings Summary#
See individual library files and recommendation.md for detailed analysis.
The most widely adopted libraries are gremlinpython (5.7M weekly downloads),
rdflib (1.4M), neo4j driver (520K), and python-arango (350K-1.2M).
gremlinpython - Apache TinkerPop Gremlin for Python#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | gremlinpython |
| Latest Version | 3.8.0 (Nov 17, 2025) |
| Python Support | 3.10+ |
| Weekly Downloads | ~5.7 million |
| GitHub Stars | 1,900+ (TinkerPop repo) |
| License | Apache-2.0 |
| Maintainer | Apache TinkerPop |
Installation#
pip install gremlinpythonFirst Impression#
Strengths:
- Universal graph query language (Gremlin)
- Works with multiple databases: Neptune, JanusGraph, CosmosDB, etc.
- Most downloaded graph Python library
- Apache Foundation backing
- Stable, mature codebase
Considerations:
- Gremlin syntax differs from Cypher
- Generic API may lack database-specific optimizations
- TinkerPop 4.0 brings breaking changes (HTTP replacing WebSockets)
Compatible Databases#
- Amazon Neptune
- JanusGraph
- Azure Cosmos DB (Gremlin API)
- DataStax Enterprise Graph
- IBM Compose for JanusGraph
- OrientDB
- TigerGraph (limited)
Quick Example#
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.structure.graph import Graph
# Connect to Gremlin server
graph = Graph()
conn = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
g = graph.traversal().withRemote(conn)
# Traverse the graph
people = g.V().hasLabel('person').values('name').toList()
print(people)
# Find friends of Alice
friends = g.V().has('person', 'name', 'Alice').out('knows').values('name').toList()
print(friends)
conn.close()Amazon Neptune Usage#
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
# With neptune-python-utils for IAM auth
from neptune_python_utils.gremlin_utils import GremlinUtils
gremlin_utils = GremlinUtils()
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)Assessment#
Tier: 1 - Production Ready
gremlinpython is the standard for Gremlin-based graph traversal. Its massive adoption (5.7M downloads/week) reflects use in AWS Neptune and other cloud graph services. Essential for multi-database portability or when using Gremlin-compatible databases.
Links#
- PyPI: https://pypi.org/project/gremlinpython/
- GitHub: https://github.com/apache/tinkerpop/tree/master/gremlin-python
- Docs: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python
- Neptune Utils: https://github.com/awslabs/amazon-neptune-tools
neo4j - Official Neo4j Python Driver#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | neo4j |
| Latest Version | 6.0.3 (Nov 6, 2025) |
| Python Support | 3.10, 3.11, 3.12, 3.13 |
| Weekly Downloads | ~520,000 |
| GitHub Stars | 1,000 |
| Contributors | 58 |
| License | Apache-2.0 |
| Maintainer | Neo4j, Inc. (official) |
Installation#
pip install neo4j
# With optional Rust extension for 10x performance
pip install neo4j-rust-ext
# With pandas/numpy integration
pip install neo4j[numpy,pandas,pyarrow]First Impression#
Strengths:
- Official vendor support with dedicated team
- Production-stable with semantic versioning
- Async support built-in
- Rust extensions available for performance-critical workloads
- Excellent documentation and examples
- Type hints throughout
Considerations:
- Python 3.10+ required (no legacy support)
- Deprecated
neo4j-driverpackage still in PyPI (causes confusion)
Quick Example#
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
result = session.run("MATCH (n:Person) RETURN n.name LIMIT 10")
for record in result:
print(record["n.name"])
driver.close()Assessment#
Tier: 1 - Production Ready
The official Neo4j driver is the clear choice for Neo4j integration. It has strong community adoption, official support, excellent documentation, and modern Python features. The Rust extension option makes it suitable for high-throughput workloads.
Links#
- PyPI: https://pypi.org/project/neo4j/
- GitHub: https://github.com/neo4j/neo4j-python-driver
- Docs: https://neo4j.com/docs/python-manual/current/
neomodel - Python OGM for Neo4j#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | neomodel |
| Latest Version | 6.0.0 (Nov 26, 2025) |
| Python Support | 3.10+ |
| Weekly Downloads | ~25,000 |
| GitHub Stars | 1,100 |
| Contributors | 97 |
| Open Issues | 49 |
| License | MIT |
| Maintainer | Neo4j Labs (community) |
Installation#
pip install neomodel
# With Rust driver extension for performance
pip install neomodel[rust-driver-ext]
# With optional dependencies (Shapely, pandas, numpy)
pip install neomodel[extras,rust-driver-ext]First Impression#
Strengths:
- Django-style model definitions (familiar pattern)
- Schema enforcement with cardinality restrictions
- Full transaction and async support
- Neo4j Labs project (good maintenance quality)
- Django integration via django-neomodel plugin
- Vector and full-text search support (v6.0+)
Considerations:
- Abstracts away Cypher (less control for complex queries)
- Learning curve for graph-specific concepts
- Performance overhead vs raw driver
Quick Example#
from neomodel import StructuredNode, StringProperty, RelationshipTo
class Person(StructuredNode):
name = StringProperty(required=True)
friends = RelationshipTo('Person', 'FRIEND')
# Create and relate nodes
alice = Person(name="Alice").save()
bob = Person(name="Bob").save()
alice.friends.connect(bob)
# Query
for friend in alice.friends.all():
print(friend.name)Assessment#
Tier: 2 - Mature Community
neomodel is the recommended OGM for Neo4j, especially for developers coming from Django/SQLAlchemy backgrounds. It provides a Pythonic abstraction over the graph while still allowing raw Cypher when needed. Good choice for rapid development.
Links#
- PyPI: https://pypi.org/project/neomodel/
- GitHub: https://github.com/neo4j-contrib/neomodel
- Docs: https://neomodel.readthedocs.io/
- Neo4j Labs: https://neo4j.com/labs/neomodel/
py2neo - End of Life (EOL)#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | py2neo |
| Latest Version | 2021.2.4 |
| Status | END OF LIFE |
| Last Meaningful Release | 2021 |
| GitHub Stars | ~1,200 (archived) |
| License | Apache-2.0 |
Status Warning#
py2neo has been officially declared End of Life as of April 2025.
The project is no longer maintained and will receive no further updates. The GitHub repository has moved to neo4j-contrib/py2neo but is effectively archived.
Migration Path#
Neo4j recommends migrating to:
- neo4j - Official Python driver for direct Cypher queries
- neomodel - For ORM-style object-graph mapping
Why py2neo Was Popular#
Historically, py2neo offered:
- Higher-level API than the raw driver
- Built-in OGM (Object-Graph Mapper)
- HTTP and Bolt protocol support
- Cypher lexer for Pygments
- Command-line tools
Migration Example#
Old (py2neo):
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
graph.run("MATCH (n) RETURN n LIMIT 10")New (neo4j driver):
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
session.run("MATCH (n) RETURN n LIMIT 10")Assessment#
Tier: 4 - Do Not Use
Do not start new projects with py2neo. For existing codebases, plan migration to the official neo4j driver or neomodel. Historical releases remain available on PyPI for legacy compatibility.
Links#
- PyPI (historical): https://pypi.org/project/py2neo/
- GitHub (archived): https://github.com/neo4j-contrib/py2neo
- Migration Guide: https://neo4j.com/blog/developer/py2neo-end-migration-guide/
pydgraph - Official Dgraph Python Client#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | pydgraph |
| Latest Version | 24.3.0 (Aug 5, 2025) |
| Python Support | 3.7 - 3.12 |
| Weekly Downloads | ~8,300 |
| GitHub Stars | 288 |
| Forks | 90 |
| Open Issues | 0 |
| License | Apache-2.0 |
| Maintainer | Hypermode Inc. (steward of Dgraph) |
Installation#
pip install pydgraphFirst Impression#
Strengths:
- Official client with gRPC protocol
- Good Python version support (3.7+)
- Clean, simple API
- Connection string support for clusters
- ACL (Access Control List) authentication
Considerations:
- Smaller ecosystem compared to Neo4j
- Dgraph uses GraphQL-like DQL, learning curve
- gRPC dependency can cause build issues on older systems
Quick Example#
import pydgraph
# Create client
client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
# Set schema
schema = """
name: string @index(exact) .
friends: [uid] .
type Person {
name
friends
}
"""
op = pydgraph.Operation(schema=schema)
client.alter(op)
# Create data
txn = client.txn()
try:
mutation = pydgraph.Mutation(set_nquads='_:alice <name> "Alice" .')
txn.mutate(mutation)
txn.commit()
finally:
txn.discard()
# Query
query = """
{
people(func: has(name)) {
name
friends { name }
}
}
"""
res = client.txn(read_only=True).query(query)
print(res.json)
client_stub.close()Connection String Format#
# Standard connection
client_stub = pydgraph.DgraphClientStub("localhost:9080")
# With authentication
client_stub = pydgraph.DgraphClientStub("dgraph://username:password@host:9080")Assessment#
Tier: 3 - Emerging/Niche
pydgraph is the correct choice for Dgraph integration. While the community is smaller than Neo4j, the library is well-maintained with zero open issues. Suitable for applications needing Dgraph’s distributed graph capabilities and native GraphQL-like query language.
Links#
- PyPI: https://pypi.org/project/pydgraph/
- GitHub: https://github.com/dgraph-io/pydgraph
- Dgraph Docs: https://dgraph.io/docs/
python-arango - Official ArangoDB Python Driver#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | python-arango |
| Latest Version | 8.2.5 (Dec 22, 2025) |
| Python Support | 3.9, 3.10, 3.11, 3.12 |
| Weekly Downloads | ~350,000 - 1.2M |
| GitHub Stars | 466 |
| Contributors | 32+ |
| Open Issues | 0 |
| License | MIT |
| Maintainer | ArangoDB (official) |
Installation#
pip install python-arango
# For async support
pip install python-arango-asyncFirst Impression#
Strengths:
- Official vendor support
- Excellent maintenance (zero open issues)
- Clean, Pythonic API
- Comprehensive AQL query support
- Graph traversal, document, and key-value operations
- Async alternative available
Considerations:
- ArangoDB-specific (multi-model but single vendor)
- Smaller community than Neo4j ecosystem
Quick Example#
from arango import ArangoClient
client = ArangoClient(hosts="http://localhost:8529")
db = client.db("mydb", username="root", password="password")
# Create a graph
graph = db.create_graph("social")
people = graph.create_vertex_collection("people")
friends = graph.create_edge_definition(
edge_collection="friends",
from_vertex_collections=["people"],
to_vertex_collections=["people"]
)
# Insert vertices and edges
alice = people.insert({"_key": "alice", "name": "Alice"})
bob = people.insert({"_key": "bob", "name": "Bob"})
friends.insert({"_from": "people/alice", "_to": "people/bob"})
# AQL query
cursor = db.aql.execute("FOR p IN people RETURN p.name")
print([doc for doc in cursor])Assessment#
Tier: 1 - Production Ready
python-arango is an excellent choice for ArangoDB integration. The library is well-maintained with official support, zero open issues, and comprehensive coverage of ArangoDB features. Particularly strong for applications needing multi-model (document + graph + key-value) capabilities.
Links#
- PyPI: https://pypi.org/project/python-arango/
- GitHub: https://github.com/arangodb/python-arango
- Docs: https://docs.python-arango.com/
- Async: https://github.com/arangodb/python-arango-async
pyTigerGraph - TigerGraph Python Client#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | pyTigerGraph |
| Latest Version | 1.9.1 (Nov 4, 2025) |
| Python Support | 3.8+ |
| Weekly Downloads | ~5,600 |
| GitHub Stars | 34 |
| Contributors | 22 |
| Open Issues | 7 |
| License | Apache-2.0 |
| Maintainer | TigerGraph (official) |
Installation#
# Core functionality
pip install pyTigerGraph
# With Graph Data Science / ML capabilities
pip install pyTigerGraph[gds]First Impression#
Strengths:
- Official vendor support
- Graph machine learning integration
- Async support (v1.8+)
- DataFrame loading from Pandas
- Good for analytics and ML workloads
Considerations:
- Smaller community (niche database)
- Less documentation compared to Neo4j/ArangoDB
- TigerGraph-specific GSQL language
Quick Example#
import pyTigerGraph as tg
conn = tg.TigerGraphConnection(
host="https://your-instance.i.tgcloud.io",
graphname="MyGraph",
username="tigergraph",
password="password"
)
# Get auth token
conn.getToken(conn.createSecret())
# Run a query
results = conn.runInstalledQuery("find_friends", params={"person": "Alice"})
# Upsert vertices
conn.upsertVertices("Person", [
{"id": "alice", "name": "Alice"},
{"id": "bob", "name": "Bob"}
])Graph Data Science Features#
# With pyTigerGraph[gds]
from pyTigerGraph.gds import featurizer
# Create graph features for ML
feat = conn.gds.featurizer()
feat.installAlgorithm("pagerank")
feat.runAlgorithm("pagerank", params={"v_type": "Person"})Assessment#
Tier: 3 - Emerging/Niche
pyTigerGraph is the right choice when using TigerGraph, especially for graph analytics and machine learning use cases. The library has official support but a smaller community. Best suited for enterprise analytics workloads where TigerGraph’s performance advantages justify the ecosystem trade-offs.
Links#
- PyPI: https://pypi.org/project/pyTigerGraph/
- GitHub: https://github.com/tigergraph/pyTigerGraph
- Docs: https://docs.tigergraph.com/pytigergraph/current/intro/
rdflib - Python Library for RDF#
Quick Facts#
| Metric | Value |
|---|---|
| Package Name | rdflib |
| Latest Version | 7.5.0 (Nov 28, 2025) |
| Python Support | 3.8.1+ |
| Weekly Downloads | ~1.45 million |
| GitHub Stars | 2,400 |
| Contributors | 189 |
| Open Issues | 291 |
| License | BSD-3-Clause |
| Maintainer | RDFLib community |
Installation#
pip install rdflibFirst Impression#
Strengths:
- Dominant library for RDF/semantic web in Python
- Comprehensive format support (RDF/XML, Turtle, JSON-LD, N-Quads, etc.)
- Full SPARQL 1.1 implementation
- Mature, well-documented
- Large ecosystem of extensions
Considerations:
- Focus on RDF graphs (different paradigm from property graphs)
- Not designed for high-performance graph traversal
- Steeper learning curve for developers new to semantic web
Quick Example#
from rdflib import Graph, Literal, RDF, URIRef, Namespace
# Create a graph
g = Graph()
# Define namespace
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
g.bind("foaf", FOAF)
# Add triples
alice = URIRef("http://example.org/alice")
g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, URIRef("http://example.org/bob")))
# SPARQL query
results = g.query("""
SELECT ?name WHERE {
?person foaf:name ?name .
}
""")
for row in results:
print(row.name)
# Serialize
print(g.serialize(format="turtle"))RDF vs Property Graphs#
| RDF (rdflib) | Property Graphs (Neo4j, etc.) |
|---|---|
| Triple-based (subject-predicate-object) | Nodes and relationships with properties |
| URIs for identifiers | Internal IDs |
| SPARQL query language | Cypher, Gremlin, etc. |
| Semantic web / linked data focus | Application data modeling |
| Standards-based (W3C) | Vendor-specific |
Assessment#
Tier: 1 - Production Ready
rdflib is the standard for RDF processing in Python. Essential for semantic web applications, knowledge graphs with linked data, and SPARQL-based querying. Different use case than property graph databases but equally mature.
Links#
- PyPI: https://pypi.org/project/rdflib/
- GitHub: https://github.com/RDFLib/rdflib
- Docs: https://rdflib.readthedocs.io/
- Website: https://rdflib.dev/
Graph Database Python Clients: Recommendations#
Quick Assessment Summary#
Tier Rankings#
| Tier | Library | Downloads/Week | Use Case |
|---|---|---|---|
| 1 | gremlinpython | 5.7M | Multi-DB, Neptune, JanusGraph |
| 1 | rdflib | 1.45M | RDF/Semantic web |
| 1 | neo4j | 520K | Neo4j (official) |
| 1 | python-arango | 350K-1.2M | ArangoDB (official) |
| 2 | neomodel | 25K | Neo4j OGM |
| 3 | pydgraph | 8K | Dgraph |
| 3 | pyTigerGraph | 5.6K | TigerGraph |
| 4 | py2neo | EOL | Do not use |
| 4 | pyorient | Stale (2017) | Avoid |
Top Picks by Use Case#
For Neo4j Integration#
Primary: neo4j (official driver)
- Best for: Direct Cypher queries, maximum control, performance
- Install:
pip install neo4j
Alternative: neomodel (OGM)
- Best for: Django-style model definitions, rapid development
- Install:
pip install neomodel
For Multi-Database Portability#
Primary: gremlinpython
- Best for: Neptune, JanusGraph, CosmosDB, any Gremlin-compatible DB
- Install:
pip install gremlinpython
For ArangoDB#
Primary: python-arango
- Best for: Multi-model (document + graph + key-value) applications
- Install:
pip install python-arango
For Semantic Web / Knowledge Graphs#
Primary: rdflib
- Best for: RDF processing, SPARQL queries, linked data
- Install:
pip install rdflib
For Graph Analytics / ML#
Primary: pyTigerGraph[gds]
- Best for: Large-scale graph analytics, ML on graphs
- Install:
pip install pyTigerGraph[gds]
Decision Matrix#
| Requirement | Recommended Library |
|---|---|
| Need Cypher query language | neo4j, neomodel |
| Need Gremlin query language | gremlinpython |
| Need SPARQL / RDF | rdflib |
| Need vendor portability | gremlinpython |
| Need ORM-style abstraction | neomodel |
| AWS Neptune | gremlinpython + neptune-python-utils |
| Multi-model (doc + graph) | python-arango |
| Graph machine learning | pyTigerGraph[gds] |
| Distributed graph at scale | pydgraph, pyTigerGraph |
Libraries to Avoid#
- py2neo - Officially EOL (April 2025), migrate to neo4j/neomodel
- pyorient - Last release 2017, OrientDB has limited Python support
- neo4j-driver - Deprecated package name, use
neo4jinstead
Key Insights#
- gremlinpython dominates downloads due to AWS Neptune and cloud adoption
- Neo4j ecosystem is strongest with official driver + OGM options
- ArangoDB has excellent official support with zero open issues
- RDFLib serves a distinct use case (semantic web vs property graphs)
- TigerGraph and Dgraph are niche but officially supported
Next Steps for Deeper Evaluation#
- Test connection setup with actual database instances
- Benchmark query performance for representative workloads
- Evaluate async support for concurrent applications
- Review error handling and retry mechanisms
- Assess integration with web frameworks (FastAPI, Django)
Data Sources#
- PyPI: Package metadata and downloads
- PyPI Stats: https://pypistats.org/
- GitHub: Repository metrics
- Snyk Advisor: Package health analysis
- Official documentation for each library
Research conducted: December 2025
S2: Comprehensive
S2 Comprehensive Discovery: Graph Database Python Client Libraries#
Overview#
This document outlines the methodology for evaluating Python client libraries for graph databases. The analysis covers official drivers, community libraries, and Object-Graph Mappers (OGMs) across multiple graph database platforms.
Scope#
Libraries Evaluated#
| Library | Database | Type | Maintenance |
|---|---|---|---|
| neo4j-driver | Neo4j | Official Driver | Active (Neo4j Inc.) |
| py2neo | Neo4j | Community Driver | EOL (Archived) |
| neomodel | Neo4j | OGM | Active (Neo4j Labs) |
| python-arango | ArangoDB | Official Driver | Active (ArangoDB) |
| pyTigerGraph | TigerGraph | Official Client | Active (TigerGraph) |
| gremlinpython | Multi-DB | Official (TinkerPop) | Active (Apache) |
| pydgraph | Dgraph | Official Driver | Active (Hypermode) |
| rdflib | RDF/SPARQL | Library | Active (Community) |
Evaluation Criteria#
1. API Design and Ergonomics#
- Pythonic Design: Adherence to Python idioms (PEP 8, context managers, generators)
- Type Hints: MyPy compatibility and IDE support
- Documentation Quality: Official docs, examples, and community resources
- Learning Curve: Time to productivity for developers
2. Performance Characteristics#
- Connection Pooling: Configuration options and efficiency
- Bulk Operations: Batch insert/update capabilities
- Serialization: Data format handling (JSON, Binary, custom)
- Rust Extensions: Native code acceleration options
3. Async Support#
- Native asyncio: Built-in async/await support
- Framework Integration: FastAPI, aiohttp, Starlette compatibility
- Concurrent Transactions: Parallel query execution
4. Transaction and Consistency#
- ACID Support: Transaction isolation levels
- Retry Logic: Automatic retry on transient failures
- Causal Consistency: Bookmark/session management
- Read/Write Splitting: Routing to appropriate cluster nodes
5. Query Language Support#
| Library | Primary | Secondary |
|---|---|---|
| neo4j-driver | Cypher | - |
| neomodel | Python OGM | Cypher (raw) |
| python-arango | AQL | - |
| pyTigerGraph | GSQL | REST API |
| gremlinpython | Gremlin | - |
| pydgraph | GraphQL+/DQL | - |
| rdflib | SPARQL | RDF/Turtle |
6. Schema and Migration#
- Schema Definition: Programmatic vs. declarative
- Constraint Management: Unique, existence, type constraints
- Index Management: Creation, deletion, optimization
- Migration Tooling: Version control for schema changes
7. Testing and Development#
- Mocking Support: Test doubles and fixtures
- Embedded Mode: In-process database for testing
- CI/CD Integration: Docker, testcontainers compatibility
Data Sources#
Primary Sources#
- Official Documentation: Driver manuals and API references
- GitHub Repositories: Source code, issues, release notes
- PyPI: Package metadata, version history, dependencies
Secondary Sources#
- Community Forums: Stack Overflow, database-specific communities
- Performance Benchmarks: Published comparisons and metrics
- Migration Guides: Version upgrade documentation
Analysis Deliverables#
- Per-Library Deep Dives: 100-200 lines covering features, patterns, and limitations
- Feature Matrix: Side-by-side comparison across all criteria
- Recommendations: Use-case based guidance with justifications
Versioning Context#
All analysis conducted against library versions current as of December 2024:
- neo4j-driver: 6.0.x
- neomodel: 6.0.x
- python-arango: 8.2.x
- pyTigerGraph: 1.6.x
- gremlinpython: 3.7.x
- pydgraph: 24.x / 25.x
- rdflib: 7.2.x
Graph Database Python Client Libraries: Feature Matrix#
Overview#
This matrix compares Python client libraries for graph databases across key functional and technical criteria. Libraries are evaluated as of December 2024.
Quick Reference#
| Library | Database | Query Language | Status |
|---|---|---|---|
| neo4j-driver | Neo4j | Cypher | Active |
| py2neo | Neo4j | Cypher | EOL |
| neomodel | Neo4j | OGM/Cypher | Active |
| python-arango | ArangoDB | AQL | Active |
| pyTigerGraph | TigerGraph | GSQL | Active |
| gremlinpython | Multi-DB | Gremlin | Active |
| pydgraph | Dgraph | DQL | Active |
| rdflib | RDF stores | SPARQL | Active |
Async Support#
| Library | Native asyncio | Async Variant | Framework Compat |
|---|---|---|---|
| neo4j-driver | Yes | Built-in | FastAPI, aiohttp |
| py2neo | No | - | - |
| neomodel | Yes | Built-in (v5+) | Django, FastAPI |
| python-arango | No | python-arango-async | FastAPI (separate pkg) |
| pyTigerGraph | Partial | AsyncTigerGraphConnection | Limited |
| gremlinpython | No | aiogremlin, goblin | Via third-party |
| pydgraph | No | gRPC futures only | Limited |
| rdflib | No | Manual wrapping | Via thread pool |
Connection Management#
| Library | Connection Pooling | Pool Size Config | Liveness Check |
|---|---|---|---|
| neo4j-driver | Yes | max_connection_pool_size | liveness_check_timeout |
| py2neo | Basic | Limited | No |
| neomodel | Yes (via driver) | Via driver_options | Via driver |
| python-arango | No | - | No |
| pyTigerGraph | No | - | No |
| gremlinpython | Yes | pool_size parameter | Known issues |
| pydgraph | Manual | Multiple stubs | Manual |
| rdflib | N/A | N/A | N/A |
Transaction Support#
| Library | ACID | Managed Txn | Auto-retry | Causal Consistency |
|---|---|---|---|---|
| neo4j-driver | Yes | execute_read/write | Yes | Bookmarks |
| py2neo | Yes | Context manager | No | No |
| neomodel | Yes | Context manager | No | Via driver |
| python-arango | Yes | Stream/JS txn | No | No |
| pyTigerGraph | Limited | Via REST | No | No |
| gremlinpython | Yes | tx.begin/commit | No | No |
| pydgraph | Yes | txn() context | Manual | No |
| rdflib | No | N/A | N/A | N/A |
Query Language Features#
| Library | Parameterized | Prepared/Cached | Bulk Operations |
|---|---|---|---|
| neo4j-driver | Yes ($params) | No | UNWIND pattern |
| py2neo | Yes | No | Batch methods |
| neomodel | Yes | No | save() loop |
| python-arango | Yes (@params) | No | insert_many() |
| pyTigerGraph | Yes | Installed queries | upsertVertices() |
| gremlinpython | Limited | No | Batch traversals |
| pydgraph | Yes ($params) | No | JSON arrays |
| rdflib | Yes (initBindings) | prepareQuery() | addN() |
OGM/ORM Capabilities#
| Library | OGM Layer | Schema Definition | Hooks | Validation |
|---|---|---|---|---|
| neo4j-driver | No | Manual | No | No |
| py2neo | Built-in | GraphObject | Limited | No |
| neomodel | Built-in | StructuredNode | Yes | Property-level |
| python-arango | No | Manual | No | No |
| pyTigerGraph | Schema API | Object-oriented | No | GSQL |
| gremlinpython | Via Goblin | Vertex/Edge classes | Limited | Via Goblin |
| pydgraph | No | DQL schema | No | No |
| rdflib | No | RDF/OWL | No | SHACL (ext) |
Type System#
| Library | Type Hints | MyPy Support | Spatial Types | Temporal Types |
|---|---|---|---|---|
| neo4j-driver | Yes | Good | Point | Date/DateTime/Duration |
| py2neo | Partial | Limited | Via Cypher | Via Cypher |
| neomodel | Yes | Good | PointProperty | DateTime/Date |
| python-arango | Partial | Limited | GeoJSON | ISO strings |
| pyTigerGraph | Limited | Limited | GSQL types | DATETIME |
| gremlinpython | Limited | Limited | Via properties | Via properties |
| pydgraph | Limited | Limited | Geo (geo:) | dateTime |
| rdflib | Partial | Limited | GeoSPARQL | xsd:dateTime |
Performance Features#
| Library | Native Extensions | Binary Protocol | Compression |
|---|---|---|---|
| neo4j-driver | Rust (optional) | Bolt | No |
| py2neo | No | Bolt/HTTP | No |
| neomodel | Via driver | Bolt | No |
| python-arango | No | HTTP/REST | Optional |
| pyTigerGraph | No | HTTP/REST | No |
| gremlinpython | No | WebSocket | GraphBinary |
| pydgraph | No | gRPC | Protocol Buffers |
| rdflib | No | N/A | N/A |
Error Handling#
| Library | Typed Exceptions | Retry Categories | Error Codes |
|---|---|---|---|
| neo4j-driver | Yes | Transient/Client/DB | Yes |
| py2neo | Partial | No | Limited |
| neomodel | Via driver | Via driver | Via driver |
| python-arango | Yes | No | ArangoDB codes |
| pyTigerGraph | Basic | No | REST status |
| gremlinpython | GremlinServerError | No | Server codes |
| pydgraph | gRPC errors | Manual | gRPC codes |
| rdflib | Standard Python | No | No |
Testing Support#
| Library | Mocking | Embedded Mode | Testcontainers |
|---|---|---|---|
| neo4j-driver | Manual | No | Yes |
| py2neo | No | No | Possible |
| neomodel | Manual | No | Yes |
| python-arango | No | No | Yes |
| pyTigerGraph | No | No | Limited |
| gremlinpython | No | JVM only | Yes (Gremlin Server) |
| pydgraph | No | No | Yes |
| rdflib | In-memory graph | Yes | N/A |
Documentation and Community#
| Library | Official Docs | API Reference | Examples | Community |
|---|---|---|---|---|
| neo4j-driver | Excellent | Complete | Extensive | Large |
| py2neo | Archived | Archived | Limited | Inactive |
| neomodel | Good | Complete | Moderate | Active |
| python-arango | Good | Complete | Good | Moderate |
| pyTigerGraph | Good | Complete | Good | Moderate |
| gremlinpython | Good | Reference | Book available | Large |
| pydgraph | Moderate | README | Basic | Small |
| rdflib | Excellent | Complete | Extensive | Large |
Python Version Support#
| Library | Min Version | Max Version | Notes |
|---|---|---|---|
| neo4j-driver | 3.10 | 3.14 | Drops 3.9 in v6 |
| py2neo | 3.x | - | EOL |
| neomodel | 3.8 | 3.12+ | |
| python-arango | 3.9 | Latest | |
| pyTigerGraph | 3.7 | Latest | |
| gremlinpython | 3.10 | Latest | |
| pydgraph | 3.7 | Latest | |
| rdflib | 3.8 | Latest |
Database Version Support#
| Library | Supported Versions | LTS Support |
|---|---|---|
| neo4j-driver | 4.4+, 5.x | 4.4 LTS, 5.26 LTS |
| py2neo | 4.x (frozen) | - |
| neomodel | 4.4+, 5.x | Via driver |
| python-arango | 3.11+ | Via ArangoDB |
| pyTigerGraph | 3.x+ | Via TigerGraph |
| gremlinpython | TinkerPop 3.x | Via database |
| pydgraph | Version-matched | Via Dgraph |
| rdflib | N/A | N/A |
Installation Size#
| Library | Core Size | Dependencies | Optional Extras |
|---|---|---|---|
| neo4j-driver | ~500KB | pytz | rust-ext (~2MB) |
| py2neo | ~1MB | Several | pygments |
| neomodel | ~200KB | neo4j-driver | shapely, extras |
| python-arango | ~300KB | requests | - |
| pyTigerGraph | ~500KB | requests | torch (GDS) |
| gremlinpython | ~200KB | aiohttp, nest-asyncio | - |
| pydgraph | ~100KB | grpcio, protobuf | - |
| rdflib | ~2MB | pyparsing, isodate | lxml, html5lib |
Summary Scores (1-5)#
| Library | API Design | Performance | Async | Ecosystem | Overall |
|---|---|---|---|---|---|
| neo4j-driver | 5 | 5 | 5 | 5 | 5.0 |
| py2neo | 4 | 3 | 1 | 1 | 2.3 |
| neomodel | 5 | 4 | 4 | 4 | 4.3 |
| python-arango | 4 | 4 | 3 | 4 | 3.8 |
| pyTigerGraph | 3 | 3 | 2 | 3 | 2.8 |
| gremlinpython | 3 | 3 | 2 | 4 | 3.0 |
| pydgraph | 3 | 4 | 2 | 2 | 2.8 |
| rdflib | 4 | 3 | 1 | 4 | 3.0 |
Scores based on: API ergonomics, Python idiom adherence, documentation quality, maintenance activity, and production readiness.
gremlinpython - Apache TinkerPop Gremlin Client#
Overview#
gremlinpython is the official Python language variant (GLV) for Apache TinkerPop’s Gremlin graph traversal language. It provides a consistent API for interacting with any TinkerPop-enabled graph database, offering database portability through a standardized query language.
Key Information#
| Attribute | Value |
|---|---|
| Package | gremlinpython |
| Version | 3.7.x |
| Python Support | 3.10+ |
| Protocol | WebSocket (GraphBinary/GraphSON) |
| License | Apache 2.0 |
| Repository | github.com/apache/tinkerpop |
Supported Databases#
gremlinpython works with any TinkerPop-compliant database:
- Amazon Neptune
- Azure Cosmos DB (Gremlin API)
- JanusGraph
- OrientDB
- Neo4j (with TinkerPop plugin)
- TigerGraph (with TinkerPop connector)
- DataStax Graph
Installation#
pip install gremlinpythonConnection Management#
Basic Connection#
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
# Create traversal source
g = traversal().with_remote(
DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
)
# With authentication
g = traversal().with_remote(
DriverRemoteConnection(
'wss://your-endpoint:8182/gremlin',
'g',
username='user',
password='password'
)
)Connection Options#
from gremlin_python.driver.client import Client
# Lower-level client access
client = Client(
'ws://localhost:8182/gremlin',
'g',
pool_size=8, # Connection pool size
max_workers=4, # Thread pool workers
message_serializer=None # Custom serializer
)
# Submit raw Gremlin
result = client.submit("g.V().count()")
for r in result:
print(r)Traversal Basics#
Creating Vertices#
# Add vertex
g.addV('person').property('name', 'Alice').property('age', 30).next()
# Add with ID
g.addV('person').property(T.id, 'alice').property('name', 'Alice').next()Creating Edges#
# Add edge between vertices
g.V().has('person', 'name', 'Alice').as_('a') \
.V().has('person', 'name', 'Bob').as_('b') \
.addE('knows').from_('a').to('b').property('since', 2020).next()Reading Data#
# Get all vertices of type
people = g.V().hasLabel('person').toList()
# Get specific vertex
alice = g.V().has('person', 'name', 'Alice').next()
# Get vertex properties
props = g.V().has('person', 'name', 'Alice').valueMap().next()Updating Data#
# Update property
g.V().has('person', 'name', 'Alice') \
.property('age', 31).next()
# Add multiple properties
g.V().has('person', 'name', 'Alice') \
.property('email', '[email protected]') \
.property('city', 'NYC').next()Deleting Data#
# Delete vertex (and connected edges)
g.V().has('person', 'name', 'Alice').drop().iterate()
# Delete edge
g.E().hasLabel('knows').drop().iterate()Traversal Patterns#
Filtering#
# Has filters
g.V().has('person', 'age', P.gt(25)).toList()
g.V().has('person', 'name', P.within('Alice', 'Bob')).toList()
# Multiple conditions
g.V().hasLabel('person') \
.has('age', P.gte(18)) \
.has('age', P.lt(65)).toList()
# Not filter
g.V().hasLabel('person').not_(__.has('retired', True)).toList()Traversing Relationships#
# Outgoing edges
friends = g.V().has('person', 'name', 'Alice').out('knows').toList()
# Incoming edges
followers = g.V().has('person', 'name', 'Alice').in_('follows').toList()
# Both directions
connections = g.V().has('person', 'name', 'Alice').both('knows').toList()
# Multiple hops
friends_of_friends = g.V().has('person', 'name', 'Alice') \
.out('knows').out('knows') \
.dedup().toList()Path Queries#
# Get paths
paths = g.V().has('person', 'name', 'Alice') \
.repeat(__.out('knows')).times(2) \
.path().by('name').toList()
# Shortest path
path = g.V().has('person', 'name', 'Alice') \
.repeat(__.out().simplePath()) \
.until(__.has('person', 'name', 'Charlie')) \
.path().limit(1).next()Aggregation#
# Count
count = g.V().hasLabel('person').count().next()
# Group by
by_age = g.V().hasLabel('person') \
.group().by('age').by(__.count()).next()
# Statistics
stats = g.V().hasLabel('person') \
.values('age').fold() \
.project('min', 'max', 'avg', 'count') \
.by(__.min()) \
.by(__.max()) \
.by(__.mean()) \
.by(__.count()).next()Serialization#
GraphBinary (Recommended)#
from gremlin_python.driver.serializer import GraphBinarySerializersV1
g = traversal().with_remote(
DriverRemoteConnection(
'ws://localhost:8182/gremlin',
'g',
message_serializer=GraphBinarySerializersV1()
)
)GraphSON#
from gremlin_python.driver.serializer import GraphSONSerializersV3d0
g = traversal().with_remote(
DriverRemoteConnection(
'ws://localhost:8182/gremlin',
'g',
message_serializer=GraphSONSerializersV3d0()
)
)Transaction Support#
# Begin transaction
tx = g.tx()
# Get transaction-bound traversal
gtx = tx.begin()
try:
gtx.addV('person').property('name', 'Alice').next()
gtx.addV('person').property('name', 'Bob').next()
tx.commit()
except:
tx.rollback()
raiseAsync Alternatives#
gremlinpython itself is synchronous. For async support, consider:
aiogremlin#
from aiogremlin import Cluster, Graph
cluster = await Cluster.open(hosts=['localhost'])
client = await cluster.connect()
g = Graph().traversal().withRemote(client)
# Async operations
result = await g.V().toList()Goblin OGM#
from goblin import Goblin, Vertex, String
class Person(Vertex):
name = String()
app = await Goblin.open(hosts=['localhost'])
session = await app.session()
person = Person(name='Alice')
session.add(person)
await session.flush()gremlinpy (FastAPI compatible)#
from gremlinpy import Graph
g = Graph().traversal()
# Compatible with existing event loopsTraversal Strategies#
from gremlin_python.process.strategies import *
# Read-only strategy
g = g.withStrategies(ReadOnlyStrategy())
# Subgraph strategy (filter)
g = g.withStrategies(SubgraphStrategy(
vertices=__.hasLabel('person'),
edges=__.hasLabel('knows')
))
# Partition strategy
g = g.withStrategies(PartitionStrategy(
partitionKey='region',
writePartition='us-west'
))Error Handling#
from gremlin_python.driver.protocol import GremlinServerError
try:
result = g.V().has('invalid').next()
except GremlinServerError as e:
print(f"Server error: {e}")
except StopIteration:
print("No results found")Connection Pooling Issues#
Known limitation (TINKERPOP-3114):
“Once a connection error occurred, pooled connections are broken and will not be recovered.”
Workaround:
# Implement connection health checks
def get_connection():
try:
g.V().limit(1).next()
return g
except:
# Reconnect
return create_new_connection()Limitations#
- No native Python asyncio (use aiogremlin/goblin)
- Connection pool recovery issues
- WebSocket-only protocol
- Remote execution only (no embedded mode)
- Reference-only objects from server (no full properties)
- Significant memory overhead for large result sets
When to Use#
Choose gremlinpython when:
- Database portability is important
- Working with TinkerPop-compatible databases
- Standard graph query language preferred
- Amazon Neptune or Azure Cosmos DB target
Consider alternatives when:
- Native async required (use aiogremlin/goblin)
- Database-specific features needed
- Maximum performance critical
- OGM patterns preferred (use goblin)
Resources#
Neo4j Python Driver (neo4j)#
Overview#
The official Neo4j Python driver provides low-level, high-performance access to Neo4j databases using the Bolt protocol. Maintained by Neo4j Inc., it serves as the foundation for higher-level libraries like neomodel.
Key Information#
| Attribute | Value |
|---|---|
| Package | neo4j (formerly neo4j-driver) |
| Version | 6.0.x |
| Python Support | 3.10, 3.11, 3.12, 3.13, 3.14 |
| Protocol | Bolt 4.4, 5.0-5.8, 6.0 |
| License | Apache 2.0 |
| Repository | github.com/neo4j/neo4j-python-driver |
Installation#
pip install neo4j
# With optional Rust extensions for performance
pip install neo4j-rust-extCore Features#
Connection Management#
from neo4j import GraphDatabase
# Basic connection
driver = GraphDatabase.driver(
"neo4j://localhost:7687",
auth=("neo4j", "password")
)
# With context manager (recommended)
with GraphDatabase.driver(uri, auth=auth) as driver:
driver.verify_connectivity()
# Use driver...Query Execution#
# Simple query execution (v5.0+)
records, summary, keys = driver.execute_query(
"MATCH (p:Person {name: $name}) RETURN p",
name="Alice",
database_="neo4j"
)
# With routing control
from neo4j import RoutingControl
records, summary, keys = driver.execute_query(
"MATCH (p:Person) RETURN p",
routing_=RoutingControl.READ
)Session-Based Transactions#
with driver.session(database="neo4j") as session:
# Managed transaction (recommended - auto-retry)
result = session.execute_read(
lambda tx: tx.run("MATCH (p:Person) RETURN p").data()
)
# Write transaction
session.execute_write(
lambda tx: tx.run(
"CREATE (p:Person {name: $name})",
name="Bob"
)
)Async Support#
Full async/await support mirrors the synchronous API:
from neo4j import AsyncGraphDatabase
async def main():
async with AsyncGraphDatabase.driver(uri, auth=auth) as driver:
async with driver.session() as session:
result = await session.execute_read(
lambda tx: tx.run("MATCH (n) RETURN n").data()
)Async Features#
AsyncDriver,AsyncSession,AsyncTransactionAsyncResultwith async iteration- Compatible with asyncio, FastAPI, aiohttp
- Shares non-I/O components with sync implementation
Connection Pooling#
Configuration Options#
driver = GraphDatabase.driver(
uri, auth=auth,
max_connection_pool_size=100, # Max connections per host
connection_acquisition_timeout=60, # Seconds to wait for connection
connection_timeout=30, # TCP connection timeout
max_connection_lifetime=3600, # Max age of pooled connection
liveness_check_timeout=60 # Idle check threshold
)Best Practices#
- Create one driver instance per application
- Driver objects are expensive to create (connection pool setup)
- Sessions are lightweight - create/close as needed
- Use context managers for automatic resource cleanup
Transaction Support#
Transaction Types#
Auto-commit: Single statement, no retry
session.run("CREATE (n:Node)")Managed Transactions: Recommended - includes retry logic
session.execute_read(work_function) session.execute_write(work_function)Explicit Transactions: Manual control
tx = session.begin_transaction() try: tx.run(query) tx.commit() except: tx.rollback()
Causal Consistency#
# Bookmark management for causal chains
with driver.session(bookmarks=[bookmark]) as session:
session.execute_write(work)
new_bookmark = session.last_bookmark()Error Handling#
from neo4j.exceptions import (
ServiceUnavailable,
TransientError,
ClientError,
DatabaseError
)
try:
driver.execute_query(query)
except ServiceUnavailable:
# Connection/cluster issues
except TransientError:
# Retryable errors (deadlock, etc.)
except ClientError:
# Query syntax, constraint violationsType System#
Neo4j to Python Type Mapping#
| Neo4j Type | Python Type |
|---|---|
| Integer | int |
| Float | float |
| String | str |
| Boolean | bool |
| List | list |
| Map | dict |
| Node | neo4j.graph.Node |
| Relationship | neo4j.graph.Relationship |
| Path | neo4j.graph.Path |
| Date | datetime.date |
| DateTime | datetime.datetime |
| Duration | neo4j.time.Duration |
| Point | neo4j.spatial.Point |
Performance Optimization#
Rust Extensions#
pip install neo4j-rust-ext- Drop-in replacement for default transport
- Significant performance improvement for I/O-heavy workloads
- No code changes required
Bulk Operations#
# Batch create with UNWIND
session.execute_write(lambda tx: tx.run(
"UNWIND $batch AS row CREATE (n:Node {id: row.id, name: row.name})",
batch=[{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]
))Testing#
# Use testcontainers for integration tests
from testcontainers.neo4j import Neo4jContainer
with Neo4jContainer() as neo4j:
driver = GraphDatabase.driver(
neo4j.get_connection_url(),
auth=("neo4j", neo4j.NEO4J_ADMIN_PASSWORD)
)Limitations#
- No built-in ORM/OGM (use neomodel for that)
- Cypher-only (no Gremlin support)
- Manual schema management required
- No migration tooling included
When to Use#
Choose neo4j-driver when:
- Direct control over queries and transactions is needed
- Performance is critical (with Rust extensions)
- Building custom abstractions on top
- Async support is required
Consider alternatives when:
- OGM patterns would simplify development (use neomodel)
- Multi-database portability is needed (use gremlinpython)
Resources#
neomodel - Neo4j Object Graph Mapper#
Overview#
neomodel is a Python Object Graph Mapper (OGM) for Neo4j that provides Django-style model definitions for graph data. It allows developers to work with graph data using Pythonic patterns without writing raw Cypher queries.
Key Information#
| Attribute | Value |
|---|---|
| Package | neomodel |
| Version | 6.0.x |
| Python Support | 3.8+ |
| Protocol | Bolt (via neo4j-driver) |
| License | MIT |
| Repository | github.com/neo4j-contrib/neomodel |
| Status | Neo4j Labs (actively maintained) |
Installation#
pip install neomodel
# With extras (includes Shapely for spatial data)
pip install neomodel[extras]
# With Rust driver extensions for performance
pip install neomodel[rust-driver-ext]Configuration#
from neomodel import config
# Connection string
config.DATABASE_URL = 'bolt://neo4j:password@localhost:7687'
# Or using dataclass configuration (v6.0+)
from neomodel import NeomodelConfig
config = NeomodelConfig(
driver_options={"max_connection_pool_size": 50},
database="neo4j",
auto_install_labels=True
)Environment Variables#
NEO4J_BOLT_URL=bolt://neo4j:password@localhost:7687
NEO4J_DATABASE=neo4jModel Definition#
Basic Node Definition#
from neomodel import (
StructuredNode, StringProperty, IntegerProperty,
UniqueIdProperty, RelationshipTo, RelationshipFrom
)
class Person(StructuredNode):
uid = UniqueIdProperty()
name = StringProperty(required=True)
age = IntegerProperty(index=True)
# Relationships
friends = RelationshipTo('Person', 'FRIENDS_WITH')
employer = RelationshipTo('Company', 'WORKS_AT')Property Types#
from neomodel import (
StringProperty, # String values
IntegerProperty, # Integer values
FloatProperty, # Floating point
BooleanProperty, # Boolean
DateProperty, # datetime.date
DateTimeProperty, # datetime.datetime
UniqueIdProperty, # Auto-generated UUID
ArrayProperty, # Lists
JSONProperty, # JSON-serializable dicts
PointProperty, # Spatial data
)Relationship Properties#
from neomodel import StructuredRel, DateTimeProperty
class WorkedAt(StructuredRel):
start_date = DateTimeProperty()
end_date = DateTimeProperty()
role = StringProperty()
class Person(StructuredNode):
name = StringProperty()
employers = RelationshipTo('Company', 'WORKED_AT', model=WorkedAt)CRUD Operations#
Create#
# Create single node
person = Person(name="Alice", age=30).save()
# Create with relationships
company = Company(name="Acme").save()
person.employer.connect(company)Read#
# Get by property
alice = Person.nodes.get(name="Alice")
# Filter nodes
adults = Person.nodes.filter(age__gte=18)
# All nodes
all_people = Person.nodes.all()
# First match
first_person = Person.nodes.first()Update#
person = Person.nodes.get(name="Alice")
person.age = 31
person.save()Delete#
person = Person.nodes.get(name="Alice")
person.delete()Query API#
Filtering#
# Comparison operators
Person.nodes.filter(age__gt=25) # Greater than
Person.nodes.filter(age__gte=25) # Greater or equal
Person.nodes.filter(age__lt=25) # Less than
Person.nodes.filter(age__lte=25) # Less or equal
Person.nodes.filter(name__ne="Bob") # Not equal
# String operators
Person.nodes.filter(name__contains="ali")
Person.nodes.filter(name__startswith="A")
Person.nodes.filter(name__endswith="ce")
Person.nodes.filter(name__icontains="ALI") # Case insensitive
# List operations
Person.nodes.filter(name__in=["Alice", "Bob"])Traversal (v6.0+)#
# Advanced traversal with filtering and ordering
results = Person.nodes.filter(name="Alice").traverse(
relation_type="FRIENDS_WITH",
filter_expr={"age__gte": 18},
order_by="age"
)Raw Cypher#
from neomodel import db
results, meta = db.cypher_query(
"MATCH (p:Person) WHERE p.age > $age RETURN p",
{"age": 25}
)Async Support#
from neomodel import adb, AsyncStructuredNode
class Person(AsyncStructuredNode):
name = StringProperty()
async def main():
# Async operations
person = await Person(name="Alice").save()
alice = await Person.nodes.get(name="Alice")
await person.delete()
# Async traversal
friends = await alice.friends.all()Async Configuration#
from neomodel import adb
await adb.set_connection("bolt://localhost:7687")Schema Management#
Constraints and Indexes#
from neomodel import install_all_labels, install_labels
# Install all constraints and indexes
install_all_labels()
# Install for specific models
install_labels(Person)Schema Definition#
class Person(StructuredNode):
# Unique constraint
email = StringProperty(unique_index=True)
# Index only
name = StringProperty(index=True)
# Required (not null)
created = DateTimeProperty(required=True)Hooks#
class Person(StructuredNode):
name = StringProperty()
def pre_save(self):
# Called before saving
self.name = self.name.strip()
def post_save(self):
# Called after saving
log.info(f"Saved {self.name}")
def pre_delete(self):
# Called before deletion
pass
def post_delete(self):
# Called after deletion
passTransaction Support#
from neomodel import db
# Context manager
with db.transaction:
person = Person(name="Alice").save()
company = Company(name="Acme").save()
person.employer.connect(company)
# Explicit control
db.begin()
try:
person = Person(name="Alice").save()
db.commit()
except:
db.rollback()
raiseDjango Integration#
# settings.py
NEOMODEL_NEO4J_BOLT_URL = 'bolt://neo4j:password@localhost:7687'
# models.py
from django_neomodel import DjangoNode
from neomodel import StringProperty
class Person(DjangoNode):
name = StringProperty()
class Meta:
app_label = 'myapp'Vector and Full-Text Search (v6.0+)#
from neomodel import VectorIndex, FullTextIndex
class Document(StructuredNode):
content = StringProperty()
embedding = ArrayProperty()
# Vector index for semantic search
__vector_index__ = VectorIndex(
property_name='embedding',
dimensions=384
)
# Full-text index
__fulltext_index__ = FullTextIndex(
property_names=['content']
)Performance Considerations#
Batch Operations#
# Use batch_save for bulk inserts
from neomodel import db
with db.transaction:
for data in large_dataset:
Person(name=data['name']).save()Connection Pooling#
Inherited from neo4j-driver configuration - set via driver_options in config.
Limitations#
- Neo4j-specific (no multi-database portability)
- No automatic migration tooling (schema drift possible)
- OGM overhead vs. raw Cypher
- Complex traversals may require raw Cypher
When to Use#
Choose neomodel when:
- Django-like model patterns preferred
- Type safety and validation important
- Schema enforcement needed
- Working primarily with Neo4j
Consider alternatives when:
- Maximum performance required (use neo4j-driver)
- Multi-database support needed (use gremlinpython)
- Complex graph algorithms (use raw Cypher)
Resources#
py2neo (End of Life)#
Status: Archived#
IMPORTANT: py2neo is End of Life (EOL) as of 2023. No further updates will be released. Users should migrate to the official Neo4j Python driver.
The project has been transferred to Neo4j Inc. for archival purposes at neo4j-contrib/py2neo.
Overview#
py2neo was a comprehensive Neo4j client library and toolkit providing a high-level API, OGM capabilities, admin tools, and a Cypher lexer for Pygments. It supported both Bolt and HTTP protocols.
Key Information#
| Attribute | Value |
|---|---|
| Package | py2neo |
| Final Version | 2021.2 |
| Python Support | 3.x |
| Protocols | Bolt, HTTP |
| License | Apache 2.0 |
| Repository | github.com/neo4j-contrib/py2neo (archived) |
Historical Features#
Graph Object API#
from py2neo import Graph, Node, Relationship
# Connect to database
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
# Create nodes and relationships
alice = Node("Person", name="Alice")
bob = Node("Person", name="Bob")
knows = Relationship(alice, "KNOWS", bob)
# Merge to database
graph.merge(alice, "Person", "name")OGM Capabilities#
from py2neo.ogm import GraphObject, Property, RelatedTo
class Person(GraphObject):
__primarykey__ = "name"
name = Property()
born = Property()
friends = RelatedTo("Person", "KNOWS")
# Usage
person = Person()
person.name = "Alice"
graph.push(person)Cypher Execution#
# Direct Cypher queries
results = graph.run(
"MATCH (p:Person {name: $name}) RETURN p",
name="Alice"
)
for record in results:
print(record["p"])Batch Operations#
from py2neo import Graph
tx = graph.begin()
for i in range(1000):
tx.create(Node("Item", id=i))
if i % 100 == 0:
tx.commit()
tx = graph.begin()
tx.commit()Migration Path#
Recommended Migration: neo4j-driver#
For low-level access, migrate to the official Neo4j Python driver:
# py2neo (old)
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
result = graph.run("MATCH (n) RETURN n")
# neo4j-driver (new)
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
result = session.run("MATCH (n) RETURN n")Recommended Migration: neomodel#
For OGM functionality, migrate to neomodel:
# py2neo OGM (old)
from py2neo.ogm import GraphObject, Property
class Person(GraphObject):
name = Property()
# neomodel (new)
from neomodel import StructuredNode, StringProperty
class Person(StructuredNode):
name = StringProperty()Why py2neo Was Deprecated#
- Maintenance Burden: Single maintainer model not sustainable
- Official Driver Improvements: Neo4j’s official driver matured significantly
- Community Fragmentation: Multiple overlapping libraries caused confusion
- Compatibility Challenges: Keeping up with Neo4j versions became difficult
Historical Strengths#
- Clean, Pythonic API
- Built-in OGM functionality
- Cypher lexer for syntax highlighting
- HTTP fallback when Bolt unavailable
- Comprehensive documentation
Historical Limitations#
- Single maintainer created bus factor risk
- Calendar versioning led to breaking changes
- No async support
- Performance overhead vs. official driver
- Infrequent updates in later years
Lessons for Library Selection#
The py2neo deprecation offers lessons for evaluating graph database libraries:
- Prefer Official Drivers: Better long-term support guarantees
- Check Maintainer Count: Multiple maintainers reduce abandonment risk
- Evaluate Release Frequency: Regular releases indicate active maintenance
- Consider Corporate Backing: Libraries backed by database vendors more stable
Current Alternatives#
| Need | Recommended Library |
|---|---|
| Low-level access | neo4j-driver |
| OGM functionality | neomodel |
| Async support | neo4j-driver (async) |
| Bulk operations | neo4j-driver + UNWIND |
Resources (Archival)#
pydgraph - Dgraph Python Client#
Overview#
pydgraph is the official Python client for Dgraph, a distributed, horizontally scalable graph database. It uses gRPC for high-performance communication and supports Dgraph’s GraphQL-like query language (DQL, formerly GraphQL+-).
Key Information#
| Attribute | Value |
|---|---|
| Package | pydgraph |
| Version | 24.x / 25.x |
| Python Support | 3.7+ |
| Protocol | gRPC |
| License | Apache 2.0 |
| Repository | github.com/hypermodeinc/pydgraph |
Version Compatibility#
| Dgraph Version | pydgraph Version |
|---|---|
| 21.03.x | 21.03.x |
| 23.0.x | 23.0.x |
| 24.0.x | 24.0.x |
| 25.0.x | 25.0.x |
Installation#
pip install pydgraphConnection Management#
Basic Connection#
import pydgraph
# Create client stub
stub = pydgraph.DgraphClientStub('localhost:9080')
# Create client
client = pydgraph.DgraphClient(stub)
# Close when done
stub.close()Multiple Stubs (Cluster)#
# Connect to multiple cluster nodes
stub1 = pydgraph.DgraphClientStub('node1:9080')
stub2 = pydgraph.DgraphClientStub('node2:9080')
stub3 = pydgraph.DgraphClientStub('node3:9080')
client = pydgraph.DgraphClient(stub1, stub2, stub3)Connection Strings#
# Using connection string
client = pydgraph.DgraphClient.from_cloud(
"dgraph://user:pass@host:9080?tls=true"
)
# Dgraph Cloud
client = pydgraph.DgraphClient.from_cloud(
"https://your-instance.cloud.dgraph.io/graphql",
api_key="your-api-key"
)TLS Configuration#
import grpc
# Load credentials
with open('ca.crt', 'rb') as f:
ca_cert = f.read()
credentials = grpc.ssl_channel_credentials(ca_cert)
stub = pydgraph.DgraphClientStub(
'localhost:9080',
credentials=credentials
)Schema Management#
Alter Schema#
schema = """
name: string @index(exact) .
age: int @index(int) .
email: string @index(hash) @upsert .
type Person {
name
age
email
friends
}
"""
client.alter(pydgraph.Operation(schema=schema))Drop Operations#
# Drop all data and schema
client.alter(pydgraph.Operation(drop_all=True))
# Drop specific predicate
client.alter(pydgraph.Operation(drop_attr='name'))
# Drop specific type
client.alter(pydgraph.Operation(drop_op=pydgraph.Operation.TYPE, drop_value='Person'))Transaction Types#
Read-Write Transaction#
txn = client.txn()
try:
# Mutations and queries
response = txn.mutate(set_nquads='_:alice <name> "Alice" .')
txn.commit()
finally:
txn.discard()Read-Only Transaction#
txn = client.txn(read_only=True)
try:
response = txn.query(query_string)
finally:
txn.discard()Best-Effort Transaction#
# For stale reads (better performance)
txn = client.txn(read_only=True, best_effort=True)Mutations#
JSON Mutations#
import json
txn = client.txn()
try:
data = {
'uid': '_:alice',
'dgraph.type': 'Person',
'name': 'Alice',
'age': 30,
'friends': [
{'uid': '_:bob', 'dgraph.type': 'Person', 'name': 'Bob'}
]
}
response = txn.mutate(set_obj=data)
# Get assigned UIDs
alice_uid = response.uids['alice']
bob_uid = response.uids['bob']
txn.commit()
finally:
txn.discard()N-Quads Mutations#
txn = client.txn()
try:
nquads = """
_:alice <dgraph.type> "Person" .
_:alice <name> "Alice" .
_:alice <age> "30"^^<xs:int> .
"""
response = txn.mutate(set_nquads=nquads)
txn.commit()
finally:
txn.discard()Delete Mutations#
txn = client.txn()
try:
# Delete specific predicate
txn.mutate(del_nquads=f'<{uid}> <name> * .')
# Delete node completely
txn.mutate(del_obj={'uid': uid})
txn.commit()
finally:
txn.discard()Queries (DQL)#
Basic Query#
query = """
{
people(func: type(Person)) {
uid
name
age
friends {
name
}
}
}
"""
response = client.txn(read_only=True).query(query)
result = json.loads(response.json)
people = result['people']Parameterized Query#
query = """
query findPerson($name: string) {
person(func: eq(name, $name)) {
uid
name
age
}
}
"""
variables = {'$name': 'Alice'}
response = client.txn(read_only=True).query(query, variables=variables)Aggregation Queries#
query = """
{
stats(func: type(Person)) {
count: count(uid)
avgAge: avg(age)
minAge: min(age)
maxAge: max(age)
}
}
"""Upsert Operations#
Basic Upsert#
query = """
query {
user as var(func: eq(email, "[email protected]"))
}
"""
mutation = """
uid(user) <name> "Alice" .
uid(user) <email> "[email protected]" .
"""
txn = client.txn()
try:
request = txn.create_request(
query=query,
mutations=[pydgraph.Mutation(set_nquads=mutation)]
)
response = txn.do_request(request)
txn.commit()
finally:
txn.discard()Conditional Upsert#
query = """
query {
user as var(func: eq(email, "[email protected]"))
}
"""
# Only mutate if user doesn't exist
mutation = pydgraph.Mutation(
set_nquads='uid(user) <name> "Alice" .',
cond='@if(eq(len(user), 0))'
)Async Operations#
pydgraph provides async variants using gRPC futures:
async_alter#
future = client.async_alter(pydgraph.Operation(schema=schema))
# Handle result
try:
result = pydgraph.DgraphClient.handle_alter_future(future)
except Exception as e:
if pydgraph.util.is_jwt_expired(e):
# Refresh token and retry
passasync_query and async_mutation#
txn = client.txn()
# Async query
query_future = txn.async_query(query_string)
result = pydgraph.Txn.handle_query_future(query_future)
# Async mutation
mutation_future = txn.async_mutation(set_obj=data)
result = pydgraph.Txn.handle_mutate_future(mutation_future)Note: Async methods use gRPC futures, not native Python asyncio. They cannot retry on JWT expiration.
ACL and Authentication#
Login#
# Login with credentials
client.login("groot", "password")
# Login with namespace (multi-tenancy)
client.login_into_namespace("user", "password", namespace=1)Refresh Token#
# Tokens expire - refresh periodically
client.retry_login()Error Handling#
import grpc
try:
response = txn.mutate(set_obj=data)
txn.commit()
except grpc.RpcError as e:
if e.code() == grpc.StatusCode.ABORTED:
# Transaction conflict - retry
pass
elif e.code() == grpc.StatusCode.UNAUTHENTICATED:
# JWT expired
client.retry_login()
except pydgraph.errors.TransactionError as e:
# Transaction-specific error
pass
finally:
txn.discard()Performance Considerations#
Batch Mutations#
# Accumulate mutations, commit in batches
txn = client.txn()
batch = []
for item in large_dataset:
batch.append(item)
if len(batch) >= 1000:
txn.mutate(set_obj=batch)
batch = []
if batch:
txn.mutate(set_obj=batch)
txn.commit()Connection Reuse#
# Reuse client and stubs across requests
# Create once at application startupLimitations#
- gRPC futures not native asyncio
- No connection pooling (manage stubs manually)
- No OGM layer included
- DQL learning curve (different from Cypher/Gremlin)
- Limited IDE support for DQL
- No built-in migration tooling
When to Use#
Choose pydgraph when:
- Distributed, horizontally scalable graph needed
- GraphQL-native development preferred
- Multi-tenancy (namespaces) required
- Integration with Dgraph Cloud
- High-write throughput scenarios
Consider alternatives when:
- Native asyncio critical
- OGM patterns preferred
- Cypher or Gremlin expertise exists
- Smaller scale deployments
Resources#
python-arango - ArangoDB Python Driver#
Overview#
python-arango is the official Python driver for ArangoDB, providing comprehensive access to ArangoDB’s multi-model capabilities including document, graph, and key-value operations. It offers a Pythonic interface to ArangoDB’s REST API.
Key Information#
| Attribute | Value |
|---|---|
| Package | python-arango |
| Version | 8.2.x |
| Python Support | 3.9+ |
| ArangoDB Support | 3.11+ |
| Protocol | HTTP REST |
| License | MIT |
| Repository | github.com/arangodb/python-arango |
Installation#
pip install python-arango
# For async support (separate package)
pip install python-arango-asyncConnection Management#
Basic Connection#
from arango import ArangoClient
# Initialize client
client = ArangoClient(hosts="http://localhost:8529")
# Connect to database
db = client.db("mydb", username="root", password="password")
# System database for admin operations
sys_db = client.db("_system", username="root", password="password")Connection Options#
client = ArangoClient(
hosts="http://localhost:8529",
http_client=None, # Custom HTTP client
serializer=None, # Custom JSON serializer
deserializer=None # Custom JSON deserializer
)Document Operations#
Basic CRUD#
# Get collection
collection = db.collection("users")
# Insert
metadata = collection.insert({"name": "Alice", "age": 30})
# Returns: {'_id': 'users/12345', '_key': '12345', '_rev': '_abc123'}
# Get by key
doc = collection.get("12345")
# Update
collection.update({"_key": "12345", "age": 31})
# Replace
collection.replace({"_key": "12345", "name": "Alice", "age": 31})
# Delete
collection.delete("12345")Batch Operations#
# Batch insert
docs = [{"name": f"User{i}"} for i in range(1000)]
results = collection.insert_many(docs)
# Batch update
updates = [{"_key": key, "status": "active"} for key in keys]
collection.update_many(updates)
# Batch delete
collection.delete_many([{"_key": k} for k in keys])AQL Queries#
Query Execution#
# Simple query
cursor = db.aql.execute(
"FOR doc IN users FILTER doc.age > @min_age RETURN doc",
bind_vars={"min_age": 25}
)
# Iterate results
for doc in cursor:
print(doc)
# Get all results as list
results = cursor.batch()Query Options#
cursor = db.aql.execute(
query,
bind_vars={"param": value},
count=True, # Include count
batch_size=100, # Results per batch
ttl=3600, # Cursor TTL in seconds
max_runtime=30.0, # Max execution time
profile=True # Include query profile
)
# Access statistics
print(cursor.statistics())
print(cursor.profile())Graph Operations#
Graph Management#
# Create graph
graph = db.create_graph(
"social",
edge_definitions=[{
"edge_collection": "knows",
"from_vertex_collections": ["users"],
"to_vertex_collections": ["users"]
}]
)
# Get existing graph
graph = db.graph("social")Vertex Operations#
# Get vertex collection
users = graph.vertex_collection("users")
# Insert vertex
users.insert({"_key": "alice", "name": "Alice"})
# Get vertex
alice = users.get("alice")
# Update vertex
users.update({"_key": "alice", "age": 30})Edge Operations#
# Get edge collection
knows = graph.edge_collection("knows")
# Insert edge
knows.insert({
"_from": "users/alice",
"_to": "users/bob",
"since": 2020
})
# Traverse graph
result = graph.traverse(
start_vertex="users/alice",
direction="outbound",
max_depth=2
)Async Support#
python-arango-async (Separate Package)#
from arangoasync import ArangoClient
from arangoasync.auth import Auth
async with ArangoClient(hosts="http://localhost:8529") as client:
auth = Auth(username="root", password="password")
db = await client.db("mydb", auth=auth)
# Async operations
collection = db.collection("users")
await collection.insert({"name": "Alice"})
cursor = await db.aql.execute("FOR doc IN users RETURN doc")
async for doc in cursor:
print(doc)Fire-and-Forget Async (python-arango)#
# Create async execution context
async_db = db.begin_async_execution(return_result=True)
# Queue operations
job1 = async_db.collection("users").insert({"name": "Alice"})
job2 = async_db.collection("users").insert({"name": "Bob"})
# Check job status
print(job1.status()) # 'pending', 'done', or 'error'
# Get results when ready
result1 = job1.result()Transaction Support#
Single Request Transactions#
# Stream transaction
txn = db.begin_transaction(
read=["users"],
write=["orders"]
)
try:
txn.collection("users").insert({"name": "Alice"})
txn.collection("orders").insert({"item": "Book"})
txn.commit()
except:
txn.abort()
raiseJavaScript Transactions#
# Server-side transaction with JavaScript
result = db.transaction(
read=["users"],
write=["orders"],
action="""
function(params) {
const db = require('@arangodb').db;
const user = db.users.insert({name: params.name});
return user;
}
""",
params={"name": "Alice"}
)Index Management#
# Create persistent index
collection.add_persistent_index(
fields=["name", "email"],
unique=True,
sparse=False
)
# Create geo index
collection.add_geo_index(
fields=["location"],
geo_json=True
)
# Create fulltext index (deprecated, use ArangoSearch)
collection.add_fulltext_index(
fields=["description"],
min_length=3
)
# List indexes
for index in collection.indexes():
print(index)ArangoSearch Views#
# Create search view
db.create_arangosearch_view(
name="users_view",
properties={
"links": {
"users": {
"analyzers": ["text_en"],
"fields": {
"name": {},
"bio": {"analyzers": ["text_en"]}
}
}
}
}
)
# Search query
cursor = db.aql.execute("""
FOR doc IN users_view
SEARCH ANALYZER(doc.bio == "developer", "text_en")
RETURN doc
""")Error Handling#
from arango.exceptions import (
ArangoError,
DocumentInsertError,
DocumentGetError,
AQLQueryExecuteError,
TransactionAbortError
)
try:
collection.insert({"_key": "duplicate"})
except DocumentInsertError as e:
print(f"Error code: {e.error_code}")
print(f"HTTP status: {e.http_code}")
print(f"Message: {e.error_message}")
except ArangoError as e:
# Generic error handling
passFoxx Microservices#
# Install Foxx service
db.foxx.install(
mount="/myapp",
source="https://github.com/user/foxx-service/archive/main.zip"
)
# Call Foxx endpoint
response = db.foxx.request(
method="POST",
mount="/myapp",
path="/api/endpoint",
data={"param": "value"}
)Cluster Support#
# Cluster health
health = sys_db.cluster.server_health()
# Cluster statistics
stats = sys_db.cluster.statistics()
# Rebalance shards
sys_db.cluster.rebalance_shards()Limitations#
- No native Python asyncio in main package (use python-arango-async)
- No OGM layer (document-centric design)
- HTTP protocol only (no binary protocol)
- Fire-and-forget async differs from true async
When to Use#
Choose python-arango when:
- Multi-model database needed (document + graph + key-value)
- AQL query language preferred
- Microservice architecture (Foxx)
- Horizontal scaling required
Consider alternatives when:
- Pure graph database needed (use Neo4j)
- Native asyncio critical (use python-arango-async)
- Gremlin compatibility needed (use gremlinpython)
Resources#
pyTigerGraph - TigerGraph Python Client#
Overview#
pyTigerGraph is the official Python package for interacting with TigerGraph databases. It provides comprehensive access to TigerGraph’s graph analytics and machine learning capabilities, with special emphasis on Graph Data Science (GDS) workflows.
Key Information#
| Attribute | Value |
|---|---|
| Package | pyTigerGraph |
| Version | 1.6.x |
| Python Support | 3.7+ |
| Protocol | REST API |
| License | Apache 2.0 |
| Repository | github.com/tigergraph/pyTigerGraph |
Installation#
# Core package
pip install pyTigerGraph
# With Graph Data Science features
pip install 'pyTigerGraph[gds]'Connection Management#
Basic Connection#
import pyTigerGraph as tg
conn = tg.TigerGraphConnection(
host="https://your-instance.i.tgcloud.io",
graphname="MyGraph",
username="tigergraph",
password="password"
)
# Generate API token
conn.getToken(conn.createSecret())TigerGraph Cloud#
conn = tg.TigerGraphConnection(
host="https://your-instance.i.tgcloud.io",
graphname="MyGraph",
apiToken="your-api-token"
)Async Connection#
from pyTigerGraph import AsyncTigerGraphConnection
async_conn = AsyncTigerGraphConnection(
host="https://your-instance.i.tgcloud.io",
graphname="MyGraph",
username="tigergraph",
password="password"
)Schema Management#
Object-Oriented Schema (v1.5+)#
# Define vertex types
person = conn.gds.vertexType("Person", [
("name", "STRING"),
("age", "INT"),
("email", "STRING")
])
# Define edge types
knows = conn.gds.edgeType("KNOWS",
from_vertex="Person",
to_vertex="Person",
attributes=[
("since", "DATETIME"),
("strength", "FLOAT")
]
)
# Create graph from schema
conn.gds.createGraph("SocialNetwork", [person], [knows])GSQL Schema Definition#
# Execute GSQL directly
conn.gsql("""
CREATE VERTEX Person (
PRIMARY_ID id STRING,
name STRING,
age INT
)
CREATE DIRECTED EDGE KNOWS (
FROM Person,
TO Person,
since DATETIME
)
""")Data Operations#
Vertex Operations#
# Upsert vertex (insert or update)
conn.upsertVertex(
vertexType="Person",
vertexId="alice",
attributes={"name": "Alice", "age": 30}
)
# Get vertex
vertex = conn.getVertices(
vertexType="Person",
vertexIds=["alice"]
)
# Delete vertex
conn.delVertices(
vertexType="Person",
vertexId="alice"
)Edge Operations#
# Upsert edge
conn.upsertEdge(
sourceVertexType="Person",
sourceVertexId="alice",
edgeType="KNOWS",
targetVertexType="Person",
targetVertexId="bob",
attributes={"since": "2020-01-01"}
)
# Get edges
edges = conn.getEdges(
sourceVertexType="Person",
sourceVertexId="alice",
edgeType="KNOWS"
)Bulk Operations#
# Bulk upsert vertices
vertices = [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
{"name": "Charlie", "age": 35}
]
conn.upsertVertices("Person", vertices)
# Bulk upsert edges
edges = [
{
"from_id": "alice",
"to_id": "bob",
"attributes": {"since": "2020-01-01"}
}
]
conn.upsertEdges("Person", "KNOWS", "Person", edges)GSQL Queries#
Installed Queries#
# Install query
conn.gsql("""
CREATE QUERY findFriends(VERTEX<Person> p) FOR GRAPH MyGraph {
Start = {p};
Friends = SELECT t
FROM Start:s -(KNOWS)-> Person:t;
PRINT Friends;
}
INSTALL QUERY findFriends
""")
# Run installed query
result = conn.runInstalledQuery("findFriends", {"p": "alice"})
# Async query execution
job_id = conn.runInstalledQuery("longQuery", params={}, runAsync=True)
status = conn.checkQueryStatus([job_id])Interpreted Queries#
# Run query without installing
result = conn.gsql("""
INTERPRET QUERY () FOR GRAPH MyGraph {
Persons = SELECT p FROM Person:p;
PRINT Persons;
}
""")Query Metadata#
# Get query information
metadata = conn.getQueryMetadata("findFriends")
# List running queries
running = conn.getRunningQueries()
# Abort query
conn.abortQuery(["query_id_1", "query_id_2"])Graph Data Science#
Feature Engineering (with GDS package)#
# Install GDS algorithms
conn.gds.featurizer.installAlgorithm("pagerank")
# Run PageRank
result = conn.gds.featurizer.runAlgorithm(
"pagerank",
params={"v_type": "Person", "e_type": "KNOWS"}
)
# Community detection
result = conn.gds.featurizer.runAlgorithm(
"louvain",
params={"v_type": "Person", "e_type": "KNOWS"}
)Graph Neural Networks#
# PyTorch Geometric data loader
from torch_geometric.loader import NeighborLoader
# Create graph data
data = conn.gds.getVertexDataFrame("Person")
# Vertex feature extraction
features = conn.gds.featurizer.extractVertexFeatures(
v_type="Person",
attributes=["age", "degree"]
)
# Edge feature extraction
edge_features = conn.gds.featurizer.extractEdgeFeatures(
e_type="KNOWS",
attributes=["strength"]
)Train/Test Split#
# Split vertices for ML
conn.gds.vertexSplitter(
v_types=["Person"],
train_fraction=0.8,
validate_fraction=0.1,
test_fraction=0.1
)Error Handling#
from pyTigerGraph.pyTigerGraphException import TigerGraphException
try:
result = conn.runInstalledQuery("nonexistent")
except TigerGraphException as e:
print(f"Error: {e}")Authentication#
Token Management#
# Create secret
secret = conn.createSecret()
# Get token with lifetime
token = conn.getToken(secret, lifetime=86400) # 24 hours
# Refresh token
new_token = conn.refreshToken(secret)Role-Based Access#
# With RBAC
conn = tg.TigerGraphConnection(
host="https://your-instance.i.tgcloud.io",
graphname="MyGraph",
username="analyst",
password="password"
)Performance Considerations#
Caveats#
From official documentation:
“pyTigerGraph may perform slower than direct HTTP requests to the TigerGraph REST API due to its feature-rich abstraction layer adding URL setup, logging, authentication, and validation.”
Optimization Tips#
# Use bulk operations for large datasets
conn.upsertVertices("Person", large_list, atomic=False)
# Disable unnecessary logging
import logging
logging.getLogger("pyTigerGraph").setLevel(logging.WARNING)
# Use async for long-running queries
job_id = conn.runInstalledQuery("heavyQuery", runAsync=True)Limitations#
- REST-only protocol (higher latency than binary protocols)
- Performance overhead from abstraction layer
- GSQL learning curve for complex queries
- Less mature ecosystem than Neo4j/ArangoDB
- GDS features require additional package
When to Use#
Choose pyTigerGraph when:
- Graph analytics and ML are primary use cases
- Large-scale graph processing needed
- GSQL expertise available
- TigerGraph Cloud deployment
- Integration with PyTorch Geometric or DGL needed
Consider alternatives when:
- Simple CRUD operations primary use case
- Low latency critical (consider direct REST)
- Multi-database portability needed
- Smaller graphs with simpler requirements
Resources#
rdflib - RDF Graph Library for Python#
Overview#
rdflib is a pure Python library for working with RDF (Resource Description Framework) data. It provides comprehensive support for parsing, serializing, and querying RDF graphs using SPARQL, making it the standard choice for semantic web and linked data applications in Python.
Key Information#
| Attribute | Value |
|---|---|
| Package | rdflib |
| Version | 7.2.x |
| Python Support | 3.8+ |
| Query Language | SPARQL 1.1 |
| License | BSD-3-Clause |
| Repository | github.com/RDFLib/rdflib |
Installation#
pip install rdflib
# With optional dependencies
pip install rdflib[html,lxml]Core Concepts#
RDF Triples#
RDF data consists of triples: (subject, predicate, object)
from rdflib import Graph, Literal, URIRef, Namespace
from rdflib.namespace import RDF, FOAF, XSD
# Create graph
g = Graph()
# Define namespace
EX = Namespace("http://example.org/")
# Add triple
g.add((
EX.alice, # Subject
FOAF.name, # Predicate
Literal("Alice", datatype=XSD.string) # Object
))Node Types#
from rdflib import URIRef, Literal, BNode
# URI Reference (resources)
person = URIRef("http://example.org/alice")
# Literal (values)
name = Literal("Alice")
age = Literal(30, datatype=XSD.integer)
name_en = Literal("Alice", lang="en")
# Blank Node (anonymous)
address = BNode()Graph Operations#
Creating and Populating Graphs#
from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF, FOAF
g = Graph()
# Bind namespace prefix
g.bind("foaf", FOAF)
g.bind("ex", EX)
# Add triples
alice = URIRef("http://example.org/alice")
bob = URIRef("http://example.org/bob")
g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, bob))
g.add((bob, RDF.type, FOAF.Person))
g.add((bob, FOAF.name, Literal("Bob")))Querying Triples#
# All triples
for s, p, o in g:
print(s, p, o)
# Specific patterns
for person in g.subjects(RDF.type, FOAF.Person):
name = g.value(person, FOAF.name)
print(f"{person}: {name}")
# Check existence
if (alice, FOAF.knows, bob) in g:
print("Alice knows Bob")Removing Triples#
# Remove specific triple
g.remove((alice, FOAF.knows, bob))
# Remove by pattern (None = wildcard)
g.remove((alice, None, None)) # Remove all triples about AliceSPARQL Queries#
SELECT Queries#
query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?friend
WHERE {
?person a foaf:Person ;
foaf:name ?name ;
foaf:knows ?friendUri .
?friendUri foaf:name ?friend .
}
"""
for row in g.query(query):
print(f"{row.name} knows {row.friend}")ASK Queries#
query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
ASK {
?person foaf:name "Alice" .
}
"""
result = g.query(query)
print(bool(result)) # True or FalseCONSTRUCT Queries#
query = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
CONSTRUCT {
?person ex:displayName ?name .
}
WHERE {
?person foaf:name ?name .
}
"""
result_graph = g.query(query).graphParameterized Queries#
from rdflib.plugins.sparql import prepareQuery
query = prepareQuery("""
SELECT ?name
WHERE {
?person foaf:name ?name .
}
""", initNs={"foaf": FOAF})
# With initial bindings
results = g.query(
query,
initBindings={'person': alice}
)SPARQL Update#
update = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
<http://example.org/charlie> a foaf:Person ;
foaf:name "Charlie" .
}
"""
g.update(update)Serialization#
Parsing RDF#
# Parse from file
g.parse("data.ttl", format="turtle")
# Parse from URL
g.parse("http://example.org/data.rdf")
# Parse from string
g.parse(data=rdf_string, format="turtle")
# Supported formats
formats = ["xml", "turtle", "n3", "nt", "nquads", "trig", "json-ld"]Serializing RDF#
# Serialize to string
turtle = g.serialize(format="turtle")
jsonld = g.serialize(format="json-ld")
# Serialize to file
g.serialize("output.ttl", format="turtle")
# Available formats
# RDF/XML, N3, NTriples, N-Quads, Turtle, TriG, TriX, JSON-LD, HexTuplesPersistence#
In-Memory (Default)#
g = Graph() # Default in-memory storeBerkeley DB#
from rdflib import Graph
from rdflib.plugins.stores import BerkeleyDB
store = BerkeleyDB()
g = Graph(store, identifier="mygraph")
g.open("/path/to/store", create=True)
# Use graph...
g.close()SQLite (via rdflib-sqlalchemy)#
# pip install rdflib-sqlalchemy
from rdflib import Graph, ConjunctiveGraph
from rdflib_sqlalchemy import registerplugins
registerplugins()
g = Graph(store="SQLAlchemy", identifier="mygraph")
g.open("sqlite:///graph.db", create=True)Remote SPARQL Endpoints#
SPARQLWrapper#
# pip install sparqlwrapper
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?label
WHERE {
<http://dbpedia.org/resource/Python_(programming_language)>
rdfs:label ?label .
FILTER (lang(?label) = 'en')
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()Federated Queries#
query = """
SELECT ?name ?abstract
WHERE {
?person foaf:name ?name .
SERVICE <http://dbpedia.org/sparql> {
?dbperson rdfs:label ?name ;
dbo:abstract ?abstract .
FILTER (lang(?abstract) = 'en')
}
}
"""Named Graphs and Datasets#
Conjunctive Graph#
from rdflib import ConjunctiveGraph, URIRef
# Dataset with multiple named graphs
ds = ConjunctiveGraph()
# Add to specific graph
graph1 = URIRef("http://example.org/graph1")
ds.add((alice, FOAF.name, Literal("Alice"), graph1))
# Query across graphs
for ctx in ds.contexts():
print(f"Graph: {ctx.identifier}")Dataset#
from rdflib import Dataset
ds = Dataset()
g1 = ds.graph(URIRef("http://example.org/graph1"))
g1.add((alice, FOAF.name, Literal("Alice")))Custom SPARQL Functions#
from rdflib.plugins.sparql.operators import register_custom_function
from rdflib import Literal, URIRef
def custom_uppercase(value):
return Literal(str(value).upper())
# Register function
register_custom_function(
URIRef("http://example.org/uppercase"),
custom_uppercase
)
# Use in query
query = """
SELECT (ex:uppercase(?name) AS ?upper)
WHERE { ?person foaf:name ?name }
"""Async Support#
rdflib is synchronous. For async operations:
import asyncio
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
async def async_query(graph, query):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
executor,
lambda: list(graph.query(query))
)Inference and Reasoning#
RDFS Inference#
from rdflib import Graph
from rdflib.plugins.stores.memory import IOMemory
# Enable RDFS reasoning
from rdflib import RDF, RDFS
g = Graph()
g.parse("ontology.ttl")
# Manual inference example
for s, _, o in g.triples((None, RDFS.subClassOf, None)):
# Infer types based on subclass
passOWL-RL (via owlrl)#
# pip install owlrl
import owlrl
g = Graph()
g.parse("data.ttl")
# Apply OWL-RL reasoning
owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(g)Limitations#
- No native async/await support
- Memory-intensive for large graphs
- SPARQL performance varies by store
- Limited OWL reasoning (requires extensions)
- Not a graph database (in-memory or file-based)
When to Use#
Choose rdflib when:
- Working with RDF/semantic web data
- SPARQL queries required
- Linked data integration needed
- Ontology processing
- Standards compliance important (W3C RDF)
Consider alternatives when:
- Property graph model preferred (use Neo4j)
- High-performance database needed
- Native async required
- Large-scale graph analytics
Resources#
Graph Database Python Client Recommendations#
Executive Summary#
This document provides optimized recommendations for selecting Python client libraries for graph databases based on common use cases and technical requirements.
Primary Recommendations by Use Case#
1. General-Purpose Graph Applications#
Recommended: neo4j-driver + neomodel
| Component | Library | Rationale |
|---|---|---|
| Low-level access | neo4j-driver | Best-in-class async, connection pooling, Rust extensions |
| OGM layer | neomodel | Django-style models, validation, hooks |
Why this combination:
- Neo4j has the most mature Python ecosystem
- neo4j-driver provides native asyncio for modern applications
- neomodel adds productivity without sacrificing performance
- Excellent documentation and community support
2. Multi-Database Portability#
Recommended: gremlinpython (with aiogremlin for async)
Compatible databases: Amazon Neptune, Azure Cosmos DB, JanusGraph, and more
Why Gremlin:
- Standardized query language across vendors
- Reduces vendor lock-in risk
- Single codebase for multiple deployment targets
Caveats:
- Native gremlinpython is synchronous; use aiogremlin or goblin for async
- Database-specific features may not be accessible
- Connection pooling has known recovery issues
3. Multi-Model Requirements (Document + Graph + Key-Value)#
Recommended: python-arango
Why ArangoDB:
- Single database for multiple data models
- AQL is powerful and SQL-like
- Good Python driver quality
Async strategy: Use python-arango-async for true asyncio support
4. Semantic Web / RDF / Linked Data#
Recommended: rdflib + SPARQLWrapper
Why rdflib:
- De facto standard for Python RDF processing
- Full SPARQL 1.1 support
- Extensive serialization format support
Limitations:
- No native async (wrap with thread pools)
- Not suitable for large-scale production graphs (use graph databases with SPARQL endpoints)
5. High-Scale Graph Analytics and ML#
Recommended: pyTigerGraph[gds]
Why TigerGraph:
- Built-in graph data science algorithms
- Direct integration with PyTorch Geometric and DGL
- Distributed processing for large graphs
Caveats:
- GSQL learning curve
- Performance overhead in Python client
- Smaller community than Neo4j
6. Distributed/Horizontally Scalable Graphs#
Recommended: pydgraph
Why Dgraph:
- Native horizontal scaling
- GraphQL-native design
- gRPC for efficient communication
Caveats:
- Async uses gRPC futures, not native asyncio
- Smaller ecosystem than alternatives
- DQL query language unique to Dgraph
Decision Matrix#
| Requirement | Best Choice | Runner-up |
|---|---|---|
| Best overall Python experience | neo4j-driver | python-arango |
| OGM/Django-style models | neomodel | Goblin (Gremlin) |
| Native async/FastAPI | neo4j-driver | python-arango-async |
| Database portability | gremlinpython | - |
| Multi-model (doc+graph) | python-arango | - |
| Graph ML/Analytics | pyTigerGraph[gds] | - |
| Semantic web/RDF | rdflib | - |
| Horizontal scaling | pydgraph | TigerGraph |
| Cloud-native (AWS) | gremlinpython (Neptune) | - |
| Cloud-native (Azure) | gremlinpython (Cosmos) | - |
Libraries to Avoid#
py2neo (Deprecated)#
- End of Life - no further updates
- Migrate to neo4j-driver + neomodel
Framework Integration Recommendations#
FastAPI Applications#
# Recommended stack
neo4j-driver (AsyncDriver)
# OR
python-arango-asyncDjango Applications#
# Recommended stack
neomodel with django_neomodelData Science / Jupyter#
# Recommended stack
pyTigerGraph[gds] # For graph ML
# OR
rdflib # For RDF/semantic dataPerformance Optimization Tips#
Neo4j Stack#
- Install
neo4j-rust-extfor 20-40% performance improvement - Use
execute_query()for simple operations (avoids session overhead) - Configure connection pool based on concurrency needs
- Use UNWIND for bulk operations
ArangoDB Stack#
- Use batch methods (
insert_many,update_many) for bulk operations - Consider async driver for I/O-bound workloads
- Use ArangoSearch for full-text queries instead of AQL filters
Gremlin Stack#
- Prefer GraphBinary serialization over GraphSON
- Use prepared traversals for repeated queries
- Consider Goblin OGM for complex object mapping
Migration Considerations#
From py2neo to neo4j-driver#
- Replace
Graph.run()withsession.run()ordriver.execute_query() - Update transaction patterns to managed transactions
- Migrate OGM code to neomodel
From SQL to Graph#
- Start with neomodel for familiar ORM patterns
- Use Cypher for complex traversals
- Consider ArangoDB if joining existing document data
Conclusion#
For most Python graph database applications, the Neo4j ecosystem (neo4j-driver + neomodel) offers the best balance of:
- API quality and Pythonic design
- Native async support
- Documentation and community
- Performance (with Rust extensions)
- OGM productivity (neomodel)
For specialized requirements (multi-database portability, RDF/semantic web, graph ML, or horizontal scaling), select the specialized library that best matches the use case as outlined above.
S3: Need-Driven
S3 Need-Driven Discovery: Graph Database Client Libraries#
Methodology Overview#
This analysis evaluates Python graph database client libraries through a need-driven lens, matching library capabilities to real-world use case requirements rather than comparing features in isolation.
Analysis Framework#
1. Use Case Decomposition#
Each use case is analyzed across five dimensions:
| Dimension | Questions Addressed |
|---|---|
| Graph Model | Property graph vs RDF vs hypergraph? Schema flexibility needs? |
| Query Patterns | Traversal depth? Path finding? Aggregations? Pattern matching? |
| Scale Profile | Node/edge counts? Query concurrency? Growth trajectory? |
| Processing Mode | Real-time OLTP? Batch analytics? Hybrid? |
| Integration Context | REST APIs? Event streams? ETL pipelines? Existing stack? |
2. Library Capability Mapping#
For each use case, libraries are evaluated on:
- Native support: Does the library directly support required patterns?
- Performance characteristics: Latency, throughput, memory efficiency
- Developer experience: API ergonomics, documentation, debugging
- Operational maturity: Stability, community support, enterprise readiness
3. Gap Analysis#
Identifying where library capabilities fall short:
- Missing features requiring workarounds
- Performance limitations at scale
- Integration friction points
- Operational blind spots
Use Cases Analyzed#
| Use Case | Primary Pattern | Scale Profile | Processing Mode |
|---|---|---|---|
| Social Network | Traversal-heavy | High volume, real-time | OLTP |
| Knowledge Graph | Semantic queries | Medium volume, complex | Hybrid |
| Fraud Detection | Pattern matching | High throughput | Real-time + batch |
| Recommendation Engine | Collaborative filtering | Very high volume | Batch + real-time |
| Network Infrastructure | Topology analysis | Medium volume | OLTP + analytics |
| Supply Chain | Path optimization | Medium-high volume | Hybrid |
Evaluation Criteria#
Functional Fit (40%)#
- Query language expressiveness for use case patterns
- Data model alignment with domain requirements
- Built-in algorithms vs custom implementation needs
Performance Fit (30%)#
- Query latency for typical operations
- Throughput under concurrent load
- Memory efficiency for graph size
Operational Fit (20%)#
- Connection pooling and failover
- Monitoring and observability hooks
- Transaction management capabilities
Integration Fit (10%)#
- Async/await support
- Framework compatibility (FastAPI, Django, etc.)
- Data pipeline integration (Pandas, Apache Spark)
Libraries Under Evaluation#
| Library | Database | Graph Model | Maturity |
|---|---|---|---|
neo4j (official) | Neo4j | Property Graph | Production |
py2neo | Neo4j | Property Graph | Production |
python-arango | ArangoDB | Multi-model | Production |
pyTigerGraph | TigerGraph | Property Graph | Production |
gremlinpython | Various | Property Graph | Production |
rdflib | Various | RDF/Triple Store | Production |
NetworkX | In-memory | General | Production |
Deliverables#
- Per-use-case analysis: Detailed evaluation of library fit
- Recommendation matrix: Best-fit library by use case and constraint
- Gap documentation: Known limitations and workarounds
Recommendation Summary: Graph Database Client Libraries by Use Case#
Quick Reference Matrix#
| Use Case | Best Fit | Alternative | Scale Trigger |
|---|---|---|---|
| Social Network | neo4j | pyTigerGraph | > 100M users |
| Knowledge Graph | neo4j + neosemantics | rdflib (small) | > 1M triples |
| Fraud Detection | pyTigerGraph | neo4j | > 1B transactions |
| Recommendation Engine | neo4j | pyTigerGraph | > 100M users |
| Network Infrastructure | neo4j | python-arango | > 1M resources |
| Supply Chain | neo4j | pyTigerGraph | Global enterprise |
Library Recommendations by Priority#
1. neo4j (Official Driver) - Primary Choice#
Best for: Most graph use cases at moderate scale
Strengths across use cases:
- Cypher query language is most expressive for graph patterns
- GDS (Graph Data Science) library covers common algorithms
- Mature Python driver with async support
- Strong community and documentation
- Visualization tools (Bloom) for non-technical users
When to choose neo4j:
- Team has or can develop Cypher expertise
- Use case fits property graph model
- Scale < 1B edges
- Need graph algorithms (centrality, community, paths)
- Visualization is important
Installation:
uv pip install neo4j2. pyTigerGraph - Scale-First Choice#
Best for: High-volume fraud detection, massive recommendation systems
Strengths across use cases:
- Distributed architecture handles massive scale
- GSQL optimized for deep traversals
- Strong financial services and enterprise focus
- ML workbench for graph embeddings
When to choose pyTigerGraph:
- Scale exceeds 1B edges
- Deep traversals (5+ hops) are common
- Distributed processing required
- Enterprise budget available
- Financial/fraud detection primary use case
Installation:
uv pip install pyTigerGraph3. python-arango - Multi-Model Choice#
Best for: Knowledge graphs with complex documents, cost-sensitive deployments
Strengths across use cases:
- Combines document + graph in single database
- Good horizontal scaling
- Cost-effective (open source core)
- Flexible schema for evolving models
When to choose python-arango:
- Need document storage alongside graph
- Budget constraints on database licensing
- Schema flexibility is priority
- Multi-model queries beneficial
Installation:
uv pip install python-arango4. rdflib - Standards-First Choice#
Best for: Small-to-medium knowledge graphs requiring RDF/SPARQL compliance
Strengths:
- Full RDF/SPARQL specification compliance
- Inference engine support
- Standards-based data exchange
- Good for linked data applications
When to choose rdflib:
- RDF/SPARQL compliance required
- Ontology reasoning needed
- Scale < 1M triples
- Academic or research contexts
Installation:
uv pip install rdflib5. gremlinpython - Portability Choice#
Best for: Multi-database environments, cloud-native deployments
Strengths:
- Works with many backends (Neptune, JanusGraph, etc.)
- Cloud-managed options available
- Standard traversal language
When to choose gremlinpython:
- Using AWS Neptune or similar managed service
- Need database portability
- Multi-cloud strategy
Installation:
uv pip install gremlinpython6. NetworkX - Analysis Choice#
Best for: Prototyping, offline analysis, algorithm development
Strengths:
- Rich algorithm library
- Easy Python integration
- Great for research and prototyping
- Integrates with scientific Python stack
When to choose NetworkX:
- Prototyping graph logic before production
- Offline batch analysis
- Algorithm research and development
- In-memory data fits requirements
Installation:
uv pip install networkxDecision Framework#
START
|
v
Is scale > 1B edges?
|-- YES --> pyTigerGraph
|-- NO --> Continue
|
v
Is RDF/SPARQL compliance required?
|-- YES --> Scale < 1M? --> rdflib
| Scale > 1M? --> neo4j + neosemantics
|-- NO --> Continue
|
v
Is document + graph multi-model needed?
|-- YES --> python-arango
|-- NO --> Continue
|
v
Is database portability required?
|-- YES --> gremlinpython
|-- NO --> Continue
|
v
Production use case?
|-- YES --> neo4j (official driver)
|-- NO --> NetworkX for prototypingCommon Hybrid Patterns#
Pattern 1: Neo4j + NetworkX#
- Neo4j for production serving
- NetworkX for algorithm prototyping
- Export graph subset for analysis
Pattern 2: Graph DB + Vector Search#
- Graph database for relationship queries
- Vector database (Pinecone, Milvus) for embeddings
- Combine for hybrid recommendations
Pattern 3: Graph DB + Optimization Solver#
- Graph database for topology storage
- OR-Tools/Gurobi for constrained optimization
- Write optimal solutions back to graph
Gaps Across All Libraries#
| Gap | Workaround |
|---|---|
| Real-time graph algorithms | Pre-compute, cache results |
| Temporal queries | Temporal properties, time-bucketed subgraphs |
| Streaming ingestion | External stream processor (Kafka Connect) |
| Multi-tenant isolation | Database-per-tenant or property-based filtering |
| Schema migration | Version properties, migration scripts |
Final Recommendation#
For teams starting with graph databases in Python:
- Start with neo4j official driver - best documentation, most examples
- Add NetworkX for prototyping and analysis workflows
- Evaluate scale after initial deployment
- Consider pyTigerGraph if scaling beyond 1B edges
- Consider python-arango if multi-model becomes valuable
Use Case: Fraud Detection#
Domain Description#
Fraud detection leverages graph analysis to identify suspicious patterns in transactions, accounts, and entity relationships. Graphs excel at revealing hidden connections between seemingly unrelated entities, detecting ring structures, and identifying anomalous behavior patterns that traditional tabular analysis misses.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | Property Graph | Need rich properties on both nodes and edges |
| Schema | Flexible | Fraud patterns evolve; schema must adapt quickly |
| Temporality | Time-aware | Transaction timestamps critical for pattern detection |
Key Entity Types:
- Accounts (bank, merchant, user)
- Transactions (payments, transfers, purchases)
- Devices (phones, IP addresses, browsers)
- Identities (SSN, email, phone numbers)
- Locations (addresses, GPS coordinates)
Query Pattern Complexity#
Primary Patterns:
- Ring detection: Circular money flows (A -> B -> C -> A)
- Shared identity: Multiple accounts sharing device/IP/email
- Velocity analysis: Transaction frequency and amount patterns
- Network expansion: Exploring N-hop neighborhood of suspicious entity
- Similarity matching: Finding accounts with similar behavior patterns
Query Characteristics:
- Depth: 3-6 hops for pattern detection
- Time windows: Queries scoped to time ranges (last 24h, 7d, 30d)
- Aggregation: Sum, count, standard deviation of transactions
- Pattern matching: Complex subgraph patterns with constraints
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Accounts (nodes) | 10M - 100M | 1B+ |
| Transactions (edges) | 100M - 10B | 100B+ |
| Real-time queries | 100 - 1K QPS | 10K+ QPS |
| Pattern scans | 1M - 100M/hour | 1B+/hour |
Processing Mode#
- Real-time: Transaction scoring at payment time (< 100ms)
- Near-real-time: Alert generation (< 5 min lag)
- Batch: Pattern discovery, model training (hourly/daily)
Integration Requirements#
- Transaction streaming (Kafka, Kinesis) for real-time ingestion
- ML pipeline for fraud scoring models
- Case management systems for investigation workflows
- Regulatory reporting and audit trails
- Alert delivery (email, SMS, dashboards)
Library Evaluation#
neo4j (Official Driver)#
Strengths:
- Excellent pattern matching with Cypher
- GDS library has community detection, PageRank for risk scoring
- Good transaction support for consistent writes
- Bloom visualization for investigators
Limitations:
- Real-time scoring at 10K+ TPS challenging
- Temporal queries require careful indexing
- Graph algorithms not available in Community edition
Fit Score: 8/10
pyTigerGraph#
Strengths:
- Built for high-throughput transaction processing
- GSQL optimized for deep link analysis
- Native support for temporal patterns
- Designed for financial services scale
Limitations:
- Enterprise licensing costs
- Steeper learning curve for GSQL
- Smaller Python community
Fit Score: 9/10 (high scale); 7/10 (smaller deployments)
python-arango#
Strengths:
- Good throughput for transaction ingestion
- Multi-model allows storing raw transaction documents
- Flexible schema for evolving fraud patterns
- Cost-effective scaling
Limitations:
- Graph algorithms less mature than Neo4j GDS
- Pattern matching syntax less expressive
- Smaller fraud detection community
Fit Score: 7/10
gremlinpython (with Neptune)#
Strengths:
- Managed service reduces operational burden
- Good for AWS-native architectures
- Scales horizontally
Limitations:
- Query latency can be variable
- Limited graph algorithm support
- Gremlin verbose for complex patterns
Fit Score: 6/10
NetworkX (with external storage)#
Strengths:
- Rich algorithm library for analysis
- Good for offline pattern discovery
- Easy prototyping of detection logic
Limitations:
- In-memory only (not for production scale)
- No persistence or transactions
- Cannot handle real-time requirements
Fit Score: 4/10 (analysis only)
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Real-time graph algorithms | Cannot run PageRank per transaction | Pre-compute risk scores, incremental updates |
| Temporal pattern matching | Limited native time-series support | Time-bucketed subgraphs, temporal indices |
| Streaming ingestion | Not all drivers handle high-volume streams | Kafka Connect, custom streaming layer |
| Explainability | Graph patterns hard to explain to regulators | Path export, visualization, rule extraction |
| Model integration | Limited native ML support | Feature extraction to external ML pipeline |
Architecture Pattern#
[Transaction Stream]
|
v
[Stream Processor] -- real-time features --> [ML Scoring Service]
|
v
[Graph Database] <-- enrichment queries
|
v
[Batch Analytics] -- pattern discovery --> [Rule Engine Update]Hybrid Approach:
- Real-time: Feature extraction + ML scoring (sub-100ms)
- Near-real-time: Graph enrichment queries (100ms-1s)
- Batch: Deep pattern analysis, model retraining
Recommendation#
Best Fit: pyTigerGraph for enterprise fraud detection
At the scale typical for financial fraud detection (billions of transactions), TigerGraph’s distributed architecture and GSQL’s pattern matching capabilities make it the strongest choice. The financial services focus means battle-tested at relevant scale.
Alternative: neo4j official driver for smaller deployments or teams with existing Cypher expertise. The GDS library provides excellent algorithm support for pattern discovery.
Hybrid pattern: Use Neo4j/TigerGraph for graph storage and queries, with NetworkX for offline algorithm development and prototyping.
Use Case: Knowledge Graph#
Domain Description#
Knowledge graphs represent entities and their semantic relationships, enabling structured knowledge representation, reasoning, and discovery. Common applications include enterprise knowledge management, semantic search, question answering systems, and data integration across disparate sources.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | RDF/Triple Store OR Property Graph | RDF for standards compliance; Property Graph for flexibility |
| Schema | Ontology-driven | Need formal type hierarchies and relationship constraints |
| Semantics | Rich typing | Entities have types, properties have ranges, relationships have semantics |
Model Choice Considerations:
- RDF/SPARQL: Best for standards compliance, linked data, inference
- Property Graph: Better performance, easier development, less formal semantics
Query Pattern Complexity#
Primary Patterns:
- Semantic traversal: Following typed relationships with constraints
- Inference queries: Deriving implicit relationships from explicit ones
- Faceted search: Filtering entities by multiple attribute combinations
- Path queries: Finding connection paths with semantic constraints
Query Characteristics:
- Depth: Variable (1-10+ hops depending on question complexity)
- Filtering: Heavy use of type and property constraints
- Aggregation: Counting, grouping by entity types
- Reasoning: RDFS/OWL inference for RDF; manual for property graphs
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Entities (nodes) | 100K - 10M | 100M+ |
| Facts (edges) | 1M - 100M | 1B+ |
| Concurrent queries | 10 - 100 QPS | 1K+ QPS |
| Ontology complexity | 100 - 1K classes | 10K+ classes |
Processing Mode#
- Primary: Interactive queries for search and exploration
- Secondary: Batch ingestion from source systems
- Latency target: < 500ms for exploratory queries; < 100ms for autocomplete
Integration Requirements#
- NLP pipelines for entity extraction and linking
- Data integration from multiple source systems (databases, APIs, documents)
- Search engines (Elasticsearch) for full-text capabilities
- Visualization tools for graph exploration
- LLM integration for natural language querying
Library Evaluation#
rdflib#
Strengths:
- Native RDF/SPARQL support
- Standards compliant (W3C specifications)
- Good for small-to-medium knowledge graphs
- Inference engine support (OWL-RL)
Limitations:
- In-memory by default (persistence requires plugins)
- Performance degrades above 1M triples
- Limited concurrent query support
- No built-in clustering
Fit Score: 7/10 (small-medium); 4/10 (large scale)
neo4j (Official Driver)#
Strengths:
- Excellent query performance at scale
- Flexible property graph for evolving ontologies
- Full-text search integration
- Strong Python ecosystem
Limitations:
- No native RDF/SPARQL (requires neosemantics plugin)
- No built-in inference engine
- Ontology constraints require manual enforcement
Fit Score: 8/10
python-arango#
Strengths:
- Multi-model allows combining document + graph
- Good for knowledge graphs with rich entity attributes
- Full-text search built-in
- Scales well horizontally
Limitations:
- No RDF/SPARQL support
- Limited semantic reasoning capabilities
- Smaller knowledge graph community
Fit Score: 7/10
gremlinpython (with Neptune/JanusGraph)#
Strengths:
- Cloud-native options (AWS Neptune)
- Supports both property graph and RDF modes
- Good for large-scale deployments
Limitations:
- Verbose query syntax for complex patterns
- Variable performance across backends
- Less intuitive for semantic queries
Fit Score: 6/10
pyTigerGraph#
Strengths:
- Excellent scale for massive knowledge graphs
- GSQL supports complex pattern matching
- Built-in ML workbench for embeddings
Limitations:
- Enterprise-focused (cost considerations)
- Steeper learning curve
- Limited RDF ecosystem integration
Fit Score: 7/10 (large scale)
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Inference across libraries | Most lack native reasoning | External reasoner (HermiT, Pellet) or pre-materialization |
| Schema evolution | Ontology changes disruptive | Versioned schemas, migration scripts |
| Multilingual support | Limited language handling | External NLP, language-tagged properties |
| Provenance tracking | Need to track fact sources | Custom edge properties for provenance |
| Temporal knowledge | Facts change over time | Temporal properties, versioned subgraphs |
Hybrid Architecture Pattern#
For production knowledge graphs, consider a hybrid approach:
[RDFLib for ontology management]
|
v
[Neo4j/ArangoDB for query execution]
|
v
[Elasticsearch for full-text search]This combines:
- RDFLib’s semantic capabilities for schema management
- Property graph’s query performance for runtime
- Search engine’s text capabilities for discovery
Recommendation#
Best Fit: neo4j official driver with neosemantics plugin
For knowledge graph applications requiring both semantic expressiveness and query performance, Neo4j with the neosemantics (n10s) plugin provides the best balance. It supports RDF import/export while leveraging Cypher’s performance for queries.
Alternative: rdflib for smaller knowledge graphs (< 1M triples) where standards compliance and inference are primary requirements.
Alternative: python-arango when knowledge entities have complex nested attributes and document-style storage is beneficial.
Use Case: Network Infrastructure#
Domain Description#
Network infrastructure graphs model the topology and dependencies of IT systems, including physical networks, cloud resources, microservices, and their interconnections. Use cases include impact analysis, root cause detection, capacity planning, and configuration management.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | Property Graph | Rich metadata on nodes (config, status); typed edges |
| Schema | Semi-structured | Core types stable; vendor-specific attributes vary |
| Hierarchy | Multi-level | Physical -> logical -> application layers |
Key Entity Types:
- Physical: Servers, switches, routers, data centers
- Virtual: VMs, containers, Kubernetes pods
- Application: Services, databases, APIs, queues
- Configuration: Ports, IPs, certificates, credentials
- Connections: Network links, API calls, data flows
Query Pattern Complexity#
Primary Patterns:
- Impact analysis: “What is affected if server X fails?”
- Root cause: “What upstream dependencies could cause service Y to fail?”
- Path analysis: Network paths between two endpoints
- Configuration drift: Finding misconfigurations across related resources
- Dependency depth: How deep is the dependency tree for a service?
Query Characteristics:
- Depth: Variable (1-hop for direct deps, 5+ for full impact)
- Direction: Both upstream and downstream traversals
- Filtering: By resource type, status, environment
- Aggregation: Counting affected resources, grouping by type
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Resources (nodes) | 10K - 100K | 1M+ |
| Dependencies (edges) | 50K - 1M | 10M+ |
| Query frequency | 10 - 100 QPS | 1K+ QPS |
| Update frequency | 100 - 10K/min | 100K/min |
Processing Mode#
- Real-time: Incident impact analysis (< 1s)
- Near-real-time: Topology updates from discovery (< 1min lag)
- Batch: Full topology reconciliation, analytics
Integration Requirements#
- CMDB/asset management systems
- Monitoring tools (Prometheus, Datadog, Nagios)
- Cloud provider APIs (AWS, GCP, Azure)
- Container orchestration (Kubernetes API)
- Incident management (PagerDuty, ServiceNow)
- IaC tools (Terraform, Ansible) for configuration
Library Evaluation#
neo4j (Official Driver)#
Strengths:
- Excellent for dependency traversal queries
- Cypher’s variable-length paths perfect for impact analysis
- Good visualization integration for operations teams
- APOC procedures for graph algorithms
Limitations:
- Schema flexibility can lead to inconsistency
- Need careful index strategy for large topologies
- Single-node architecture limits write scale
Fit Score: 9/10
python-arango#
Strengths:
- Multi-model stores rich configuration documents
- Good for combining graph with document queries
- Horizontal scaling for large infrastructures
- Cost-effective for moderate scale
Limitations:
- AQL traversal syntax less intuitive than Cypher
- Fewer infrastructure-specific examples
- Smaller operations/SRE community
Fit Score: 7/10
pyTigerGraph#
Strengths:
- Scales well for very large infrastructures
- Good for cross-region federated topologies
- GSQL handles complex impact queries
Limitations:
- Overkill for most infrastructure use cases
- Enterprise licensing costs
- Steeper learning curve
Fit Score: 6/10 (typical); 8/10 (very large scale)
gremlinpython (with Neptune/JanusGraph)#
Strengths:
- Cloud-native options integrate with cloud monitoring
- Standard traversal language
- Good for multi-cloud environments
Limitations:
- Verbose for operational queries
- Debugging traversals challenging during incidents
- Variable performance
Fit Score: 6/10
NetworkX#
Strengths:
- Excellent for topology analysis algorithms
- Easy integration with Python operations tools
- Good for offline planning and analysis
Limitations:
- In-memory only
- Cannot handle real-time incident queries
- No persistence
Fit Score: 5/10 (analysis only)
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Real-time topology updates | Discovery lag during changes | Event-driven updates, eventual consistency |
| Multi-layer correlation | Physical-logical-app mapping complex | Typed edges, layer property on nodes |
| Historical topology | Need point-in-time topology | Temporal properties, snapshot graphs |
| Dynamic environments | Kubernetes pods ephemeral | Aggregate by service, not pod |
| Cross-system correlation | Multiple source systems | Canonical ID mapping layer |
Architecture Pattern#
[Discovery Sources]
|-- Cloud APIs
|-- K8s API
|-- CMDB
|-- Network monitoring
|
v
[Topology Aggregator] -- canonical model --> [Graph Database]
| |
v v
[Change Event Stream] [Query API]
| |
v v
[Alert Enrichment] [Dashboard/CLI]Operational Queries:
// Impact analysis: What services are affected if this server fails?
MATCH (server:Server {id: $serverId})<-[:RUNS_ON*1..3]-(service:Service)
RETURN service.name, service.criticality
// Root cause: What could cause this API to fail?
MATCH path = (api:API {name: $apiName})-[:DEPENDS_ON*1..5]->(dep)
WHERE dep.status = 'unhealthy'
RETURN path
// Dependency depth
MATCH path = (service:Service {name: $svc})-[:DEPENDS_ON*]->(dep)
RETURN max(length(path)) as maxDepthOperational Considerations#
| Consideration | Approach |
|---|---|
| Incident response | Pre-computed impact sets for critical services |
| Discovery frequency | Balance freshness vs database load |
| Schema evolution | Version type hierarchies, migration scripts |
| Access control | Environment-based graph partitioning |
| Audit trail | Change log for topology modifications |
Recommendation#
Best Fit: neo4j official driver
For network infrastructure and dependency mapping, Neo4j’s Cypher language
provides the most natural expression of dependency queries. The ability to
write variable-length path queries (-[:DEPENDS_ON*1..5]->) makes impact
analysis and root cause queries intuitive.
Key advantages for operations:
- Fast time-to-value with intuitive query language
- Bloom visualization for non-technical stakeholders
- Active operations/SRE community with examples
Alternative: python-arango when infrastructure includes complex configuration documents that benefit from document storage alongside graph relationships.
Complement with NetworkX for offline topology analysis, capacity planning, and what-if simulations that don’t need real-time data.
Use Case: Recommendation Engine#
Domain Description#
Recommendation engines leverage graph structures to model user-item relationships, enabling collaborative filtering, content-based recommendations, and hybrid approaches. Graphs naturally represent the bipartite relationship between users and items, as well as item-item and user-user similarities.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | Property Graph | Edge weights for ratings/interactions; rich node properties |
| Structure | Bipartite core | User-Item relationships; User-User and Item-Item derived |
| Weights | Numeric edge properties | Ratings, interaction counts, recency scores |
Key Entity Types:
- Users (profiles, preferences, segments)
- Items (products, content, services)
- Interactions (views, purchases, ratings, saves)
- Categories/Tags (content metadata)
Query Pattern Complexity#
Primary Patterns:
- Collaborative filtering: “Users who liked X also liked Y”
- User neighborhood: Similar users based on shared interactions
- Item neighborhood: Similar items based on shared user base
- Path-based recommendations: Multi-hop reasoning (A likes B, B similar to C)
- Popularity queries: Top items by interaction count
Query Characteristics:
- Depth: 2-3 hops typical (user -> item -> similar items)
- Aggregation: Heavy (counting, averaging, ranking)
- Filtering: By recency, category, availability
- Personalization: User-specific traversal starting points
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Users (nodes) | 100K - 10M | 100M+ |
| Items (nodes) | 10K - 1M | 10M+ |
| Interactions (edges) | 10M - 1B | 100B+ |
| Recommendation requests | 100 - 10K QPS | 100K+ QPS |
| Latency target | < 100ms | < 50ms |
Processing Mode#
- Real-time: Serving recommendations (< 100ms)
- Batch: Computing similarity matrices, embeddings (hourly/daily)
- Incremental: Updating recommendations as new interactions arrive
Integration Requirements#
- API layer for client applications (REST, GraphQL)
- Event streaming for real-time interaction capture
- Feature store for ML model features
- A/B testing infrastructure for recommendation experiments
- Analytics for recommendation performance tracking
Library Evaluation#
neo4j (Official Driver)#
Strengths:
- Cypher excellent for collaborative filtering queries
- GDS library has similarity algorithms (cosine, Jaccard)
- Node embedding algorithms for hybrid approaches
- Good caching for repeated query patterns
Limitations:
- Real-time computation at scale challenging
- Need GDS for similarity algorithms (Enterprise)
- No native matrix operations
Fit Score: 8/10
python-arango#
Strengths:
- Good performance for bipartite graph queries
- Multi-model allows storing item metadata as documents
- Cost-effective scaling for high interaction volumes
Limitations:
- Limited built-in similarity algorithms
- Less mature recommendation-specific ecosystem
- Need custom similarity implementations
Fit Score: 6/10
pyTigerGraph#
Strengths:
- Excellent scale for high-volume interactions
- GSQL supports complex aggregation patterns
- Built-in ML workbench for embeddings
- Graph feature extraction for ML models
Limitations:
- Enterprise cost considerations
- Overkill for smaller catalogs
- Steeper learning curve
Fit Score: 8/10 (large scale); 6/10 (smaller deployments)
gremlinpython#
Strengths:
- Works with multiple backends
- Standard traversal patterns
- Cloud options available
Limitations:
- Verbose for aggregation-heavy queries
- No built-in similarity algorithms
- Performance varies by backend
Fit Score: 5/10
NetworkX#
Strengths:
- Rich algorithm library (bipartite algorithms)
- Easy prototyping of recommendation logic
- Good for offline analysis and testing
Limitations:
- In-memory only
- Cannot serve real-time recommendations
- No persistence
Fit Score: 3/10 (prototyping only)
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Real-time similarity | Cannot compute on-the-fly at scale | Pre-computed similarity cache |
| Cold start | New users/items have no connections | Content-based fallback, popularity-based |
| Implicit feedback | View != purchase signal strength | Weight tuning, decay functions |
| Diversity | Graph algorithms tend toward popular items | Re-ranking layer, exploration bonus |
| Explanation | Hard to explain graph-based recommendations | Path extraction, rule-based overlays |
Architecture Pattern#
[Interaction Events]
|
v
[Stream Processor] --> [Real-time Features]
|
v
[Graph Database] <-- batch similarity updates
|
v
[Recommendation Service]
|
v
[Cache Layer] --> [API Response]Hybrid Recommendation Pattern:
- Graph-based collaborative filtering for relationship signals
- Embedding-based similarity for scale and cold start
- Business rules layer for diversity, freshness, inventory
- Caching layer for latency requirements
Pre-computation Strategy#
For production recommendation systems, pre-compute:
| Computation | Frequency | Storage |
|---|---|---|
| Item-item similarity top-K | Daily | Graph edges or Redis |
| User-item affinity scores | Hourly | Feature store |
| User segments | Daily | User properties |
| Popular items per category | Hourly | Cache layer |
Recommendation#
Best Fit: neo4j official driver for most recommendation use cases
Neo4j’s combination of expressive queries (Cypher) and graph algorithms (GDS) makes it well-suited for recommendation systems. The ability to compute Jaccard similarity, node embeddings, and community detection in the database enables sophisticated recommendations.
Alternative: pyTigerGraph for very high-scale systems (100M+ users, 1B+ interactions) where distributed processing is essential.
Hybrid pattern: Use the graph database for relationship storage and collaborative filtering queries, combined with vector similarity search (Pinecone, Milvus) for embedding-based recommendations and caching (Redis) for serving latency.
Use Case: Social Network Graph#
Domain Description#
Social networks model relationships between users including follows, friendships, group memberships, content sharing, and interactions. The graph structure captures the social fabric that enables features like friend suggestions, feed ranking, and influence analysis.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | Property Graph | Nodes need rich attributes (profiles); edges need properties (timestamp, strength) |
| Schema | Semi-flexible | Core user/relationship types stable; new interaction types added frequently |
| Directionality | Mixed | Follows are directional; friendships are bidirectional |
Query Pattern Complexity#
Primary Patterns:
- Friend-of-friend traversal: 2-3 hop neighborhood exploration
- Mutual connections: Finding common neighbors between two users
- Shortest path: Degrees of separation between users
- Influence propagation: Multi-hop traversal with aggregation
Query Characteristics:
- Depth: Typically 2-4 hops (beyond 4 hops performance degrades rapidly)
- Breadth: Can explode (users with 10K+ connections)
- Aggregation: Count, distinct, top-N patterns common
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Users (nodes) | 100K - 10M | 100M+ |
| Relationships (edges) | 1M - 500M | 10B+ |
| Concurrent queries | 100 - 1K QPS | 10K+ QPS |
| Write throughput | 100 - 10K/sec | 100K+/sec |
Processing Mode#
- Primary: Real-time OLTP for user-facing features
- Secondary: Batch analytics for recommendations and insights
- Latency target: < 50ms for interactive queries
Integration Requirements#
- REST/GraphQL API layer for mobile/web clients
- Event streaming for activity feeds (Kafka, Redis Streams)
- ML pipeline integration for recommendation models
- Analytics warehouse sync for business intelligence
Library Evaluation#
neo4j (Official Driver)#
Strengths:
- Excellent Cypher support for complex traversals
- Native path-finding algorithms (shortest path, all paths)
- Strong transaction support for consistent updates
- Async driver available for high concurrency
Limitations:
- Single-database focus limits multi-tenancy options
- Graph algorithms require separate APOC/GDS plugins
- Connection pooling configuration can be complex
Fit Score: 8/10
py2neo#
Strengths:
- Pythonic OGM (Object-Graph Mapping) layer
- Easier onboarding for developers new to graphs
- Good integration with pandas for analytics
Limitations:
- Performance overhead from OGM abstraction
- Less control over query optimization
- Maintenance concerns (community-driven)
Fit Score: 6/10
python-arango#
Strengths:
- Multi-model allows document storage alongside graph
- AQL provides flexible query patterns
- Built-in support for graph traversal with configurable depth
Limitations:
- Less mature graph algorithm ecosystem
- Smaller community for social network patterns
- Traversal syntax less intuitive than Cypher
Fit Score: 7/10
pyTigerGraph#
Strengths:
- Designed for massive scale (10B+ edges)
- GSQL optimized for deep traversals
- Built-in distributed processing
Limitations:
- Steeper learning curve
- Enterprise licensing costs
- Less flexible for rapid prototyping
Fit Score: 7/10 (9/10 at very high scale)
gremlinpython#
Strengths:
- Database-agnostic (works with many backends)
- Standard traversal language
- Good for multi-database environments
Limitations:
- Verbose syntax compared to Cypher
- Performance varies by backend
- Debugging traversals can be challenging
Fit Score: 6/10
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Real-time graph algorithms | Cannot compute PageRank on-the-fly | Pre-compute in batch, cache results |
| Supernodes (celebrities) | Traversal explosion | Bidirectional search, sampling strategies |
| Temporal queries | Limited time-series support | Add timestamp indices, partition by time |
| Multi-hop aggregations | Memory pressure | Streaming result processing, pagination |
Recommendation#
Best Fit: neo4j official driver
For social network applications, the combination of expressive Cypher queries, mature ecosystem (GDS for algorithms), and strong async support makes the official Neo4j driver the best choice for most scale profiles.
Alternative: pyTigerGraph for platforms expecting 100M+ users where distributed processing becomes essential.
Use Case: Supply Chain#
Domain Description#
Supply chain graphs model the network of suppliers, manufacturers, distributors, and logistics providers that move products from raw materials to end customers. Graph analysis enables risk assessment, optimization of logistics, supplier diversification, and end-to-end traceability.
Requirements Analysis#
Graph Model Requirements#
| Aspect | Requirement | Rationale |
|---|---|---|
| Model Type | Property Graph | Rich attributes on entities; weighted relationships |
| Multi-graph | Multiple edge types | Material flow, financial flow, information flow |
| Temporal | Time-aware edges | Lead times, seasonal variations, historical performance |
Key Entity Types:
- Organizations: Suppliers, manufacturers, distributors, retailers
- Facilities: Factories, warehouses, ports, distribution centers
- Products: SKUs, components, raw materials, finished goods
- Logistics: Routes, carriers, shipments
- Contracts: Agreements, terms, pricing
Query Pattern Complexity#
Primary Patterns:
- Shortest path: Optimal route from supplier to customer
- Risk propagation: “If supplier X fails, what products are affected?”
- Alternative sourcing: Finding backup suppliers for a component
- Bottleneck detection: Identifying single points of failure
- Cost optimization: Weighted path finding for lowest total cost
Query Characteristics:
- Depth: 3-10 hops (raw material to finished product)
- Weighted: Edges have cost, time, capacity attributes
- Aggregation: Sum costs, max lead times, min capacities
- Constraints: Capacity limits, geographic restrictions
Scale Requirements#
| Metric | Typical Range | High Scale |
|---|---|---|
| Entities (nodes) | 10K - 100K | 1M+ |
| Relationships (edges) | 100K - 1M | 10M+ |
| Query frequency | 10 - 100 QPS | 1K QPS |
| Path computations | 100 - 10K/day | 100K/day |
Processing Mode#
- Real-time: Disruption impact assessment (< 5s)
- Batch: Route optimization, network redesign (hourly/daily)
- Simulation: What-if analysis for planning
Integration Requirements#
- ERP systems (SAP, Oracle) for order and inventory data
- TMS (Transportation Management Systems) for logistics
- Supplier portals for performance data
- IoT/tracking systems for shipment visibility
- BI tools for reporting and visualization
- Planning tools for demand forecasting
Library Evaluation#
neo4j (Official Driver)#
Strengths:
- Excellent weighted shortest path algorithms
- Cypher handles multi-hop supply chain queries well
- GDS library for network analysis (centrality, community)
- Good visualization for supply chain mapping
Limitations:
- Complex optimization needs external solvers
- Limited native geospatial support
- Large-scale simulations may need export to specialized tools
Fit Score: 8/10
python-arango#
Strengths:
- Multi-model stores complex product/contract documents
- Good geospatial support for logistics
- Scales well for medium-large supply chains
- Cost-effective for exploration
Limitations:
- Fewer built-in graph algorithms
- Path optimization less mature than Neo4j
- Smaller supply chain community
Fit Score: 7/10
pyTigerGraph#
Strengths:
- Excellent for very large global supply chains
- GSQL handles complex path computations
- Built-in graph analytics
- Enterprise supply chain focus
Limitations:
- Enterprise licensing costs
- Steeper learning curve
- Overkill for regional supply chains
Fit Score: 8/10 (global enterprise); 6/10 (smaller chains)
gremlinpython#
Strengths:
- Database-agnostic
- Standard traversal patterns
- Works with Neptune for AWS supply chains
Limitations:
- Verbose for weighted path queries
- Limited optimization algorithms
- Less intuitive for supply chain queries
Fit Score: 5/10
NetworkX#
Strengths:
- Rich library for network optimization
- Excellent for simulations and what-if analysis
- Easy integration with optimization libraries (PuLP, OR-Tools)
- Good for research and prototyping
Limitations:
- In-memory only
- Cannot serve production queries
- Export/import overhead for real data
Fit Score: 6/10 (analysis); 2/10 (production)
Gaps and Workarounds#
| Gap | Impact | Workaround |
|---|---|---|
| Constrained optimization | Cannot express capacity constraints in query | Export to optimization solver |
| Multi-objective paths | Trade-off cost vs time vs risk complex | Pareto frontier computation offline |
| Temporal edges | Lead times vary by season/volume | Time-parameterized edge properties |
| Geospatial routing | Distance calculations limited | Integrate with mapping APIs |
| Simulation | What-if at scale challenging | Clone subgraphs, sandbox environments |
| Data freshness | Supply chain data from many sources | ETL pipeline, change data capture |
Architecture Pattern#
[Source Systems]
|-- ERP
|-- TMS
|-- Supplier portals
|-- IoT/Tracking
|
v
[ETL Pipeline] -- transformation --> [Graph Database]
| |
v v
[Master Data Management] [Query API]
|
v
[Planning/Visualization Tools]Query Examples:
// Shortest path by lead time
MATCH path = shortestPath(
(supplier:Supplier {id: $supplierId})-[:SHIPS_TO*]-(dc:DistributionCenter {id: $dcId})
)
RETURN path, reduce(time=0, r in relationships(path) | time + r.leadTimeDays) as totalLeadTime
// Risk propagation: What's affected if this supplier fails?
MATCH (supplier:Supplier {id: $supplierId})<-[:SOURCED_FROM*1..5]-(product:Product)
RETURN product.sku, product.name, product.criticality
// Alternative suppliers
MATCH (product:Product {sku: $sku})-[:SOURCED_FROM]->(current:Supplier)
MATCH (component)<-[:CONTAINS]-(product)
MATCH (alt:Supplier)-[:PROVIDES]->(component)
WHERE alt <> current
RETURN alt.name, count(component) as componentsAvailableOptimization Patterns#
For complex supply chain optimization, combine graph database with optimization:
- Graph database: Topology storage, constraint queries
- Export to pandas/NumPy: Data preparation
- Optimization solver (OR-Tools, Gurobi): Route optimization
- Write back to graph: Optimal routes as relationships
# Example hybrid pattern
from neo4j import GraphDatabase
from ortools.constraint_solver import routing_enums_pb2, pywrapcp
# 1. Extract network from graph
with driver.session() as session:
network = session.run("MATCH (a)-[r:ROUTE]->(b) RETURN ...").data()
# 2. Build optimization model
manager = pywrapcp.RoutingIndexManager(...)
routing = pywrapcp.RoutingModel(manager)
# 3. Solve
solution = routing.SolveWithParameters(search_params)
# 4. Write optimal routes back to graph
with driver.session() as session:
session.run("CREATE (r:OptimalRoute {...})", optimal_route)Recommendation#
Best Fit: neo4j official driver
For supply chain applications, Neo4j provides the best balance of query expressiveness, graph algorithms, and ecosystem maturity. The combination of Cypher for querying and GDS for analytics covers most supply chain needs.
Key advantages for supply chain:
- Weighted shortest path for logistics optimization
- Centrality algorithms for identifying critical nodes
- Community detection for supplier clustering
- Good visualization for supply chain mapping
Alternative: pyTigerGraph for global enterprises with very large, distributed supply chains requiring massive scale.
Complement with NetworkX/OR-Tools for complex constrained optimization that goes beyond graph traversal (e.g., vehicle routing, facility location).
S4: Strategic
Strategic Analysis Methodology: Graph Database Client Libraries#
Analysis Framework#
This strategic assessment evaluates Python client libraries for graph databases through a 5-year viability lens, focusing on sustainability, portability, and ecosystem evolution.
Evaluation Dimensions#
1. Library Sustainability Assessment#
- Maintenance Cadence: Release frequency, bug fix responsiveness, security patches
- Corporate vs Community: Official vendor support vs community-driven development
- Funding Model: Venture-backed, open-source foundation, or hybrid approaches
- Bus Factor: Number of active maintainers, knowledge distribution
- Breaking Change Philosophy: Semantic versioning adherence, deprecation cycles
2. Ecosystem Positioning#
- Market Share Alignment: Does the library serve a growing or declining database?
- Standards Compliance: GQL ISO standard readiness and migration path
- AI/ML Integration: Support for GraphRAG, knowledge graphs, vector embeddings
- Cloud Service Compatibility: Works with managed offerings (Neptune, Cosmos DB)
3. Portability Analysis#
- Query Language Lock-in: Cypher vs Gremlin vs proprietary languages
- Data Model Portability: Property graph standardization, export/import capabilities
- Abstraction Layer Options: TinkerPop compatibility, ORM/OGM availability
4. Risk Assessment#
- Vendor Viability: Financial health, acquisition risk, licensing changes
- Technology Obsolescence: Language evolution (async support, typing)
- Community Fragmentation: Fork risks, competing implementations
Data Sources#
- PyPI release history and download statistics
- GitHub commit activity and contributor metrics
- Corporate financial disclosures and funding announcements
- ISO GQL standardization progress (ISO/IEC 39075:2024)
- Market research reports (Gartner, Forrester, independent analysts)
- Vendor roadmaps and conference announcements
Scoring Methodology#
Each library receives ratings (1-5) across:
- Maintenance Health
- Corporate Backing Stability
- Breaking Change Risk (inverted: lower = better)
- Dependency Security
- Long-term Viability Confidence
Selection Context#
The analysis considers different use case profiles:
- Enterprise Production: Stability, support contracts, compliance
- Startup/Growth: Flexibility, cost efficiency, rapid iteration
- Research/Academic: Feature richness, community, documentation
- Multi-Database: Portability, abstraction, standards compliance
Time Horizon Considerations#
- Short-term (1-2 years): Current maintenance, Python version support
- Medium-term (3-5 years): GQL adoption, cloud service evolution
- Long-term (5+ years): Standards consolidation, market concentration
Graph Database Ecosystem Evolution (2025-2030)#
Market Growth Trajectory#
The graph database market is experiencing explosive growth, with projections ranging from $8.9B to $13.7B by 2030 (22-30% CAGR depending on source). Key growth drivers:
- AI/ML Workloads: Knowledge graphs powering RAG and agentic systems
- Cloud-Native Adoption: 72% of 2024 deployments are cloud-based
- Fraud Detection: 28.4% of 2025 market revenue from fraud/risk analytics
- SME Accessibility: Fastest-growing segment at 30%+ CAGR
GQL ISO Standard Impact (ISO/IEC 39075:2024)#
Timeline and Adoption#
- April 2024: GQL standard officially published by ISO
- 2024-2025: openCypher evolving toward GQL compliance
- 2025: Neo4j Cypher 25 introduces GQL-conformant features
- 2026-2028: Expected broad vendor adoption
What This Means for Developers#
- Cypher Users: Smooth transition path as Cypher converges to GQL
- Gremlin Users: No direct GQL migration; separate language families
- GSQL Users: Likely continued proprietary path; TigerGraph may add GQL layer
- New Projects: Consider GQL-ready implementations
Standard Features#
- 600+ pages of formal definitions
- Comparable in scope to SQL-92
- Pattern matching, path finding, graph mutations
- Expected to reduce vendor lock-in over time
Query Language Standardization Landscape#
Current State (2025)#
| Language | Type | Vendors | GQL Path |
|---|---|---|---|
| Cypher | Declarative | Neo4j, Memgraph, AGE | Converging |
| Gremlin | Traversal | Neptune, Cosmos, JanusGraph | Separate family |
| GSQL | Proprietary | TigerGraph | Unknown |
| SPARQL | RDF | Various | Separate family |
| openCypher | Open Standard | Multiple | Evolving to GQL |
Convergence Timeline#
- Short-term (2025-2026): Cypher/openCypher implementations add GQL features
- Medium-term (2027-2028): Majority of property graph databases GQL-compliant
- Long-term (2029+): GQL becomes default query language for new databases
Multi-Model Database Convergence#
PostgreSQL Graph Capabilities#
- Apache AGE: Graph extension bringing Cypher to PostgreSQL
- Incubator Status: Apache Software Foundation project
- Value Proposition: Add graph queries to existing PostgreSQL investments
MongoDB Evolution#
- Current: Document-focused with limited graph features
- Trend: Focus on Atlas Search, Vector Search, AI workloads
- Graph Strategy: Not a primary focus
Market Implication#
Multi-model databases offer “good enough” graph capabilities for many use cases, potentially limiting growth of pure graph databases. However, deep graph analytics still favor specialized databases.
Cloud-Native Graph Services Growth#
Major Cloud Offerings#
| Provider | Service | Query Languages | Status |
|---|---|---|---|
| AWS | Neptune | Gremlin, openCypher | Active |
| Azure | Cosmos DB (Gremlin) | Gremlin | Stable |
| Spanner Graph | SQL + Graph | GA (2024) | |
| Neo4j | AuraDB | Cypher | Growing |
2024-2025 Developments#
- Google Spanner Graph: Entered market with SQL-integrated graph
- AWS Neptune + Bedrock: Graph RAG for knowledge bases
- Neo4j Aura: New analytics and GenAI features
Trend: Managed Services Dominating#
Cloud-based deployments (72%+ share) reduce infrastructure concerns but increase vendor lock-in. Library selection should consider cloud provider compatibility.
AI/ML Integration with Graph Databases#
GraphRAG Revolution (2024+)#
Microsoft’s open-source GraphRAG (July 2024) established graph-augmented retrieval as a production pattern. Key developments:
- Knowledge Graph Construction: LLMs extracting structured graphs from text
- Graph + Vector Hybrid: Combining semantic search with relationship traversal
- Agentic RAG: LLM agents using graph reasoning for multi-step workflows
Production Evidence#
- 300-320% ROI reported for knowledge graph implementations
- LinkedIn: 63% improvement in ticket resolution with graph-based systems
- Finance: 50% improvement in fraud detection rates
Library Implications#
Graph database clients increasingly need:
- Vector index support (embeddings)
- Streaming/async for real-time processing
- LLM framework integration (LangChain, LlamaIndex)
- Batch import for knowledge graph construction
Predictions for 2030#
High Confidence#
- GQL becomes dominant property graph query language
- Cloud-managed graph services capture majority of new deployments
- GraphRAG/knowledge graph use cases drive enterprise adoption
- Vector + graph hybrid architectures become standard
Medium Confidence#
- Neo4j maintains market leadership but with reduced share
- TinkerPop/Gremlin remains relevant for multi-database scenarios
- PostgreSQL AGE captures significant “casual graph” use cases
Lower Confidence#
- Complete query language standardization across vendors
- Proprietary languages (GSQL) gaining significant share
- On-premise deployments returning to favor
Graph Database Python Client Library Viability Assessment#
Executive Summary#
This assessment evaluates the long-term viability of Python client libraries for major graph databases. Libraries are rated on maintenance health, corporate backing, and sustainability for production use over a 5-year horizon.
Neo4j Python Driver#
Package: neo4j (PyPI)
Type: Official vendor driver
Current Version: 6.0.3 (November 2025)
Maintenance Status: EXCELLENT#
- Release Cadence: Monthly releases since 5.0
- Recent Activity: 6.0.x series actively developed with breaking changes for modernization
- Python Support: 3.10, 3.11, 3.12, 3.13 (dropped 3.7-3.9 in 6.0)
- Migration Tools: Official migration assistant for codebase upgrades
Corporate Backing: STRONG#
- Vendor: Neo4j Inc. (founded 2007)
- Funding: $581M total raised, $2B valuation
- Revenue: $200M+ ARR (2024), 44% market share in graph DBMS
- Customers: 75%+ Fortune 100, including BMW, NASA, UBS
- Business Model: Open-source core + AuraDB managed service
Breaking Change History: MODERATE RISK#
Recent 5.x to 6.x migration requires attention:
- Error handling redesign (DriverError vs Neo4jError separation)
- Resource management changes (explicit .close() required)
- Package rename from
neo4j-drivertoneo4j - Element IDs changed from integers to strings (5.x)
Dependency Health: EXCELLENT#
- Minimal dependencies, optional Rust extensions for performance
- No security vulnerabilities detected
- Clean dependency tree
Bus Factor Risk: LOW#
- Large engineering team at Neo4j
- Multiple maintainers across driver ecosystem
- Comprehensive documentation and enterprise support
Viability Score: 9/10#
Recommendation: Primary choice for Neo4j deployments. Strong long-term investment.
Neomodel (Neo4j OGM)#
Package: neomodel (PyPI)
Type: Community OGM under Neo4j Labs
Current Version: 6.0.0 (2024)
Maintenance Status: GOOD (Improved)#
- Release Cadence: Active development resumed 2023
- 2024 Updates: Async support, mypy typing (95% coverage), vector index support
- Python Support: 3.7+ with Neo4j 5.x and 4.4 LTS
Corporate Backing: COMMUNITY + LABS#
- Moved to Neo4j Labs program (official recognition, community-driven)
- Production use by Novo Nordisk (OpenStudyBuilder)
- No dedicated corporate funding
Breaking Change History: MODERATE#
- Major version bumps may require model adjustments
- Configuration system overhaul in recent versions
Bus Factor Risk: MEDIUM#
- Small maintainer team (Marius Conjeaud primary)
- Active community but concentration of knowledge
Viability Score: 7/10#
Recommendation: Suitable for Neo4j projects needing OGM patterns. Monitor maintainer activity.
python-arango (ArangoDB)#
Package: python-arango (PyPI)
Type: Official vendor driver
Current Version: Latest 2024 release
Maintenance Status: GOOD#
- Release Cadence: Healthy release activity
- Weekly Downloads: 352,711 (popular package)
- Python Support: 3.8+
- Async Alternative:
python-arango-asyncavailable
Corporate Backing: MODERATE (Changed)#
- Vendor: ArangoDB GmbH (founded 2014)
- Funding: $58.6M total raised
- Licensing Change: Moved to BSL 1.1 for version 3.12+ (Q1 2024)
- Still source-available for non-commercial use
- Cannot be used for competing managed services
- Community Edition Transition Fund available
Breaking Change History: LOW#
- Stable API evolution
- Good backward compatibility
Dependency Health: GOOD#
- No security vulnerabilities detected
- Reasonable dependency footprint
Bus Factor Risk: MEDIUM#
- Smaller company than Neo4j
- Dual headquarters (San Francisco/Cologne)
Viability Score: 7/10#
Recommendation: Viable for multi-model needs. Watch licensing implications for SaaS deployments.
pyTigerGraph#
Package: pyTigerGraph (PyPI)
Type: Official vendor SDK
Current Version: 1.8.1
Maintenance Status: ADEQUATE#
- Release Cadence: Active but less frequent
- Weekly Downloads: 5,614 (smaller user base)
- Recent Features: Async support (1.8), REST endpoint refactoring (1.7)
- Contributors: 30 open-source contributors
Corporate Backing: STRONG (Enterprise Focus)#
- Vendor: TigerGraph (founded 2012)
- Funding: $172-174M total raised
- Investors: Tiger Global, AME Cloud Ventures, Baidu
- Focus: Enterprise analytics, fraud detection, supply chain
- Customers: Uber, VISA, Alipay, Zillow
Breaking Change History: LOW-MODERATE#
- Version 1.7+ requires TigerGraph DB 4.1+ for new features
- Generally stable API
Dependency Health: GOOD#
- No security vulnerabilities detected
Bus Factor Risk: MEDIUM#
- Enterprise focus may limit open-source investment
- Proprietary GSQL creates ecosystem isolation
Viability Score: 6/10#
Recommendation: Best for enterprise-scale graph analytics with existing TigerGraph investment. Not recommended as a first graph database choice due to GSQL lock-in.
gremlinpython (Apache TinkerPop)#
Package: gremlinpython (PyPI)
Type: Apache Foundation project
Current Version: 3.8.0 (November 2025)
Maintenance Status: EXCELLENT#
- Release Cadence: Regular releases, 4.0 beta in development
- Governance: Apache Software Foundation PMC
- Python Support: Modern Python versions
Corporate Backing: FOUNDATION + MULTI-VENDOR#
- Apache Software Foundation governance since 2016
- Supported by multiple vendors (AWS, Microsoft, DataStax)
- PMC includes contributors from diverse organizations
- Active community with Discord, Twitch, YouTube presence
Breaking Change History: MODERATE (4.0 Coming)#
TinkerPop 4.0 introduces significant changes:
- Dropping WebSockets for HTTP 1.1
- Removing Bytecode in favor of gremlin-lang scripts
- Simplifying connection options
Dependency Health: GOOD#
- Standard Apache project quality
Bus Factor Risk: LOW#
- Multiple major vendors invested
- PMC governance ensures continuity
- Long-term Apache stewardship
Viability Score: 8/10#
Recommendation: Excellent choice for multi-database portability strategy. Works with JanusGraph, Neptune, Cosmos DB, DataStax. TinkerPop 4.0 migration planning needed.
Cloud Provider SDKs#
AWS Neptune (boto3 + gremlinpython)#
- Gremlin and openCypher support
- Strong backing from AWS
- Lock-in to AWS ecosystem
- Viability tied to AWS platform (effectively permanent)
Azure Cosmos DB (azure-cosmos + gremlinpython)#
- Gremlin API among multiple options
- Microsoft backing but graph capabilities seen as stagnant
- Multi-model flexibility
- Viability tied to Azure platform
Summary Viability Matrix#
| Library | Maintenance | Backing | Breaking Risk | Bus Factor | Overall |
|---|---|---|---|---|---|
| neo4j | 9 | 9 | 7 | 9 | 9/10 |
| neomodel | 7 | 6 | 7 | 5 | 7/10 |
| python-arango | 8 | 6 | 8 | 6 | 7/10 |
| pyTigerGraph | 6 | 7 | 7 | 6 | 6/10 |
| gremlinpython | 9 | 8 | 6 | 9 | 8/10 |
Key Findings#
Safest Long-term Bets#
- neo4j: Dominant market position, strong funding, active development
- gremlinpython: Apache governance, multi-vendor support, portability value
Watch List#
- python-arango: BSL licensing change may affect SaaS use cases
- pyTigerGraph: GSQL proprietary language creates lock-in risk
Emerging Considerations#
- All libraries adding async support (critical for modern Python)
- Vector/embedding support becoming table stakes
- GQL standard will reshape query language landscape
Vendor Lock-in Analysis: Graph Database Clients#
Query Language Portability Assessment#
Portability Spectrum#
Most Portable Least Portable
| |
Gremlin -----> Cypher/GQL -----> GSQL -----> ProprietaryGremlin (Apache TinkerPop)#
Portability Score: 9/10
- Supported Databases: JanusGraph, Neptune, Cosmos DB, DataStax, OrientDB
- Strengths: True multi-database abstraction, Apache governance
- Weaknesses: Imperative style less intuitive than Cypher
- Best For: Projects requiring database portability guarantees
Cypher / openCypher / GQL#
Portability Score: 7/10 (improving)
- Current Support: Neo4j, Memgraph, AGE (PostgreSQL), RedisGraph (EOL)
- GQL Future: Expected broad adoption 2026-2028
- Strengths: Declarative, readable, standardizing via ISO GQL
- Weaknesses: Neo4j dominance means de facto lock-in
- Best For: Projects betting on GQL standardization
GSQL (TigerGraph)#
Portability Score: 2/10
- Single Vendor: Only TigerGraph
- Strengths: Turing-complete, optimized for deep analytics
- Weaknesses: Complete vendor lock-in, no migration path
- Best For: Enterprise analytics with long-term TigerGraph commitment
Data Model Portability#
Property Graph Model#
All major graph databases (Neo4j, TigerGraph, Neptune, JanusGraph) use property graphs, providing basic model compatibility:
- Nodes/Vertices: Labeled entities with properties
- Edges/Relationships: Typed connections with properties
- Export Formats: CSV, JSON, GraphML widely supported
Migration Complexity Matrix#
| From | To Neo4j | To Neptune | To TigerGraph | To JanusGraph |
|---|---|---|---|---|
| Neo4j | - | Medium | High | Medium |
| Neptune | Medium | - | High | Low |
| TigerGraph | High | High | - | High |
| JanusGraph | Medium | Low | High | - |
Key Factors:
- Query translation (Cypher <-> Gremlin <-> GSQL)
- Schema and constraint differences
- Indexing strategy variations
- Application code rewrite requirements
Export/Import Tooling#
Available Tools:
- Neo4j: LOAD CSV, neo4j-admin export, APOC procedures
- Memgraph: Neo4j migration module (direct connection)
- General: GraphML format interchange
- Microsoft: MigrateToGraph (relational to graph)
Limitations:
- No universal graph-to-graph migration standard
- Query translation typically manual
- Application logic must be rewritten
Abstraction Layer Options#
TinkerPop as Universal Layer#
What It Provides:
- Common Gremlin query language
- Vendor-agnostic driver interfaces
- Standard property graph model
Databases Supported:
- JanusGraph (native)
- Amazon Neptune
- Azure Cosmos DB
- DataStax Enterprise
- OrientDB
Databases NOT Supported:
- Neo4j (native Cypher only, no TinkerPop)
- TigerGraph (GSQL only)
When TinkerPop Makes Sense:
- Multi-cloud strategy requiring database flexibility
- Existing investment in Gremlin queries
- Need to switch between Neptune/Cosmos/JanusGraph
- Avoiding single-vendor dependency
When TinkerPop Doesn’t Make Sense:
- Neo4j-specific features required
- GSQL analytics capabilities needed
- Cypher/GQL standardization bet
- Simple use case not needing portability
ORM/OGM Abstraction#
Available OGMs:
- Neomodel: Neo4j only
- Object-Graph Mappers: Database-specific implementations
Limitation: No cross-database Python OGM exists. OGMs provide code abstraction but not database portability.
Lock-in Risk Mitigation Strategies#
Strategy 1: TinkerPop-First#
Choose Gremlin-compatible database; use gremlinpython exclusively.
Pros: Maximum portability, multi-vendor competition Cons: Excludes Neo4j, foregoes Cypher benefits Risk Level: Low lock-in, medium feature limitation
Strategy 2: GQL-Ready Cypher#
Choose Neo4j or openCypher database; prepare for GQL migration.
Pros: Best tooling (Neo4j), GQL future-proofing Cons: Near-term Cypher lock-in, GQL timeline uncertainty Risk Level: Medium lock-in, low feature limitation
Strategy 3: Abstraction Layer#
Build internal abstraction over database clients.
Pros: Control over interfaces, potential future migration Cons: Development overhead, incomplete feature coverage Risk Level: Low lock-in, high development cost
Strategy 4: Cloud Provider Lock-in Accept#
Choose Neptune or Cosmos DB; accept cloud platform dependency.
Pros: Managed service benefits, cloud ecosystem integration Cons: Full cloud vendor lock-in Risk Level: High lock-in, low operational burden
Recommendations by Use Case#
Startup/MVP#
- Choice: Neo4j + Cypher
- Rationale: Best developer experience, largest community, GQL path
- Lock-in Acceptance: Medium (acceptable for velocity)
Enterprise Multi-Database#
- Choice: TinkerPop/Gremlin
- Rationale: Proven portability, vendor-neutral governance
- Lock-in Acceptance: Low (portability required)
Deep Analytics#
- Choice: TigerGraph + GSQL
- Rationale: Best performance for complex algorithms
- Lock-in Acceptance: High (feature-driven decision)
Cloud-Native#
- Choice: Neptune or Cosmos DB (matching cloud provider)
- Rationale: Operational simplicity, ecosystem integration
- Lock-in Acceptance: High (cloud strategy dependent)
Strategic Recommendations: Graph Database Client Libraries#
5-Year Horizon Summary#
For Python projects requiring graph database capabilities over the next 5 years, the strategic landscape centers on two viable paths: Neo4j/Cypher with GQL evolution, or TinkerPop/Gremlin for multi-database portability.
Primary Recommendation: Neo4j Python Driver#
Package: neo4j
When to Choose: Default choice for most new graph database projects
Rationale#
- Market Leadership: 44% market share, $200M+ ARR, Fortune 100 adoption
- Funding Stability: $581M raised, $2B valuation, path to IPO
- Active Development: Monthly releases, Python 3.13 support, Rust extensions
- GQL Alignment: Cypher converging to ISO GQL standard (smooth transition)
- AI/ML Integration: Best GraphRAG tooling, LangChain integration, vector support
- Community: Largest graph database community, extensive documentation
Risk Factors to Monitor#
- Breaking changes in major versions (5.x to 6.x pattern)
- Managed service (AuraDB) pricing evolution
- GQL standardization timeline slippage
Secondary Recommendation: gremlinpython (TinkerPop)#
Package: gremlinpython
When to Choose: Multi-database portability required
Rationale#
- True Portability: Works across Neptune, Cosmos DB, JanusGraph, DataStax
- Apache Governance: Foundation backing, multi-vendor PMC
- Cloud Flexibility: Switch between AWS/Azure/on-premise
- Long-term Stability: Apache projects rarely abandoned
Risk Factors to Monitor#
- TinkerPop 4.0 migration (significant API changes)
- No GQL convergence (separate from Cypher/GQL ecosystem)
- Learning curve for imperative traversal patterns
Conditional Recommendations#
For OGM Requirements: neomodel#
Package: neomodel
Condition: Need Python OGM patterns with Neo4j
- Active maintenance under Neo4j Labs
- Async support added 2024
- Production use by major enterprises
- Monitor maintainer activity (smaller team)
For Multi-Model Needs: python-arango#
Package: python-arango
Condition: Document + graph + key-value in single database
- BSL 1.1 licensing change (2024) limits SaaS use
- Viable for internal applications
- Async variant available
For Enterprise Analytics: pyTigerGraph#
Package: pyTigerGraph
Condition: Deep graph algorithms, existing TigerGraph investment
- Strong enterprise backing
- GSQL lock-in is significant risk
- Not recommended for new projects without specific requirements
Library Avoidance List#
- Deprecated packages:
neo4j-driver(useneo4jinstead) - Abandoned projects: py2neo (deleted), unmaintained forks
- Proprietary-only SDKs: Unless committed to that vendor long-term
Strategic Decision Framework#
Choose Neo4j (neo4j) When:#
- Starting a new graph database project
- Developer experience is a priority
- GraphRAG or knowledge graph use case
- Willing to bet on GQL standardization
- Single-database architecture acceptable
Choose TinkerPop (gremlinpython) When:#
- Multi-cloud or multi-database strategy required
- Using Neptune, Cosmos DB, or JanusGraph
- Vendor-neutral governance is important
- Portability outweighs developer convenience
Choose Cloud-Specific When:#
- Already committed to AWS (Neptune) or Azure (Cosmos)
- Managed services preferred over self-hosted
- Cloud ecosystem integration is primary concern
5-Year Outlook Summary#
| Library | 2025 Status | 2030 Projection |
|---|---|---|
| neo4j | Strong | Dominant (GQL leader) |
| gremlinpython | Strong | Stable (portability) |
| neomodel | Good | Dependent on community |
| python-arango | Good | Viable (watch license) |
| pyTigerGraph | Adequate | Niche (enterprise) |
Highest Confidence Bet: Neo4j Python driver with Cypher/GQL path Best Hedge Strategy: TinkerPop for projects needing future flexibility