1.186 Time-Series Databases#
Survey of time-series databases and extensions: TimescaleDB (PostgreSQL extension), InfluxDB, Prometheus, QuestDB, VictoriaMetrics, and TDengine. Covers storage engines, query languages, ingestion patterns, and use case fit for metrics, IoT, and financial data.
Explainer
Why Time-Series Databases Exist: A Domain Explainer#
The Problem With a Regular Database#
Imagine you are running a warehouse. You keep a ledger. Each morning you update the ledger to reflect the current state of inventory: 500 units of product A, 230 units of product B, and so on. Each row in the ledger represents the current truth about one product. When inventory changes, you update the existing row. The ledger has one row per product, and it always reflects the present.
Now imagine a different kind of record-keeping. Instead of tracking inventory, you are recording temperature readings from sensors throughout the warehouse: one sensor per shelf, measuring every 10 seconds. Each reading is a new fact — not an update to an existing fact. At the end of a day you have 8,640 readings per sensor. After a year, assuming 1,000 sensors, you have over 3 billion rows. You never update an old reading; you only ever append new ones. And when you query the data, you almost always ask “what happened between 2 PM and 4 PM on Tuesday?” — a time range — rather than “show me the current state of sensor 47.”
These two workloads are fundamentally different. A general-purpose relational database was designed for the first workload: a moderate number of rows, each representing current state, frequently updated. A time-series database was designed for the second: an ever-growing stream of immutable, time-stamped facts that are queried by time range.
The mismatch between the two patterns is what created the time-series database as a specialized category.
What Makes Time-Series Data Different#
It Only Ever Grows#
In a regular operational database, rows are created, updated, and deleted. An order record gets a status update when it ships. A user record changes when someone edits their profile. The total number of rows stays roughly proportional to the number of live entities in your system.
In a time-series dataset, historical data is immutable. A temperature reading from last Tuesday is never corrected. A server CPU measurement from three weeks ago will not change. You can delete old data (via retention policies), but you never update it. And because new readings arrive continuously, the table grows without bound for as long as the system operates.
This changes everything about how data should be stored. A general-purpose database uses storage formats and index structures optimized for random updates — changing any arbitrary row at any time. A time-series database can assume that writes are almost always appended to the end of the dataset (the latest timestamp), which enables dramatically simpler and faster storage structures.
Writes Are Relentless#
A busy e-commerce site might process hundreds of orders per hour. That is manageable for a general-purpose database — it is designed for that kind of transactional workload.
A time-series workload at the same company might collect tens of thousands of metrics per second: CPU and memory from every server, request counts from every API endpoint, error rates from every service, queue depths from every message broker, and sensor readings from every network switch. If the company runs 1,000 servers and collects 100 metrics per server at 10-second intervals, that is 10,000 metric data points per second arriving at the storage layer — every second, indefinitely.
Purpose-built time-series storage engines are designed around a simple optimization: because new data always has the newest timestamp, writes are always sequential. Sequential writes to disk are an order of magnitude faster than random writes. A general-purpose database, designed to update arbitrary rows, cannot make this assumption and pays a performance penalty accordingly.
Queries Are Almost Always About Time Ranges#
Consider what questions people ask of time-series data:
- “What was the average CPU utilization of our web servers between 2 AM and 6 AM this morning?”
- “Show me the temperature readings from sensor 47 for the past 24 hours.”
- “What is the 99th percentile API response time for the past week?”
- “How has our error rate trended over the past month?”
Every single one of these questions has a time range at its center. Nobody asks “show me all the records where the value was exactly 73.4” in a time-series context. Time is the primary axis.
This means a time-series database can organize its on-disk storage by time and dramatically accelerate the most common queries. When you ask for data from last Tuesday, the database knows exactly where on disk Tuesday’s data lives — it never scans records from any other day. A general-purpose database with a timestamp column has to use an index to find the right records, which is slower than direct range access.
Retention: Old Data Should Expire#
In an operational database, old records typically stay forever because they represent business facts. A customer’s order history from 2018 is still a valid business record.
In a time-series context, high-resolution historical data often loses value quickly. You need CPU readings at 10-second resolution to debug a production incident that happened this morning. You probably do not need 10-second resolution data from three years ago. You might want the hourly averages from three years ago (useful for capacity planning), but the raw per-second readings are just wasting storage.
Time-series databases make retention management a first-class feature. You define policies like “keep raw data for 30 days, then automatically delete.” Or “keep raw data for 7 days, keep 1-minute averages for 90 days, keep hourly averages forever.” The database enforces these policies automatically in the background, without you having to write and schedule DELETE queries.
Enforcing retention via DELETE on a general-purpose database is painful at scale. Deleting millions of rows from a large table can take hours, cause massive I/O load, and leave fragmented storage that must be vacuumed. Time-series databases solve this by organizing data into time-bounded segments — dropping an old segment is a single filesystem operation, not millions of row deletions.
Downsampling: Trading Resolution for History#
Related to retention is downsampling. High-resolution data (readings every second) is expensive to store and slow to query over long time periods. Low-resolution data (hourly averages) is cheap and fast, but loses the detail needed for recent analysis.
The solution is to automatically produce multiple resolutions of the same data and store all of them simultaneously. You keep the last 7 days at 10-second resolution, the last 90 days at 1-minute resolution, and the last 5 years at 1-hour resolution. Queries automatically route to the appropriate resolution based on the time range requested.
This is called downsampling or rollup, and it is standard practice in time-series systems. The original Graphite system (from 2006) had round-robin databases that handled this automatically. Modern systems like TimescaleDB call them continuous aggregates; InfluxDB calls them downsampling tasks; Prometheus calls them recording rules. The concept is the same: compute summaries of fine-grained data and store them for efficient long-range queries.
General-purpose databases can do this with materialized views, but the management is manual and the integration with retention policies is disconnected.
The Logbook vs. the Spreadsheet#
A useful mental model: a general-purpose database is like a spreadsheet. Each row represents a thing in the world. You can update any row at any time. You look things up by what they are.
A time-series database is like a logbook. Each entry records what happened at a specific moment. Entries are never changed. You read back through the log to understand history. You look things up by when they happened.
A spreadsheet is great for tracking the current state of a project. A logbook is great for recording every event that happened during an experiment. They serve different purposes, and you would not use one where the other belongs.
When You Do NOT Need a Time-Series Database#
Not every dataset with timestamps needs a specialized TSDB. A general-purpose database with a timestamp column is often enough. Indicators that you do not need a TSDB:
Low write rate: If you are storing a few hundred records per hour, a regular database handles it easily. The write-rate problem only becomes relevant at thousands of records per second.
Irregular timestamps: If your data is not arriving at a consistent cadence — occasional events rather than continuous measurements — a regular database may be a better fit. TSDBs optimize for regular cadences.
Updates are needed: If your “time-series” data actually gets updated (corrected historical records, late-arriving corrections that overwrite earlier values), the immutable append model of TSDBs becomes a friction point.
Complex relational queries dominate: If your most important queries join time-stamped events with multiple other tables — customer records, product catalogs, geographic data — a general-purpose database with a good timestamp index may serve you better than a TSDB, which typically has limited or no join capabilities.
Small total data volume: If your entire time-series dataset fits comfortably in memory (under a few gigabytes), the performance benefits of a specialized storage engine are negligible.
The rough threshold: if you are collecting more than a few thousand data points per second, planning to store months or years of data, and your primary queries are always time-range scans, you should evaluate a TSDB. Below that threshold, a well-indexed timestamp column in PostgreSQL is likely sufficient.
The Cardinality Problem#
There is one concept that trips up many developers new to time-series databases: cardinality.
In a TSDB, every unique combination of metadata labels (tags) associated with a time-series creates a separate “series.” If you track CPU utilization with three labels — host, region, and datacenter — and you have 1,000 hosts in 10 regions in 5 datacenters, you have 50,000 possible combinations (assuming they are all occupied). That is the cardinality of your dataset: 50,000 series.
High cardinality becomes a problem when labels take on many unique values. If you tag a series with a request ID (unique per HTTP request) or a user ID (one per user), the number of series explodes to billions. Most TSDB index structures are designed to hold the index in memory for fast lookups — billions of series exceed available RAM.
This is why you should never use high-cardinality values (UUIDs, request IDs, session tokens, user IDs) as labels or tags in a TSDB. Those values belong in the event payload, not as index keys. This is a fundamental design constraint that trips up teams migrating from relational databases, where every column can be indexed without concern for this specific kind of cardinality explosion.
Modern TSDBs (InfluxDB v3, QuestDB, VictoriaMetrics) have largely addressed this constraint through alternative index structures. But understanding the historical cardinality problem is important for reading existing documentation, understanding legacy deployments, and making sense of design decisions in older InfluxDB v1 and Prometheus deployments.
Summary#
Time-series databases exist because the shape of time-series data — continuous, high-velocity, immutable, time-range-queried, retention-managed — is a poor fit for the general-purpose storage and indexing strategies of relational and document databases.
The core benefits a TSDB provides over a general-purpose database with a timestamp column are:
- Sequential write throughput that scales to millions of points per second
- Time-range queries that scan only the relevant time partition, not the entire table
- Built-in retention policies that drop old data instantly without expensive DELETE scans
- Built-in downsampling that automatically maintains multiple resolutions of historical data
- Storage formats optimized for time-ordered, append-heavy workloads — typically achieving 10–30x compression over raw storage
If your data fits the time-series pattern, a purpose-built TSDB is not premature optimization — it is the right tool from the start.
S1: Rapid Discovery
S1: Rapid Discovery — Time-Series Databases#
What This Space Is#
Time-series databases (TSDBs) are storage engines designed around a simple but demanding constraint: every record has a timestamp, data arrives continuously at high velocity, and reads are almost always range scans across time. That constraint makes a general-purpose relational or document database the wrong tool at scale — and makes TSDBs worth knowing well.
The space divides into three rough camps. First, purpose-built TSDBs (InfluxDB, QuestDB, TDengine) that were designed from scratch for sequential time-indexed writes. Second, TSDB extensions on existing databases (TimescaleDB on PostgreSQL). Third, metrics-specific systems that combine ingestion with alerting and federation (Prometheus, VictoriaMetrics). Graphite and its Whisper file format predate all of them and remain widely deployed in legacy infrastructure.
The Landscape at a Glance#
TimescaleDB (~18k GitHub stars, Apache 2 / Timescale License). A PostgreSQL extension that adds automatic time partitioning through a concept called hypertables. If your team knows SQL and already operates PostgreSQL, TimescaleDB is the lowest-friction path to time-series at scale. You get continuous aggregates (materialized rollups that stay fresh), compression, retention policies, and the entire PostgreSQL ecosystem (PostGIS, logical replication, pg_dump, every ORM ever written). The trade-off: it is still PostgreSQL under the hood, so extreme write throughput (millions of points per second) eventually hits limits that purpose-built engines avoid.
InfluxDB (~28k GitHub stars, MIT for OSS editions, commercial for Cloud/Enterprise). The most widely recognized purpose-built TSDB. Version 1.x used InfluxQL (SQL-like) and the TSM storage engine. Version 2.x introduced Flux, a functional scripting language for queries and transformations. Version 3.0 (2023-2024) is a ground-up rewrite using Apache Arrow and Parquet as the storage layer, with SQL as the primary query interface. The v3 rewrite resolves long-standing cardinality limitations and aligns InfluxDB with the broader Arrow/Parquet ecosystem. Community consensus: v1/v2 are mature and battle-tested; v3 is promising but still gaining adoption. Strong fit for IoT telemetry and application instrumentation.
Prometheus (~56k GitHub stars, Apache 2, CNCF Graduated). The de facto standard for infrastructure and application metrics in cloud-native environments. Prometheus works on a pull model — it scrapes HTTP endpoints at a configured interval, stores data in its own on-disk TSDB format (chunks + inverted index), and exposes PromQL for querying. It is not designed for long-term retention (default 15 days) or for general time-series use cases. It is designed to be the alerting backbone of a Kubernetes cluster. Community consensus: if you run Kubernetes, you run Prometheus. It is effectively mandatory infrastructure. The ecosystem around it — Grafana, Alertmanager, dozens of exporters — is enormous.
QuestDB (~14k GitHub stars, Apache 2). A newer entrant that has attracted significant attention for raw ingestion throughput benchmarks. QuestDB uses a columnar append-only storage format and SIMD-accelerated query execution. The query language is SQL with time-series extensions (SAMPLE BY, LATEST ON). Community consensus: genuinely fast, clean SQL interface, good fit for financial tick data or high-frequency sensor data. Smaller community than InfluxDB or Prometheus, but growing. Backed by venture capital.
VictoriaMetrics (~13k GitHub stars, Apache 2 for single-node, commercial for cluster). Positions itself as a drop-in Prometheus replacement with better compression (claims 20x vs Prometheus TSDB) and higher ingestion throughput. Exposes MetricQL (a superset of PromQL) and Prometheus remote write/read endpoints, so migration is typically configuration-only. Available as a single binary (unusually operationally simple) or a cluster deployment. Community consensus: operationally excellent, dramatically outperforms Prometheus at scale, widely adopted in organizations that outgrew Prometheus but don’t want to switch ecosystems.
TDengine (~23k GitHub stars, AGPL 3.0 / commercial). An open-source TSDB with built-in stream processing, message queuing, and caching, designed with IoT deployments in mind. Unique “supertable” concept separates the schema (columns) from device identity (tags). Strong adoption in China; growing international presence. Community consensus: good for IoT pipelines where you want the entire data pipeline in one product; less adoption in Western cloud-native stacks compared to InfluxDB.
Graphite + Whisper (Graphite ~6k stars, very mature). The original open-source metrics system, created at Orbitz in 2006. Graphite uses Whisper, a fixed-size round-robin database format inspired by RRDtool. Whisper pre-allocates disk space for a fixed retention period and resolution, which means old data is automatically overwritten — no explicit retention policy needed. The query language is a functional pipeline of transformation functions applied to metric path globs. Community consensus: legacy but ubiquitous. Many organizations still run Graphite. New projects almost never choose it; migration away from it is common. Included here because developers will encounter it in existing systems.
Quick Answer#
TimescaleDB if your team knows PostgreSQL and wants SQL, foreign keys, joins with relational data, and PostGIS. The right choice when time-series is one workload among several and you want a single database tier.
Prometheus for infrastructure and application metrics monitoring in cloud-native / Kubernetes environments. Not a general TSDB; purpose-built for the scrape-alert-graph pattern.
InfluxDB for IoT telemetry, application instrumentation, or any use case where you want a purpose-built TSDB with a polished developer experience and hosted cloud option. v3 is the direction for new projects.
QuestDB for extremely high-throughput ingestion (financial tick data, high-frequency sensor streams) where SQL familiarity matters and you want raw performance.
VictoriaMetrics as a Prometheus-compatible backend that scales further and compresses better. The upgrade path when Prometheus alone is not enough.
TDengine for IoT pipelines where stream processing and time-series storage should come from the same product, particularly in environments comfortable with AGPL licensing.
Graphite only when maintaining existing infrastructure. Do not start new projects on Graphite.
Community Consensus and Trade-offs#
The most debated question in this space is whether to use an extension (TimescaleDB) or a purpose-built engine (InfluxDB, QuestDB). Extensions win on ecosystem compatibility and operational simplicity for teams already running PostgreSQL. Purpose-built engines win on raw throughput at the highest scale and on feature sets purpose-designed for time-series (e.g., line protocol ingestion, native downsampling).
The second major debate is around cardinality. High cardinality (many unique tag combinations) is a known scaling challenge in InfluxDB v1/v2 and in Prometheus. VictoriaMetrics and InfluxDB v3 have specifically addressed this. QuestDB uses a columnar design that handles cardinality differently. Understanding your cardinality profile before choosing a TSDB is essential.
The Prometheus ecosystem is not really a “database choice” — it is a monitoring stack choice. If you adopt Prometheus, you adopt the entire pull-model, PromQL, Alertmanager, Grafana ecosystem. It is a coherent whole, not just a storage layer.
Flux (InfluxDB v2’s scripting language) was controversial. It is powerful but has a steep learning curve compared to InfluxQL or SQL. InfluxDB v3’s shift back to SQL as the primary interface is widely seen as the right correction.
TimescaleDB’s compression (2.x+) is genuinely impressive — columnar compression within PostgreSQL achieves ratios competitive with purpose-built TSDBs. Continuous aggregates (materialized time-bucketed rollups that refresh automatically) are a significant productivity feature that no other TSDB matches in terms of integration depth.
S2: Comprehensive
S2: Comprehensive Discovery — Time-Series Databases#
TimescaleDB#
Architecture: Hypertables and Chunk-Based Partitioning#
TimescaleDB’s core abstraction is the hypertable — a PostgreSQL table that the extension automatically partitions into time-ordered chunks. Each chunk is a standard PostgreSQL table internally, which means all PostgreSQL tooling (EXPLAIN, pg_dump, logical replication, indexes, triggers) works without modification. The extension intercepts DML at the planner level to route writes and reads to the correct chunks.
Time partitioning is mandatory. By default, TimescaleDB creates a new chunk for each time interval you configure (e.g., 7 days per chunk). As data ages, older chunks can be compressed or moved to a different tablespace (tiered storage).
Space partitioning is optional. You can add a second partition dimension — typically a device ID or sensor ID — which hash-partitions each time interval into sub-chunks. Space partitioning helps when you have many parallel writers, each writing to a different device, because it eliminates write contention at the chunk level by ensuring each writer lands in its own chunk.
Chunk exclusion at query time is automatic. When you run a query with a WHERE clause on the time column, the planner excludes all chunks outside the requested range without scanning them. This is the fundamental performance benefit: large tables do not degrade query performance proportionally to total data volume, only to the volume within the queried time range.
Continuous Aggregates#
Continuous aggregates are TimescaleDB’s answer to pre-computation. They are materialized views backed by a refresh policy that tracks which time buckets have received new data and selectively re-materializes only those buckets. You define them with time_bucket() — a function that rounds timestamps to a fixed interval.
A 1-minute sensor reading dataset can have a continuous aggregate at 1-hour granularity and another at 1-day granularity. Queries against those aggregates are instant (they read pre-computed rows) rather than scanning millions of raw rows. Continuous aggregates can be layered — a daily aggregate can be built on top of an hourly aggregate rather than re-scanning raw data.
Real-time aggregation mode merges the materialized data with un-materialized recent data, so queries always return up-to-date results without waiting for a refresh cycle to complete.
Compression#
TimescaleDB’s columnar compression (2.x+) operates at the chunk level. When a chunk falls outside a configured compression policy window (e.g., compress chunks older than 7 days), the extension re-encodes the chunk in columnar format and applies dictionary encoding, delta-of-delta encoding for timestamps, XOR encoding for floats, and Gorilla-style compression for boolean and integer runs.
Compression ratios of 90–95% are common on dense sensor data (10–20x reduction). Compressed chunks are still queryable via the same SQL interface — the decompression happens transparently in the executor. Writes into compressed chunks require decompression first (a “decompression” step), which is why compression is typically applied to read-mostly historical data.
Retention Policies#
A retention policy drops chunks older than a threshold. Because each chunk is a separate PostgreSQL table, the drop operation is a DDL statement (DROP TABLE on the chunk), not a row-level DELETE scan. This makes retention policy execution nearly instantaneous even on tables with billions of rows.
InfluxDB Line Protocol Compatibility#
TimescaleDB 2.x supports ingestion via the InfluxDB line protocol using the pg_timescaledb connector or third-party bridges. This allows existing InfluxDB writers to send data to TimescaleDB without modifying application code.
InfluxDB#
TSM Storage Engine (v1/v2)#
The Time-Structured Merge Tree (TSM) is InfluxDB’s on-disk storage format, a variant of the LSM tree concept adapted for time-series. Incoming writes go to an in-memory cache (WAL-backed for durability). Periodically, the cache is flushed to immutable TSM files on disk. Background compaction merges multiple TSM files into larger ones, improving read performance.
TSM files contain time-ordered blocks of values, organized by series key (the combination of measurement name + tag set). Compression is applied per-column within each block: timestamps use delta encoding, floats use XOR (Gorilla), integers use run-length encoding, strings use Snappy compression.
Cardinality constraint: In v1/v2, each unique combination of tag values (tag set) creates a new series. The series count is bounded by available memory — the inverted index (mapping tag values to series) is held in memory. High-cardinality tag sets (e.g., a unique UUID per request) can exhaust memory. This is InfluxDB v1/v2’s most significant architectural limitation.
Line Protocol Wire Format#
InfluxDB’s line protocol is a compact text format:
measurement,tag1=val1,tag2=val2 field1=1.0,field2=2i timestampThe protocol is intentionally simple: measurement name, comma-separated tags (indexed), space, space-separated fields (values), optional nanosecond Unix timestamp. It is widely implemented — Telegraf, Vector, Fluent Bit, and many hardware SDKs write line protocol natively. This has made line protocol a de facto standard for TSDB ingestion, adopted by TimescaleDB and QuestDB as an input format.
InfluxQL and Flux#
InfluxQL (v1): SQL-like dialect. SELECT mean("value") FROM "cpu" WHERE time > now() - 1h GROUP BY time(5m), "host". Familiar, limited — no arbitrary joins, no user-defined functions, no streaming transformations.
Flux (v2): A functional data scripting language. Data flows through a pipeline of functions: from() → range() → filter() → aggregateWindow() → yield(). Powerful for transformations, joins across measurements, and scripted alerting. Steep learning curve; controversial in the community. Being deprecated in v3.
SQL (v3): InfluxDB 3.0 returns to SQL as the primary query interface, built on Apache Arrow DataFusion.
InfluxDB v3 Architecture#
Version 3 is a complete storage engine rewrite. The primary changes:
- Storage layer uses Apache Parquet files on object storage (S3/GCS/Azure Blob). This eliminates local disk as a bottleneck and enables infinite retention at object storage pricing.
- Query engine is Apache Arrow DataFusion — a vectorized query engine that operates on Arrow columnar memory format. This enables SIMD-accelerated query execution and interoperability with the Arrow ecosystem (DuckDB, Polars, pandas).
- Cardinality constraints are largely eliminated because the inverted index design is replaced by Parquet file statistics and partition pruning.
- SQL is the primary query interface. InfluxQL is supported for backward compatibility.
- The write path uses the InfluxDB IOx (previously the codename for v3) write buffer, which writes to object storage via a WAL-like mechanism.
Prometheus#
Pull Model#
Prometheus operates on a pull model: rather than receiving pushed metrics, it scrapes HTTP endpoints that expose metrics in the Prometheus text format (or OpenMetrics). Each scrape target exposes a /metrics endpoint returning current metric values. Prometheus polls these endpoints at a configured interval (typically 15–60 seconds).
This architecture has significant operational implications. Service discovery (Kubernetes, Consul, EC2 auto-discovery) drives the scrape target list dynamically. The scraping server (Prometheus itself) is the single point that must reach all targets — in large deployments this creates network topology requirements. The pull model also means metric data has a resolution floor set by the scrape interval.
Pushgateway exists as an escape hatch for batch jobs and short-lived processes that cannot be scraped. Metrics are pushed to the Pushgateway, which Prometheus then scrapes. This is intentionally second-class — for persistent services, pull is preferred.
On-Disk TSDB Format#
Prometheus stores data in a custom on-disk TSDB format. The storage is organized into blocks, each covering a two-hour time window by default. Each block contains:
- Chunks: compressed series data. Each series is stored as a sequence of Gorilla-compressed chunks of 120 samples each (approximately two hours at 15-second scrape intervals).
- Index: an inverted index mapping label names and values to series. This index is how Prometheus evaluates label selectors efficiently.
- Tombstones: records of delete operations (Prometheus supports deletes via the admin API; they are lazy).
- Meta: block metadata (min/max time, statistics).
Background compaction merges two-hour blocks into larger blocks (6h, 24h, then multi-day) to improve query efficiency for longer time ranges and to apply better compression.
PromQL#
Prometheus Query Language evaluates expressions over time-series identified by metric name and label selectors. The data model is: metric name + label set → float64 time-series. There are no strings as values; strings are labels.
Key concepts:
- Instant vectors: one sample per series at a point in time.
- Range vectors: a window of samples per series over a duration.
http_requests_total[5m]returns the last 5 minutes of samples. - Functions:
rate(),irate(),increase(),histogram_quantile(),predict_linear(). - Aggregation operators:
sum(),avg(),topk(),bottomk(), grouped by label sets.
Remote Write/Read#
Prometheus supports remote write — forwarding all ingested samples to an external system via a protobuf-encoded HTTP POST. This is how long-term storage backends integrate: Thanos, Cortex, Mimir, VictoriaMetrics, and InfluxDB all accept Prometheus remote write. Remote read allows Prometheus to query external systems as if they were local storage.
Recording Rules and Federation#
Recording rules pre-compute expensive PromQL expressions and store the result as a new time-series. They are the Prometheus equivalent of continuous aggregates — trade storage for query speed on commonly evaluated expressions.
Federation allows one Prometheus instance to scrape a subset of time-series from another Prometheus instance. This enables hierarchical deployments: regional Prometheus instances collect local metrics; a global Prometheus instance federates aggregated metrics from each region.
QuestDB#
Columnar Append-Only Storage#
QuestDB stores each column as a separate file on disk, sorted by time. The append-only constraint means writes are always sequential — no random writes, no LSM tree compaction overhead. This gives QuestDB theoretical write throughput limited only by disk sequential write speed.
Each column file is a typed array: timestamps in nanoseconds, doubles as IEEE 754, integers as fixed-width binary. There is no WAL by default (configurable) — the commitment model relies on OS page cache flushing.
Out-of-order ingestion (introduced in 2021) allows rows with timestamps earlier than the current write frontier. QuestDB buffers out-of-order rows in memory, sorts them, and merges them into the correct position in the column files. This is essential for real-world IoT and distributed systems where events arrive with network-induced delay.
SIMD Query Execution#
QuestDB’s query engine is written in Java (JVM) with hand-written SIMD intrinsics for hot paths. Aggregation operations (COUNT, SUM, AVG, MIN, MAX) over dense numeric columns use AVX2 vectorized instructions. For large analytical queries over millions of rows, this can achieve throughput close to theoretical memory bandwidth limits.
SQL Extensions: SAMPLE BY and LATEST ON#
QuestDB extends SQL with two time-series-specific clauses:
SAMPLE BY 1h FILL(PREV) — groups rows into time buckets of the specified duration. FILL options handle missing buckets: NONE (omit), NULL, PREV (carry forward), LINEAR (interpolate).
LATEST ON ts PARTITION BY device_id — returns the most recent row per partition key. Efficient for “last known value” queries on sensor streams.
InfluxDB Line Protocol and ILP#
QuestDB natively accepts the InfluxDB line protocol over TCP and HTTP. It also supports the PostgreSQL wire protocol (port 8812) for SQL queries, making it compatible with any PostgreSQL client library.
VictoriaMetrics#
MetricQL and PromQL Compatibility#
VictoriaMetrics exposes PromQL-compatible query and ingestion endpoints. Existing Grafana dashboards and Prometheus alerting rules work without modification. MetricQL is a superset of PromQL with additional functions (default_value, outlier_iq, histogram_share) and optimizations (implicit subquery resolution, smarter staleness handling).
Compression Architecture#
VictoriaMetrics achieves aggressive compression through:
- Delta-of-delta encoding for timestamps — differences of differences compress to near-zero for regular scrape intervals.
- XOR encoding for float64 values — the Gorilla technique, which stores XOR of consecutive values rather than raw floats.
- Variable-length integer encoding for integer deltas.
- Zstandard (zstd) as the final compression layer applied after the above encodings.
The combination yields 20x–30x compression compared to raw storage, and approximately 5–10x better than Prometheus TSDB in practice (Prometheus already applies Gorilla compression). On a large Prometheus deployment, migrating to VictoriaMetrics typically reduces storage costs by 70–80%.
Single-Binary Operational Simplicity#
The single-node VictoriaMetrics binary is a single executable with no external dependencies. It accepts Prometheus remote write, InfluxDB line protocol, Graphite plaintext, OpenTelemetry OTLP, and DataDog agent formats simultaneously. The operational simplicity is a significant selling point for teams that find Thanos or Cortex operationally heavy.
Cluster Mode#
VictoriaMetrics cluster consists of three component types: vminsert (ingestion, stateless, horizontally scalable), vmselect (query, stateless, horizontally scalable), and vmstorage (storage, stateful). Data is sharded across vmstorage nodes by metric hash. This architecture scales to hundreds of millions of active time-series and petabytes of data.
TDengine#
Supertable Concept#
TDengine’s data model separates schema (column definitions) from device identity (tags). A supertable defines the column schema. Each physical device gets its own sub-table (a regular table with its own data files). Tags are stored separately as metadata and indexed for fast filtering. Queries against the supertable aggregate across all sub-tables matching a tag filter.
This design means each device’s data is stored in its own column files, eliminating write contention between devices and making single-device reads extremely efficient.
Integrated Stream Processing#
TDengine includes a built-in stream processing engine. Continuous queries can compute moving windows, sliding windows, and event-driven aggregations over incoming data, writing results to derived tables. This eliminates the need for a separate stream processing layer (e.g., Kafka + Flink) for common IoT aggregation patterns.
Graphite and Whisper#
Round-Robin Database Design#
Whisper pre-allocates fixed-size database files. Each file stores one metric at one or more retention archives. An archive is a fixed-length array of (timestamp, value) pairs with a fixed resolution (e.g., 10-second resolution for 24 hours = 8,640 slots). When the archive fills, old data is overwritten in a round-robin fashion — hence the name.
The key consequence: storage is bounded and predictable, but old data is permanently lost once the retention window expires. Aggregation between archives (downsampling) happens automatically according to configured aggregation functions (average, sum, max, min, last).
Graphite Query Language#
The Graphite query language is a functional pipeline applied to metric paths identified by a dot-separated hierarchy and wildcard globs:
summarize(groupByNode(stats.servers.*.cpu.load, 2, 'average'), '1h', 'avg')Functions include mathematical operations, statistical functions, and graph rendering directives. The ecosystem around Graphite (Graphite-web, Grafana’s Graphite datasource) interprets these queries both for data retrieval and for rendering hints.
Current Status#
Graphite remains in maintenance mode. New features are rare. The round-robin design limits granularity flexibility — you cannot retroactively add higher-resolution archives. Migration paths to Prometheus (via prom2graphite or Graphite remote write receivers) and to InfluxDB are well-documented. For new deployments, Graphite is not recommended.
Comparison Matrix#
| System | Query Language | Write Throughput | Cardinality | Retention Control | PostgreSQL Compatible |
|---|---|---|---|---|---|
| TimescaleDB | SQL (full) | High | Unlimited | Chunk-drop | Yes (is PG) |
| InfluxDB v3 | SQL / InfluxQL | Very High | Unlimited | Object storage | No |
| Prometheus | PromQL | Medium | Limited | Drop blocks | No |
| QuestDB | SQL + extensions | Extremely High | High | Partition drop | Partial (wire) |
| VictoriaMetrics | MetricQL / PromQL | Very High | Very High | Configurable | No |
| TDengine | SQL + extensions | Very High | High (per device) | Retention policies | No |
| Graphite | Functional pipeline | Medium | Unlimited (files) | Round-robin auto | No |
Client Library Integration#
All systems covered here connect via importable SDKs or standard protocols:
- TimescaleDB: any PostgreSQL client (
psycopg2,asyncpg,pg,SQLAlchemy,ActiveRecord) - InfluxDB: official clients for Python (
influxdb-client), Go, Java, JavaScript, C# - Prometheus: client libraries for Go (
prometheus/client_golang), Python (prometheus_client), Java (simpleclient) - QuestDB: PostgreSQL wire protocol (any PG client) + ILP TCP; official Python client
- VictoriaMetrics: Prometheus remote write protocol; any Prometheus client works
- TDengine: official clients for C, Go, Python, Java, Rust; JDBC/ODBC drivers
The Prometheus client library model is particularly important: application code instruments itself using the Prometheus client, which exposes a /metrics endpoint that any PromQL-compatible system (Prometheus, VictoriaMetrics, Grafana Mimir) can scrape.
S3: Need-Driven
S3: Need-Driven Discovery — Time-Series Databases#
Who Needs This and Why#
Time-series databases touch a wide range of engineering roles. The person building a Kubernetes monitoring stack has almost nothing in common with the person ingesting 10,000 IoT sensor readings per second, even though both are “doing time-series.” Understanding the persona first clarifies which tool is appropriate.
Persona 1: DevOps / SRE — Infrastructure and Application Metrics#
The Problem They Face#
A site reliability engineer managing a Kubernetes cluster needs to know, at any moment, the CPU and memory utilization of every pod, the request rate and error rate of every service, and the latency percentiles of every API endpoint. They need to be paged when things go wrong. They need to build dashboards that show the state of the system at a glance. They need to write alert conditions that fire when a metric crosses a threshold for a sustained period.
This SRE is not building a data pipeline. They are running an operations tool. The time-series storage is incidental to their actual goal: operational visibility and automated alerting.
Why Prometheus Is the Answer#
Prometheus was designed for exactly this persona. The pull model maps well to Kubernetes: pods declare their metrics endpoint in annotations, and Prometheus discovers them via the Kubernetes API. The Prometheus client libraries for Go, Python, Java, and Ruby are the standard way to instrument application code — adding a histogram metric to a service is a three-line change. The Prometheus text format for /metrics endpoints is standardized to the point that it is effectively an industry specification.
Grafana, which most SREs use for dashboards, has native and deeply integrated Prometheus support. Alertmanager handles routing alert notifications to PagerDuty, Slack, email, and OpsGenie. The entire stack — Prometheus + Alertmanager + Grafana — is well-documented, widely understood, and supported by every cloud provider.
When They Hit Limits#
The SRE running a medium Kubernetes cluster (50–200 nodes) is fine with vanilla Prometheus. At larger scale — thousands of nodes, millions of time-series, multi-region deployments — Prometheus alone struggles. Local disk storage fills. Retention is short. Prometheus instances do not federate seamlessly beyond simple hierarchies.
At this point, the SRE typically moves to either:
Thanos or Grafana Mimir: Long-term storage solutions that bolt onto Prometheus, storing historical data on object storage (S3) while keeping Prometheus as the local scraping engine. Thanos uses a sidecar next to each Prometheus that uploads blocks to object storage. Mimir is a fully distributed Prometheus replacement.
VictoriaMetrics: Drop-in Prometheus replacement that accepts remote write from existing Prometheus instances (or replaces them entirely), stores data more efficiently, and runs as a single binary or small cluster. SREs who find VictoriaMetrics typically do not go back to Prometheus for long-term storage.
Tools They Interact With#
- Prometheus client libraries in application code
prometheus.ymlscrape configuration and alerting rulespromtoolfor rule validation- Grafana for dashboards
- Alertmanager for routing
kubectlalongsidepromtoolfor Kubernetes-native deployments- Helm charts for the kube-prometheus-stack (the standard Kubernetes monitoring stack)
Persona 2: IoT / Embedded Systems Engineer — Sensor Data Pipelines#
The Problem They Face#
An IoT engineer is collecting data from thousands (or millions) of edge devices: temperature sensors, industrial machines, GPS trackers, smart meters. Each device emits readings at a regular interval — maybe every second, maybe every 10 milliseconds. The device may have limited connectivity, so readings may arrive out of order or in batches. The core requirement is storing these readings reliably, querying recent readings per device, and aggregating over time for trend analysis.
Unlike the SRE, the IoT engineer needs to store data for months or years. Readings at 1-second resolution for one year from 10,000 devices is approximately 315 billion rows. That is the scale problem. The schema is wide (many sensor columns) and regular (same columns for all devices of the same type). The query pattern is strongly time-ranged and frequently device-filtered.
Why InfluxDB or TDengine Fits Here#
InfluxDB has strong IoT adoption because of the Telegraf agent ecosystem. Telegraf is a plugin-based metrics collection agent with over 200 input plugins — MQTT (the dominant IoT messaging protocol), Modbus (industrial automation), OPC-UA, serial port sensors, and hardware-specific integrations. IoT devices or gateways write to an MQTT broker; Telegraf subscribes and forwards to InfluxDB. The entire pipeline is mature and well-documented.
InfluxDB’s line protocol is simple enough that firmware engineers can implement it from scratch in C or Rust to write directly from edge hardware without a full SDK.
TDengine maps particularly well to IoT because its supertable/sub-table design was explicitly designed for this pattern. Each physical device gets its own sub-table. Queries across the supertable aggregate across all devices with tag-based filtering. The built-in stream processing handles common IoT aggregation patterns (moving averages, anomaly detection) without a separate streaming layer.
Where TimescaleDB Fits for IoT#
TimescaleDB is a strong choice for IoT teams whose devices connect to the cloud via REST APIs (rather than MQTT/line protocol), and whose analysis requires joining sensor data with relational metadata (device registry, location tables, customer assignments). The ability to do a SQL JOIN between sensor readings and a device registry table is genuinely useful and not possible in InfluxDB or TDengine without an additional data warehouse layer.
TimescaleDB with space partitioning (partitioned by device ID within each time chunk) eliminates write contention across devices and keeps per-device queries fast.
What They Need From the Client Library#
IoT engineers need:
- A lightweight write client (line protocol over UDP or HTTP is common for constrained devices)
- Batch write support (sending accumulated readings in one HTTP call when connectivity is available)
- Out-of-order write handling (most TSDBs support this; it is essential for unreliable connectivity)
- Efficient range queries per device
- Downsampled historical queries (1-minute averages over 6 months, not raw seconds)
Persona 3: Financial Data Engineer — Market Tick Data and Quantitative Analysis#
The Problem They Face#
A quantitative developer or financial data engineer is working with market data: stock prices, order book snapshots, trade executions, options chains. The volume is extreme at peak — top exchanges publish millions of updates per second across all symbols. The query pattern is also extreme: backtesting a trading strategy requires scanning years of tick data for specific symbols at millisecond granularity. Latency matters at ingestion (low-latency capture) and at query time (interactive backtesting).
This persona cares deeply about:
- Nanosecond timestamp precision (exchanges timestamp events with nanosecond clocks)
- Query throughput for analytical scans (backtesting is a full-table scan over years of data)
- SQL familiarity (quantitative analysts know SQL and Python; they do not want to learn Flux or PromQL)
- Correctness (financial data cannot be silently lost or corrupted)
Why QuestDB Is Often the Best Fit#
QuestDB was explicitly designed with financial data use cases in mind. Its columnar append-only storage is optimized for sequential writes (tick capture) and sequential reads (backtesting scans). SIMD-accelerated aggregation over dense float64 columns — computing OHLCV bars (Open-High-Low-Close-Volume) over billions of ticks — is a core benchmark QuestDB optimizes against.
QuestDB’s SQL extensions are particularly relevant here:
SAMPLE BY 1sproduces OHLCV aggregations naturallyLATEST ON ts PARTITION BY symbolretrieves the most recent tick per symbol instantly- Nanosecond timestamp support is native
QuestDB also supports the PostgreSQL wire protocol, so existing Python analytics workflows using psycopg2 or SQLAlchemy work without modification.
Where TimescaleDB Competes#
TimescaleDB is a credible alternative for financial data engineers who need to JOIN tick data with reference data (instrument metadata, corporate actions, exchange holidays) stored in the same PostgreSQL database. The continuous aggregate feature handles OHLCV bar generation efficiently, and the full PostgreSQL ecosystem (window functions, CTEs, PL/pgSQL, foreign data wrappers) is valuable for complex analytics.
The trade-off: QuestDB’s raw scan throughput on dense numeric data is higher than TimescaleDB’s. For applications that are purely analytical (ingestion speed and scan throughput matter most), QuestDB wins. For applications that need relational joins and the PostgreSQL ecosystem, TimescaleDB wins.
What InfluxDB v3 Brings#
InfluxDB v3’s Apache Arrow/Parquet foundation is interesting for financial data because Parquet is already the standard format for historical market data archival (used by exchanges, data vendors, and quant research platforms). An InfluxDB v3 instance stores data as Parquet on S3, making it directly queryable by DuckDB, Polars, or pandas Arrow readers without export steps. This interoperability could be significant for quant workflows.
Persona 4: Analytics Engineer — Extending PostgreSQL for Time-Series#
The Problem They Face#
An analytics engineer at a SaaS company is responsible for user-facing analytics: usage dashboards showing API calls over time, feature adoption trends, billing metrics, data export pipelines. The data model includes time-stamped events (API calls, user actions, billing records) alongside relational data (users, plans, organizations). Queries frequently join across both dimensions.
This persona is already running PostgreSQL. They know SQL. Their BI tooling (Metabase, Redash, Superset, Looker) connects to PostgreSQL. Their ORM-based application already writes to PostgreSQL. They do not want to operate a second database tier.
Why TimescaleDB Is the Natural Choice#
For this persona, TimescaleDB is often the answer with the least disruption. The migration path is minimal:
- Install the TimescaleDB extension
- Convert the event table to a hypertable with one function call
- Create continuous aggregates for commonly queried metrics
- Configure a retention policy if old raw data should be discarded
The application code does not change — the hypertable exposes exactly the same SQL interface as a regular table. Existing BI tools work. Existing ORMs work. Existing pg_dump backups work. The operational model does not change.
The gains are concrete: queries over time ranges that previously caused sequential scans across hundreds of millions of rows now use chunk exclusion, scanning only the relevant time partition. Continuous aggregates eliminate expensive GROUP BY queries in dashboards. Compression reduces storage costs.
Where They Eventually Outgrow TimescaleDB#
Very high write throughput (tens of thousands of events per second, sustained) will eventually stress PostgreSQL’s write path. WAL amplification, autovacuum overhead on insert-heavy tables, and connection pool limits all become relevant at scale. At that point, the analytics engineer may need to either:
- Scale vertically (TimescaleDB supports very large PostgreSQL instances)
- Separate the write path from the read path (use an event streaming system like Kafka as a buffer)
- Move raw event storage to a purpose-built TSDB and replicate aggregated data back to PostgreSQL for dashboards
The threshold for “outgrowing” TimescaleDB is workload-dependent. Many production deployments run comfortably at 100,000+ rows per second with appropriate hardware. For most SaaS analytics workloads, that threshold is never reached.
Decision Heuristics by Need#
Need: Monitor Kubernetes / cloud-native infrastructure Answer: Prometheus + Alertmanager + Grafana. Non-negotiable in this context.
Need: Grow beyond Prometheus storage limits Answer: VictoriaMetrics (simpler) or Thanos/Mimir (more ecosystem support)
Need: IoT sensor ingestion with MQTT integration Answer: InfluxDB v2/v3 with Telegraf, or TDengine for stream processing
Need: Already on PostgreSQL, want time-series features Answer: TimescaleDB. Minimal disruption, maximum compatibility.
Need: High-frequency financial tick data, backtesting Answer: QuestDB for pure performance; TimescaleDB if joins with relational data matter
Need: Maintain existing Graphite infrastructure Answer: Continue with Graphite; plan migration to Prometheus or VictoriaMetrics for new deployments
Need: Store metrics for months or years with high compression Answer: VictoriaMetrics single-node or cluster; InfluxDB v3 on object storage
S4: Strategic
S4: Strategic Discovery — Time-Series Databases#
Ecosystem Health and Long-Term Viability#
TimescaleDB — Timescale Inc.#
Company: Timescale, Inc. Founded 2017 by former MIT CSAIL researchers. Headquartered in New York. Raised Series B ($40M) in 2021; total funding over $70M.
License: Apache 2.0 for the core TimescaleDB extension. The “Timescale License” (TSL) covers enterprise features (multi-node, tiered storage on object storage, some continuous aggregate features). The TSL is source-available but restricts use by competing database-as-a-service providers. A well-understood and fair license for most organizations.
Commercial offering: Timescale Cloud — a managed PostgreSQL + TimescaleDB service. Competes with AWS RDS, Google Cloud SQL, and Supabase in the managed PostgreSQL market.
Viability indicators:
- The PostgreSQL alignment is the strongest long-term signal. TimescaleDB’s core value (hypertables, chunk-based partitioning) is implemented as an extension — it benefits from every improvement to PostgreSQL’s executor, planner, and storage engine automatically. PostgreSQL itself has three decades of investment behind it.
- The extension model also means that if Timescale Inc. ceased to exist, the extension would remain usable indefinitely (it is fully open-source), and could be forked.
- The Timescale Cloud product provides a commercial revenue stream that funds ongoing development.
- Community: strong, active GitHub repository. TimescaleDB consistently appears in “best time-series database” surveys with high satisfaction scores.
Risk factors:
- The TSL on enterprise features means very large deployments or managed service providers may encounter licensing friction.
- As PostgreSQL itself adds features (notably declarative partitioning, which overlaps with hypertables conceptually), the differentiation of TimescaleDB narrows at the margins.
Verdict: High long-term viability. The PostgreSQL ecosystem dependency is a strength, not a risk.
InfluxDB — InfluxData Inc.#
Company: InfluxData, Inc. Founded 2012. Headquartered in San Francisco. Raised Series D ($81M) in 2020; total funding over $170M. One of the best-funded companies in the TSDB space.
License: MIT for open-source editions (OSS). Commercial for InfluxDB Cloud and Enterprise. InfluxDB v3 Core is Apache 2.0.
Commercial offering: InfluxDB Cloud (multi-tenant SaaS) and InfluxDB Cloud Dedicated (single-tenant managed). InfluxDB Enterprise for on-premises large-scale deployments.
Viability indicators:
- InfluxDB is the most widely recognized TSDB brand. It has the largest developer mindshare in the purpose-built TSDB category.
- The v3 rewrite (project IOx, now InfluxDB 3.0) is a major technical bet. The Apache Arrow/Parquet/DataFusion stack is a well-funded, rapidly improving open-source ecosystem. Aligning with it is strategically sound.
- Telegraf (the agent) has 200+ input plugins and massive community adoption. It is increasingly independent of InfluxDB the database and works with many backends.
- InfluxData’s commercial SaaS product provides recurring revenue.
Risk factors:
- The v1→v2→v3 migration path has been disruptive. Flux (v2’s query language) was a controversial investment that is now being deprecated in favor of SQL in v3. This history creates hesitation among developers who invested in Flux.
- v3 is architecturally very different from v1/v2. Existing v1/v2 deployments cannot upgrade in place — migration requires data movement.
- The company has been around for over a decade without achieving the scale of a public company, which raises questions about exit and long-term independence.
Verdict: High mindshare, technically strong v3 direction, but organizational history of disrupting developer investment (InfluxQL → Flux → SQL) creates justified caution. Evaluate v3 specifically; do not assume v1/v2 patterns carry forward.
Prometheus — CNCF Graduated Project#
Governance: Prometheus is a Cloud Native Computing Foundation (CNCF) Graduated project. Graduated status is the highest level in the CNCF maturity model, alongside Kubernetes, Envoy, and Fluentd. It means the project has demonstrated production adoption, a healthy governance structure, and long-term sustainability.
License: Apache 2.0.
Commercial backing: No single company owns Prometheus. Major contributors include Google, AWS, Grafana Labs, Red Hat, SoundCloud (where Prometheus originated), and many others. Grafana Labs maintains the largest commercial ecosystem around Prometheus (Grafana, Loki, Mimir, Tempo).
Viability indicators:
- CNCF Graduated status is effectively a guarantee of perpetual maintenance. Kubernetes depends on Prometheus for its standard monitoring stack (via kube-prometheus-stack). As long as Kubernetes is running in production — which will be measured in decades — Prometheus is maintained.
- The OpenMetrics standard (an IETF draft based on the Prometheus text format) is the direction for standardized metrics exposition. Prometheus alignment means alignment with this standard.
- The Prometheus client library ecosystem (Go, Python, Java, Ruby, Rust, .NET) is enormous and actively maintained by major companies.
Risk factors:
- Prometheus itself has known scale limitations (single-node storage, memory-intensive index). This is well-understood and addressed by the ecosystem (Mimir, Thanos, VictoriaMetrics) rather than by Prometheus core itself.
- The CNCF governance model means changes are slow and consensus-driven. This is a feature for stability but a limitation for rapid innovation.
Verdict: Effectively immortal in the cloud-native context. The safest possible long-term choice for infrastructure metrics. Not the right choice outside that specific use case.
QuestDB — QuestDB Ltd.#
Company: QuestDB Ltd. Founded 2019. Backed by venture capital (notable investors: OpenOcean, Y Combinator W21). Smaller funding base than InfluxData but growing.
License: Apache 2.0.
Commercial offering: QuestDB Cloud (managed SaaS). Active enterprise support offering.
Viability indicators:
- Strong technical differentiation. Benchmark results consistently show QuestDB at or near the top for ingestion throughput on dense numeric workloads.
- The SQL interface is a strategic advantage — no proprietary query language to learn. SQL familiarity broadens the potential user base.
- Y Combinator backing indicates early-stage validation. Apache 2.0 license is developer-friendly and maximizes adoption.
- Active development; the GitHub repository has consistent commit cadence and responsive maintainers.
- The financial services community has shown meaningful adoption for tick data use cases.
Risk factors:
- Smaller community and ecosystem than InfluxDB or Prometheus. Fewer StackOverflow answers, fewer community plugins, fewer pre-built integrations.
- As a younger, smaller company, long-term funding stability is less certain than for InfluxData or Timescale.
- If QuestDB were acquired or sunset, the Apache 2.0 license means the codebase survives, but community momentum might not.
Verdict: Strong technical choice, particularly for high-throughput analytical workloads. Appropriate risk profile for greenfield projects where SQL familiarity is valued. Monitor ecosystem growth over the next 2–3 years.
VictoriaMetrics — Victoria Metrics Inc.#
Company: Victoria Metrics Inc. Founded 2018 by Aliaksandr Valialkin (primary author). Small team; bootstrapped / minimally funded compared to competitors.
License: Apache 2.0 for single-node. Enterprise features under a commercial license.
Viability indicators:
- Exceptional operational reputation. Engineers who have deployed VictoriaMetrics in production consistently report fewer operational issues, better compression, and higher throughput than Prometheus TSDB.
- The single-binary design philosophy reduces operational complexity to a minimum — a deliberate and successful product decision.
- Strong PromQL compatibility means zero switching cost from Prometheus. Grafana dashboards, recording rules, and alerting rules work without modification.
- Active development from a small but focused team. The primary author is technically excellent and responsive.
- Growing adoption in the Prometheus ecosystem as the de facto “Prometheus at scale” solution.
Risk factors:
- Very small team. The project’s long-term maintenance depends heavily on a small number of contributors.
- Commercial licensing for enterprise features is less well-documented than competitors.
- Not a CNCF project (there is a CNCF sandbox proposal that has stalled). Less formal governance than Prometheus.
Verdict: Excellent technical choice for Prometheus-compatible long-term storage. Low switching cost. The small team is a risk to monitor but has so far been mitigated by the project’s quality and reputation.
TDengine — TDengine (TAOS Data)#
Company: TAOS Data. Founded 2017 in Beijing. Significant backing in Chinese venture capital ecosystem.
License: AGPL 3.0 for community edition. Commercial license available.
Viability indicators:
- Strong IoT market in China with significant domestic adoption.
- Technical depth — stream processing, message queuing, and caching integrated with TSDB storage is a coherent product vision.
- Growing international presence; English documentation has improved substantially.
Risk factors:
- AGPL 3.0 license is restrictive for SaaS deployments — any modification to TDengine that is served over a network must be open-sourced. Organizations building products on top of TDengine may need a commercial license.
- Primary adoption in China may limit Western ecosystem integrations (fewer Grafana plugins, fewer Telegraf/Fluent Bit integrations compared to InfluxDB).
- Geopolitical considerations may affect enterprise adoption decisions in some organizations.
Verdict: Sound technical product. Appropriate for IoT pipelines, especially in Asian markets. Western organizations should evaluate the AGPL license carefully and consider ecosystem maturity relative to InfluxDB for the same use case.
Graphite — Community Maintained#
Status: Maintenance mode. No active new development. The core Whisper format has not changed in years. Used in older deployments but not recommended for new projects.
Verdict: Understand it to maintain existing systems; migrate to Prometheus or InfluxDB for new work.
Decision Matrix#
| Factor | TimescaleDB | InfluxDB v3 | Prometheus | QuestDB | VictoriaMetrics |
|---|---|---|---|---|---|
| PostgreSQL compatibility | Native (is PG) | No | No | Wire protocol | No |
| SQL query language | Full SQL | SQL | PromQL | SQL + extensions | MetricQL |
| Best use case | Analytics + relational | IoT / telemetry | Infrastructure metrics | High-freq ingestion | Prometheus at scale |
| Long-term storage | Yes (object storage in TSL) | Yes (Parquet on S3) | No (15d default) | Yes | Yes |
| Cloud managed option | Timescale Cloud | InfluxDB Cloud | Grafana Cloud | QuestDB Cloud | Enterprise only |
| CNCF status | No | No | Graduated | No | No |
| License (core) | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 |
| Operational complexity | Low (extends PG) | Medium | Medium | Low | Very Low |
| Ecosystem maturity | Very High (via PG) | High | Very High | Medium | Medium-High |
Strategic Recommendations by Organization Type#
Startup with PostgreSQL: TimescaleDB. One fewer system to operate, full SQL, familiar tooling, migrate when scale demands it.
Cloud-native company (Kubernetes): Prometheus for metrics. Add VictoriaMetrics as remote storage when Prometheus runs out of disk. Never remove Prometheus — it is the scraping and alerting layer.
IoT product company: InfluxDB v3 with Telegraf. The ecosystem depth (MQTT integration, line protocol SDKs, managed cloud) reduces time to production. Evaluate TDengine if stream processing integration is a core requirement.
Quantitative finance / fintech: QuestDB for tick storage and backtesting. TimescaleDB if relational joins are required. Both are actively maintained with appropriate SQL interfaces.
Legacy infrastructure maintainer: Maintain Graphite until a migration project is justified. When migrating, Prometheus or VictoriaMetrics are the natural targets.
Large enterprise, multi-cloud: VictoriaMetrics cluster for the operational simplicity of the single-binary model extended to cluster mode. Avoid vendor lock-in with the Apache 2.0 license and standard PromQL compatibility.
Technology Trajectory (2026 and Beyond)#
The most significant trend in this space is convergence on Apache Arrow and Parquet as the interoperability layer. InfluxDB v3 is built on it. DuckDB queries Parquet natively. Polars, pandas 2.x, and the broader Python data ecosystem use Arrow internally. A TSDB that stores data as Parquet on object storage can be queried by an expanding set of tools without any export step.
The second trend is OpenTelemetry adoption. OTel defines a standardized wire format for metrics, traces, and logs. VictoriaMetrics, InfluxDB, and TimescaleDB all support OTLP ingestion. As OTel adoption grows in application instrumentation, the distinction between “metrics database” and “observability backend” blurs. Systems that accept OTLP natively gain adoption from the OTel ecosystem automatically.
The third trend is SQL as the convergence query language. InfluxDB’s abandonment of Flux in favor of SQL (v3), QuestDB’s SQL-first design, and TimescaleDB’s native SQL are all evidence that the industry has settled on SQL as the lingua franca for time-series queries. Proprietary query languages (InfluxQL, Flux) are convergence risks; SQL alignment is a long-term strategic advantage.