You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: introduction.mdx
+21-13Lines changed: 21 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,34 +9,42 @@ Trusted by 30,000+ organizations, from high-growth startups to global enterprise
9
9
10
10
## Components
11
11
12
-
Axiom consists of two fundamental components:
12
+
Axiom consists of two purpose-built data stores supported by a unified console experience:
13
13
14
14
### EventDB
15
15
16
16
Robust, cost-effective, and scalable datastore specifically optimized for timestamped event data. Built from the ground up to handle the vast volumes and high velocity of event ingestion, EventDB ensures:
17
17
18
-
***Scalable data loading:** Events are ingested seamlessly without complex middleware, scaling linearly with no single points of failure.
19
-
***Extreme compression:** Tuned storage format compresses data 25-50x, significantly reducing storage costs and ensuring data remains queryable at any time.
20
-
***Serverless querying:** Axiom spins up ephemeral, serverless runtimes on-demand to execute queries efficiently, minimizing idle compute resources and costs.
18
+
-**Scalable data loading:** Events are ingested seamlessly without complex middleware, scaling linearly with no single points of failure.
19
+
-**Extreme compression:** Tuned storage format compresses data 25-50x, significantly reducing storage costs and ensuring data remains queryable at any time.
20
+
-**Serverless querying:** Axiom spins up ephemeral, serverless runtimes on-demand to execute queries efficiently, minimizing idle compute resources and costs.
21
+
22
+
### MetricsDB
23
+
24
+
Dedicated metrics database engineered specifically for high-cardinality time-series data. Unlike traditional metrics solutions that penalize you for dimensional complexity, MetricsDB embraces high-cardinality tags as a design principle:
25
+
26
+
-**High-cardinality native:** Store metrics with high-cardinality dimensional tags without performance degradation or cost penalties.
27
+
-**Optimized storage:** Purpose-built storage format designed for time-series workloads delivers efficient compression and fast aggregations across millions of unique tag combinations.
28
+
-**Thoughtful constraints:** Design choices prioritize the most common metrics use cases while maintaining exceptional performance.
21
29
22
30
For more information, see [Axiom’s architecture](/platform-overview/architecture).
23
31
24
32
### Console
25
33
26
34
Intuitive web app built for exploration, visualization, and monitoring of your data.
27
35
28
-
***Real-time exploration:** Effortlessly query and visualize data streams in real-time, providing instant clarity on operational and business conditions.
29
-
***Dynamic visualizations:** Generate insightful visualizations, from straightforward counts to sophisticated aggregations, tailored specifically to your needs.
30
-
***Robust monitoring:** Set up threshold-based and anomaly driven alerts, ensuring proactive visibility into potential issues.
36
+
-**Real-time exploration:** Effortlessly query and visualize data streams in real-time, providing instant clarity on operational and business conditions.
37
+
-**Dynamic visualizations:** Generate insightful visualizations, from straightforward counts to sophisticated aggregations, tailored specifically to your needs.
38
+
-**Robust monitoring:** Set up threshold-based and anomaly driven alerts, ensuring proactive visibility into potential issues.
31
39
32
40
## Why choose Axiom?
33
41
34
-
***Cost-efficiency:** Axiom dramatically lowers data ingestion and storage costs compared to traditional observability and logging solutions.
35
-
***Flexible insights:** Real-time query capabilities and an increasingly intelligent UI help pinpoint issues and opportunities without sampling.
36
-
***AI engineering:** Axiom provides specialized features designed explicitly for AI engineering workflows, allowing teams to confidently build, deploy, and optimize AI capabilities.
42
+
-**Cost-efficiency:** Axiom dramatically lowers data ingestion and storage costs compared to traditional observability and logging solutions.
43
+
-**Flexible insights:** Real-time query capabilities and an increasingly intelligent UI help pinpoint issues and opportunities without sampling.
44
+
-**AI engineering:** Axiom provides specialized features designed explicitly for AI engineering workflows, allowing teams to confidently build, deploy, and optimize AI capabilities.
37
45
38
46
## Getting started
39
47
40
-
*[Learn more about Axiom’s features](/platform-overview/features).
41
-
*[Explore the interactive demo playground](https://play.axiom.co/).
42
-
*[Create your own organization](https://app.axiom.co/register).
48
+
-[Learn more about Axiom’s features](/platform-overview/features).
49
+
-[Explore the interactive demo playground](https://play.axiom.co/).
50
+
-[Create your own organization](https://app.axiom.co/register).
You don’t need to understand any of the following material to get massive value from Axiom. As a fully managed data platform, Axiom just works. This technical deep-dive is intended for curious minds wondering: Why is Axiom different?
9
9
</Tip>
10
10
11
-
Axiom routes ingestion requests through a distributed edge layer to a cluster of specialized services that process and store data in a proprietary columnar format optimized for event data. Query requests are executed by ephemeral, serverless workers that operate directly on compressed data stored in object storage.
11
+
Axiom routes ingestion requests through a distributed edge layer to a cluster of specialized services that process and store data in proprietary columnar formats optimized for different data types. EventDB handles high-volume event data, while MetricsDB is purpose-built for time-series metrics with high-cardinality dimensions. Query requests are executed by ephemeral, serverless workers that operate directly on compressed data stored in object storage.
12
12
13
13
## Ingestion architecture
14
14
15
15
Data flows through a multi-layered ingestion system designed for high throughput and reliability:
16
16
17
-
**Regional Edge Layer**: HTTPS ingestion requests are received by regional edge proxies positioned to meet data jurisdiction requirements. These proxies handle protocol translation, authentication, and initial data validation. The edge layer supports multiple input formats (JSON, CSV, compressed streams) and can buffer data during downstream issues.
18
-
19
-
**High-availability routing**: The system provides intelligent routing to healthy database nodes using real-time health monitoring. When primary ingestion paths fail, requests are automatically routed to available nodes or queued in a backlog system that processes data when systems recover.
20
-
21
-
**Streaming Pipeline**: Raw events are parsed, validated, and transformed in streaming fashion. Field limits and schema validation occur during this phase.
22
-
23
-
**Write-Ahead Logging**: All ingested data is durably written to a distributed write-ahead log before being processed. This ensures zero data loss even during system failures and supports concurrent writes across multiple ingestion nodes.
17
+
-**Regional Edge Layer:** HTTPS ingestion requests are received by regional edge proxies positioned to meet data jurisdiction requirements. These proxies handle protocol translation, authentication, and initial data validation. The edge layer supports multiple input formats (JSON, CSV, compressed streams) and can buffer data during downstream issues.
18
+
-**High-availability routing:** The system provides intelligent routing to healthy database nodes using real-time health monitoring. When primary ingestion paths fail, requests are automatically routed to available nodes or queued in a backlog system that processes data when systems recover.
19
+
-**Streaming Pipeline:** Raw events are parsed, validated, and transformed in streaming fashion. Field limits and schema validation occur during this phase.
20
+
-**Write-Ahead Logging:** All ingested data is durably written to a distributed write-ahead log before being processed. This ensures zero data loss even during system failures and supports concurrent writes across multiple ingestion nodes.
24
21
25
22
## Storage architecture
26
23
27
-
Axiom’s storage layer is built around a custom columnar format that achieves extreme compression ratios:
24
+
Axiom’s storage layer uses specialized columnar formats optimized for different workload types:
28
25
29
-
**Columnar organization**: Events are decomposed into columns and stored using specialized encodings optimized for each data type. String columns use dictionary encoding, numeric columns use various compression schemes, and boolean columns use bitmap compression.
26
+
### EventDB storage
30
27
31
-
**Block-based storage**: Data is organized into immutable blocks that are written once and read many times. Each block contains:
28
+
EventDB’s storageis built around a custom columnar format that achieves extreme compression ratios:
32
29
33
-
- Column metadata and statistics
34
-
- Compressed column data in a proprietary format
35
-
- Separate time indexes for temporal queries
36
-
- Field schemas and type information
30
+
-**Columnar organization:** Events are decomposed into columns and stored using specialized encodings optimized for each data type. String columns use dictionary encoding, numeric columns use various compression schemes, and boolean columns use bitmap compression.
31
+
-**Block-based storage:** Data is organized into immutable blocks that are written once and read many times. Each block contains:
37
32
38
-
**Compression pipeline**: Data flows through multiple compression stages:
33
+
- Column metadata and statistics
34
+
- Compressed column data in a proprietary format
35
+
- Separate time indexes for temporal queries
36
+
- Field schemas and type information
39
37
40
-
1.**Ingestion compression**: Real-time compression during ingestion (25-50% reduction)
41
-
2.**Block compression**: Columnar compression within storage blocks (10-20x additional compression)
42
-
3.**Compaction compression**: Background compaction further optimizes storage (additional 2-5x compression)
38
+
-**Compression pipeline:** Data flows through multiple compression stages:
43
39
44
-
**Object storage integration**: Blocks are stored in object storage (S3) with intelligent partitioning strategies that distribute load and avoid hot-spotting. The system supports multiple storage tiers and automatic lifecycle management.
40
+
1.**Ingestion compression:** Real-time compression during ingestion (25-50% reduction)
41
+
1.**Block compression:** Columnar compression within storage blocks (10-20x additional compression)
42
+
1.**Compaction compression:** Background compaction further optimizes storage (additional 2-5x compression)
45
43
46
-
## Query architecture
44
+
-**Object storage integration:** Blocks are stored in object storage (S3) with intelligent partitioning strategies that distribute load and avoid hot-spotting. The system supports multiple storage tiers and automatic lifecycle management.
47
45
48
-
Axiom executes queries using a serverless architecture that spins up compute resources on-demand:
46
+
### MetricsDB storage
49
47
50
-
**Query compilation**: The APL (Axiom Processing Language) query is parsed, optimized, and compiled into an execution plan. The compiler performs predicate pushdown, projection optimization, and identifies which blocks need to be read.
48
+
MetricsDB uses a specialized columnar format engineered for time-series metrics with high-cardinality tags:
51
49
52
-
**Serverless Workers**: Query execution occurs in ephemeral workers optimized through "Fusion queries"—a system that runs parallel queries inside a single worker to reduce costs and leave more resources for large queries. Workers download only the necessary column data from object storage, enabling efficient resource utilization. Multiple workers can process different blocks in parallel.
50
+
-**High-cardinality optimization:** Unlike traditional metrics databases that struggle with dimensional complexity, MetricsDB is designed from the ground up to handle high numbers of unique tag combinations efficiently.
51
+
-**Intentional design constraints:** MetricsDB makes deliberate trade-offs to optimize for the most common metrics use cases. These constraints are purposeful architectural choices that enable MetricsDB to deliver exceptional performance and cost-efficiency for real-world metrics workloads. Where other systems penalize you for high cardinality or force you to pre-aggregate data, MetricsDB lets you store and query metrics with full dimensional flexibility.
52
+
-**Unified observability:** Query metrics alongside logs and traces, enabling powerful correlations across all your telemetry data without switching tools or learning multiple query languages.
53
53
54
-
**Block-level parallelism**: Each query spawns multiple workers that process different blocks concurrently. Workers read compressed column data directly from object storage, decompress it in memory, and execute the query.
54
+
## Query architecture
55
55
56
-
**Result aggregation**: Worker results are streamed back and aggregated by a coordinator process. Large result sets are automatically spilled to object storage and streamed to clients via signed URLs.
56
+
Axiom executes queries using a serverless architecture that spins up compute resources on-demand:
57
57
58
-
**Intelligent caching**: Query results are cached in object storage with intelligent cache keys that account for time ranges and query patterns. Cache hits dramatically reduce query latency for repeated queries.
58
+
-**Query compilation:** The APL (Axiom Processing Language) query is parsed, optimized, and compiled into an execution plan. The compiler performs predicate pushdown, projection optimization, and identifies which blocks need to be read.
59
+
-**Serverless Workers:** Query execution occurs in ephemeral workers optimized through "Fusion queries"—a system that runs parallel queries inside a single worker to reduce costs and leave more resources for large queries. Workers download only the necessary column data from object storage, enabling efficient resource utilization. Multiple workers can process different blocks in parallel.
60
+
-**Block-level parallelism:** Each query spawns multiple workers that process different blocks concurrently. Workers read compressed column data directly from object storage, decompress it in memory, and execute the query.
61
+
-**Result aggregation:** Worker results are streamed back and aggregated by a coordinator process. Large result sets are automatically spilled to object storage and streamed to clients via signed URLs.
62
+
-**Intelligent caching:** Query results are cached in object storage with intelligent cache keys that account for time ranges and query patterns. Cache hits dramatically reduce query latency for repeated queries.
59
63
60
64
## Compaction system
61
65
62
66
A background compaction system continuously optimizes storage efficiency:
63
67
64
-
**Automatic compaction**: The compaction scheduler identifies blocks that can be merged based on size, age, and access patterns. Small blocks are combined into larger "superblocks" that provide better compression ratios and query performance.
65
-
66
-
**Multiple strategies**: The system supports several compaction algorithms:
68
+
-**Automatic compaction:** The compaction scheduler identifies blocks that can be merged based on size, age, and access patterns. Small blocks are combined into larger "superblocks" that provide better compression ratios and query performance.
69
+
-**Multiple strategies:** The system supports several compaction algorithms:
67
70
68
-
-**Default**: General-purpose compaction with optimal compression
69
-
-**Clustered**: Groups data by common field values for better locality
70
-
-**Fieldspace**: Optimizes for specific field access patterns
71
-
-**Concat**: Simple concatenation for append-heavy workloads
71
+
-**Default:** General-purpose compaction with optimal compression
72
+
-**Clustered:** Groups data by common field values for better locality
73
+
-**Fieldspace:** Optimizes for specific field access patterns
74
+
-**Concat:** Simple concatenation for append-heavy workloads
72
75
73
-
**Compression optimization**: During compaction, data is recompressed using more aggressive algorithms and column-specific optimizations that aren’t feasible during real-time ingestion.
76
+
-**Compression optimization:** During compaction, data is recompressed using more aggressive algorithms and column-specific optimizations that aren’t feasible during real-time ingestion.
74
77
75
78
## System architecture
76
79
77
80
The overall system is composed of specialized microservices:
78
81
79
-
**Core services**: Handle authentication, billing, dataset management, and API routing. These services are stateless and horizontally scalable.
80
-
81
-
**Database layer**: The core database engine processes ingestion, manages storage, and coordinates query execution. It supports multiple deployment modes and automatic failover.
82
-
83
-
**Orchestration layer**: Manages distributed operations, monitors system health, and coordinates background processes like compaction and maintenance.
84
-
85
-
**Edge services**: Handle real-time data ingestion, protocol translation, and provide regional data collection points.
82
+
-**Core services:** Handle authentication, billing, dataset management, and API routing. These services are stateless and horizontally scalable.
83
+
-**Database layer:** The core database engine processes ingestion, manages storage, and coordinates query execution. It supports multiple deployment modes and automatic failover.
84
+
-**Orchestration layer:** Manages distributed operations, monitors system health, and coordinates background processes like compaction and maintenance.
85
+
-**Edge services:** Handle real-time data ingestion, protocol translation, and provide regional data collection points.
86
86
87
87
## Why this architecture wins
88
88
89
-
**Cost efficiency**: Serverless query execution means you only pay for compute during active queries. Extreme compression (25-50x) dramatically reduces storage costs compared to traditional row-based systems.
90
-
91
-
**Operational simplicity**: The system is designed to be self-managing. Automatic compaction, intelligent caching, and distributed coordination eliminate operational overhead.
92
-
93
-
**Elastic scale**: Each component scales independently. Ingestion scales with edge capacity, storage scales with object storage, and query capacity scales with serverless workers.
94
-
95
-
**Fault tolerance**: Write-ahead logging, distributed routing, and automatic failover ensure high availability. The system gracefully handles node failures and storage outages.
96
-
97
-
**Real-time performance**: Despite the distributed architecture, the system maintains sub-second query performance through intelligent caching, predicate pushdown, and columnar storage optimizations.
89
+
-**Cost efficiency:** Serverless query execution means you only pay for compute during active queries. Extreme compression (25-50x) dramatically reduces storage costs compared to traditional row-based systems.
90
+
-**Operational simplicity:** The system is designed to be self-managing. Automatic compaction, intelligent caching, and distributed coordination eliminate operational overhead.
91
+
-**Elastic scale:** Each component scales independently. Ingestion scales with edge capacity, storage scales with object storage, and query capacity scales with serverless workers.
92
+
-**Fault tolerance:** Write-ahead logging, distributed routing, and automatic failover ensure high availability. The system gracefully handles node failures and storage outages.
93
+
-**Real-time performance:** Despite the distributed architecture, the system maintains sub-second query performance through intelligent caching, predicate pushdown, and columnar storage optimizations.
98
94
99
95
This architecture enables Axiom to ingest millions of events per second while maintaining sub-second query latency at a fraction of the cost of traditional logging and observability solutions.
0 commit comments