Query Guide Overview¶

Micromegas provides a powerful SQL interface for querying observability data including logs, metrics, spans, and traces. Micromegas SQL is an extension of Apache DataFusion SQL - you can use all standard DataFusion SQL features plus Micromegas-specific functions and views optimized for observability workloads.

Key Concepts¶

SQL Engine¶

Micromegas uses Apache DataFusion as its SQL engine, which means you get:

Full SQL standard compliance
Advanced query optimization
Vectorized execution engine
Columnar data processing with Apache Arrow

Data Architecture¶

Raw data stored in object storage (S3/GCS) in Parquet format
Metadata stored in PostgreSQL for fast lookups
Views provide logical organization of telemetry data
On-demand ETL processes data only when queried

Available Interfaces¶

Python API¶

The primary interface for querying Micromegas data programmatically. All queries return pandas DataFrames, making it easy to work with results using the pandas ecosystem:

import micromegas
client = micromegas.connect()
df = client.query("SELECT * FROM log_entries LIMIT 10;")

Grafana Plugin¶

Use the same SQL capabilities in Grafana dashboards through the Micromegas Grafana plugin.

Data Views¶

Micromegas organizes telemetry data into several queryable views:

View	Description
`processes`	Process metadata and system information
`streams`	Data stream information within processes
`log_entries`	Application log messages with levels and context
`measures`	Numeric metrics and performance measurements
`thread_spans`	Synchronous execution spans and timing
`async_events`	Asynchronous event lifecycle tracking

Query Capabilities¶

Standard SQL Features¶

SELECT, FROM, WHERE, ORDER BY, GROUP BY
JOINs between views
Aggregation functions (COUNT, SUM, AVG, etc.)
Window functions and CTEs
Complex filtering and sorting

Observability Extensions¶

Time-range filtering for performance
Process-scoped view instances
Histogram generation functions
Log level filtering and analysis
Span relationship queries

Performance Features¶

Query streaming for large datasets
Predicate pushdown to storage layer
Automatic view materialization
Memory-efficient processing

Getting Started¶

Quick Start - Basic queries to get you started
Python API - Complete API reference and examples
Schema Reference - Detailed view and field documentation
Functions Reference - Available SQL functions
Query Patterns - Common observability query patterns
Async Performance Analysis - Comprehensive async operation analysis with depth tracking
Performance Guide - Optimize your queries for best performance
Advanced Features - View materialization and custom views

Best Practices¶

Always Use Time Ranges¶

For performance and memory efficiency, always specify time ranges in your queries:

# Good - uses time range
df = client.query(sql, begin_time, end_time)

# Avoid - queries all data
df = client.query(sql)  # Can be slow and memory-intensive

Start Simple¶

Begin with basic queries and add complexity incrementally:

-- Start with this
SELECT * FROM log_entries LIMIT 10;

-- Then add filtering
SELECT * FROM log_entries WHERE level <= 3 LIMIT 10;

-- Then add time range
SELECT * FROM log_entries
WHERE level <= 3 AND time >= NOW() - INTERVAL '1 hour'
LIMIT 10;

Use Process-Scoped Views¶

For better performance when analyzing specific processes:

-- Instead of filtering the global view
SELECT * FROM log_entries WHERE process_id = 'my_process';

-- Use a process-scoped view instance
SELECT * FROM view_instance('log_entries', 'my_process');

Ready to start querying? Head to the Quick Start guide!