Schema Reference¶

This page provides a complete reference to all views, data types, and field definitions available in Micromegas SQL queries.

Views Overview¶

Micromegas organizes telemetry data into several views that can be queried using SQL:

View	Description	Use Cases
`processes`	Process metadata and system information	System overview, process tracking
`streams`	Data stream information within processes	Stream debugging, data flow analysis
`blocks`	Core telemetry block metadata	Low-level data inspection
`log_entries`	Application log messages with levels	Error tracking, debugging, monitoring
`log_stats`	Aggregated log statistics by process, level, and target	Log volume analysis, monitoring trends
`measures`	Numeric metrics and performance data	Performance monitoring, alerting
`thread_spans`	Synchronous execution spans and timing	Performance profiling, call tracing
`async_events`	Asynchronous event lifecycle tracking	Async operation monitoring

Core Views¶

`processes`¶

Contains metadata about processes that have sent telemetry data.

Field	Type	Description
`process_id`	`Dictionary(Int16, Utf8)`	Unique identifier for the process
`exe`	`Dictionary(Int16, Utf8)`	Executable name
`username`	`Dictionary(Int16, Utf8)`	User who ran the process
`realname`	`Dictionary(Int16, Utf8)`	Real name of the user
`computer`	`Dictionary(Int16, Utf8)`	Computer/hostname
`distro`	`Dictionary(Int16, Utf8)`	Operating system distribution
`cpu_brand`	`Dictionary(Int16, Utf8)`	CPU brand information
`tsc_frequency`	`UInt64`	Time stamp counter frequency
`start_time`	`Timestamp(Nanosecond)`	Process start time
`start_ticks`	`UInt64`	Process start time in ticks
`insert_time`	`Timestamp(Nanosecond)`	When the process data was first inserted
`parent_process_id`	`Dictionary(Int16, Utf8)`	Parent process identifier
`properties`	`Dictionary(Int32, Binary)`	Additional process metadata (JSONB format)
`last_update_time`	`Timestamp(Nanosecond)`	When the process data was last updated
`last_block_end_ticks`	`Int64`	Tick count when the last block ended
`last_block_end_time`	`Timestamp(Nanosecond)`	Timestamp when the last block ended

Example Queries:

-- Get all processes from the last day
SELECT process_id, exe, computer, start_time
FROM processes
WHERE start_time >= NOW() - INTERVAL '1 day'
ORDER BY start_time DESC;

-- Find processes by executable name
SELECT process_id, exe, username, computer
FROM processes
WHERE exe LIKE '%analytics%';

`streams`¶

Contains information about data streams within processes.

Field	Type	Description
`stream_id`	`Dictionary(Int16, Utf8)`	Unique identifier for the stream
`process_id`	`Dictionary(Int16, Utf8)`	Reference to the parent process
`dependencies_metadata`	`Binary`	Stream dependency metadata
`objects_metadata`	`Binary`	Stream object metadata
`tags`	`List<Utf8>`	Stream tags
`properties`	`Dictionary(Int32, Binary)`	Stream properties (JSONB format)
`insert_time`	`Timestamp(Nanosecond)`	When the stream data was first inserted
`last_update_time`	`Timestamp(Nanosecond)`	When the stream data was last updated

Example Queries:

-- Get streams for a specific process
SELECT stream_id, tags, properties
FROM streams
WHERE process_id = 'my_process_123';

-- Join streams with process information
SELECT s.stream_id, s.tags, p.exe, p.computer
FROM streams s
JOIN processes p ON s.process_id = p.process_id;

`blocks`¶

Core table containing telemetry block metadata with joined process and stream information.

Field	Type	Description
`block_id`	`Utf8`	Unique identifier for the block
`stream_id`	`Utf8`	Stream identifier
`process_id`	`Utf8`	Process identifier
`begin_time`	`Timestamp(Nanosecond)`	Block start time
`begin_ticks`	`Int64`	Block start time in ticks
`end_time`	`Timestamp(Nanosecond)`	Block end time
`end_ticks`	`Int64`	Block end time in ticks
`nb_objects`	`Int32`	Number of objects in block
`object_offset`	`Int64`	Offset to objects in storage
`payload_size`	`Int64`	Size of block payload
`insert_time`	`Timestamp(Nanosecond)`	When block was inserted

Joined Stream Fields:

Field	Type	Description
`streams.dependencies_metadata`	`Binary`	Stream dependency metadata
`streams.objects_metadata`	`Binary`	Stream object metadata
`streams.tags`	`List<Utf8>`	Stream tags
`streams.properties`	`Dictionary(Int32, Binary)`	Stream properties (JSONB format)
`streams.insert_time`	`Timestamp(Nanosecond)`	When stream was inserted

Joined Process Fields:

Field	Type	Description
`processes.start_time`	`Timestamp(Nanosecond)`	Process start time
`processes.start_ticks`	`Int64`	Process start ticks
`processes.tsc_frequency`	`Int64`	Time stamp counter frequency
`processes.exe`	`Utf8`	Executable name
`processes.username`	`Utf8`	User who ran the process
`processes.realname`	`Utf8`	Real name of the user
`processes.computer`	`Utf8`	Computer/hostname
`processes.distro`	`Utf8`	Operating system distribution
`processes.cpu_brand`	`Utf8`	CPU brand information
`processes.insert_time`	`Timestamp(Nanosecond)`	When process was inserted
`processes.parent_process_id`	`Utf8`	Parent process identifier
`processes.properties`	`Dictionary(Int32, Binary)`	Process properties (JSONB format)

Example Queries:

-- Analyze block sizes and object counts
SELECT
    process_id,
    AVG(payload_size) as avg_block_size,
    AVG(nb_objects) as avg_objects_per_block,
    COUNT(*) as total_blocks
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
GROUP BY process_id;

Observability Data Views¶

`log_entries`¶

Text-based log entries with levels and structured data.

Field	Type	Description
`process_id`	`Dictionary(Int16, Utf8)`	Process identifier
`stream_id`	`Dictionary(Int16, Utf8)`	Stream identifier
`block_id`	`Dictionary(Int16, Utf8)`	Block identifier
`insert_time`	`Timestamp(Nanosecond)`	Block insertion time
`exe`	`Dictionary(Int16, Utf8)`	Executable name
`username`	`Dictionary(Int16, Utf8)`	User who ran the process
`computer`	`Dictionary(Int16, Utf8)`	Computer/hostname
`time`	`Timestamp(Nanosecond)`	Log entry timestamp
`target`	`Dictionary(Int16, Utf8)`	Module/target
`level`	`Int32`	Log level (see Log Levels)
`msg`	`Utf8`	Log message
`properties`	`Dictionary(Int32, Binary)`	Log-specific properties (JSONB format)
`process_properties`	`Dictionary(Int32, Binary)`	Process-specific properties (JSONB format)

Log Levels¶

Micromegas uses numeric log levels for efficient filtering:

Level	Name	Description
1	Fatal	Critical errors that cause application termination
2	Error	Errors that don't stop execution but need attention
3	Warn	Warning conditions that might cause problems
4	Info	Informational messages about normal operation
5	Debug	Detailed information for debugging
6	Trace	Very detailed tracing information

Example Queries:

-- Get recent error and warning logs
SELECT time, process_id, level, target, msg
FROM log_entries
WHERE level <= 3  -- Fatal, Error, Warn
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time DESC;

-- Count logs by level for a specific process
SELECT level, COUNT(*) as count
FROM view_instance('log_entries', 'my_process_123')
WHERE time >= NOW() - INTERVAL '1 day'
GROUP BY level
ORDER BY level;

`log_stats`¶

Materialized view providing aggregated log statistics by process, minute, level, and target. This view is optimized for analyzing log volume trends and patterns over time.

Field	Type	Description
`time_bin`	`Timestamp(Nanosecond)`	1-minute time bucket for aggregation
`process_id`	`Dictionary(Int16, Utf8)`	Process identifier
`level`	`Int32`	Log level (see Log Levels)
`target`	`Dictionary(Int16, Utf8)`	Module/target that generated the logs
`count`	`Int64`	Number of log entries in this aggregation

Key Features: - Pre-aggregated by 1-minute intervals for efficient time-series queries - Materialized for fast query performance - Automatically updated as new log data arrives - Daily partitioning for efficient storage and querying

Example Queries:

-- Analyze log volume trends over the last hour
SELECT 
    time_bin,
    SUM(count) as total_logs,
    SUM(CASE WHEN level <= 2 THEN count ELSE 0 END) as error_count
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 hour'
GROUP BY time_bin
ORDER BY time_bin;

-- Find noisiest modules by log volume
SELECT 
    target,
    SUM(count) as total_logs,
    COUNT(DISTINCT time_bin) as active_minutes
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY target
ORDER BY total_logs DESC
LIMIT 20;

-- Monitor error rate by process
SELECT 
    process_id,
    time_bin,
    SUM(CASE WHEN level <= 2 THEN count ELSE 0 END) * 100.0 / SUM(count) as error_percentage
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '6 hours'
GROUP BY process_id, time_bin
HAVING SUM(count) > 100  -- Filter out low-volume periods
ORDER BY time_bin, error_percentage DESC;

-- Compare log levels distribution
SELECT 
    level,
    SUM(count) as total_count,
    SUM(count) * 100.0 / (SELECT SUM(count) FROM log_stats WHERE time_bin >= NOW() - INTERVAL '1 day') as percentage
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY level
ORDER BY level;

`measures`¶

Numerical measurements and counters.

Field	Type	Description
`process_id`	`Dictionary(Int16, Utf8)`	Process identifier
`stream_id`	`Dictionary(Int16, Utf8)`	Stream identifier
`block_id`	`Dictionary(Int16, Utf8)`	Block identifier
`insert_time`	`Timestamp(Nanosecond)`	Block insertion time
`exe`	`Dictionary(Int16, Utf8)`	Executable name
`username`	`Dictionary(Int16, Utf8)`	User who ran the process
`computer`	`Dictionary(Int16, Utf8)`	Computer/hostname
`time`	`Timestamp(Nanosecond)`	Measurement timestamp
`target`	`Dictionary(Int16, Utf8)`	Module/target
`name`	`Dictionary(Int16, Utf8)`	Metric name
`unit`	`Dictionary(Int16, Utf8)`	Measurement unit
`value`	`Float64`	Metric value
`properties`	`Dictionary(Int32, Binary)`	Metric-specific properties (JSONB format)
`process_properties`	`Dictionary(Int32, Binary)`	Process-specific properties (JSONB format)

Example Queries:

-- Get CPU metrics over time
SELECT time, value, unit
FROM measures
WHERE name = 'cpu_usage'
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time;

-- Aggregate memory usage by process
SELECT
    process_id,
    AVG(value) as avg_memory,
    MAX(value) as peak_memory,
    unit
FROM measures
WHERE name LIKE '%memory%'
  AND time >= NOW() - INTERVAL '1 hour'
GROUP BY process_id, unit;

`thread_spans`¶

Derived view for analyzing span durations and hierarchies. Access via view_instance('thread_spans', stream_id).

Field	Type	Description
`id`	`Int64`	Span identifier
`parent`	`Int64`	Parent span identifier
`depth`	`UInt32`	Nesting depth in call tree
`hash`	`UInt32`	Span hash for deduplication
`begin`	`Timestamp(Nanosecond)`	Span start time
`end`	`Timestamp(Nanosecond)`	Span end time
`duration`	`Int64`	Span duration in nanoseconds
`name`	`Dictionary(Int16, Utf8)`	Span name (function)
`target`	`Dictionary(Int16, Utf8)`	Module/target
`filename`	`Dictionary(Int16, Utf8)`	Source file
`line`	`UInt32`	Line number

Example Queries:

-- Get slowest functions in a stream
SELECT name, AVG(duration) as avg_duration_ns, COUNT(*) as call_count
FROM view_instance('thread_spans', 'stream_123')
WHERE duration > 1000000  -- > 1ms
GROUP BY name
ORDER BY avg_duration_ns DESC
LIMIT 10;

-- Analyze call hierarchy
SELECT depth, name, duration
FROM view_instance('thread_spans', 'stream_123')
WHERE parent = 42  -- specific parent span
ORDER BY begin;

`async_events`¶

Asynchronous span events for tracking async operations with call hierarchy depth information.

Field	Type	Description
`stream_id`	`Dictionary(Int16, Utf8)`	Thread stream identifier
`block_id`	`Dictionary(Int16, Utf8)`	Block identifier
`time`	`Timestamp(Nanosecond)`	Event timestamp
`event_type`	`Dictionary(Int16, Utf8)`	"begin" or "end"
`span_id`	`Int64`	Async span identifier
`parent_span_id`	`Int64`	Parent span identifier
`depth`	`UInt32`	Nesting depth in async call hierarchy
`name`	`Dictionary(Int16, Utf8)`	Span name (function)
`filename`	`Dictionary(Int16, Utf8)`	Source file
`target`	`Dictionary(Int16, Utf8)`	Module/target
`line`	`UInt32`	Line number

Example Queries:

-- Find top-level async operations (depth = 0) with performance metrics
SELECT
    name,
    depth,
    AVG(duration_ms) as avg_duration,
    COUNT(*) as operation_count
FROM (
    SELECT
        begin_events.name,
        begin_events.depth,
        CAST((end_events.time - begin_events.time) AS BIGINT) / 1000000 as duration_ms
    FROM
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'begin') begin_events
    LEFT JOIN
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'end') end_events
        ON begin_events.span_id = end_events.span_id
    WHERE end_events.span_id IS NOT NULL AND begin_events.depth = 0
)
GROUP BY name, depth
ORDER BY avg_duration DESC;

-- Compare performance by call depth
SELECT
    depth,
    COUNT(*) as span_count,
    AVG(duration_ms) as avg_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration
FROM (
    SELECT
        begin_events.depth,
        CAST((end_events.time - begin_events.time) AS BIGINT) / 1000000 as duration_ms
    FROM
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'begin') begin_events
    LEFT JOIN
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'end') end_events
        ON begin_events.span_id = end_events.span_id
    WHERE end_events.span_id IS NOT NULL
)
GROUP BY depth
ORDER BY depth;

-- Find operations that spawn many nested async calls
SELECT
    name,
    depth,
    COUNT(*) as nested_count
FROM view_instance('async_events', 'my_process_123')
WHERE depth > 0 AND event_type = 'begin'
GROUP BY name, depth
HAVING COUNT(*) > 5  -- Functions that create multiple nested async operations
ORDER BY nested_count DESC, depth DESC;

-- Analyze async call hierarchy and parent-child relationships
SELECT
    parent.name as parent_operation,
    parent.depth as parent_depth,
    child.name as child_operation,
    child.depth as child_depth,
    COUNT(*) as relationship_count
FROM view_instance('async_events', 'my_process_123') parent
JOIN view_instance('async_events', 'my_process_123') child
     ON parent.span_id = child.parent_span_id
WHERE parent.event_type = 'begin' AND child.event_type = 'begin'
GROUP BY parent.name, parent.depth, child.name, child.depth
ORDER BY relationship_count DESC;

-- Filter async operations by depth level for focused analysis
-- Shallow operations only (depth <= 2)
SELECT name, event_type, time, depth, span_id
FROM view_instance('async_events', 'my_process_123')
WHERE depth <= 2
ORDER BY time;

-- Deep nested operations only (depth >= 3)
SELECT name, depth, COUNT(*) as deep_operation_count
FROM view_instance('async_events', 'my_process_123')
WHERE depth >= 3 AND event_type = 'begin'
GROUP BY name, depth
ORDER BY depth DESC, deep_operation_count DESC;

-- Track async operation lifecycle with depth context
SELECT time, event_type, name, span_id, parent_span_id, depth
FROM view_instance('async_events', 'my_process_123')
WHERE span_id = 12345
ORDER BY time;

Data Types¶

Properties¶

Key-value pairs stored as dictionary-encoded JSONB with the following structure:

-- Properties structure (optimized JSONB format)
Dictionary(Int32, Binary)

This format provides: - Dictionary compression - Repeated property sets stored once and referenced by index - JSONB efficiency - Native binary JSON format for fast property access - Storage optimization - Significant memory and storage savings over legacy formats

Common properties fields:

properties - Event-specific metadata (log properties, metric properties)
process_properties - Process-wide metadata shared across all events from a process

Querying properties:

-- Access property values using property_get function (works with all formats)
SELECT property_get(process_properties, 'thread-name') as thread_name
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;

-- Count properties using properties_length
SELECT properties_length(properties) as prop_count
FROM log_entries
WHERE properties_length(properties) > 0;

Legacy Support: Micromegas maintains full backward compatibility. Existing queries using property_get() and properties_length() work unchanged with the new JSONB format.

Dictionary Compression¶

Most string fields use dictionary compression (Dictionary(Int16, Utf8)) for storage efficiency:

Reduces storage space for repeated values
Improves query performance
Transparent to SQL queries - use as normal strings

Timestamps¶

All time fields use Timestamp(Nanosecond) precision:

Nanosecond resolution for high-precision timing
UTC timezone assumed
Compatible with standard SQL time functions

View Relationships¶

Views can be joined to combine information:

-- Join log entries with process information
SELECT l.time, l.level, l.msg, p.exe, p.computer
FROM log_entries l
JOIN processes p ON l.process_id = p.process_id
WHERE l.level <= 2;  -- Fatal and Error only

-- Join measures with stream information
SELECT m.time, m.name, m.value, s.tags
FROM measures m
JOIN streams s ON m.stream_id = s.stream_id
WHERE m.name = 'cpu_usage';

Performance Considerations¶

Dictionary Fields¶

Dictionary-compressed fields are optimized for:

Equality comparisons (field = 'value')
IN clauses (field IN ('val1', 'val2'))
LIKE patterns on repeated values

Time-based Queries¶

Always use time ranges for optimal performance:

-- Good - uses time index
WHERE time >= NOW() - INTERVAL '1 hour'

-- Avoid - full table scan
WHERE level <= 3

View Instances¶

Use view_instance() for process-specific queries:

-- Better performance for single process
SELECT * FROM view_instance('log_entries', 'process_123')

-- Less efficient for single process
SELECT * FROM log_entries WHERE process_id = 'process_123'

Next Steps¶

Functions Reference - SQL functions available for queries
Query Patterns - Common observability query patterns
Performance Guide - Optimize your queries for best performance

Schema Reference¶

Views Overview¶

Core Views¶

processes¶

streams¶

blocks¶

Observability Data Views¶

log_entries¶

Log Levels¶

log_stats¶

measures¶

thread_spans¶

async_events¶

Data Types¶

Properties¶

Dictionary Compression¶

Timestamps¶

View Relationships¶

Performance Considerations¶

Dictionary Fields¶

Time-based Queries¶

View Instances¶

Next Steps¶

`processes`¶

`streams`¶

`blocks`¶

`log_entries`¶

`log_stats`¶

`measures`¶

`thread_spans`¶

`async_events`¶