Skip to content

Schema Reference

This page provides a complete reference to all views, data types, and field definitions available in Micromegas SQL queries.

Views Overview

Micromegas organizes telemetry data into several views that can be queried using SQL:

View Description Use Cases
processes Process metadata and system information System overview, process tracking
streams Data stream information within processes Stream debugging, data flow analysis
blocks Core telemetry block metadata Low-level data inspection
log_entries Application log messages with levels Error tracking, debugging, monitoring
log_stats Aggregated log statistics by process, level, and target Log volume analysis, monitoring trends
measures Numeric metrics and performance data Performance monitoring, alerting
thread_spans Synchronous execution spans and timing Performance profiling, call tracing
async_events Asynchronous event lifecycle tracking Async operation monitoring

Core Views

processes

Contains metadata about processes that have sent telemetry data.

Field Type Description
process_id Dictionary(Int16, Utf8) Unique identifier for the process
exe Dictionary(Int16, Utf8) Executable name
username Dictionary(Int16, Utf8) User who ran the process
realname Dictionary(Int16, Utf8) Real name of the user
computer Dictionary(Int16, Utf8) Computer/hostname
distro Dictionary(Int16, Utf8) Operating system distribution
cpu_brand Dictionary(Int16, Utf8) CPU brand information
tsc_frequency UInt64 Time stamp counter frequency
start_time Timestamp(Nanosecond) Process start time
start_ticks UInt64 Process start time in ticks
insert_time Timestamp(Nanosecond) When the process data was first inserted
parent_process_id Dictionary(Int16, Utf8) Parent process identifier
properties Dictionary(Int32, Binary) Additional process metadata (JSONB format)
last_update_time Timestamp(Nanosecond) When the process data was last updated
last_block_end_ticks Int64 Tick count when the last block ended
last_block_end_time Timestamp(Nanosecond) Timestamp when the last block ended

Example Queries:

-- Get all processes from the last day
SELECT process_id, exe, computer, start_time
FROM processes
WHERE start_time >= NOW() - INTERVAL '1 day'
ORDER BY start_time DESC;

-- Find processes by executable name
SELECT process_id, exe, username, computer
FROM processes
WHERE exe LIKE '%analytics%';

streams

Contains information about data streams within processes.

Field Type Description
stream_id Dictionary(Int16, Utf8) Unique identifier for the stream
process_id Dictionary(Int16, Utf8) Reference to the parent process
dependencies_metadata Binary Stream dependency metadata
objects_metadata Binary Stream object metadata
tags List<Utf8> Stream tags
properties Dictionary(Int32, Binary) Stream properties (JSONB format)
insert_time Timestamp(Nanosecond) When the stream data was first inserted
last_update_time Timestamp(Nanosecond) When the stream data was last updated

Example Queries:

-- Get streams for a specific process
SELECT stream_id, tags, properties
FROM streams
WHERE process_id = 'my_process_123';

-- Join streams with process information
SELECT s.stream_id, s.tags, p.exe, p.computer
FROM streams s
JOIN processes p ON s.process_id = p.process_id;

blocks

Core table containing telemetry block metadata with joined process and stream information.

Field Type Description
block_id Utf8 Unique identifier for the block
stream_id Utf8 Stream identifier
process_id Utf8 Process identifier
begin_time Timestamp(Nanosecond) Block start time
begin_ticks Int64 Block start time in ticks
end_time Timestamp(Nanosecond) Block end time
end_ticks Int64 Block end time in ticks
nb_objects Int32 Number of objects in block
object_offset Int64 Offset to objects in storage
payload_size Int64 Size of block payload
insert_time Timestamp(Nanosecond) When block was inserted

Joined Stream Fields:

Field Type Description
streams.dependencies_metadata Binary Stream dependency metadata
streams.objects_metadata Binary Stream object metadata
streams.tags List<Utf8> Stream tags
streams.properties Dictionary(Int32, Binary) Stream properties (JSONB format)
streams.insert_time Timestamp(Nanosecond) When stream was inserted

Joined Process Fields:

Field Type Description
processes.start_time Timestamp(Nanosecond) Process start time
processes.start_ticks Int64 Process start ticks
processes.tsc_frequency Int64 Time stamp counter frequency
processes.exe Utf8 Executable name
processes.username Utf8 User who ran the process
processes.realname Utf8 Real name of the user
processes.computer Utf8 Computer/hostname
processes.distro Utf8 Operating system distribution
processes.cpu_brand Utf8 CPU brand information
processes.insert_time Timestamp(Nanosecond) When process was inserted
processes.parent_process_id Utf8 Parent process identifier
processes.properties Dictionary(Int32, Binary) Process properties (JSONB format)

Example Queries:

-- Analyze block sizes and object counts
SELECT
    process_id,
    AVG(payload_size) as avg_block_size,
    AVG(nb_objects) as avg_objects_per_block,
    COUNT(*) as total_blocks
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
GROUP BY process_id;

Observability Data Views

log_entries

Text-based log entries with levels and structured data.

Field Type Description
process_id Dictionary(Int16, Utf8) Process identifier
stream_id Dictionary(Int16, Utf8) Stream identifier
block_id Dictionary(Int16, Utf8) Block identifier
insert_time Timestamp(Nanosecond) Block insertion time
exe Dictionary(Int16, Utf8) Executable name
username Dictionary(Int16, Utf8) User who ran the process
computer Dictionary(Int16, Utf8) Computer/hostname
time Timestamp(Nanosecond) Log entry timestamp
target Dictionary(Int16, Utf8) Module/target
level Int32 Log level (see Log Levels)
msg Utf8 Log message
properties Dictionary(Int32, Binary) Log-specific properties (JSONB format)
process_properties Dictionary(Int32, Binary) Process-specific properties (JSONB format)

Log Levels

Micromegas uses numeric log levels for efficient filtering:

Level Name Description
1 Fatal Critical errors that cause application termination
2 Error Errors that don't stop execution but need attention
3 Warn Warning conditions that might cause problems
4 Info Informational messages about normal operation
5 Debug Detailed information for debugging
6 Trace Very detailed tracing information

Example Queries:

-- Get recent error and warning logs
SELECT time, process_id, level, target, msg
FROM log_entries
WHERE level <= 3  -- Fatal, Error, Warn
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time DESC;

-- Count logs by level for a specific process
SELECT level, COUNT(*) as count
FROM view_instance('log_entries', 'my_process_123')
WHERE time >= NOW() - INTERVAL '1 day'
GROUP BY level
ORDER BY level;

log_stats

Materialized view providing aggregated log statistics by process, minute, level, and target. This view is optimized for analyzing log volume trends and patterns over time.

Field Type Description
time_bin Timestamp(Nanosecond) 1-minute time bucket for aggregation
process_id Dictionary(Int16, Utf8) Process identifier
level Int32 Log level (see Log Levels)
target Dictionary(Int16, Utf8) Module/target that generated the logs
count Int64 Number of log entries in this aggregation

Key Features: - Pre-aggregated by 1-minute intervals for efficient time-series queries - Materialized for fast query performance - Automatically updated as new log data arrives - Daily partitioning for efficient storage and querying

Example Queries:

-- Analyze log volume trends over the last hour
SELECT 
    time_bin,
    SUM(count) as total_logs,
    SUM(CASE WHEN level <= 2 THEN count ELSE 0 END) as error_count
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 hour'
GROUP BY time_bin
ORDER BY time_bin;

-- Find noisiest modules by log volume
SELECT 
    target,
    SUM(count) as total_logs,
    COUNT(DISTINCT time_bin) as active_minutes
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY target
ORDER BY total_logs DESC
LIMIT 20;

-- Monitor error rate by process
SELECT 
    process_id,
    time_bin,
    SUM(CASE WHEN level <= 2 THEN count ELSE 0 END) * 100.0 / SUM(count) as error_percentage
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '6 hours'
GROUP BY process_id, time_bin
HAVING SUM(count) > 100  -- Filter out low-volume periods
ORDER BY time_bin, error_percentage DESC;

-- Compare log levels distribution
SELECT 
    level,
    SUM(count) as total_count,
    SUM(count) * 100.0 / (SELECT SUM(count) FROM log_stats WHERE time_bin >= NOW() - INTERVAL '1 day') as percentage
FROM log_stats
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY level
ORDER BY level;

measures

Numerical measurements and counters.

Field Type Description
process_id Dictionary(Int16, Utf8) Process identifier
stream_id Dictionary(Int16, Utf8) Stream identifier
block_id Dictionary(Int16, Utf8) Block identifier
insert_time Timestamp(Nanosecond) Block insertion time
exe Dictionary(Int16, Utf8) Executable name
username Dictionary(Int16, Utf8) User who ran the process
computer Dictionary(Int16, Utf8) Computer/hostname
time Timestamp(Nanosecond) Measurement timestamp
target Dictionary(Int16, Utf8) Module/target
name Dictionary(Int16, Utf8) Metric name
unit Dictionary(Int16, Utf8) Measurement unit
value Float64 Metric value
properties Dictionary(Int32, Binary) Metric-specific properties (JSONB format)
process_properties Dictionary(Int32, Binary) Process-specific properties (JSONB format)

Example Queries:

-- Get CPU metrics over time
SELECT time, value, unit
FROM measures
WHERE name = 'cpu_usage'
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time;

-- Aggregate memory usage by process
SELECT
    process_id,
    AVG(value) as avg_memory,
    MAX(value) as peak_memory,
    unit
FROM measures
WHERE name LIKE '%memory%'
  AND time >= NOW() - INTERVAL '1 hour'
GROUP BY process_id, unit;

thread_spans

Derived view for analyzing span durations and hierarchies. Access via view_instance('thread_spans', stream_id).

Field Type Description
id Int64 Span identifier
parent Int64 Parent span identifier
depth UInt32 Nesting depth in call tree
hash UInt32 Span hash for deduplication
begin Timestamp(Nanosecond) Span start time
end Timestamp(Nanosecond) Span end time
duration Int64 Span duration in nanoseconds
name Dictionary(Int16, Utf8) Span name (function)
target Dictionary(Int16, Utf8) Module/target
filename Dictionary(Int16, Utf8) Source file
line UInt32 Line number

Example Queries:

-- Get slowest functions in a stream
SELECT name, AVG(duration) as avg_duration_ns, COUNT(*) as call_count
FROM view_instance('thread_spans', 'stream_123')
WHERE duration > 1000000  -- > 1ms
GROUP BY name
ORDER BY avg_duration_ns DESC
LIMIT 10;

-- Analyze call hierarchy
SELECT depth, name, duration
FROM view_instance('thread_spans', 'stream_123')
WHERE parent = 42  -- specific parent span
ORDER BY begin;

async_events

Asynchronous span events for tracking async operations with call hierarchy depth information.

Field Type Description
stream_id Dictionary(Int16, Utf8) Thread stream identifier
block_id Dictionary(Int16, Utf8) Block identifier
time Timestamp(Nanosecond) Event timestamp
event_type Dictionary(Int16, Utf8) "begin" or "end"
span_id Int64 Async span identifier
parent_span_id Int64 Parent span identifier
depth UInt32 Nesting depth in async call hierarchy
name Dictionary(Int16, Utf8) Span name (function)
filename Dictionary(Int16, Utf8) Source file
target Dictionary(Int16, Utf8) Module/target
line UInt32 Line number

Example Queries:

-- Find top-level async operations (depth = 0) with performance metrics
SELECT
    name,
    depth,
    AVG(duration_ms) as avg_duration,
    COUNT(*) as operation_count
FROM (
    SELECT
        begin_events.name,
        begin_events.depth,
        CAST((end_events.time - begin_events.time) AS BIGINT) / 1000000 as duration_ms
    FROM
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'begin') begin_events
    LEFT JOIN
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'end') end_events
        ON begin_events.span_id = end_events.span_id
    WHERE end_events.span_id IS NOT NULL AND begin_events.depth = 0
)
GROUP BY name, depth
ORDER BY avg_duration DESC;

-- Compare performance by call depth
SELECT
    depth,
    COUNT(*) as span_count,
    AVG(duration_ms) as avg_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration
FROM (
    SELECT
        begin_events.depth,
        CAST((end_events.time - begin_events.time) AS BIGINT) / 1000000 as duration_ms
    FROM
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'begin') begin_events
    LEFT JOIN
        (SELECT * FROM view_instance('async_events', 'my_process_123') WHERE event_type = 'end') end_events
        ON begin_events.span_id = end_events.span_id
    WHERE end_events.span_id IS NOT NULL
)
GROUP BY depth
ORDER BY depth;

-- Find operations that spawn many nested async calls
SELECT
    name,
    depth,
    COUNT(*) as nested_count
FROM view_instance('async_events', 'my_process_123')
WHERE depth > 0 AND event_type = 'begin'
GROUP BY name, depth
HAVING COUNT(*) > 5  -- Functions that create multiple nested async operations
ORDER BY nested_count DESC, depth DESC;

-- Analyze async call hierarchy and parent-child relationships
SELECT
    parent.name as parent_operation,
    parent.depth as parent_depth,
    child.name as child_operation,
    child.depth as child_depth,
    COUNT(*) as relationship_count
FROM view_instance('async_events', 'my_process_123') parent
JOIN view_instance('async_events', 'my_process_123') child
     ON parent.span_id = child.parent_span_id
WHERE parent.event_type = 'begin' AND child.event_type = 'begin'
GROUP BY parent.name, parent.depth, child.name, child.depth
ORDER BY relationship_count DESC;

-- Filter async operations by depth level for focused analysis
-- Shallow operations only (depth <= 2)
SELECT name, event_type, time, depth, span_id
FROM view_instance('async_events', 'my_process_123')
WHERE depth <= 2
ORDER BY time;

-- Deep nested operations only (depth >= 3)
SELECT name, depth, COUNT(*) as deep_operation_count
FROM view_instance('async_events', 'my_process_123')
WHERE depth >= 3 AND event_type = 'begin'
GROUP BY name, depth
ORDER BY depth DESC, deep_operation_count DESC;

-- Track async operation lifecycle with depth context
SELECT time, event_type, name, span_id, parent_span_id, depth
FROM view_instance('async_events', 'my_process_123')
WHERE span_id = 12345
ORDER BY time;

Data Types

Properties

Key-value pairs stored as dictionary-encoded JSONB with the following structure:

-- Properties structure (optimized JSONB format)
Dictionary(Int32, Binary)

This format provides: - Dictionary compression - Repeated property sets stored once and referenced by index - JSONB efficiency - Native binary JSON format for fast property access - Storage optimization - Significant memory and storage savings over legacy formats

Common properties fields:

  • properties - Event-specific metadata (log properties, metric properties)
  • process_properties - Process-wide metadata shared across all events from a process

Querying properties:

-- Access property values using property_get function (works with all formats)
SELECT property_get(process_properties, 'thread-name') as thread_name
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;

-- Count properties using properties_length
SELECT properties_length(properties) as prop_count
FROM log_entries
WHERE properties_length(properties) > 0;

Legacy Support: Micromegas maintains full backward compatibility. Existing queries using property_get() and properties_length() work unchanged with the new JSONB format.

Dictionary Compression

Most string fields use dictionary compression (Dictionary(Int16, Utf8)) for storage efficiency:

  • Reduces storage space for repeated values
  • Improves query performance
  • Transparent to SQL queries - use as normal strings

Timestamps

All time fields use Timestamp(Nanosecond) precision:

  • Nanosecond resolution for high-precision timing
  • UTC timezone assumed
  • Compatible with standard SQL time functions

View Relationships

Views can be joined to combine information:

-- Join log entries with process information
SELECT l.time, l.level, l.msg, p.exe, p.computer
FROM log_entries l
JOIN processes p ON l.process_id = p.process_id
WHERE l.level <= 2;  -- Fatal and Error only

-- Join measures with stream information
SELECT m.time, m.name, m.value, s.tags
FROM measures m
JOIN streams s ON m.stream_id = s.stream_id
WHERE m.name = 'cpu_usage';

Performance Considerations

Dictionary Fields

Dictionary-compressed fields are optimized for:

  • Equality comparisons (field = 'value')
  • IN clauses (field IN ('val1', 'val2'))
  • LIKE patterns on repeated values

Time-based Queries

Always use time ranges for optimal performance:

-- Good - uses time index
WHERE time >= NOW() - INTERVAL '1 hour'

-- Avoid - full table scan
WHERE level <= 3

View Instances

Use view_instance() for process-specific queries:

-- Better performance for single process
SELECT * FROM view_instance('log_entries', 'process_123')

-- Less efficient for single process
SELECT * FROM log_entries WHERE process_id = 'process_123'

Next Steps