Functions Reference¶

This page provides a complete reference to all SQL functions available in Micromegas queries, including both standard DataFusion functions and Micromegas-specific extensions.

Micromegas Extensions¶

Table Functions¶

Table functions return tables that can be used in FROM clauses.

`view_instance(view_name, identifier)`¶

Creates a process or stream-scoped view instance for better performance.

Syntax:

view_instance(view_name, identifier)

Parameters:

view_name (Utf8): Name of the view ('log_entries', 'measures', 'thread_spans', 'async_events')
identifier (Utf8): Process ID (for most views) or Stream ID (for thread_spans)

Returns: Schema depends on the view type (see Schema Reference)

Examples:

-- Get logs for a specific process
SELECT time, level, msg
FROM view_instance('log_entries', 'my_process_123')
WHERE level <= 3;

-- Get spans for a specific stream
SELECT name, duration
FROM view_instance('thread_spans', 'stream_456')
WHERE duration > 1000000;  -- > 1ms

`list_partitions()` 🔧¶

Administrative Function - Lists available data partitions in the lakehouse.

Syntax:

SELECT * FROM list_partitions()

Returns:

Column	Type	Description
view_set_name	Utf8	Name of the view set
view_instance_id	Utf8	Instance identifier
begin_insert_time	Timestamp(Nanosecond)	Partition start time
end_insert_time	Timestamp(Nanosecond)	Partition end time
min_event_time	Timestamp(Nanosecond)	Earliest event time
max_event_time	Timestamp(Nanosecond)	Latest event time
updated	Timestamp(Nanosecond)	Last update time
file_path	Utf8	Partition file path
file_size	Int64	File size in bytes
file_schema_hash	Binary	Hash of the file schema
source_data_hash	Binary	Hash of the source data
num_rows	Int64	Number of rows in the partition
partition_format_version	Int32	Parquet format version (1=Arrow 56.0, 2=Arrow 57.0)

Example:

-- View partition information
SELECT view_set_name, view_instance_id, file_size
FROM list_partitions()
ORDER BY updated DESC;

ℹ️ Administrative Use: This function provides system-level partition metadata primarily useful for administrators monitoring lakehouse storage and partition management. Regular users querying data typically don't need this information.

`retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time)` 🔧¶

Administrative Function - Retires (removes) data partitions from the lakehouse for a specified time range. Returns a log stream of the operation.

Syntax:

SELECT * FROM retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time)

Parameters:

view_set_name (Utf8): Name of the view set
view_instance_id (Utf8): Instance identifier
begin_insert_time (Timestamp(Nanosecond)): Start time for partition retirement
end_insert_time (Timestamp(Nanosecond)): End time for partition retirement

Returns: Log stream table with operation progress and messages

Example:

-- Retire old partitions for a specific view
SELECT * FROM retire_partitions(
    'log_entries', 
    'global',
    NOW() - INTERVAL '30 days',
    NOW() - INTERVAL '7 days'
);

⚠️ DESTRUCTIVE OPERATION: This function permanently removes data partitions from the lakehouse, making the contained data inaccessible. Use only for data retention management and with extreme caution in production environments. Ensure proper backups exist before retiring partitions.

`materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds)` 🔧¶

Administrative Function - Materializes data partitions for a view over a specified time range. Returns a log stream of the operation.

Syntax:

SELECT * FROM materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds)

Parameters:

view_name (Utf8): Name of the view to materialize
begin_insert_time (Timestamp(Nanosecond)): Start time for materialization
end_insert_time (Timestamp(Nanosecond)): End time for materialization
partition_delta_seconds (Int64): Partition time delta in seconds

Returns: Log stream table with operation progress and messages

Example:

-- Materialize partitions for CPU usage view
SELECT * FROM materialize_partitions(
    'cpu_usage_per_process_per_minute',
    NOW() - INTERVAL '1 day',
    NOW(),
    3600  -- 1 hour partitions
);

⚠️ Administrative Use Only: This function is intended for system administrators and data engineers managing the lakehouse infrastructure. Regular users querying data should not need to call this function. It triggers background processing to create materialized partitions and can impact system performance.

`list_view_sets()` 🔧¶

Administrative Function - Lists all available view sets with their current schema information. Useful for schema discovery and management.

Syntax:

SELECT * FROM list_view_sets()

Returns:

Column	Type	Description
view_set_name	Utf8	Name of the view set (e.g., 'log_entries', 'measures')
current_schema_hash	Binary	Current schema version identifier
schema	Utf8	Full schema as formatted string
has_view_maker	Boolean	Whether view set supports process-specific instances
global_instance_available	Boolean	Whether a global instance exists

Example:

-- View all available view sets and their schemas
SELECT view_set_name, current_schema_hash, has_view_maker
FROM list_view_sets()
ORDER BY view_set_name;

-- Check schema for specific view set
SELECT schema
FROM list_view_sets()
WHERE view_set_name = 'log_entries';

ℹ️ Administrative Use: This function provides schema discovery for administrators managing view compatibility and schema evolution. It shows the current schema versions and capabilities of each view set in the lakehouse.

`retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time)` 🔧¶

Administrative Function - Retires a single partition by its metadata identifiers. Works for both empty and non-empty partitions.

Syntax:

SELECT retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time) as result

Parameters:

view_set_name (Utf8): Name of the view set
view_instance_id (Utf8): Instance ID (e.g., process_id or 'global')
begin_insert_time (Timestamp): Begin insert time of the partition
end_insert_time (Timestamp): End insert time of the partition

Returns: Utf8 - Result message indicating success or failure

Example:

-- Retire a specific partition
SELECT retire_partition_by_metadata(
    'log_entries', 'process-123',
    TIMESTAMP '2024-01-01 00:00:00',
    TIMESTAMP '2024-01-01 01:00:00'
) as result;

⚠️ DESTRUCTIVE OPERATION: Permanently removes data. Batch operations are atomic - if any fails, all are rolled back.

`retire_partition_by_file(file_path)` 🔧¶

Administrative Function - Retires a single partition by file path. Prefer retire_partition_by_metadata() for new code.

Syntax:

SELECT retire_partition_by_file(file_path) as result

Limitation: Cannot retire empty partitions (file_path=NULL).

`perfetto_trace_chunks(process_id, span_types, start_time, end_time)`¶

Generates Perfetto trace chunks from process telemetry data for visualization and performance analysis.

Syntax:

SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(process_id, span_types, start_time, end_time)
ORDER BY chunk_id

Parameters:

process_id (Utf8): Process UUID to generate trace for
span_types (Utf8): Type of spans to include: 'thread', 'async', or 'both'
start_time (Timestamp): Start time for trace data (UTC timestamp)
end_time (Timestamp): End time for trace data (UTC timestamp)

Returns:

Column	Type	Description
chunk_id	Int32	Sequential chunk identifier
chunk_data	Binary	Binary protobuf TracePacket data

Examples:

-- Generate trace for thread spans only
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
    'process-uuid-123',
    'thread',
    TIMESTAMP '2024-01-01T00:00:00Z',
    TIMESTAMP '2024-01-01T01:00:00Z'
)
ORDER BY chunk_id;

-- Generate trace for both thread and async spans
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
    'my-process-id',
    'both',
    NOW() - INTERVAL '1 hour',
    NOW()
)
ORDER BY chunk_id;

Note: The returned binary data is in Perfetto protobuf format and can be loaded directly into the Perfetto UI for visualization and analysis.

Scalar Functions¶

JSON/JSONB Functions¶

Micromegas provides functions for working with JSON data stored in binary JSONB format for efficient storage and querying.

`jsonb_parse(json_string)`¶

Parses a JSON string into binary JSONB format.

Syntax:

jsonb_parse(json_string)

Parameters:

json_string (Multiple formats supported): JSON string to parse:
Utf8 - Plain string
Dictionary<Int32, Utf8> - Dictionary-encoded string

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB data

Example:

-- Parse JSON string into JSONB
SELECT jsonb_parse('{"name": "web_server", "port": 8080}') as parsed_json
FROM processes;

`jsonb_get(jsonb, key)`¶

Extracts a value from a JSONB object by key name.

Syntax:

jsonb_get(jsonb, key)

Parameters:

jsonb (Multiple formats supported): JSONB object in any of these formats:
Binary - Plain JSONB binary
Dictionary<Int32, Binary> - Dictionary-encoded JSONB
key (Utf8): Key name to extract

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB value or NULL if key not found

Example:

-- Extract name field from JSON data
SELECT jsonb_get(jsonb_parse('{"name": "web_server", "port": 8080}'), 'name') as name_value
FROM processes;

`jsonb_format_json(jsonb)`¶

Converts a JSONB value back to a human-readable JSON string.

Syntax:

jsonb_format_json(jsonb)

Parameters:

jsonb (Multiple formats supported): JSONB value in any of these formats:
Dictionary<Int32, Binary> - Dictionary-encoded JSONB (default)
Binary - Non-dictionary JSONB

Returns: Dictionary<Int32, Utf8> - Dictionary-encoded JSON string representation

Examples:

-- Format JSONB back to JSON string
SELECT jsonb_format_json(jsonb_parse('{"name": "web_server"}')) as json_string
FROM processes;

-- Works directly with dictionary-encoded properties
SELECT jsonb_format_json(properties_to_jsonb(properties)) as json_props
FROM log_entries;

-- Format property values as JSON
SELECT jsonb_format_json(properties) as json_string
FROM processes
WHERE properties IS NOT NULL;

`jsonb_as_string(jsonb)`¶

Casts a JSONB value to a string.

Syntax:

jsonb_as_string(jsonb)

Parameters:

jsonb (Multiple formats supported): JSONB value to convert:
Binary - Plain JSONB binary
Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Dictionary<Int32, Utf8> - Dictionary-encoded string value or NULL if not a string

Example:

-- Extract string value from JSONB
SELECT jsonb_as_string(jsonb_get(jsonb_parse('{"service": "web_server"}'), 'service')) as service_name
FROM processes;

`jsonb_as_f64(jsonb)`¶

Casts a JSONB value to a 64-bit float.

Syntax:

jsonb_as_f64(jsonb)

Parameters:

jsonb (Multiple formats supported): JSONB value to convert:
Binary - Plain JSONB binary
Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Float64 - Numeric value or NULL if not a number

Example:

-- Extract numeric value from JSONB
SELECT jsonb_as_f64(jsonb_get(jsonb_parse('{"cpu_usage": 75.5}'), 'cpu_usage')) as cpu_usage
FROM processes;

`jsonb_as_i64(jsonb)`¶

Casts a JSONB value to a 64-bit integer.

Syntax:

jsonb_as_i64(jsonb)

Parameters:

jsonb (Multiple formats supported): JSONB value to convert:
Binary - Plain JSONB binary
Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Int64 - Integer value or NULL if not an integer

Example:

-- Extract integer value from JSONB
SELECT jsonb_as_i64(jsonb_get(jsonb_parse('{"port": 8080}'), 'port')) as port_number
FROM processes;

Data Access Functions¶

`get_payload(process_id, stream_id, block_id)`¶

Retrieves the raw binary payload of a telemetry block from data lake storage.

Syntax:

get_payload(process_id, stream_id, block_id)

Parameters:

process_id (Utf8): Process identifier
stream_id (Utf8): Stream identifier
block_id (Utf8): Block identifier

Returns: Binary - Raw block payload data

Example:

-- Get raw payload data for specific blocks
SELECT process_id, stream_id, block_id, get_payload(process_id, stream_id, block_id) as payload
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
LIMIT 10;

Note: This is an async function that fetches data from object storage. Use sparingly in queries as it can impact performance.

Property Functions¶

Micromegas provides specialized functions for working with property data, including efficient dictionary encoding for memory optimization.

`property_get(properties, key)`¶

Extracts a value from a properties map with automatic format detection and optimized performance for JSONB data.

Syntax:

property_get(properties, key)

Parameters:

properties (Multiple formats supported): Properties data in any of these formats:
- Dictionary<Int32, Binary> - JSONB format (default, optimized)
- List<Struct<key, value>> - Legacy format (automatic conversion)
- Dictionary<Int32, List<Struct>> - Dictionary-encoded legacy
- Binary - Non-dictionary JSONB
key (Utf8): Property key to extract

Returns: Dictionary<Int32, Utf8> - Property value or NULL if not found

Performance: Optimized for the new JSONB format. Legacy formats are automatically converted for backward compatibility.

Examples:

-- Get thread name from process properties (works with all formats)
SELECT time, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;

-- Filter by custom property
SELECT time, name, value
FROM measures
WHERE property_get(properties, 'source') = 'system_monitor';

-- Direct JSONB property access (post-migration default)
SELECT time, msg, property_get(properties, 'service') as service
FROM log_entries
WHERE property_get(properties, 'env') = 'production';

`properties_length(properties)`¶

Returns the number of properties in a properties map with support for multiple storage formats.

Syntax:

properties_length(properties)

Parameters:

properties (Multiple formats supported): Properties data in any of these formats:
- List<Struct<key, value>> - Legacy format
- Dictionary<Int32, Binary> - JSONB format (optimized)
- Dictionary<Int32, List<Struct>> - Dictionary-encoded legacy
- Binary - Non-dictionary JSONB

Returns: Int32 - Number of properties

Examples:

-- Works with regular properties
SELECT properties_length(properties) as prop_count
FROM measures;

-- Works with dictionary-encoded properties
SELECT properties_length(properties_to_dict(properties)) as prop_count
FROM measures;

-- JSONB property counting
SELECT properties_length(properties_to_jsonb(properties)) as prop_count
FROM measures;

`properties_to_dict(properties)`¶

Converts a properties list to a dictionary-encoded array for memory efficiency.

Syntax:

properties_to_dict(properties)

Parameters:

properties (List<Struct<key: Utf8, value: Utf8>>): Properties list to encode

Returns: Dictionary<Int32, List<Struct<key: Utf8, value: Utf8>>> - Dictionary-encoded properties

Examples:

-- Convert properties to dictionary encoding for memory efficiency
SELECT properties_to_dict(properties) as dict_props
FROM measures;

-- Use with other functions via properties_to_array
SELECT array_length(properties_to_array(properties_to_dict(properties))) as prop_count
FROM measures;

Note: Dictionary encoding can reduce memory usage by 50-80% for datasets with repeated property patterns.

`properties_to_jsonb(properties)`¶

Converts a properties list to binary JSONB format with dictionary encoding for efficient storage and querying.

Syntax:

properties_to_jsonb(properties)

Parameters:

properties (Multiple formats supported): Properties in any of these formats:
- List<Struct<key: Utf8, value: Utf8>> - Regular properties list
- Dictionary<Int32, List<Struct>> - Dictionary-encoded properties
- Binary - Non-dictionary JSONB
- Dictionary<Int32, Binary> - JSONB format

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB object containing the properties as key-value pairs

Examples:

-- Convert properties to JSONB format
SELECT properties_to_jsonb(properties) as jsonb_props
FROM log_entries;

-- Use with other JSONB functions
SELECT jsonb_get(properties_to_jsonb(properties), 'hostname') as hostname
FROM log_entries;

-- Convert dictionary-encoded properties to JSONB
SELECT properties_to_jsonb(properties_to_dict(properties)) as jsonb_props
FROM measures;

Note: This function returns Dictionary<Int32, Binary> format for optimal memory usage with Arrow's built-in dictionary encoding.

`properties_to_array(dict_properties)`¶

Converts dictionary-encoded properties back to a regular array for compatibility with standard functions.

Syntax:

properties_to_array(dict_properties)

Parameters:

dict_properties (Dictionary<Int32, List<Struct>>): Dictionary-encoded properties

Returns: List<Struct<key: Utf8, value: Utf8>> - Regular properties array

Examples:

-- Convert dictionary-encoded properties back to array
SELECT properties_to_array(properties_to_dict(properties)) as props
FROM measures;

-- Use with array functions
SELECT array_length(properties_to_array(properties_to_dict(properties))) as count
FROM measures;

Histogram Functions¶

Micromegas provides a comprehensive set of functions for creating and analyzing histograms, enabling efficient statistical analysis of large datasets.

`make_histogram(start, end, bins, values)`¶

Creates histogram data from numeric values with specified range and bin count.

Syntax:

make_histogram(start, end, bins, values)

Parameters:

start (Float64): Histogram minimum value
end (Float64): Histogram maximum value
bins (Int64): Number of histogram bins
values (Float64): Column of numeric values to histogram

Returns: Histogram structure with buckets and counts

Example:

-- Create histogram of response times (0-50ms, 20 bins)
SELECT make_histogram(0.0, 50.0, 20, CAST(duration AS FLOAT64) / 1000000.0) as duration_histogram
FROM view_instance('thread_spans', 'web_server_123')
WHERE name = 'handle_request';

`sum_histograms(histogram_column)`¶

Aggregates multiple histograms by summing their bins.

Syntax:

sum_histograms(histogram_column)

Parameters:

histogram_column (Histogram): Column containing histogram values

Returns: Combined histogram with summed bins

Example:

-- Combine histograms across processes
SELECT sum_histograms(duration_histogram) as combined_histogram
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 hour';

`quantile_from_histogram(histogram, quantile)`¶

Estimates a quantile value from a histogram.

Syntax:

quantile_from_histogram(histogram, quantile)

Parameters:

histogram (Histogram): Histogram to analyze
quantile (Float64): Quantile to estimate (0.0 to 1.0)

Returns: Float64 - Estimated quantile value

Examples:

-- Get median (50th percentile) response time
SELECT quantile_from_histogram(duration_histogram, 0.5) as median_duration
FROM performance_histograms;

-- Get 95th percentile response time
SELECT quantile_from_histogram(duration_histogram, 0.95) as p95_duration
FROM performance_histograms;

`variance_from_histogram(histogram)`¶

Calculates variance from histogram data.

Syntax:

variance_from_histogram(histogram)

Parameters:

histogram (Histogram): Histogram to analyze

Returns: Float64 - Variance of the histogram data

Example:

-- Calculate response time variance
SELECT variance_from_histogram(duration_histogram) as duration_variance
FROM performance_histograms;

`count_from_histogram(histogram)`¶

Extracts the total count of values from a histogram.

Syntax:

count_from_histogram(histogram)

Parameters:

histogram (Histogram): Histogram to analyze

Returns: UInt64 - Total number of values in the histogram

Example:

-- Get total sample count from histogram
SELECT count_from_histogram(duration_histogram) as total_samples
FROM performance_histograms;

`sum_from_histogram(histogram)`¶

Extracts the sum of all values from a histogram.

Syntax:

sum_from_histogram(histogram)

Parameters:

histogram (Histogram): Histogram to analyze

Returns: Float64 - Sum of all values in the histogram

Example:

-- Get total duration from histogram
SELECT sum_from_histogram(duration_histogram) as total_duration
FROM performance_histograms;

Standard SQL Functions¶

Micromegas supports all standard DataFusion SQL functions including math, string, date/time, conditional, and array functions. For a complete list with examples, see the DataFusion Scalar Functions documentation.

Advanced Query Patterns¶

Histogram Analysis¶

-- Create performance histogram (0-100ms, 10 bins)
SELECT make_histogram(0.0, 100.0, 10, duration / 1000000.0) as response_time_ms_histogram
FROM view_instance('thread_spans', 'web_server')
WHERE name = 'handle_request'
  AND duration > 1000000;  -- > 1ms

-- Analyze histogram statistics
SELECT 
    quantile_from_histogram(response_time_histogram, 0.5) as median_ms,
    quantile_from_histogram(response_time_histogram, 0.95) as p95_ms,
    quantile_from_histogram(response_time_histogram, 0.99) as p99_ms,
    variance_from_histogram(response_time_histogram) as variance,
    count_from_histogram(response_time_histogram) as sample_count,
    sum_from_histogram(response_time_histogram) as total_time_ms
FROM performance_histograms
WHERE time_bin >= NOW() - INTERVAL '1 hour';

-- Aggregate histograms across multiple processes
SELECT 
    time_bin,
    sum_histograms(cpu_usage_histo) as combined_cpu_histogram,
    quantile_from_histogram(sum_histograms(cpu_usage_histo), 0.95) as p95_cpu
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY time_bin
ORDER BY time_bin;

Property Extraction and Filtering¶

-- Find logs with specific thread names
SELECT time, level, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') LIKE '%worker%'
ORDER BY time DESC;

High-Performance JSONB Property Access¶

-- Convert properties to JSONB for better performance
SELECT
    time,
    msg,
    property_get(properties_to_jsonb(properties), 'service') as service,
    property_get(properties_to_jsonb(properties), 'version') as version
FROM log_entries
WHERE property_get(properties_to_jsonb(properties), 'env') = 'production'
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time DESC;

-- Efficient property filtering with JSONB
WITH jsonb_logs AS (
    SELECT
        time,
        level,
        msg,
        properties_to_jsonb(properties) as jsonb_props
    FROM log_entries
    WHERE time >= NOW() - INTERVAL '1 day'
)
SELECT
    time,
    level,
    msg,
    property_get(jsonb_props, 'service') as service,
    property_get(jsonb_props, 'request_id') as request_id
FROM jsonb_logs
WHERE property_get(jsonb_props, 'error_code') IS NOT NULL
ORDER BY time DESC;

-- Property aggregation with optimal performance
SELECT
    property_get(properties_to_jsonb(properties), 'service') as service,
    property_get(properties_to_jsonb(properties), 'env') as environment,
    COUNT(*) as event_count,
    COUNT(CASE WHEN level <= 2 THEN 1 END) as error_count
FROM log_entries
WHERE time >= NOW() - INTERVAL '1 hour'
  AND property_get(properties_to_jsonb(properties), 'service') IS NOT NULL
GROUP BY service, environment
ORDER BY error_count DESC;

JSON Data Processing¶

-- Parse and extract configuration from JSON logs
SELECT 
    time,
    msg,
    jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service_name,
    jsonb_as_i64(jsonb_get(jsonb_parse(msg), 'port')) as port,
    jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'cpu_limit')) as cpu_limit
FROM log_entries
WHERE msg LIKE '%{%'  -- Contains JSON
  AND jsonb_parse(msg) IS NOT NULL
ORDER BY time DESC;

-- Aggregate metrics from JSON payloads
SELECT 
    jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service,
    COUNT(*) as event_count,
    AVG(jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'response_time'))) as avg_response_ms
FROM log_entries
WHERE msg LIKE '%response_time%'
  AND jsonb_parse(msg) IS NOT NULL
GROUP BY service
ORDER BY avg_response_ms DESC;

Time-based Aggregation¶

-- Hourly error counts
SELECT 
    date_trunc('hour', time) as hour,
    COUNT(*) as error_count
FROM log_entries
WHERE level <= 2  -- Fatal and Error
  AND time >= NOW() - INTERVAL '24 hours'
GROUP BY date_trunc('hour', time)
ORDER BY hour;

Performance Trace Analysis¶

-- Top 10 slowest functions with statistics
SELECT 
    name,
    COUNT(*) as call_count,
    AVG(duration) / 1000000.0 as avg_ms,
    MAX(duration) / 1000000.0 as max_ms,
    STDDEV(duration) / 1000000.0 as stddev_ms
FROM view_instance('thread_spans', 'my_process')
WHERE duration > 100000  -- > 0.1ms
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 10;

DataFusion Reference¶

Micromegas supports all standard DataFusion SQL syntax, functions, and operators. For complete documentation including functions, operators, data types, and SQL syntax, see the Apache DataFusion SQL Reference.

Next Steps¶

Query Patterns - Common observability query patterns
Performance Guide - Optimize your queries for best performance
Schema Reference - Complete view and field reference

Functions Reference¶

Micromegas Extensions¶

Table Functions¶

view_instance(view_name, identifier)¶

list_partitions() 🔧¶

retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧¶

materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds) 🔧¶

list_view_sets() 🔧¶

retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧¶

retire_partition_by_file(file_path) 🔧¶

perfetto_trace_chunks(process_id, span_types, start_time, end_time)¶

Scalar Functions¶

JSON/JSONB Functions¶

jsonb_parse(json_string)¶

jsonb_get(jsonb, key)¶

jsonb_format_json(jsonb)¶

jsonb_as_string(jsonb)¶

jsonb_as_f64(jsonb)¶

jsonb_as_i64(jsonb)¶

Data Access Functions¶

get_payload(process_id, stream_id, block_id)¶

Property Functions¶

property_get(properties, key)¶

properties_length(properties)¶

properties_to_dict(properties)¶

properties_to_jsonb(properties)¶

properties_to_array(dict_properties)¶

Histogram Functions¶

make_histogram(start, end, bins, values)¶

sum_histograms(histogram_column)¶

quantile_from_histogram(histogram, quantile)¶

variance_from_histogram(histogram)¶

count_from_histogram(histogram)¶

sum_from_histogram(histogram)¶

Standard SQL Functions¶

Advanced Query Patterns¶

Histogram Analysis¶

Property Extraction and Filtering¶

High-Performance JSONB Property Access¶

JSON Data Processing¶

Time-based Aggregation¶

Performance Trace Analysis¶

DataFusion Reference¶

Next Steps¶

`view_instance(view_name, identifier)`¶

`list_partitions()` 🔧¶

`retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time)` 🔧¶

`materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds)` 🔧¶

`list_view_sets()` 🔧¶

`retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time)` 🔧¶

`retire_partition_by_file(file_path)` 🔧¶

`perfetto_trace_chunks(process_id, span_types, start_time, end_time)`¶

`jsonb_parse(json_string)`¶

`jsonb_get(jsonb, key)`¶

`jsonb_format_json(jsonb)`¶

`jsonb_as_string(jsonb)`¶

`jsonb_as_f64(jsonb)`¶

`jsonb_as_i64(jsonb)`¶

`get_payload(process_id, stream_id, block_id)`¶

`property_get(properties, key)`¶

`properties_length(properties)`¶

`properties_to_dict(properties)`¶

`properties_to_jsonb(properties)`¶

`properties_to_array(dict_properties)`¶

`make_histogram(start, end, bins, values)`¶

`sum_histograms(histogram_column)`¶

`quantile_from_histogram(histogram, quantile)`¶

`variance_from_histogram(histogram)`¶

`count_from_histogram(histogram)`¶

`sum_from_histogram(histogram)`¶