Functions Reference¶
This page provides a complete reference to all SQL functions available in Micromegas queries, including both standard DataFusion functions and Micromegas-specific extensions.
Micromegas Extensions¶
Table Functions¶
Table functions return tables that can be used in FROM clauses.
view_instance(view_name, identifier)
¶
Creates a process or stream-scoped view instance for better performance.
Syntax:
Parameters:
- view_name
(Utf8
): Name of the view ('log_entries', 'measures', 'thread_spans', 'async_events')
- identifier
(Utf8
): Process ID (for most views) or Stream ID (for thread_spans)
Returns: Schema depends on the view type (see Schema Reference)
Examples:
-- Get logs for a specific process
SELECT time, level, msg
FROM view_instance('log_entries', 'my_process_123')
WHERE level <= 3;
-- Get spans for a specific stream
SELECT name, duration
FROM view_instance('thread_spans', 'stream_456')
WHERE duration > 1000000; -- > 1ms
list_partitions()
đ§¶
Administrative Function - Lists available data partitions in the lakehouse.
Syntax:
Returns:
Column | Type | Description |
---|---|---|
view_set_name | Utf8 | Name of the view set |
view_instance_id | Utf8 | Instance identifier |
begin_insert_time | Timestamp(Nanosecond) | Partition start time |
end_insert_time | Timestamp(Nanosecond) | Partition end time |
min_event_time | Timestamp(Nanosecond) | Earliest event time |
max_event_time | Timestamp(Nanosecond) | Latest event time |
updated | Timestamp(Nanosecond) | Last update time |
file_path | Utf8 | Partition file path |
file_size | Int64 | File size in bytes |
Example:
-- View partition information
SELECT view_set_name, view_instance_id, file_size
FROM list_partitions()
ORDER BY updated DESC;
âšī¸ Administrative Use: This function provides system-level partition metadata primarily useful for administrators monitoring lakehouse storage and partition management. Regular users querying data typically don't need this information.
retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time)
đ§¶
Administrative Function - Retires (removes) data partitions from the lakehouse for a specified time range. Returns a log stream of the operation.
Syntax:
SELECT * FROM retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time)
Parameters:
- view_set_name
(Utf8
): Name of the view set
- view_instance_id
(Utf8
): Instance identifier
- begin_insert_time
(Timestamp(Nanosecond)
): Start time for partition retirement
- end_insert_time
(Timestamp(Nanosecond)
): End time for partition retirement
Returns: Log stream table with operation progress and messages
Example:
-- Retire old partitions for a specific view
SELECT * FROM retire_partitions(
'log_entries',
'global',
NOW() - INTERVAL '30 days',
NOW() - INTERVAL '7 days'
);
â ī¸ DESTRUCTIVE OPERATION: This function permanently removes data partitions from the lakehouse, making the contained data inaccessible. Use only for data retention management and with extreme caution in production environments. Ensure proper backups exist before retiring partitions.
materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds)
đ§¶
Administrative Function - Materializes data partitions for a view over a specified time range. Returns a log stream of the operation.
Syntax:
SELECT * FROM materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds)
Parameters:
- view_name
(Utf8
): Name of the view to materialize
- begin_insert_time
(Timestamp(Nanosecond)
): Start time for materialization
- end_insert_time
(Timestamp(Nanosecond)
): End time for materialization
- partition_delta_seconds
(Int64
): Partition time delta in seconds
Returns: Log stream table with operation progress and messages
Example:
-- Materialize partitions for CPU usage view
SELECT * FROM materialize_partitions(
'cpu_usage_per_process_per_minute',
NOW() - INTERVAL '1 day',
NOW(),
3600 -- 1 hour partitions
);
â ī¸ Administrative Use Only: This function is intended for system administrators and data engineers managing the lakehouse infrastructure. Regular users querying data should not need to call this function. It triggers background processing to create materialized partitions and can impact system performance.
Scalar Functions¶
JSON/JSONB Functions¶
Micromegas provides functions for working with JSON data stored in binary JSONB format for efficient storage and querying.
jsonb_parse(json_string)
¶
Parses a JSON string into binary JSONB format.
Syntax:
Parameters:
- json_string
(Utf8
): JSON string to parse
Returns: Binary
- Parsed JSONB data
Example:
-- Parse JSON string into JSONB
SELECT jsonb_parse('{"name": "web_server", "port": 8080}') as parsed_json
FROM processes;
jsonb_get(jsonb, key)
¶
Extracts a value from a JSONB object by key name.
Syntax:
Parameters:
- jsonb
(Binary
): JSONB object
- key
(Utf8
): Key name to extract
Returns: Binary
- JSONB value or NULL if key not found
Example:
-- Extract name field from JSON data
SELECT jsonb_get(jsonb_parse('{"name": "web_server", "port": 8080}'), 'name') as name_value
FROM processes;
jsonb_format_json(jsonb)
¶
Converts a JSONB value back to a human-readable JSON string.
Syntax:
Parameters:
- jsonb
(Binary
): JSONB value to format
Returns: Utf8
- JSON string representation
Example:
-- Format JSONB back to JSON string
SELECT jsonb_format_json(jsonb_parse('{"name": "web_server"}')) as json_string
FROM processes;
jsonb_as_string(jsonb)
¶
Casts a JSONB value to a string.
Syntax:
Parameters:
- jsonb
(Binary
): JSONB value to convert
Returns: Utf8
- String value or NULL if not a string
Example:
-- Extract string value from JSONB
SELECT jsonb_as_string(jsonb_get(jsonb_parse('{"service": "web_server"}'), 'service')) as service_name
FROM processes;
jsonb_as_f64(jsonb)
¶
Casts a JSONB value to a 64-bit float.
Syntax:
Parameters:
- jsonb
(Binary
): JSONB value to convert
Returns: Float64
- Numeric value or NULL if not a number
Example:
-- Extract numeric value from JSONB
SELECT jsonb_as_f64(jsonb_get(jsonb_parse('{"cpu_usage": 75.5}'), 'cpu_usage')) as cpu_usage
FROM processes;
jsonb_as_i64(jsonb)
¶
Casts a JSONB value to a 64-bit integer.
Syntax:
Parameters:
- jsonb
(Binary
): JSONB value to convert
Returns: Int64
- Integer value or NULL if not an integer
Example:
-- Extract integer value from JSONB
SELECT jsonb_as_i64(jsonb_get(jsonb_parse('{"port": 8080}'), 'port')) as port_number
FROM processes;
Data Access Functions¶
get_payload(process_id, stream_id, block_id)
¶
Retrieves the raw binary payload of a telemetry block from data lake storage.
Syntax:
Parameters:
- process_id
(Utf8
): Process identifier
- stream_id
(Utf8
): Stream identifier
- block_id
(Utf8
): Block identifier
Returns: Binary
- Raw block payload data
Example:
-- Get raw payload data for specific blocks
SELECT process_id, stream_id, block_id, get_payload(process_id, stream_id, block_id) as payload
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
LIMIT 10;
Note: This is an async function that fetches data from object storage. Use sparingly in queries as it can impact performance.
property_get(properties, key)
¶
Extracts a value from a properties map.
Syntax:
Parameters:
- properties
(List<Struct>
): Properties map field
- key
(Utf8
): Property key to extract
Returns: Utf8
- Property value or NULL if not found
Examples:
-- Get thread name from process properties
SELECT time, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;
-- Filter by custom property
SELECT time, name, value
FROM measures
WHERE property_get(properties, 'source') = 'system_monitor';
Histogram Functions¶
Micromegas provides a comprehensive set of functions for creating and analyzing histograms, enabling efficient statistical analysis of large datasets.
make_histogram(start, end, bins, values)
¶
Creates histogram data from numeric values with specified range and bin count.
Syntax:
Parameters:
- start
(Float64
): Histogram minimum value
- end
(Float64
): Histogram maximum value
- bins
(Int64
): Number of histogram bins
- values
(Float64
): Column of numeric values to histogram
Returns: Histogram structure with buckets and counts
Example:
-- Create histogram of response times (0-50ms, 20 bins)
SELECT make_histogram(0.0, 50.0, 20, CAST(duration AS FLOAT64) / 1000000.0) as duration_histogram
FROM view_instance('thread_spans', 'web_server_123')
WHERE name = 'handle_request';
sum_histograms(histogram_column)
¶
Aggregates multiple histograms by summing their bins.
Syntax:
Parameters:
- histogram_column
(Histogram): Column containing histogram values
Returns: Combined histogram with summed bins
Example:
-- Combine histograms across processes
SELECT sum_histograms(duration_histogram) as combined_histogram
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 hour';
quantile_from_histogram(histogram, quantile)
¶
Estimates a quantile value from a histogram.
Syntax:
Parameters:
- histogram
(Histogram): Histogram to analyze
- quantile
(Float64
): Quantile to estimate (0.0 to 1.0)
Returns: Float64
- Estimated quantile value
Examples:
-- Get median (50th percentile) response time
SELECT quantile_from_histogram(duration_histogram, 0.5) as median_duration
FROM performance_histograms;
-- Get 95th percentile response time
SELECT quantile_from_histogram(duration_histogram, 0.95) as p95_duration
FROM performance_histograms;
variance_from_histogram(histogram)
¶
Calculates variance from histogram data.
Syntax:
Parameters:
- histogram
(Histogram): Histogram to analyze
Returns: Float64
- Variance of the histogram data
Example:
-- Calculate response time variance
SELECT variance_from_histogram(duration_histogram) as duration_variance
FROM performance_histograms;
count_from_histogram(histogram)
¶
Extracts the total count of values from a histogram.
Syntax:
Parameters:
- histogram
(Histogram): Histogram to analyze
Returns: UInt64
- Total number of values in the histogram
Example:
-- Get total sample count from histogram
SELECT count_from_histogram(duration_histogram) as total_samples
FROM performance_histograms;
sum_from_histogram(histogram)
¶
Extracts the sum of all values from a histogram.
Syntax:
Parameters:
- histogram
(Histogram): Histogram to analyze
Returns: Float64
- Sum of all values in the histogram
Example:
-- Get total duration from histogram
SELECT sum_from_histogram(duration_histogram) as total_duration
FROM performance_histograms;
Standard SQL Functions¶
Micromegas supports all standard DataFusion SQL functions including math, string, date/time, conditional, and array functions. For a complete list with examples, see the DataFusion Scalar Functions documentation.
Advanced Query Patterns¶
Histogram Analysis¶
-- Create performance histogram (0-100ms, 10 bins)
SELECT make_histogram(0.0, 100.0, 10, duration / 1000000.0) as response_time_ms_histogram
FROM view_instance('thread_spans', 'web_server')
WHERE name = 'handle_request'
AND duration > 1000000; -- > 1ms
-- Analyze histogram statistics
SELECT
quantile_from_histogram(response_time_histogram, 0.5) as median_ms,
quantile_from_histogram(response_time_histogram, 0.95) as p95_ms,
quantile_from_histogram(response_time_histogram, 0.99) as p99_ms,
variance_from_histogram(response_time_histogram) as variance,
count_from_histogram(response_time_histogram) as sample_count,
sum_from_histogram(response_time_histogram) as total_time_ms
FROM performance_histograms
WHERE time_bin >= NOW() - INTERVAL '1 hour';
-- Aggregate histograms across multiple processes
SELECT
time_bin,
sum_histograms(cpu_usage_histo) as combined_cpu_histogram,
quantile_from_histogram(sum_histograms(cpu_usage_histo), 0.95) as p95_cpu
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY time_bin
ORDER BY time_bin;
Property Extraction and Filtering¶
-- Find logs with specific thread names
SELECT time, level, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') LIKE '%worker%'
ORDER BY time DESC;
JSON Data Processing¶
-- Parse and extract configuration from JSON logs
SELECT
time,
msg,
jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service_name,
jsonb_as_i64(jsonb_get(jsonb_parse(msg), 'port')) as port,
jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'cpu_limit')) as cpu_limit
FROM log_entries
WHERE msg LIKE '%{%' -- Contains JSON
AND jsonb_parse(msg) IS NOT NULL
ORDER BY time DESC;
-- Aggregate metrics from JSON payloads
SELECT
jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service,
COUNT(*) as event_count,
AVG(jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'response_time'))) as avg_response_ms
FROM log_entries
WHERE msg LIKE '%response_time%'
AND jsonb_parse(msg) IS NOT NULL
GROUP BY service
ORDER BY avg_response_ms DESC;
Time-based Aggregation¶
-- Hourly error counts
SELECT
date_trunc('hour', time) as hour,
COUNT(*) as error_count
FROM log_entries
WHERE level <= 2 -- Fatal and Error
AND time >= NOW() - INTERVAL '24 hours'
GROUP BY date_trunc('hour', time)
ORDER BY hour;
Performance Trace Analysis¶
-- Top 10 slowest functions with statistics
SELECT
name,
COUNT(*) as call_count,
AVG(duration) / 1000000.0 as avg_ms,
MAX(duration) / 1000000.0 as max_ms,
STDDEV(duration) / 1000000.0 as stddev_ms
FROM view_instance('thread_spans', 'my_process')
WHERE duration > 100000 -- > 0.1ms
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 10;
DataFusion Reference¶
Micromegas supports all standard DataFusion SQL syntax, functions, and operators. For complete documentation including functions, operators, data types, and SQL syntax, see the Apache DataFusion SQL Reference.
Next Steps¶
- Query Patterns - Common observability query patterns
- Performance Guide - Optimize your queries for best performance
- Schema Reference - Complete view and field reference