Quick Start¶
Get up and running with Micromegas SQL queries in minutes. This guide shows you the essential patterns for querying your observability data.
Basic Connection¶
All Micromegas queries start by connecting to the analytics service:
import datetime
import micromegas
# Connect to Micromegas analytics service
client = micromegas.connect()
The connect()
function connects to the analytics service at grpc://localhost:50051
.
Your First Query¶
Let's query recent log entries to see what data is available:
# Set up time range for queries
now = datetime.datetime.now(datetime.timezone.utc)
begin = now - datetime.timedelta(hours=1)
end = now
# Query recent log entries
sql = """
SELECT time, process_id, level, target, msg
FROM log_entries
WHERE level <= 4
ORDER BY time DESC
LIMIT 10;
"""
# Execute the query
logs = client.query(sql, begin, end)
print(logs)
print(f"Result type: {type(logs)}") # pandas.DataFrame
Key points:
- ⚡ Important: Always specify time range via API parameters (
begin
,end
) for best performance - Results are returned as pandas DataFrames
level <= 4
filters to show errors and warnings (see log levels)- Use API time parameters instead of SQL time filters for partition elimination
Understanding Return Types¶
All queries return pandas DataFrames:
# Query returns a pandas DataFrame
result = client.query("SELECT process_id, exe FROM processes LIMIT 5;")
# Access DataFrame properties
print(f"Shape: {result.shape}")
print(f"Columns: {result.columns.tolist()}")
print(f"Data types:\n{result.dtypes}")
# Use pandas operations
filtered = result[result['exe'].str.contains('analytics')]
print(filtered.head())
This makes it easy to work with results using the entire pandas ecosystem for analysis, visualization, and data processing.
Essential Query Patterns¶
1. Process Information¶
Get an overview of processes sending telemetry:
processes = client.query("""
SELECT process_id, exe, computer, start_time
FROM processes
ORDER BY start_time DESC
LIMIT 10;
""")
print(processes)
2. Recent Log Entries¶
Query logs with error filtering:
error_logs = client.query("""
SELECT time, process_id, level, target, msg
FROM log_entries
WHERE level <= 3 -- Fatal, Error, Warn
ORDER BY time DESC
LIMIT 50;
""", begin, end)
print(error_logs)
3. Performance Metrics¶
Query numeric measurements:
metrics = client.query("""
SELECT time, process_id, name, value, unit
FROM measures
WHERE name LIKE '%cpu%'
ORDER BY time DESC
LIMIT 20;
""", begin, end)
print(metrics)
4. Process-Specific Data¶
Use view instances for better performance when focusing on specific processes:
process_id = "your_process_id_here" # Replace with actual process ID
process_logs = client.query(f"""
SELECT time, level, target, msg
FROM view_instance('log_entries', '{process_id}')
WHERE level <= 3
ORDER BY time DESC
LIMIT 20;
""", begin, end)
print(process_logs)
Log Levels¶
Micromegas uses numeric log levels for efficient filtering:
Level | Name | Description |
---|---|---|
1 | Fatal | Critical errors that cause application termination |
2 | Error | Errors that don't stop execution but need attention |
3 | Warn | Warning conditions that might cause problems |
4 | Info | Informational messages about normal operation |
5 | Debug | Detailed information for debugging |
6 | Trace | Very detailed tracing information |
Common filters:
level <= 2
- Only fatal and error messageslevel <= 3
- Fatal, error, and warning messageslevel <= 4
- All messages except debug and trace
Time Range Best Practices¶
Always Use Time Ranges¶
# ✅ Good - efficient and memory-safe
df = client.query(sql, begin_time, end_time)
# ❌ Avoid - can be slow and memory-intensive
df = client.query(sql) # Queries ALL data
Common Time Ranges¶
now = datetime.datetime.now(datetime.timezone.utc)
# Last hour
begin = now - datetime.timedelta(hours=1)
# Last day
begin = now - datetime.timedelta(days=1)
# Last week
begin = now - datetime.timedelta(weeks=1)
# Custom range
begin = datetime.datetime(2024, 1, 1, tzinfo=datetime.timezone.utc)
end = datetime.datetime(2024, 1, 2, tzinfo=datetime.timezone.utc)
Safe Queries Without Time Ranges¶
Some queries are safe to run without time ranges because they operate on small metadata tables:
# Process information (typically small dataset)
processes = client.query("SELECT process_id, exe FROM processes LIMIT 10;")
# Stream metadata
streams = client.query("SELECT stream_id, process_id FROM streams LIMIT 10;")
# Count queries (use with caution on large datasets)
count = client.query("SELECT COUNT(*) FROM log_entries;")
Performance Impact
Avoid querying log_entries
, measures
, thread_spans
, or async_events
without time ranges on production systems with large datasets.
Next Steps¶
Now that you can run basic queries:
- Explore the Python API - Learn about streaming and advanced features
- Review the Schema - Understand all available fields and data types
- Try Query Patterns - Common observability query patterns
- Optimize Performance - Learn to write efficient queries
Quick Reference¶
Essential Views¶
processes
- Process metadatalog_entries
- Application logsmeasures
- Numeric metricsthread_spans
- Execution timingasync_events
- Async operation tracking
Key Functions¶
view_instance('view_name', 'process_id')
- Process-scoped viewsproperty_get(properties, 'key')
- Extract property valuesmake_histogram(start, end, bins, values)
- Create histograms with specified range
Time Functions¶
NOW()
- Current timestampINTERVAL '1 hour'
- Time durationdate_trunc('hour', time)
- Truncate to time boundary