Performance Guide¶
Guidelines for writing efficient Micromegas SQL queries and avoiding common performance pitfalls.
Critical Performance Rules¶
1. Always Use Time Ranges (via Python API)¶
⚡ Performance Tip: Always specify time ranges through the Python API parameters, not in SQL WHERE clauses.
❌ Inefficient - SQL time filter:
# Analytics server scans ALL partitions, then filters in SQL
sql = """
SELECT COUNT(*) FROM log_entries
WHERE time >= NOW() - INTERVAL '1 hour';
"""
result = client.query(sql) # No time range parameters!
✅ Efficient - API time range:
import datetime
# Analytics server eliminates irrelevant partitions BEFORE query execution
now = datetime.datetime.now(datetime.timezone.utc)
begin = now - datetime.timedelta(hours=1)
end = now
sql = "SELECT COUNT(*) FROM log_entries;"
result = client.query(sql, begin, end) # ⭐ Time range in API
Why API time ranges are faster:
- Partition Elimination: Analytics server removes entire partitions from consideration before SQL execution
- Metadata Optimization: Uses partition metadata to skip irrelevant data files
- Memory Efficiency: Only loads relevant data into query engine memory
- Network Efficiency: Transfers only relevant data over FlightSQL
Performance Impact:
- API time range: Query considers only 1-2 partitions
- SQL time filter: Query scans all partitions, then filters millions of rows
2. Use Process-Scoped Views¶
❌ Less Efficient:
✅ More Efficient:
Query Optimization¶
Predicate Pushdown¶
Micromegas automatically pushes filters down to the storage layer when possible:
-- These filters are pushed to Parquet reader for efficiency
WHERE time >= NOW() - INTERVAL '1 day'
AND level <= 3
AND process_id = 'my_process'
Memory Considerations¶
Use LIMIT for exploration:
Use streaming for large results: