Skip to content

Administrative Functions Reference

This page provides detailed reference documentation for all administrative functions available in the Micromegas admin module.

Table Functions (UDTFs)

list_view_sets()

Description: Lists all available view sets with their current schema information.

Usage:

SELECT * FROM list_view_sets();

Returns: Table with columns:

Column Type Description
view_set_name String Name of the view set (e.g., "log_entries", "measures")
current_schema_hash Binary Version identifier for current schema (e.g., [4])
schema String Full schema as formatted string
has_view_maker Boolean Whether view set supports non-global instances
global_instance_available Boolean Whether a global instance exists

Example:

-- List all view sets with schema versions
SELECT view_set_name, current_schema_hash, has_view_maker 
FROM list_view_sets()
ORDER BY view_set_name;

-- Find view sets with specific schema version
SELECT * FROM list_view_sets() 
WHERE current_schema_hash = '[4]';

Performance: Fast operation, queries in-memory view registry.


list_partitions()

Description: Lists all partitions in the lakehouse with metadata.

Usage:

SELECT * FROM list_partitions();

Returns: Table with columns including:

Column Type Description
view_set_name String View set name
view_instance_id String Instance ID or 'global'
begin_insert_time Timestamp Partition start time
end_insert_time Timestamp Partition end time
min_event_time Timestamp Earliest event time in partition
max_event_time Timestamp Latest event time in partition
updated Timestamp Last update time
file_path String Object storage file path
file_size Integer File size in bytes
file_schema_hash Binary Schema version when partition was created
source_data_hash Binary Hash of the source data
num_rows Integer Number of rows in the partition

Example:

-- List partitions for specific view set
SELECT file_path, file_size, file_schema_hash 
FROM list_partitions() 
WHERE view_set_name = 'log_entries';

-- Find partitions by schema version
SELECT view_set_name, COUNT(*) as partition_count
FROM list_partitions()
WHERE file_schema_hash = '[3]'
GROUP BY view_set_name;

Performance: Queries database metadata table, indexed by view_set_name.


retire_partitions(view_set, view_instance, start_time, end_time)

Description: Retires partitions within a specified time range.

Parameters: - view_set (String): Target view set name - view_instance (String): Target view instance ID
- start_time (Timestamp): Start of time range (inclusive) - end_time (Timestamp): End of time range (inclusive)

Usage:

SELECT * FROM retire_partitions('log_entries', 'process-123', '2024-01-01T00:00:00Z', '2024-01-02T00:00:00Z');

Returns: Table with retirement operation results.

Safety: Uses database transactions for atomicity. All partitions in time range are retired.

Time-Based Retirement

This function retires ALL partitions in the specified time range, regardless of schema compatibility. Use with caution.


Scalar Functions (UDFs)

delete_duplicate_processes()

Description: Deletes duplicate processes within the query time range. A duplicate is defined as multiple rows with the same process_id. This function keeps the row with the earliest insert_time and deletes all others.

Parameters: None (time range is set via the Python client)

Returns: String message (e.g., "Deleted 42 duplicate processes")

Destructive Operation

This function permanently deletes data. The time range identifies duplicates, but ALL duplicate rows for matching IDs are deleted regardless of their insert_time.


delete_duplicate_streams()

Description: Deletes duplicate streams within the query time range. A duplicate is defined as multiple rows with the same stream_id. This function keeps the row with the earliest insert_time and deletes all others.

Parameters: None (time range is set via the Python client)

Returns: String message (e.g., "Deleted 15 duplicate streams")

Destructive Operation

This function permanently deletes data. The time range identifies duplicates, but ALL duplicate rows for matching IDs are deleted regardless of their insert_time.


delete_duplicate_blocks()

Description: Deletes duplicate blocks within the query time range. A duplicate is defined as multiple rows with the same block_id. This function keeps the row with the earliest insert_time and deletes all others.

Parameters: None (time range is set via the Python client)

Returns: String message (e.g., "Deleted 100 duplicate blocks")

Destructive Operation

This function permanently deletes data. The time range identifies duplicates, but ALL duplicate rows for matching IDs are deleted regardless of their insert_time.


retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time)

Description: Surgically retires a single partition by its metadata identifiers. This is the preferred method for retiring partitions as it works for both empty partitions (file_path=NULL) and non-empty partitions.

Parameters: - view_set_name (String): Name of the view set - view_instance_id (String): Instance ID (e.g., process_id or 'global') - begin_insert_time (Timestamp): Begin insert time of the partition - end_insert_time (Timestamp): End insert time of the partition

Usage:

SELECT retire_partition_by_metadata(
    'log_entries',
    'process-123',
    TIMESTAMP '2024-01-01 00:00:00',
    TIMESTAMP '2024-01-01 01:00:00'
) as result;

Returns: String message indicating success or failure: - Success: "SUCCESS: Retired partition <view_set>/<instance> [<begin>, <end>)" - Failure: "ERROR: Partition not found: <view_set>/<instance> [<begin>, <end>)"

Safety: - Surgical precision - only targets the exact specified partition by its natural identifiers - Works for both empty partitions (file_path=NULL) and non-empty partitions - Uses database transactions with automatic rollback on batch errors - Files are scheduled for cleanup rather than immediately deleted

Example:

-- Retire specific partition by metadata
SELECT retire_partition_by_metadata(
    'log_entries',
    'process-123',
    TIMESTAMP '2024-01-01 00:00:00',
    TIMESTAMP '2024-01-01 01:00:00'
);

-- Batch retire incompatible partitions
SELECT
    view_set_name,
    view_instance_id,
    retire_partition_by_metadata(
        view_set_name,
        view_instance_id,
        begin_insert_time,
        end_insert_time
    ) as result
FROM list_partitions() p
JOIN list_view_sets() vs ON p.view_set_name = vs.view_set_name
WHERE p.file_schema_hash != vs.current_schema_hash
LIMIT 10;

Batch Behavior: When called with multiple rows in a single query, all operations are executed within a single database transaction. If any retirement fails, all changes are rolled back and a ROLLED_BACK message is appended indicating the number of reverted changes.

Performance: Single partition operation, very fast with appropriate database indexes.


retire_partition_by_file(file_path)

Description: Retires a single partition by exact file path match.

Prefer metadata-based retirement

For new code, prefer retire_partition_by_metadata() which works for both empty and non-empty partitions.

Parameters: - file_path (String): Exact file path of partition to retire

Usage:

SELECT retire_partition_by_file('s3://bucket/data/log_entries/process-123/2024/01/01/file.parquet') as result;

Returns: String message indicating success or failure: - Success: "SUCCESS: Retired partition <file_path>" - Failure: "ERROR: Partition not found: <file_path>"

Limitation: Cannot retire empty partitions (where file_path is NULL). Use retire_partition_by_metadata() for empty partitions.

Performance: Single partition operation, very fast with appropriate database indexes.


Python API Functions

micromegas.admin.list_incompatible_partitions(client, view_set_name=None)

Description: Identifies partitions with schemas incompatible with current schema versions. Returns one row per incompatible partition for precise targeting.

Parameters: - client (FlightSQLClient): Connected Micromegas client - view_set_name (str, optional): Filter to specific view set

Returns: pandas DataFrame with columns:

Column Type Description
view_set_name str View set name
view_instance_id str Instance ID
begin_insert_time timestamp Begin insert time of the partition
end_insert_time timestamp End insert time of the partition
incompatible_schema_hash str Old schema version in partition
current_schema_hash str Current schema version
file_path str File path for the partition (NULL for empty partitions)
file_size int Size in bytes of the partition file (0 for empty partitions)

Example:

import micromegas
import micromegas.admin

client = micromegas.connect()

# List all incompatible partitions
incompatible = micromegas.admin.list_incompatible_partitions(client)
print(f"Found {len(incompatible)} incompatible partitions")

# List for specific view set
log_incompatible = micromegas.admin.list_incompatible_partitions(client, 'log_entries')
print(f"Log entries: {len(log_incompatible)} incompatible partitions")
print(f"Total size: {log_incompatible['file_size'].sum()} bytes")

# Check for empty partitions (file_path is NULL)
empty_partitions = incompatible[incompatible['file_path'].isna()]
print(f"Empty partitions: {len(empty_partitions)}")

Implementation: Uses SQL JOIN between list_partitions() and list_view_sets() with server-side filtering.

Performance: Efficient server-side processing, minimal network overhead.


micromegas.admin.retire_incompatible_partitions(client, view_set_name=None)

Description: Safely retires partitions with incompatible schemas using metadata-based retirement. This handles both empty partitions (file_path=NULL) and non-empty partitions.

Parameters: - client (FlightSQLClient): Connected Micromegas client - view_set_name (str, optional): Filter to specific view set

Returns: pandas DataFrame with columns:

Column Type Description
view_set_name str View set processed
view_instance_id str Instance ID processed
partitions_retired int Count of successfully retired partitions
partitions_failed int Count of partitions that failed to retire
storage_freed_bytes int Total bytes freed from storage
retirement_messages list Detailed messages for each retirement attempt

Example:

import micromegas
import micromegas.admin

client = micromegas.connect()

# Preview what would be retired
preview = micromegas.admin.list_incompatible_partitions(client, 'log_entries')
print(f"Will retire {len(preview)} partitions")
print(f"Will free {preview['file_size'].sum() / (1024**3):.2f} GB")

# Retire incompatible partitions
result = micromegas.admin.retire_incompatible_partitions(client, 'log_entries')
for _, row in result.iterrows():
    print(f"Retired {row['partitions_retired']} partitions from {row['view_set_name']}")
    print(f"Failed {row['partitions_failed']} partitions")
    print(f"Freed {row['storage_freed_bytes']} bytes")

Safety Features: - Uses retire_partition_by_metadata() for surgical precision - Works for both empty partitions (file_path=NULL) and non-empty partitions - Cannot accidentally retire compatible partitions - Comprehensive error handling with detailed messages - Continues processing even if individual partitions fail - Results grouped by view_set_name and view_instance_id for clarity

Implementation: 1. Calls list_incompatible_partitions() to identify targets (one row per partition) 2. Groups partitions by view_set_name and view_instance_id 3. For each partition, calls retire_partition_by_metadata() with the partition's natural identifiers 4. Aggregates results and provides summary statistics per group 5. Includes detailed operation logs for auditing

Performance: Processes partitions individually for safety, efficient for typical partition counts.


Complex Query Examples

Find Schema Migration Candidates

-- Identify view sets with the most incompatible partitions
SELECT 
    vs.view_set_name,
    vs.current_schema_hash,
    COUNT(DISTINCT p.file_schema_hash) as schema_versions_count,
    SUM(CASE WHEN p.file_schema_hash != vs.current_schema_hash THEN 1 ELSE 0 END) as incompatible_count,
    SUM(CASE WHEN p.file_schema_hash != vs.current_schema_hash THEN p.file_size ELSE 0 END) as incompatible_size_bytes
FROM list_view_sets() vs
LEFT JOIN list_partitions() p ON vs.view_set_name = p.view_set_name
GROUP BY vs.view_set_name, vs.current_schema_hash
HAVING incompatible_count > 0
ORDER BY incompatible_size_bytes DESC;

Analyze Partition Age Distribution

-- Find old incompatible partitions that are candidates for retirement
SELECT 
    p.view_set_name,
    p.file_schema_hash as old_schema,
    vs.current_schema_hash,
    COUNT(*) as partition_count,
    MIN(p.end_insert_time) as oldest_partition,
    MAX(p.end_insert_time) as newest_partition,
    SUM(p.file_size) as total_size_bytes
FROM list_partitions() p
JOIN list_view_sets() vs ON p.view_set_name = vs.view_set_name
WHERE p.file_schema_hash != vs.current_schema_hash
    AND p.end_insert_time < NOW() - INTERVAL '30 days'
GROUP BY p.view_set_name, p.file_schema_hash, vs.current_schema_hash
ORDER BY oldest_partition ASC;

Storage Impact Analysis

-- Calculate storage savings from retiring incompatible partitions
WITH incompatible_summary AS (
    SELECT 
        p.view_set_name,
        COUNT(*) as incompatible_partitions,
        SUM(p.file_size) as incompatible_size_bytes
    FROM list_partitions() p
    JOIN list_view_sets() vs ON p.view_set_name = vs.view_set_name
    WHERE p.file_schema_hash != vs.current_schema_hash
    GROUP BY p.view_set_name
),
total_summary AS (
    SELECT 
        view_set_name,
        COUNT(*) as total_partitions,
        SUM(file_size) as total_size_bytes
    FROM list_partitions()
    GROUP BY view_set_name
)
SELECT 
    t.view_set_name,
    COALESCE(i.incompatible_partitions, 0) as incompatible_partitions,
    t.total_partitions,
    ROUND(100.0 * COALESCE(i.incompatible_partitions, 0) / t.total_partitions, 2) as incompatible_percentage,
    COALESCE(i.incompatible_size_bytes, 0) as incompatible_size_bytes,
    t.total_size_bytes,
    ROUND(100.0 * COALESCE(i.incompatible_size_bytes, 0) / t.total_size_bytes, 2) as size_percentage
FROM total_summary t
LEFT JOIN incompatible_summary i ON t.view_set_name = i.view_set_name
ORDER BY size_percentage DESC;