find-hypertable-candidates by timescale
Analyze an existing PostgreSQL database to identify tables that would benefit from conversion to TimescaleDB hypertables
Data & Analytics
1.4K Stars
73 Forks
Updated Jan 9, 2026, 03:32 PM
Why Use This
This skill provides specialized capabilities for timescale's codebase.
Use Cases
- Developing new features in the timescale repository
- Refactoring existing code to follow timescale standards
- Understanding and working with timescale's codebase structure
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Skill Stats
SKILL.md 306 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: find-hypertable-candidates
description: Analyze an existing PostgreSQL database to identify tables that would benefit from conversion to TimescaleDB hypertables
---
# PostgreSQL Hypertable Candidate Analysis
Identify tables that would benefit from TimescaleDB hypertable conversion. After identification, use the companion "migrate-postgres-tables-to-hypertables" skill for configuration and migration.
## TimescaleDB Benefits
**Performance gains:** 90%+ compression, fast time-based queries, improved insert performance, efficient aggregations, continuous aggregates for materialization (dashboards, reports, analytics), automatic data management (retention, compression).
**Best for insert-heavy patterns:**
- Time-series data (sensors, metrics, monitoring)
- Event logs (user events, audit trails, application logs)
- Transaction records (orders, payments, financial)
- Sequential data (auto-incrementing IDs with timestamps)
- Append-only datasets (immutable records, historical)
**Requirements:** Large volumes (1M+ rows), time-based queries, infrequent updates
## Step 1: Database Schema Analysis
### Option A: From Database Connection
#### Table statistics and size
```sql
-- Get all tables with row counts and insert/update patterns
WITH table_stats AS (
SELECT
schemaname, tablename,
n_tup_ins as total_inserts,
n_tup_upd as total_updates,
n_tup_del as total_deletes,
n_live_tup as live_rows,
n_dead_tup as dead_rows
FROM pg_stat_user_tables
),
table_sizes AS (
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_total_relation_size(schemaname||'.'||tablename) as total_size_bytes
FROM pg_tables
WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
)
SELECT
ts.schemaname, ts.tablename, ts.live_rows,
tsize.total_size, tsize.total_size_bytes,
ts.total_inserts, ts.total_updates, ts.total_deletes,
ROUND(CASE WHEN ts.live_rows > 0
THEN (ts.total_inserts::float / ts.live_rows) * 100
ELSE 0 END, 2) as insert_ratio_pct
FROM table_stats ts
JOIN table_sizes tsize ON ts.schemaname = tsize.schemaname AND ts.tablename = tsize.tablename
ORDER BY tsize.total_size_bytes DESC;
```
**Look for:**
- mostly insert-heavy patterns (less updates/deletes)
- big tables (1M+ rows or 100MB+)
#### Index patterns
```sql
-- Identify common query dimensions
SELECT schemaname, tablename, indexname, indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
ORDER BY tablename, indexname;
```
**Look for:**
- Multiple indexes with timestamp/created_at columns → time-based queries
- Composite (entity_id, timestamp) indexes → good candidates
- Time-only indexes → time range filtering common
#### Query patterns (if pg_stat_statements available)
```sql
-- Check availability
SELECT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements');
-- Analyze expensive queries for candidate tables
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE query ILIKE '%your_table_name%'
ORDER BY total_exec_time DESC LIMIT 20;
```
**✅ Good patterns:** Time-based WHERE, entity filtering combined with time-based qualifiers, GROUP BY time_bucket, range queries over time
**❌ Poor patterns:** Non-time lookups with no time-based qualifiers in same query (WHERE email = ...)
#### Constraints
```sql
-- Check migration compatibility
SELECT conname, contype, pg_get_constraintdef(oid) as definition
FROM pg_constraint
WHERE conrelid = 'your_table_name'::regclass;
```
**Compatibility:**
- Primary keys (p): Must include partition column or ask user if can be modified
- Foreign keys (f): Plain→Hypertable and Hypertable→Plain OK, Hypertable→Hypertable NOT supported
- Unique constraints (u): Must include partition column or ask user if can be modified
- Check constraints (c): Usually OK
### Option B: From Code Analysis
#### ✅ GOOD Patterns
```python
# Append-only logging
INSERT INTO events (user_id, event_time, data) VALUES (...);
# Time-series collection
INSERT INTO metrics (device_id, timestamp, value) VALUES (...);
# Time-based queries
SELECT * FROM metrics WHERE timestamp >= NOW() - INTERVAL '24 hours';
# Time aggregations
SELECT DATE_TRUNC('day', timestamp), COUNT(*) GROUP BY 1;
```
#### ❌ POOR Patterns
```python
# Frequent updates to historical records
UPDATE users SET email = ..., updated_at = NOW() WHERE id = ...;
# Non-time lookups
SELECT * FROM users WHERE email = ...;
# Small reference tables
SELECT * FROM countries ORDER BY name;
```
#### Schema Indicators
**✅ GOOD:**
- Has timestamp/timestamptz column
- Multiple indexes with timestamp-based columns
- Composite (entity_id, timestamp) indexes
**❌ POOR:**
- Mostly indexes with non-time-based columns (on columns like email, name, status, etc.)
- Columns that you expect to be updated over time (updated_at, updated_by, status, etc.)
- Unique constraints on non-time fields
- Frequent updated_at modifications
- Small static tables
#### Special Case: ID-Based Tables
Sequential ID tables can be candidates if:
- Insert-mostly pattern / updates are either infrequent or only on recent records.
- If updates do happen, they occur on recent records (such as an order status being updated orderered->processing->delivered. Note once an order is delivered, it is unlikely to be updated again.)
- IDs correlate with time (as is the case for serial/auto-incrementing IDs/GENERATED ALWAYS AS IDENTITY)
- ID is the primary query dimension
- Recent data accessed more often (frequently the case in ecommerce, finance, etc.)
- Time-based reporting common (e.g. monthly, daily summaries/analytics)
```sql
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY, -- Can partition by ID
user_id BIGINT,
created_at TIMESTAMPTZ DEFAULT NOW() -- For sparse indexes
);
```
Note: For ID-based tables where there is also a time column (created_at, ordered_at, etc.),
you can partition by ID and use sparse indexes on the time column.
See the `migrate-postgres-tables-to-hypertables` skill for details.
## Step 2: Candidacy Scoring (8+ points = good candidate)
### Time-Series Characteristics (5+ points needed)
- Has timestamp/timestamptz column: **3 points**
- Data inserted chronologically: **2 points**
- Queries filter by time: **2 points**
- Time aggregations common: **2 points**
### Scale & Performance (3+ points recommended)
- Large table (1M+ rows or 100MB+): **2 points**
- High insert volume: **1 point**
- Infrequent updates to historical: **1 point**
- Range queries common: **1 point**
- Aggregation queries: **2 points**
### Data Patterns (bonus)
- Contains entity ID for segmentation (device_id, user_id, product_id, symbol, etc.): **1 point**
- Numeric measurements: **1 point**
- Log/event structure: **1 point**
## Common Patterns
### ✅ GOOD Candidates
**✅ Event/Log Tables** (user_events, audit_logs)
```sql
CREATE TABLE user_events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT,
event_type TEXT,
event_time TIMESTAMPTZ DEFAULT NOW(),
metadata JSONB
);
-- Partition by id, segment by user_id, enable minmax sparse_index on event_time
```
**✅ Sensor/IoT Data** (sensor_readings, telemetry)
```sql
CREATE TABLE sensor_readings (
device_id TEXT,
timestamp TIMESTAMPTZ,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
-- Partition by timestamp, segment by device_id, minmax sparse indexes on temperature and humidity
```
**✅ Financial/Trading** (stock_prices, transactions)
```sql
CREATE TABLE stock_prices (
symbol VARCHAR(10),
price_time TIMESTAMPTZ,
open_price DECIMAL,
close_price DECIMAL,
volume BIGINT
);
-- Partition by price_time, segment by symbol, minmax sparse indexes on open_price and close_price and volume
```
**✅ System Metrics** (monitoring_data)
```sql
CREATE TABLE system_metrics (
hostname TEXT,
metric_time TIMESTAMPTZ,
cpu_usage DOUBLE PRECISION,
memory_usage BIGINT
);
-- Partition by metric_time, segment by hostname, minmax sparse indexes on cpu_usage and memory_usage
```
### ❌ POOR Candidates
**❌ Reference Tables** (countries, categories)
```sql
CREATE TABLE countries (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
code CHAR(2)
);
-- Static data, no time component
```
**❌ User Profiles** (users, accounts)
```sql
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255),
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
-- Accessed by ID, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is id or email)
```
**❌ Settings/Config** (user_settings)
```sql
CREATE TABLE user_settings (
user_id BIGINT PRIMARY KEY,
theme VARCHAR(20), -- Changes: light -> dark -> auto
language VARCHAR(10), -- Changes: en -> es -> fr
notifications JSONB, -- Frequent preference updates
updated_at TIMESTAMPTZ
);
-- Accessed by user_id, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is user_id)
```
## Analysis Output Requirements
For each candidate table provide:
- **Score:** Based on criteria (8+ = strong candidate)
- **Pattern:** Insert vs update ratio
- **Access:** Time-based vs entity lookups
- **Size:** Current size and growth rate
- **Queries:** Time-range, aggregations, point lookups
Focus on insert-heavy patterns with time-based or sequential access. Tables scoring 8+ points are strong candidates for conversion.
Name Size