PostgreSQL declarative partitioning til IoT tidsseriedata
PostgreSQL declarative partitioning til IoT: range partitioning på timestamp, partition pruning, automatisk partition-oprettelse, vs TimescaleDB hypertables og query-performance benchmarks.
Af M-Bus Gateway
Declarative partitioning i PostgreSQL giver styr på tidsseriedata uden TimescaleDB-extension. Her er implementeringen og trade-offs.
Range partitioning på timestamp
-- Readings tabel med månedlig range partitioning:
CREATE TABLE readings (
id UUID DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
meter_installation_id UUID NOT NULL,
timestamp TIMESTAMPTZ NOT NULL,
value NUMERIC(12, 4) NOT NULL,
unit TEXT NOT NULL,
received_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ
)
PARTITION BY RANGE (timestamp);
-- Opret partitioner pr. måned:
CREATE TABLE readings_2025_06
PARTITION OF readings
FOR VALUES FROM ('2025-06-01') TO ('2025-07-01');
CREATE TABLE readings_2025_07
PARTITION OF readings
FOR VALUES FROM ('2025-07-01') TO ('2025-08-01');
-- Indeks arves IKKE automatisk — opret pr. partition:
CREATE INDEX ON readings_2025_06 (tenant_id, meter_installation_id, timestamp DESC);
CREATE INDEX ON readings_2025_07 (tenant_id, meter_installation_id, timestamp DESC);
-- Primary key skal inkludere partitionering-kolonnen:
ALTER TABLE readings ADD PRIMARY KEY (id, timestamp);
Automatisk partition-oprettelse
# server/src/db/partitions.py
# Celery-task der opretter næste måneds partition automatisk
from datetime import date, timedelta
from dateutil.relativedelta import relativedelta
import asyncpg
import calendar
async def ensure_next_month_partition(conn: asyncpg.Connection) -> bool:
"""
Opretter partition for næste måned hvis den ikke eksisterer.
Køres månedligt via Celery beat.
"""
today = date.today()
next_month = today + relativedelta(months=1)
partition_name = f"readings_{next_month.year}_{next_month.month:02d}"
from_date = next_month.replace(day=1)
to_date = from_date + relativedelta(months=1)
# Check om partition allerede eksisterer:
exists = await conn.fetchval(
"""
SELECT 1 FROM pg_tables
WHERE tablename = $1 AND schemaname = 'public'
""",
partition_name,
)
if exists:
return False
# Opret partition:
await conn.execute(f"""
CREATE TABLE {partition_name}
PARTITION OF readings
FOR VALUES FROM ('{from_date}') TO ('{to_date}')
""")
# Opret indekser:
await conn.execute(f"""
CREATE INDEX ON {partition_name}
(tenant_id, meter_installation_id, timestamp DESC)
""")
await conn.execute(f"""
CREATE INDEX ON {partition_name} (timestamp DESC)
WHERE deleted_at IS NULL
""")
log.info(
"partition.created",
partition=partition_name,
from_date=str(from_date),
to_date=str(to_date),
)
return True
@shared_task(name="db.ensure_partitions")
def ensure_partitions_task():
import asyncio
asyncio.run(_ensure_all_needed_partitions())
Partition pruning — query performance
-- PostgreSQL bruger partition pruning automatisk:
-- Hent aflæsninger for en måned (bruger KUN readings_2025_06):
EXPLAIN SELECT * FROM readings
WHERE timestamp >= '2025-06-01'
AND timestamp < '2025-07-01'
AND tenant_id = '...'
AND meter_installation_id = '...';
-- Output: Seq Scan on readings_2025_06 (Partitions: 1/24)
-- Resten af partitionerne er pruned
-- Kryds-partition query (lidt langsommere):
SELECT date_trunc('month', timestamp), sum(value)
FROM readings
WHERE meter_installation_id = '...'
AND timestamp >= '2025-01-01'
AND timestamp < '2026-01-01'
GROUP BY 1;
-- PostgreSQL scanner 12 partitioner, pruner resten
-- Partition pruning kræver literal eller parameter:
-- VIRKER: WHERE timestamp >= '2025-06-01'
-- VIRKER: WHERE timestamp >= $1 (parameteriseret)
-- VIRKER IKKE: WHERE timestamp >= NOW() - INTERVAL '1 year' (ved plan-tid)
-- Brug enable_partition_pruning = on (default)
Partition drop og retention
# server/src/db/partitions.py
async def drop_old_partitions(
conn: asyncpg.Connection,
retention_years: int = 7,
) -> list[str]:
"""
Drop partitioner ældre end retention_years.
Lovkrav: 5 år minimum (bogføringsloven).
Platform bruger 7 år for sikkerhedsmargin.
"""
cutoff = date.today() - relativedelta(years=retention_years)
# Find alle readings-partitioner:
partitions = await conn.fetch("""
SELECT tablename
FROM pg_tables
WHERE tablename LIKE 'readings_____\\___'
AND schemaname = 'public'
ORDER BY tablename
""")
dropped = []
for row in partitions:
name = row["tablename"]
# Parse readings_2018_05 → 2018-05
parts = name.split("_")
if len(parts) != 3:
continue
try:
year, month = int(parts[1]), int(parts[2])
partition_date = date(year, month, 1)
except ValueError:
continue
if partition_date < cutoff:
# Soft-check: Eksisterer data vi ikke burde slette?
count = await conn.fetchval(
f"SELECT COUNT(*) FROM {name} LIMIT 1"
)
if count:
log.warning(
"partition.skip_drop_has_data",
partition=name,
cutoff=str(cutoff),
)
continue
await conn.execute(f"DROP TABLE {name}")
dropped.append(name)
log.info("partition.dropped", partition=name)
return dropped
PostgreSQL partitioning vs. TimescaleDB
Feature PostgreSQL Native TimescaleDB
────────────────────────────────────────────────────────────
Partitioner Manuel/auto script Automatisk (chunk_interval)
Komprimering Ikke native 90%+ komprimering
Continuous aggregates Ikke native Built-in
Query optimizer Standard pruning Chunk-aware optimizer
Background jobs Celery/cron Policy-baseret intern
Extension krævet Nej Ja (TimescaleDB)
SQL-kompatibilitet 100% standard SQL Standard SQL (næsten)
Rækkevidde Ubegrænset Ubegrænset
Oprettelsestid Migrering-script CREATE TABLE ... USING
Hvornår native partitioning er bedre:
→ Managed PostgreSQL (ingen extension-support)
→ Overholdelse af standard-SQL krav
→ Data der ikke er tidsserier (fx audit_log pr. år)
→ Enkle retention-politikker
Hvornår TimescaleDB er bedre:
→ IoT tidsseriedata med millions of rows
→ Continuous aggregates (forudberegnede summaries)
→ Komprimering er kritisk (disk-pris)
→ Kompleks time-bucket analyse
SQLModel med partitionerede tabeller
# server/src/db/models.py
class Reading(SQLModel, table=True):
"""
Readings er partitioneret på timestamp.
SQLModel/SQLAlchemy understøtter dette transparent.
"""
__tablename__ = "readings"
# Primary key SKAL inkludere partition-kolonne ved PostgreSQL native:
id: uuid.UUID = Field(default_factory=uuid.uuid4)
timestamp: datetime = Field(
sa_column=Column(DateTime(timezone=True), primary_key=True)
)
tenant_id: uuid.UUID = Field(nullable=False)
meter_installation_id: uuid.UUID = Field(nullable=False)
value: Decimal = Field(sa_column=Column(Numeric(12, 4)))
unit: str
received_at: datetime = Field(default_factory=datetime.utcnow)
deleted_at: datetime | None = None
# SQLAlchemy query fungerer transparent på tværs af partitioner:
# session.execute(select(Reading).where(Reading.timestamp >= start))
# → PostgreSQL pruner automatisk irrelevante partitioner
Konklusion
PostgreSQL declarative range partitioning giver styr på tidsseriedata uden TimescaleDB-extension — idéelt til managed PostgreSQL-instanser eller strenge SQL-kompatibilitetskrav. Månedlige partitioner med automatisk oprettelse via Celery-task og retention-drop efter 7 år (bogføringsloven) giver fuld livscyklus-styring. Partition pruning reducer query-scope automatisk ved timestamp-filtrering. For IoT-platforme med heavy compression-behov og continuous aggregates er TimescaleDB stadig det stærkere valg.