· Grafana· Loki· monitoring· IoT· gateway· RSSI· batteri· alerting· TimescaleDB
IoT gateway monitoring med Grafana og Loki
Grafana monitoring af IoT gateways: RSSI-trends, batteri-niveau, aflæsningsdækning, struktureret logging med Loki, alerting og dashboard-opsætning.
Af M-Bus Gateway
Grafana med TimescaleDB og Loki giver fuld observability over en flåde af wM-Bus gateways — fra signalstyrke til aflæsningsdækning og log-analyse.
Arkitektur
[Raspberry Pi gateway]
↓ structlog JSON til /var/log/mbus/
↓ MQTT metrics payload
[Hetzner server]
↓ TimescaleDB (readings, gateway status)
↓ Loki (gateway logs via Promtail)
[Grafana]
← TimescaleDB datasource (PostgreSQL)
← Loki datasource
→ Alerting → Brevo email / PagerDuty
TimescaleDB datasource konfiguration
# grafana/provisioning/datasources/timescaledb.yml
apiVersion: 1
datasources:
- name: TimescaleDB
type: postgres
url: timescaledb:5432
database: mbus
user: grafana_ro
secureJsonData:
password: "${GRAFANA_DB_PASSWORD}"
jsonData:
sslmode: require
maxOpenConns: 5
maxIdleConns: 2
postgresVersion: 1600
timescaledb: true
-- Opret read-only bruger til Grafana:
CREATE USER grafana_ro WITH PASSWORD 'strong-password';
GRANT CONNECT ON DATABASE mbus TO grafana_ro;
GRANT USAGE ON SCHEMA public TO grafana_ro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO grafana_ro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO grafana_ro;
Dashboard: RSSI-trends pr. gateway
-- Grafana query — RSSI over tid pr. gateway:
SELECT
time_bucket('1 hour', r.timestamp) AS time,
g.name AS gateway,
AVG(r.rssi_dbm) AS avg_rssi
FROM reading r
JOIN meter_installation mi ON mi.id = r.meter_installation_id
JOIN meter m ON m.id = mi.meter_id
JOIN gateway g ON g.id = m.gateway_id
WHERE
$__timeFilter(r.timestamp)
AND g.tenant_id = '${tenant_id}'
GROUP BY 1, 2
ORDER BY 1
Grafana panel konfiguration:
Type: Time series
Y-axis: RSSI (dBm) — inverteret (lavere er dårligere)
Thresholds:
> -70 dBm → grøn (god)
-85 til -70 → gul (acceptabel)
< -85 dBm → rød (kritisk)
Alert: Hvis avg_rssi < -85 i 30 min → send alarm
Dashboard: Batteri-niveau og forventet levetid
-- Batteri-trend pr. installation (seneste 90 dage):
SELECT
time_bucket('1 day', timestamp) AS day,
meter_installation_id,
AVG(battery_level_pct) AS battery_pct
FROM reading
WHERE
$__timeFilter(timestamp)
AND battery_level_pct IS NOT NULL
AND meter_installation_id = '${installation_id}'
GROUP BY 1, 2
ORDER BY 1
-- Estimeret antal dage til kritisk niveau (20%):
WITH daily AS (
SELECT
date_trunc('day', timestamp) AS day,
AVG(battery_level_pct) AS pct
FROM reading
WHERE
meter_installation_id = '${installation_id}'
AND timestamp >= now() - INTERVAL '90 days'
AND battery_level_pct IS NOT NULL
GROUP BY 1
ORDER BY 1
),
regression AS (
SELECT
regr_slope(pct, extract(epoch FROM day)) AS slope,
regr_intercept(pct, extract(epoch FROM day)) AS intercept,
MAX(pct) AS latest_pct
FROM daily
)
SELECT
latest_pct AS current_battery_pct,
CASE
WHEN slope < 0 THEN
round(((20 - latest_pct) / slope / 86400)::numeric, 0)
ELSE NULL
END AS days_to_20pct
FROM regression
Loki logging fra gateway
# gateway/src/logging/setup.py
import structlog
import logging
import json
import sys
def setup_logging(log_level: str = "INFO") -> None:
"""JSON struktureret logging til stdout — opfanges af Promtail."""
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.stdlib.add_logger_name,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.BoundLogger,
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
logging.basicConfig(
format="%(message)s",
stream=sys.stdout,
level=getattr(logging, log_level.upper()),
)
# Brug i gateway kode:
log = structlog.get_logger()
log.info("telegram_received",
meter_id="12345678",
rssi_dbm=-72,
status="OK",
gateway_id="GW-0001",
)
# Output: {"event": "telegram_received", "meter_id": "12345678", "rssi_dbm": -72, "status": "OK", "gateway_id": "GW-0001", "level": "info", "timestamp": "2026-05-24T06:00:00Z"}
Promtail konfiguration
# /etc/promtail/config.yml (på Hetzner server)
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: mbus-gateway
static_configs:
- targets: ["localhost"]
labels:
job: mbus-gateway
env: production
__path__: /var/log/mbus/*.log
pipeline_stages:
- json:
expressions:
level: level
gateway_id: gateway_id
meter_id: meter_id
event: event
- labels:
level:
gateway_id:
event:
Loki query — AES-fejl per gateway
# LogQL — tæl DEC_ERR pr. gateway over 24t:
sum by (gateway_id) (
count_over_time(
{job="mbus-gateway", event="telegram_received"} | json | status="DEC_ERR"
[24h]
)
)
# Log-søgning — vis alle kritiske fejl:
{job="mbus-gateway", level="error"} | json
| line_format "{{.timestamp}} [{{.gateway_id}}] {{.event}}: {{.error}}"
Alerting: Stale gateway
# grafana/provisioning/alerting/rules.yml
groups:
- name: gateway-health
rules:
- alert: GatewayStale
expr: |
SELECT COUNT(*) FROM gateway
WHERE last_seen_at < NOW() - INTERVAL '36 hours'
AND tenant_id = '${tenant_id}'
for: 0m
labels:
severity: warning
annotations:
summary: "Gateway har ikke sendt data i 36+ timer"
description: "Gateway {{ $labels.gateway_id }} er stale"
- alert: LowCoverage
expr: |
SELECT
COUNT(*) FILTER (WHERE last_reading_at > NOW() - INTERVAL '48 hours')::float /
NULLIF(COUNT(*), 0) AS coverage
FROM meter_installation
WHERE tenant_id = '${tenant_id}' AND deleted_at IS NULL
for: 30m
labels:
severity: critical
annotations:
summary: "Aflæsningsdækning under 80%"
Docker Compose: Grafana + Loki + Promtail
services:
grafana:
image: grafana/grafana:10-alpine
ports: ["3000:3000"]
environment:
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD}"
GF_AUTH_ANONYMOUS_ENABLED: "false"
GF_SMTP_ENABLED: "true"
GF_SMTP_HOST: smtp.brevo.com:587
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
networks: [internal]
loki:
image: grafana/loki:2.9-alpine
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki_data:/loki
networks: [internal]
promtail:
image: grafana/promtail:2.9-alpine
command: -config.file=/etc/promtail/config.yml
volumes:
- /var/log/mbus:/var/log/mbus:ro
- /etc/promtail:/etc/promtail
networks: [internal]
Konklusion
Grafana med TimescaleDB-datasource giver realtidsoverblik over RSSI, batteri og aflæsningsdækning på tværs af gateway-flåden. Loki med Promtail opsamler strukturerede JSON-logs og muliggør AES-fejl-analyse pr. gateway. Alerting sender email ved stale gateways (>36t) eller dækning under 80%.
Se TimescaleDB continuous aggregates guide eller gateway fejlsøgning guide.