Skip to content
Go back

TimescaleDB Continuous Aggregates Without Pain

By SumGuy 12 min read
TimescaleDB Continuous Aggregates Without Pain

Your Grafana Dashboard Is a Ticking Time Bomb

You built a home lab metrics stack. Prometheus scrapes every 10 seconds, you pushed the data into TimescaleDB because you wanted SQL and long retention, and for a while everything was great. Fast dashboards, pretty graphs, smug self-satisfaction.

Then you added a 90-day “overview” panel. And the first time someone opened it — maybe that someone was you, at midnight, on a whim — the PostgreSQL process pinned a CPU core for 30 seconds and returned 77 million rows to compute a single average.

Congratulations. You’ve discovered why time-series databases exist as a category.

The solution isn’t “buy more RAM” or “aggregate in Grafana” (please, no). It’s continuous aggregates — TimescaleDB’s pre-computed, automatically refreshed materialized views that turn that 77-million-row scan into a 90-row lookup. Here’s how they work, where they bite you, and how to build a production-grade setup without reading the entire Timescale docs twice.

What Continuous Aggregates Actually Are

A regular PostgreSQL materialized view is a snapshot. You refresh it manually with REFRESH MATERIALIZED VIEW, it locks, it rebuilds the whole thing, done.

A TimescaleDB continuous aggregate is smarter. It:

It’s a materialized view with time-awareness baked in. The query planner knows to route your Grafana query to the aggregate when you’re asking about old data, and to the raw hypertable when you need fresh data. You don’t have to union anything yourself.

The Setup: Docker Compose + Hypertable

Let’s start with a real deployment. TimescaleDB ships as a PostgreSQL extension, so you can run it alongside a normal Postgres workflow.

docker-compose.yml
services:
timescaledb:
image: timescale/timescaledb:latest-pg17
container_name: timescaledb
environment:
POSTGRES_DB: metrics
POSTGRES_USER: metrics
POSTGRES_PASSWORD: changeme
ports:
- "5432:5432"
volumes:
- tsdb_data:/var/lib/postgresql/data
command: postgres -c shared_preload_libraries=timescaledb
volumes:
tsdb_data:

Connect, create your schema, and enable the extension:

CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
device_id TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');

The create_hypertable call is the only TimescaleDB-specific thing here. Your table now auto-partitions into chunks by time. Insert data exactly like you would any Postgres table.

Check your hypertable stats at any point. Use hypertable_detailed_size for the per-component breakdown — hypertable_size on its own just returns a single total in bytes:

SELECT * FROM hypertable_detailed_size('sensor_data');
table_bytes | index_bytes | toast_bytes | total_bytes
-------------+-------------+-------------+-------------
143851520 | 20971520 | 8192 | 164831232

Your First Continuous Aggregate

Here’s a 1-hour rollup of temperature and humidity per device:

CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
device_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
min(temperature) AS min_temp,
avg(humidity) AS avg_humidity,
count(*) AS sample_count
FROM sensor_data
GROUP BY bucket, device_id
WITH NO DATA;

The WITH NO DATA means it doesn’t immediately backfill. You control when that happens. The timescaledb.continuous parameter is what makes this a continuous aggregate instead of a plain materialized view.

Now add a refresh policy so it actually stays current:

SELECT add_continuous_aggregate_policy('sensor_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);

Read those parameters carefully because they’re not obvious:

Do a one-time backfill to populate historical data:

CALL refresh_continuous_aggregate('sensor_hourly',
NULL,
now() - INTERVAL '1 hour'
);

This runs synchronously (or use refresh_continuous_aggregate in a job if you want it async). For large datasets this can take a while — run it during off-hours.

Stacking Aggregates: 1m → 1h → 1d

TimescaleDB 2.x supports cagg-on-cagg. You can build a daily rollup from the hourly rollup instead of re-scanning raw data. This is where the efficiency really kicks in.

-- 1-minute base aggregate (useful for recent high-res dashboards)
CREATE MATERIALIZED VIEW sensor_1min
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 minute', time) AS bucket,
device_id,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
min(temperature) AS min_temp
FROM sensor_data
GROUP BY bucket, device_id
WITH NO DATA;
-- 1-hour aggregate built FROM the 1-minute aggregate
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', bucket) AS bucket,
device_id,
avg(avg_temp) AS avg_temp,
max(max_temp) AS max_temp,
min(min_temp) AS min_temp
FROM sensor_1min
GROUP BY bucket, device_id
WITH NO DATA;
-- 1-day aggregate built FROM the 1-hour aggregate
CREATE MATERIALIZED VIEW sensor_daily
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', bucket) AS bucket,
device_id,
avg(avg_temp) AS avg_temp,
max(max_temp) AS max_temp,
min(min_temp) AS min_temp
FROM sensor_hourly
GROUP BY bucket, device_id
WITH NO DATA;

Add policies to each level — the 1-minute cagg refreshes every minute, the hourly every hour, the daily every day. They cascade naturally.

SELECT add_continuous_aggregate_policy('sensor_1min',
start_offset => INTERVAL '10 minutes',
end_offset => INTERVAL '1 minute',
schedule_interval => INTERVAL '1 minute'
);
SELECT add_continuous_aggregate_policy('sensor_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
SELECT add_continuous_aggregate_policy('sensor_daily',
start_offset => INTERVAL '3 days',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 day'
);

Check your policies:

SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_refresh_continuous_aggregate';
job_id | application_name | schedule_interval | next_start
--------+--------------------------------------------+-------------------+----------------------------
1001 | Refresh Continuous Aggregate Policy [1001] | 00:01:00 | 2026-07-05 14:01:00+00
1002 | Refresh Continuous Aggregate Policy [1002] | 01:00:00 | 2026-07-05 15:00:00+00
1003 | Refresh Continuous Aggregate Policy [1003] | 1 day | 2026-07-06 00:00:00+00

Pitfalls That Will Catch You

Don’t refresh the hot window. The end_offset exists for a reason. If you set it to INTERVAL '0' and try to materialize right up to the present, you’re re-materializing data that’s still being written. That’s expensive, potentially locks chunks, and the performance gain is minimal because recent data queries are fast anyway on a small chunk. Leave a buffer. One hour for hourly aggregates is usually right.

Invalidation overhead. When you insert out-of-order data or UPDATE rows in a materialized range, TimescaleDB logs an invalidation. These get processed on the next refresh. Check your invalidation log:

SELECT * FROM _timescaledb_catalog.continuous_aggs_materialization_invalidation_log
ORDER BY lowest_modified_value DESC
LIMIT 10;

If this table is enormous, you have a problem — something is writing to already-materialized ranges constantly. Track it down and fix the write pattern, or widen your end_offset.

Cagg-on-cagg ordering matters. Refresh the base cagg before the derived cagg. If your 1-hour cagg refreshes before the 1-minute cagg, it’ll materialize slightly stale data and you’ll see tiny gaps that are annoying to debug. The default schedule intervals (1min, 1hr, 1day) handle this naturally, but if you manually call refresh_continuous_aggregate, do it bottom-up.

avg(avg_temp) is not the same as avg(temperature). When you aggregate an average of averages, you lose precision if bucket sizes are uneven. For exact aggregations on derived caggs, use sum(sum_temp) / sum(count) instead, which requires storing the sum and count in the base cagg. For monitoring data where 0.01°C accuracy doesn’t matter, avg-of-avg is fine. Know which one you need before you build the hierarchy.

Compression: Don’t Skip This

Continuous aggregates and compression work together, but you need to understand the order of operations. Compression happens on raw chunks. The cagg materialization reads uncompressed data. So your pipeline is:

  1. Data arrives → raw hypertable (uncompressed)
  2. Cagg refresh materializes the data into the aggregate
  3. After the refresh window passes, compress the raw chunks

Add a compression policy:

ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device_id',
timescaledb.compress_orderby = 'time DESC'
);
SELECT add_compression_policy('sensor_data',
compress_after => INTERVAL '7 days'
);

Set compress_after to be longer than your end_offset. If your hourly cagg has a 1-hour end offset, compressing after 7 days is safe — those chunks are well past the refresh window. Compressing too aggressively (e.g., 1 hour) means TimescaleDB has to decompress chunks to refresh the cagg, which defeats the purpose entirely.

Check compression status:

SELECT
chunk_name,
compression_status,
before_compression_total_bytes,
after_compression_total_bytes
FROM chunk_compression_stats('sensor_data')
ORDER BY chunk_name DESC
LIMIT 5;
chunk_name | compression_status | before_compression_total_bytes | after_compression_total_bytes
------------------------+--------------------+--------------------------------+-------------------------------
_hyper_1_42_chunk | Uncompressed | 8388608 |
_hyper_1_41_chunk | Uncompressed | 8388608 |
_hyper_1_40_chunk | Compressed | 52428800 | 4194304
_hyper_1_39_chunk | Compressed | 52428800 | 3932160
_hyper_1_38_chunk | Compressed | 52428800 | 4096000

Eight-to-one compression ratio on time-series sensor data is typical. The two recent chunks are uncompressed and still being written to.

Retention: Drop Raw, Keep Aggregates

Here’s the real win for long-term homelab metrics. Keep 30 days of raw 10-second data, but keep 5 years of hourly aggregates. Your storage stays manageable while you still have meaningful long-term trends.

-- Drop raw data older than 30 days
SELECT add_retention_policy('sensor_data',
drop_after => INTERVAL '30 days'
);
-- Continuous aggregates are NOT automatically dropped with raw data.
-- They persist until you explicitly drop them or add a separate retention policy.
-- For daily aggregates you want long-term, don't add a retention policy.
-- If you want to cap cagg retention too:
SELECT add_retention_policy('sensor_daily',
drop_after => INTERVAL '5 years'
);

Verify retention jobs:

SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';

This is honestly one of the best features of the whole setup. Your Prometheus instance probably has a 15-day retention because your disk filled up. With TimescaleDB + caggs, you get 30 days of raw, 5 years of hourly, forever of daily, all in a manageable storage footprint.

Adding a Cagg to a Busy Hypertable (Without Locking)

You have a production hypertable with months of data and you want to add a continuous aggregate. The naive approach — CREATE MATERIALIZED VIEW ... WITH NO DATA then CALL refresh_continuous_aggregate(NULL, now()) — will work but will hammer your I/O for hours.

The better approach:

-- Step 1: Create the cagg with NO DATA (instant, no lock)
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
device_id,
avg(temperature) AS avg_temp
FROM sensor_data
GROUP BY bucket, device_id
WITH NO DATA;
-- Step 2: Backfill in manageable time slices to avoid I/O spike
-- Run this in a loop or via psql script
CALL refresh_continuous_aggregate('sensor_hourly',
'2025-01-01'::TIMESTAMPTZ,
'2025-02-01'::TIMESTAMPTZ
);
CALL refresh_continuous_aggregate('sensor_hourly',
'2025-02-01'::TIMESTAMPTZ,
'2025-03-01'::TIMESTAMPTZ
);
-- ... continue month by month
-- Step 3: Once backfilled, add the ongoing policy
SELECT add_continuous_aggregate_policy('sensor_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);

The month-by-month backfill keeps your I/O reasonable and doesn’t block reads. It’s tedious but your 2 AM self will appreciate not getting paged about a locked table.

Grafana Queries That Actually Use the Cagg

Point your Grafana panel directly at the aggregate view. No magic required:

-- Last 90 days, hourly resolution
SELECT
bucket AS "time",
device_id,
avg_temp
FROM sensor_hourly
WHERE
bucket >= $__timeFrom()::TIMESTAMPTZ
AND bucket < $__timeTo()::TIMESTAMPTZ
AND device_id = '$device'
ORDER BY bucket ASC;

For a mixed-resolution approach (high-res recent, lower-res historical):

-- Use daily for ranges > 30 days, hourly for ranges 1-30 days, raw for < 1 day
-- In Grafana, create separate panels or use variables to switch
-- 1-day resolution panel (90-day range)
SELECT
bucket AS "time",
avg_temp
FROM sensor_daily
WHERE bucket >= now() - INTERVAL '90 days'
ORDER BY bucket;

The response time difference between scanning 77 million raw rows vs 90 pre-aggregated daily rows is not subtle. First query: 28 seconds. Second query: 12 milliseconds. That’s the kind of improvement that makes you feel like you actually know what you’re doing.

Checking Materialization Status

A few queries worth bookmarking:

-- Overview of all continuous aggregates
SELECT
view_name,
materialization_hypertable_name,
view_definition
FROM timescaledb_information.continuous_aggregates;
-- Check when each cagg last refreshed
SELECT
view_name,
last_run_started_at,
last_run_status,
next_start
FROM timescaledb_information.jobs j
JOIN timescaledb_information.job_stats js USING (job_id)
WHERE proc_name = 'policy_refresh_continuous_aggregate';
view_name | last_run_started_at | last_run_status | next_start
----------------+----------------------------+-----------------+----------------------------
sensor_1min | 2026-07-05 14:00:01+00 | Success | 2026-07-05 14:01:00+00
sensor_hourly | 2026-07-05 14:00:02+00 | Success | 2026-07-05 15:00:00+00
sensor_daily | 2026-07-05 00:00:03+00 | Success | 2026-07-06 00:00:00+00

If you see Failed in last_run_status, check timescaledb_information.job_errors:

SELECT * FROM timescaledb_information.job_errors
ORDER BY finish_time DESC
LIMIT 5;

The Bottom Line

Continuous aggregates are the thing that makes TimescaleDB worth the extra setup complexity over plain Postgres for metrics workloads. They’re not complicated once you understand the mental model: a materialized view that knows about time, refreshes only what changed, and leaves a live window for fresh data.

The hierarchy — raw → 1min → 1hr → 1day — is the right pattern for any homelab observability stack. Pair it with compression on raw data and a retention policy that keeps years of daily aggregates while dropping raw data after 30 days, and you’ve got a setup that’ll handle millions of rows a day without ever making your Grafana dashboards feel sluggish.

Start with the hourly cagg. Get it working. Add daily when you start caring about 90-day trends. Add 1-minute only if you actually need sub-hour resolution for recent data. Don’t over-engineer it on day one — the schema migration path is clean enough that you can add layers later.

And seriously, set end_offset to at least 1 hour. Don’t be the person who finds out the hard way why that parameter exists.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Navidrome vs Airsonic vs gonic
Next Post
Stop Feeding the AI Your Whole Repo

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts