Postgres Covering Indexes and the Visibility Map

Covering indexes in Postgres came up in our weekly tech call because of a query optimization review one of the teams has in progress. It seemed like some covering indexes weren't working. I've been reading a book called
Curious Moon, a work of fiction about Postgres (it's as delightful as it sounds). Since I had the enceladus dataset already loaded, I used it to answer some questions about this kind of index. Postgres handles covering indexes through the "index-only scans" feature. The requirements are similar to those of most RDBMSs. The index type matters (currently supported only in btree or gist indexes) and the index has to fully cover the columns requested by the query. However, there's an additional requirement related to a query optimization. The page block has to be fully "visible" to the query's mvcc snapshot. Data that hasn't changed recently has a better chance of remaining stable among older and newer transactions on the table. This information is stored in a visibility map adjacent to the table data. Postgres of course has a robust set of extensions, and pg_visibility helps give us insight into the stats around the visibility map. No surprises on installing the extension. This is also one that is available in
Cloud SQL for Postgres.
enceladus=# create extension pg_visibility;I'll import the master_plan data from the Cassini mission:
enceladus=# COPY master_plan FROM '.../data/master_plan.csv' WITH DELIMITER ',' HEADER CSV; COPY 61873 enceladus=# select * from master_plan limit 1; -[ RECORD 1 ]------+--------------------------- start_time_utc | 2004-135T18:40:00 duration | 000T09:22:00 date | 14-May-04 team | CAPS spass_type | Non-SPASS target | Saturn request_name | SURVEY library_definition | Magnetospheric survey title | MAPS Survey description | MAPS magnetospheric surveyAt this point, all the visibility flags are off. This makes sense, because we just performed a bunch of inserts. The vacuum process turns them all on:
enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible -------+------------- 1959 | f enceladus=# vacuum master_plan ; VACUUM enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible -------+------------- 1959 | tLet's perform a lot of inserts. I'm using the "watch" option here so I don't have to type as much. Afterwards, a lot of the visibility flags are marked as off:
enceladus=# COPY master_plan FROM '.../data/master_plan.csv' WITH DELIMITER ',' HEADER CSV; COPY 61873 enceladus=# \watch Fri Jul 24 16:26:52 2020 (every 2s) ... enceladus=# select count(*) from master_plan; -[ RECORD 1 ]-- count | 3093650 enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible -------+------------- 97868 | f 1941 | tBefore I add the covering index, I'll run my intended query. By the way, don't take the absolute times into account here. This is a test on a puny laptop. I'm mainly trying to demonstrate the absence or presence of "index only scan" in the explain plan (notice the "Parallel Seq Scan" below):
time psql -U postgres -d enceladus -c"explain analyze select team, spass_type, target from master_plan where team='CDA' or team='RSS' order by team, target" QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=133280.23..161724.62 rows=243792 width=22) (actual time=5075.047..5244.599 rows=286500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=132280.21..132584.95 rows=121896 width=22) (actual time=5069.007..5103.744 rows=95500 loops=3) Sort Key: team, target Sort Method: external merge Disk: 3352kB Worker 0: Sort Method: external merge Disk: 3448kB Worker 1: Sort Method: external merge Disk: 3432kB -> Parallel Seq Scan on master_plan (cost=0.00..119480.37 rows=121896 width=22) (actual time=0.996..4877.452 rows=95500 loops=3) Filter: ((team = 'CDA'::text) OR (team = 'RSS'::text)) Rows Removed by Filter: 935717 Planning Time: 0.419 ms Execution Time: 5272.250 ms (13 rows) real 0m6.450s user 0m0.012s sys 0m0.027sLet's add an index we hope to provide index-only queries:
CREATE INDEX idx_tpt ON public.master_plan USING btree (team, spass_type, target);It doesn't look like it helped. I still see "Parallel Seq Scan" below:
time psql -U postgres -d enceladus -c"explain analyze select team, spass_type, target from master_plan where team='CDA' or team='RSS' order by team, target" QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=132711.41..160669.96 rows=239628 width=22) (actual time=4746.679..4915.888 rows=286500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=131711.39..132010.92 rows=119814 width=22) (actual time=4722.466..4757.252 rows=95500 loops=3) Sort Key: team, target Sort Method: external merge Disk: 3424kB Worker 0: Sort Method: external merge Disk: 3416kB Worker 1: Sort Method: external merge Disk: 3392kB -> Parallel Seq Scan on master_plan (cost=0.00..119144.31 rows=119814 width=22) (actual time=3.395..4533.381 rows=95500 loops=3) Filter: ((team = 'CDA'::text) OR (team = 'RSS'::text)) Rows Removed by Filter: 935717 Planning Time: 21.649 ms Execution Time: 4943.530 ms (13 rows) real 0m7.130s user 0m0.013s sys 0m0.038sRemember that Postgres-specific requirement regarding the visibility flag for covering indexes?
enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible -------+------------- 97868 | f 1941 | tClean up the flags:
enceladus=# vacuum master_plan ; VACUUM enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible -------+------------- 97947 | tNow we see the index-only scan show up (notice "Index Only Scan" below):
time psql -U postgres -d enceladus -c"explain analyze select team, spass_type, target from master_plan where team='CDA' or team='RSS' order by team, target" QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=114722.12..142667.60 rows=239516 width=22) (actual time=461.861..612.337 rows=286500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=113722.09..114021.49 rows=119758 width=22) (actual time=455.925..482.057 rows=95500 loops=3) Sort Key: team, target Sort Method: external merge Disk: 4080kB Worker 0: Sort Method: external merge Disk: 3080kB Worker 1: Sort Method: external merge Disk: 3072kB -> Parallel Index Only Scan using idx_tpt on master_plan (cost=0.43..101163.65 rows=119758 width=22) (actual time=24.874..358.271 rows=95500 loops=3) Filter: ((team = 'CDA'::text) OR (team = 'RSS'::text)) Rows Removed by Filter: 935717 Heap Fetches: 0 Planning Time: 0.538 ms Execution Time: 640.655 ms (14 rows) real 0m1.065s user 0m0.009s sys 0m0.012sThe problem returns if you perform a lot of updates. Back to "Seq Scan" below:
enceladus=# update master_plan set target='Saturn2' where target='Saturn'; UPDATE 847900 enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible --------+------------- 119439 | f 4627 | t time psql -U postgres -d enceladus -c"explain analyze select team, spass_type, target from master_plan where team='CDA' or team='RSS' order by team, target" QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=165711.17..201108.67 rows=303386 width=22) (actual time=14423.062..14960.570 rows=286500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=164711.14..165090.37 rows=151693 width=22) (actual time=14414.109..14449.936 rows=95500 loops=3) Sort Key: team, target Sort Method: external merge Disk: 3456kB Worker 0: Sort Method: external merge Disk: 3432kB Worker 1: Sort Method: external merge Disk: 3360kB -> Parallel Seq Scan on master_plan (cost=0.00..148545.86 rows=151693 width=22) (actual time=0.113..14210.193 rows=95500 loops=3) Filter: ((team = 'CDA'::text) OR (team = 'RSS'::text)) Rows Removed by Filter: 935717 Planning Time: 8.331 ms Execution Time: 15011.649 ms (13 rows) real 0m17.167s user 0m0.014s sys 0m0.044sWe can clean it up again and get our index-only scans (and performance) back:
enceladus=# vacuum master_plan ; VACUUM enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible --------+------------- 124066 | t time psql -U postgres -d enceladus -c"explain analyze select team, spass_type, target from master_plan where team='CDA' or team='RSS' order by team, target" QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Gather Merge (cost=135747.09..163705.64 rows=239628 width=22) (actual time=1822.618..1963.040 rows=286500 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=134747.07..135046.60 rows=119814 width=22) (actual time=1811.738..1835.136 rows=95500 loops=3) Sort Key: team, target Sort Method: external merge Disk: 3640kB Worker 0: Sort Method: external merge Disk: 3336kB Worker 1: Sort Method: external merge Disk: 3280kB -> Parallel Index Only Scan using idx_tpt on master_plan (cost=0.43..122179.99 rows=119814 width=22) (actual time=50.554..1716.339 rows=95500 loops=3) Filter: ((team = 'CDA'::text) OR (team = 'RSS'::text)) Rows Removed by Filter: 935717 Heap Fetches: 0 Planning Time: 4.707 ms Execution Time: 2005.764 ms (14 rows) real 0m2.710s user 0m0.011s sys 0m0.015sMoral: You knew this already, but make sure you're vacuuming frequently enough, especially if you're doing a lot of inserts or updates. The default vacuum ratio is generally too high, so you could lower that and autovacuum will likely help you out. If you are seeing that covering indexes don't seem to be working, a likely reason is that you aren't running vacuums frequently enough on quickly changing datasets. Here are some other queries to use with the extension. This one does an in-depth check to find out if the visibility map is corrupt (maybe due to a previous database crash). We want zero rows found:
enceladus=# select * from pg_check_visible('master_plan'); t_ctid -------- (0 rows)If you do discover that a visibility map is corrupt, rebuild it using the pg_truncate_visibility_map function, and then vacuum.
enceladus=# select pg_truncate_visibility_map('master_plan'); enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible --------+------------- 124066 | f enceladus=# vacuum master_plan ; VACUUM enceladus=# select count(blkno), all_visible from pg_visibility_map('master_plan') group by all_visible; count | all_visible --------+------------- 124066 | t