Pythian Blog: Technical Track

One of the hidden mechanics of PostgreSQL may not seem important at first

When you INSERT a row into some databases like Oracle, the row is only written to the datafile, once the row has been validated. In PostgreSQL though, for regular INSERT statements, this is not the case. In PostgreSQL, the constraints are checked AFTER the tuple has been inserted. This includes primary keys.

Why does this matter? Well, usually, if you see in the log “duplicate key value violates unique constraint”, you would not immediately have any concerns about diskspace but for PostgreSQL, it can be a real concern.

As an example, let’s conduct a test. The test is conducted on a Fedora 39 workstation using the PostgreSQL community edition, running on version 16.5.

So, let's create a simple table and set autovacuum off and the fill factor to 100, to ensure that the table has unused space and is not vacuumed by mistake.

CREATE TABLE IF NOT EXISTS dup_test (
        id      SERIAL PRIMARY KEY,
        name    text NOT NULL
) WITH (autovacuum_enabled = off, fillfactor = 100);

Once the table has been created, it looks like this:

 Column |  Type   | Collation | Nullable |               Default                | Storage  | Compression | Stats target | Description 

 id     | integer |           | not null | nextval('dup_test_id_seq'::regclass) | plain    |             |              | 
 name   | text    |           | not null |                                      | extended |             |              | 
Indexes:
    "dup_test_pkey" PRIMARY KEY, btree (id)
Access method: heap
Options: autovacuum_enabled=off, fillfactor=100

Now, let’s insert some data into the table:

DO
$$
DECLARE
 x INTEGER;
BEGIN
        FOR x in 1..10000   
        LOOP
 INSERT INTO public.dup_test (id, name)
 VALUES (x, md5(random()::text));
 END LOOP;
END
$$;

Note:  It is not good to do INSERT’s this way but I’m using this method, for the sake of convenience. 

The number of tuples we have now is 10000:

SELECT count(*) 
FROM   dup_test;
 count 
-------
 10000

And let’s check the table size and index sizes:

-- Check table size
\dt+ dup_test
-- Check index size
\di+ public.dup_test_pkey;
                                     List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |  Size  | 
--------+----------+-------+----------+-------------+---------------+--------+
 public | dup_test | table | postgres | permanent   | heap          | 704 kB | 
(1 row)
                                             List of relations
 Schema |     Name      | Type  |  Owner   |  Table   | Access method |  Size  
--------+---------------+-------+----------+----------+-------------+-----------
 public | dup_test_pkey | index | postgres | dup_test |  btree        | 240 kB 

So, we have 10000 rows in a table that has a fillfactor of 100% and autovacuum is off.

So, now let’s insert some duplication in there (via a bash script):

export DB=db02
for x in {1..10000}
do
    psql -d ${DB} -c "INSERT INTO public.dup_test (id, name) 
                         VALUES (${x}, md5(random()::text));" 2>/tmp/err.log
done

The errors look like this when we run the script above:

ERROR:  duplicate key value violates unique constraint "dup_test_pkey"
DETAIL:  Key (id)=(10000) already exists.

This error was recorded 10000 times, once, for each time the loop ran.  So, what do the numbers (no of tuples, table size and index size) look like, after the duplicate insert failures?

Table Count

10000

Table Size (after the duplicate inserts):

                                     List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |  Size   | 
--------+----------+-------+----------+-------------+---------------+---------+
 public | dup_test | table | postgres | permanent   | heap          | 1368 kB | 

                                             List of relations
 Schema |     Name      | Type  |  Table   | Access method |  Size  | 
--------+---------------+-------+----------+----------+-------------+
 public | dup_test_pkey | index | dup_test | btree         | 240 kB |

We still have 10000 rows but we can see that one statistics has changed and one has not:

  • table size has increased from 704KB to 1368KB
  • index size has stayed the same at 240KB.

Why has the table size increased but the index size is the same?

The answer is that the primary key is validated after the insert (at least for this experiment, which is using the default values for PostgreSQL), so the tuple was inserted into the buffer before the duplicate key check and rejected the tuple.  

To validate this, we can see what happens when we run a manual vacuum and check to see what is removed:

db02=# VACUUM VERBOSE ANALYZE dup_test;
INFO:  vacuuming "db02.public.dup_test"
INFO:  table "dup_test": truncated 167 to 84 pages
INFO:  finished vacuuming "db02.public.dup_test": index scans: 1
pages: 83 removed, 84 remain, 167 scanned (100.00% of total)
tuples: 10000 removed, 10000 remain, 0 are dead but not yet removable
removable cutoff: 79861, which was 1 XIDs old when operation ended
new relfrozenxid: 69860, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan needed: 84 pages from table (50.30% of total) had 10000 dead item identifiers removed
index "dup_test_pkey": pages: 30 in total, 0 newly deleted, 0 currently deleted, 0 reusable
I/O timings: read: 0.000 ms, write: 0.060 ms
avg read rate: 0.000 MB/s, avg write rate: 13.700 MB/s
buffer usage: 580 hits, 0 misses, 4 dirtied
WAL usage: 344 records, 6 full page images, 108440 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  vacuuming "db02.pg_toast.pg_toast_98168"
INFO:  finished vacuuming "db02.pg_toast.pg_toast_98168": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 79862, which was 0 XIDs old when operation ended
new relfrozenxid: 79862, which is 10003 XIDs ahead of previous value
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item identifiers removed
I/O timings: read: 0.014 ms, write: 0.000 ms
avg read rate: 98.892 MB/s, avg write rate: 0.000 MB/s
buffer usage: 21 hits, 1 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 188 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  analyzing "public.dup_test"
INFO:  "dup_test": scanned 84 of 84 pages, containing 10000 live rows and 0 dead rows; 10000 rows in sample, 10000 estimated total rows

10000 tuples removed but we still only have 10000 live tuples.  Also, the space for the table has changed as well:

db02=# \dt+ dup_test
                                     List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |  Size  | Description 
--------+----------+-------+----------+-------------+---------------+--------+-------------
 public | dup_test | table | postgres | permanent   | heap          | 712 kB | 

db02=# \di+ dup_test_pkey
                                             List of relations
 Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size  | 
--------+---------------+-------+----------+----------+-------------+---------------+--------+
 public | dup_test_pkey | index | postgres | dup_test | permanent   | btree         | 240 kB | 

What can we do to stop this?

That is where INSERT .. ON CONFLICT or UPSERT syntax comes in.  Standard INSERT syntax generates dead tuples and uses a transaction id, which means it can generate table bloat.

Using the UPSERT to avoid bloat is not what it was originally intended for but does have a nice side effect.  However, it does come at a slight cost i.e. doing an UPSERT is slightly more expensive than doing a straight database insert operation.  So, why would you bother, if normal vacuuming cleans up the tuples?   It mainly comes down to space management vs performance.  On database systems with little free space, extra bloat can cause issues.  Running an UPSERT statement instead of a normal INSERT can save space.

For those worried about performance for UPSERTs vs normal INSERTs here is a basic test with an INSERT statement vs the INSERT … ON CONFLICT DO NOTHING comparison.

Let’s create another simple table and do some basic INSERT’s and time it.

CREATE TABLE IF NOT EXISTS public.cust_name (
        id              SERIAL NOT NULL PRIMARY KEY,
        name            TEXT NOT NULL,
        gender          VARCHAR(1) NOT NULL
);
INSERT INTO public.cust_name (name, gender)
SELECT  substr(MD5(random()::text), 1,30) as name,
        (array['M', 'F']) [floor(random() * 1 + 1)] as gender
FROM    generate_series(1,8000000);

Results:

INSERT 0 8000000
Time: 34321.234 ms (00:34.321)

Now, we’ll run the same experiment but this time, we’ll use the UPSERT syntax.

TRUNCATE cust_name;
INSERT INTO public.cust_name (name, gender)
SELECT  substr(MD5(random()::text), 1,30) as name,
        (array['M', 'F']) [floor(random() * 1 + 1)] as gender
FROM    generate_series(1,8000000)
ON CONFLICT DO NOTHING;

The SQL results are:

TRUNCATE TABLE
Time: 82.280 ms
INSERT 0 8000000
Time: 53530.023 ms (00:53.530)

Therefore, the basic test indicates that there is a performance overhead for using the UPSERT syntax, over the standard INSERT.

Note: MERGE (available from PG15 onwards) may be an option, if you have a join with an existing table.  Additional testing would be needed to check if this is a faster option than UPSERT.

Conclusion

Normally, these duplicate inserts are not an issue to be worried about.  However, it does not seem to be widely understood that PostgreSQL is really just an INSERT database.  Any UPDATE is really just an INSERT into a file and the old tuple flagged for vacuuming.   So, normal duplicate errors are no  different to managing table bloat from normal database operations.  However, as these are not shown when we count the tuples, it may surprise some people at how big PostgreSQL tables can get, if not vacuumed regularly. 

Therefore, just be aware that unless you are using an UPSERT syntax, all INSERT’s, successfully inserted or not, use up storage space.  Therefore, knowing how many failed inserts might be important to how often you need to vacuum your tables.

No Comments Yet

Let us know what you think

Subscribe by email