Postrgres vacuum 101
learn what is vacuum in postrgres & dead tuples, how vacuum helps in freeing up disk space. learn different types of vacuum process with example.
VACUUM in postgres in a garbage collection utility which is used to reclaim storage space occupied by the records that are stale due to update or delete operations.
Whenever a row is deleted or updated, the old row becomes invisible to all other transactions starting after the transaction has been committed. This old row is not removed from the table or disk. These kind of deleted rows are called dead tuples. They are present in the table until a VACUUM operation is done. It necessary to do VACUUM periodically, especially on frequently-updated tables.
Plain Vacuum
Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. but not released to the OS. it's just kept available for re-use within the same table. No exclusive locks are obtained. So reads and writes on the table can happen in parallel.
To run a plain vacuum, use below syntax,
VACUUM (ANALYZE, VERBOSE) my_table;
ANALYZE - Tells postres to update statistics which will help the planner to determine the most efficient way to execute a query.
VERBOSE - Prints the vacuum report
Full Vacuum
VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This operation is much slower and requires an exclusive lock on each table while it is being processed. While it is being processed others cannot access data from the table.
To run a full vacuum, use below syntax,
VACUUM (FULL, ANALYZE, VERBOSE) my_table;
FULL - tells postgres to run full vacuum and reclaim disk space
The FULL option is not recommended for routine use, but might be useful in special cases. An example is when you have deleted or updated significant number of the rows in a table and would like the table to physically shrink to occupy less disk space and allow faster table scans. VACUUM FULL will usually shrink the table more than a plain VACUUM would.
Vacuum freeze
When you run auto vacuum job periodically, sometime you would want some table not to included in the vacuum process. Vacuum freeze marks a table's contents with a very special transaction timestamp that tells postgres that it does not need to be vacuumed. Any update to the table later will unfreeze it.
Example
Create a dummy table
CREATE TABLE public.picture_likes (
id bigserial NOT NULL,
picture_id int8 NOT NULL,
user_id int8 NOT NULL,
CONSTRAINT picture_likes_pkey PRIMARY KEY (id)
);
Load data
do $$
begin
for r in 1..3 loop
WITH users AS (
SELECT *
FROM generate_series(1, 1000000)
)
INSERT INTO public.picture_likes
(picture_id, user_id)
SELECT r, generate_series FROM users;
end loop;
end;
$$;
Analyse dead tuples count
SELECT pg_stat_get_live_tuples(c.oid) AS n_live_tup , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup
FROM pg_class c where relname='picture_likes';
You would notice there would be no dead tuples here.
Delete most of the data
delete from public.picture_likes where user_id > 500000
Analyse dead tuples
SELECT pg_stat_get_live_tuples(c.oid) AS n_live_tup , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup
FROM pg_class c where relname='picture_likes';
All the deleted rows in the previous query have been marked as dead tuples, so we get about 1500000 in dead tuples count.
Run plain vacuum
VACUUM (ANALYZE, VERBOSE) public.picture_likes;
Analyse dead tuples
SELECT pg_stat_get_live_tuples(c.oid) AS n_live_tup , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup
FROM pg_class c where relname='picture_likes';
Now you can notice that dead tuples are cleaned up and live tuples count also updated.
Conclusion
- Periodic caccum is recommended for production databases, in order to remove dead tuples. This reduces space and increase table scan speed.
- After adding or deleting a large number of rows, it might be a good idea to issue a VACUUM ANALYZE command for the affected table. This will update the system catalogs with the results of all recent changes, and allow the PostgreSQL query planner to make better choices in planning queries.
- VACUUM causes a substantial increase in I/O traffic, which might cause poor performance for other active sessions. So it it is recommended to run it when low/no active database sessions.
- Plain vacuum is recommended for periodic cleanup.
- Full vacuum is recommended only when there is a significant updates or deletes happened on the table and should be done with extreme caution as it obtains exclusive table level lock.