Offset based pagination in postgres

Offset based pagination in postgres

Imagine you have one million records in a table and you have to display them in a web page. Would you fetch 1 million records in a one shot and send it through an API? Probably not. Offset based pagination is one of the most common technique for fetching small chunk of data for one page at a time.

Here is how it works, First calculate total pages based on total number of records and records per page that we would like to display, and we will fetch data when user navigate to a certain page. Offset is computed at the backend and database will use offset to skip records.

For example:

per_page = 100
total_records = 1000
total_pages = 1000/100 = 10
offset = (per_page*page_no) - per_page

Postgres offset based pagination example

-- to find total_records count
select count(*) from employee;

-- to fetch each page
select * from table_name LIMIT {{per_page}} OFFSET {{offset}};

-- fetch first page
-- offset = (100*1)-100 = 0
--  no need for an offset for first page
select * from employee LIMIT 100;

-- fetch second page
-- offset = (100*2)-100 = 100
select * from employee LIMIT 100 OFFSET 100;

-- fetch third page
-- offset = (100*3)-100 = 200
select * from employee LIMIT 100 OFFSET 200;

Analyze query plan

explain analyze select * from employee LIMIT 100 OFFSET 200;

-- seq scan is done on employee table to fetch the results
Limit  (cost=200.76..210.80 rows=100 width=930) (actual time=0.502..0.530 rows=100 loops=1)
  ->  Seq Scan on employee  (cost=0.00..1878.10 rows=18710 width=930) (actual time=0.010..0.445 rows=2100 loops=1)
Planning Time: 0.073 ms
Execution Time: 0.550 ms

Important Notes

  • Use offset based pagination with caution since it will produce inconsistent results. Make sure your application can tolerate inconsistent results. Otherwise don't use offset based pagination.
  • Two kinds of inconsistencies are possible, 1. Duplicate records 2. Missing records
  • Inconsistency due to insert - Consider a user navigates from page n to n+1 while simultaneously a new element is inserted into page n. The last row in the page n is pushed to page n+1. This results is duplicate. Also the new row that was inserted on page n will be missed.
  • Inconsistency due to update/delete - Consider a row removed from page n simultaneously as the user moves to page n+1. The first row of page n+1 will be shifted to page n and will be missed.
  • Table scan for large offsets - Large offsets are expensive. Even with index, the database must scan through storage upto the offset. Count query can have significant performance impact.
  • Dead tuples - Rows that are frequently updated or deleted are marked as dead tuples. Database will not do a hard delete until a vacuum operation is done. These dead tuples may cause a performance degradation as it needs to scan through the table. Checkout my previous article on what is dead tuples and how to analyze and improve performance when you have big number of dead rows.

Did you find this article valuable?

Support Suresh Kumar by becoming a sponsor. Any amount is appreciated!