How big is too big for a PostgreSQL table?

SqlRuby on-RailsPerformancePostgresqlDatabase Design

Sql Problem Overview


I'm working on the design for a RoR project for my company, and our development team has already run into a bit of a debate about the design, specifically the database.

We have a model called Message that needs to be persisted. It's a very, very small model with only three db columns other than the id, however there will likely be A LOT of these models when we go to production. We're looking at as much as 1,000,000 insertions per day. The models will only ever be searched by two foreign keys on them which can be indexed. As well, the models never have to be deleted, but we also don't have to keep them once they're about three months old.

So, what we're wondering is if implementing this table in Postgres will present a significant performance issue? Does anyone have experience with very large SQL databases to tell us whether or not this will be a problem? And if so, what alternative should we go with?

Sql Solutions


Solution 1 - Sql

Rows per a table won't be an issue on it's own.

So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing.

Depending on your data distribution you can use a mixture of indexes, filtered indexes, and table partitioning of some kind to speed thing up once you see what performance issues you may or may not have. Your problem will be the same on any other RDMS that I know of. If you only need 3 months worth of data design in a process to prune off the data you don't need any more. That way you will have a consistent volume of data on the table. Your lucky you know how much data will exist, test it for your volume and see what you get. Testing one table with 90 million rows may be as easy as:

select x,1 as c2,2 as c3
from generate_series(1,90000000) x;

https://wiki.postgresql.org/wiki/FAQ

Limit	Value
Maximum Database Size	    Unlimited
Maximum Table Size	        32 TB
Maximum Row Size	        1.6 TB
Maximum Field Size	        1 GB
Maximum Rows per Table	    Unlimited
Maximum Columns per Table	250 - 1600 depending on column types
Maximum Indexes per Table	Unlimited

Solution 2 - Sql

Another way to speed up your queries significantly on a table with > 100 million rows is to cluster the table on the index that is most often used in your queries. Do this in your database's "off" hours. We have a table with > 218 million rows and have found 30X improvements.

Also, for a very large table, it's a good idea to create an index on your foreign keys.

EXAMPLE:

  1. Assume we have a table named investment in a database named ccbank.
  2. Assume the index most used in our queries is (bankid,record_date)

Here are the steps to create and cluster an index:

  1. psql -c "drop index investment_bankid_rec_dt_idx;" ccbank
  2. psql -c "create index investment_bankid_rec_dt_idx on investment(bankid, record_date);"
  3. psql -c "cluster investment_bankid_rec_dt_idx on investment;"
  4. vacuumdb -d ccbank -z -v -t investment

In steps 1-2 we replace the old index with a new, optimized one. In step 3 we cluster the table: this basically puts the DB table in the physical order of the index, so that when PostgreSQL performs a query it caches the most likely next rows. In step 4 we vacuum the database to reset the statistics for the query planner.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDylan KarrView Question on Stackoverflow
Solution 1 - SqlKuberchaunView Answer on Stackoverflow
Solution 2 - SqlJames DohertyView Answer on Stackoverflow