Slow simple update query on PostgreSQL database with 3 million rows

SqlPostgresqlSql Update

Sql Problem Overview


I am trying a simple UPDATE table SET column1 = 0 on a table with about 3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min.

Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.

Any other ideas?

Update:

This is the table structure:

CREATE TABLE myTable
(
  id bigserial NOT NULL,
  title text,
  description text,
  link text,
  "type" character varying(255),
  generalFreq real,
  generalWeight real,
  author_id bigint,
  status_id bigint,
  CONSTRAINT resources_pkey PRIMARY KEY (id),
  CONSTRAINT author_pkey FOREIGN KEY (author_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT c_unique_status_id UNIQUE (status_id)
);

I am trying to run UPDATE myTable SET generalFreq = 0;

Sql Solutions


Solution 1 - Sql

I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.

  1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE myId=id"
    Gives 1,500 updates/sec. which means each run would take at least 18 hours.
  2. HOT updates solution as described here with FILLFACTOR=50. Gives 1,600 updates/sec. I use SSD's so it's a costly improvement as it doubles the storage size.
  3. Insert in a temporary table of updated value and merge them after with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM for each partition; 100,000 up/s otherwise. Cooool.
    Here is the sequence of operations:

CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
CONSTRAINT tempTable_pkey PRIMARY KEY (id));

Accumulate a bunch of updates in a buffer depending of available RAM When it's filled, or need to change of table/partition, or completed:

COPY tempTable FROM buffer;
UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
COMMIT;
TRUNCATE TABLE tempTable;
VACUUM FULL ANALYZE myTable;

That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.

Solution 2 - Sql

Take a look at this answer: https://stackoverflow.com/questions/3100072/postgresql-slow-on-a-large-table-with-arrays-and-lots-of-updates/3100232#3100232

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.

Ps. You need version 8.3 or better.

Solution 3 - Sql

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

CREATE TABLE table2 AS 
SELECT 
  all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
from myTable

Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

Solution 4 - Sql

Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.

Solution 5 - Sql

Try this (note that generalFreq starts as type REAL, and stays the same):

ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;

This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.

Solution 6 - Sql

The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:

 UPDATE myTable SET generalFreq = 0 where generalFreq != 0;

(might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.

Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.

Solution 7 - Sql

How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.

If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.

Solution 8 - Sql

In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.

My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...

Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.

Solution 9 - Sql

try

UPDATE myTable SET generalFreq = 0.0;

Maybe it is a casting issue

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRicardoView Question on Stackoverflow
Solution 1 - SqlLe DroidView Answer on Stackoverflow
Solution 2 - SqlFrank HeikensView Answer on Stackoverflow
Solution 3 - SqlRicardoView Answer on Stackoverflow
Solution 4 - SqlTregoregView Answer on Stackoverflow
Solution 5 - SqlFabiano BoninView Answer on Stackoverflow
Solution 6 - SqlrogerdpackView Answer on Stackoverflow
Solution 7 - SqlTom GullenView Answer on Stackoverflow
Solution 8 - SqlRolintocourView Answer on Stackoverflow
Solution 9 - SqlChocolimView Answer on Stackoverflow