Hidden Features of PostgreSQL

DatabasePostgresqlRdbmsPostgresql 9.3

Database Problem Overview


I'm surprised this hasn't been posted yet. Any interesting tricks that you know about in Postgres? Obscure config options and scaling/perf tricks are particularly welcome.

I'm sure we can beat the 9 comments on the corresponding MySQL thread :)

Database Solutions


Solution 1 - Database

Since postgres is a lot more sane than MySQL, there are not that many "tricks" to report on ;-)

The manual has some nice performance tips.

A few other performance related things to keep in mind:

  • Make sure autovacuum is turned on
  • Make sure you've gone through your postgres.conf (effective cache size, shared buffers, work mem ... lots of options there to tune).
  • Use pgpool or pgbouncer to keep your "real" database connections to a minimum
  • Learn how EXPLAIN and EXPLAIN ANALYZE works. Learn to read the output.
  • CLUSTER sorts data on disk according to an index. Can dramatically improve performance of large (mostly) read-only tables. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered.

Here's a few things I've found useful that aren't config or performance related per se.

To see what's currently happening:

select * from pg_stat_activity;

Search misc functions:

select * from pg_proc WHERE proname ~* '^pg_.*'

Find size of database:

select pg_database_size('postgres');
select pg_size_pretty(pg_database_size('postgres'));

Find size of all databases:

select datname, pg_size_pretty(pg_database_size(datname)) as size
  from pg_database;

Find size of tables and indexes:

select pg_size_pretty(pg_relation_size('public.customer'));

Or, to list all tables and indexes (probably easier to make a view of this):

select schemaname, relname,
    pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) as size
  from (select schemaname, relname, 'table' as type
          from pg_stat_user_tables
        union all
        select schemaname, relname, 'index' as type
          from pg_stat_user_indexes) x;

Oh, and you can nest transactions, rollback partial transactions++

test=# begin;
BEGIN
test=# select count(*) from customer where name='test';
 count 
-------
     0
(1 row)
test=# insert into customer (name) values ('test');
INSERT 0 1
test=# savepoint foo;
SAVEPOINT
test=# update customer set name='john';
UPDATE 3
test=# rollback to savepoint foo;
ROLLBACK
test=# commit;
COMMIT
test=# select count(*) from customer where name='test';
 count 
-------
     1
(1 row)

Solution 2 - Database

The easiest trick to let postgresql perform a lot better (apart from setting and using proper indexes of course) is just to give it more RAM to work with (if you have not done so already). On most default installations the value for shared_buffers is way too low (in my opinion). You can set

> shared_buffers

in postgresql.conf. Divide this number by 128 to get an approximation of the amount of memory (in MB) postgres can claim. If you up it enough this will make postgresql fly. Don't forget to restart postgresql.

On Linux systems, when postgresql won't start again you will probably have hit the kernel.shmmax limit. Set it higher with

sysctl -w kernel.shmmax=xxxx

To make this persist between boots, add a kernel.shmmax entry to /etc/sysctl.conf.

A whole bunch of Postgresql tricks can be found here:

Solution 3 - Database

Postgres has a very powerful datetime handling facility thanks to its INTERVAL support.

For example:

select NOW(), NOW() + '1 hour';
              now              |           ?column?            
-------------------------------+-------------------------------
 2009-04-18 01:37:49.116614+00 | 2009-04-18 02:37:49.116614+00
(1 row)



select current_date ,(current_date +  interval '1 year')::date;
    date             |  date            
---------------------+----------------
 2014-10-17          | 2015-10-17
(1 row)

You can cast many strings to an INTERVAL type.

Solution 4 - Database

COPY

I'll start. Whenever I switch to Postgres from SQLite, I usually have some really big datasets. The key is to load your tables with COPY FROM rather than doing INSERTS. See documentation:

http://www.postgresql.org/docs/8.1/static/sql-copy.html

The following example copies a table to the client using the vertical bar (|) as the field delimiter:

COPY country TO STDOUT WITH DELIMITER '|';

To copy data from a file into the country table:

COPY country FROM '/usr1/proj/bray/sql/country_data';

See also here: https://stackoverflow.com/questions/364017/faster-bulk-inserts-in-sqlite3/759866#759866

Solution 5 - Database

  • My by far favorite is generate_series: at last a clean way to generate dummy rowsets.

  • Ability to use a correlated value in a LIMIT clause of a subquery:

      SELECT  (
              SELECT  exp_word
              FROM    mytable
              OFFSET id
              LIMIT 1
              )
      FROM    othertable
    
  • Abitlity to use multiple parameters in custom aggregates (not covered by the documentation): see the article in my blog for an example.

Solution 6 - Database

One of the things I really like about Postgres is some of the data types supported in columns. For example, there are column types made for storing http://www.postgresql.org/docs/8.3/interactive/datatype-net-types.html">Network Addresses and http://www.postgresql.org/docs/8.3/interactive/arrays.html">Arrays</a>;. The corresponding functions (http://www.postgresql.org/docs/8.3/interactive/functions-net.html">Network Addresses / http://www.postgresql.org/docs/8.3/interactive/functions-array.html">Arrays</a>;) for these column types let you do a lot of complex operations inside queries that you'd have to do by processing results through code in MySQL or other database engines.

Solution 7 - Database

Arrays are really cool once you get to know 'em. Lets say you would like to store some hyper links between pages. You might start by thinking about creating a Table kinda like this:

CREATE TABLE hyper.links (
     tail INT4,
     head INT4
);

If you needed to index the tail column, and you had, say 200,000,000 links-rows (like wikipedia would give you), you would find yourself with a huge Table and a huge Index.

However, with PostgreSQL, you could use this Table format instead:

CREATE TABLE hyper.links (
     tail INT4,
     head INT4[],
     PRIMARY KEY(tail)
);

To get all heads for a link you could send a command like this (unnest() is standard since 8.4):

SELECT unnest(head) FROM hyper.links WHERE tail = $1;

This query is surprisingly fast when it is compared with the first option (unnest() is fast and the Index is way way smaller). Furthermore, your Table and Index will take up much less RAM-memory and HD-space, especially when your Arrays are so long that they are compressed to a Toast Table. Arrays are really powerful.

Note: while unnest() will generate rows out of an Array, array_agg() will aggregate rows into an Array.

Solution 8 - Database

Materialized Views are pretty easy to setup:

CREATE VIEW my_view AS SELECT id, AVG(my_col) FROM my_table GROUP BY id;
CREATE TABLE my_matview AS SELECT * FROM my_view;

That creates a new table, my_matview, with the columns and values of my_view. Triggers or a cron script can then be setup to keep the data up to date, or if you're lazy:

TRUNCATE my_matview;
INSERT INTO my_matview SELECT * FROM my_view;

Solution 9 - Database

  • Inheritance..infact Multiple Inheritance (as in parent-child "inheritance" not 1-to-1 relation inheritance which many web frameworks implement when working with postgres).

  • PostGIS (spatial extension), a wonderful add-on that offers comprehensive set of geometry functions and coordinates storage out of the box. Widely used in many open-source geo libs (e.g. OpenLayers,MapServer,Mapnik etc) and definitely way better than MySQL's spatial extensions.

  • Writing procedures in different languages e.g. C, Python,Perl etc (makes your life easir to code if you're a developer and not a db-admin).

Also all procedures can be stored externally (as modules) and can be called or imported at runtime by specified arguments. That way you can source control the code and debug the code easily.

  • A huge and comprehensive catalogue on all objects implemented in your database (i.e. tables,constraints,indexes,etc).

I always find it immensely helpful to run few queries and get all meta info e.g. ,constraint names and fields on which they have been implemented on, index names etc.

For me it all becomes extremely handy when I have to load new data or do massive updates in big tables (I would automatically disable triggers and drop indexes) and then recreate them again easily after processing has finished. Someone did an excellent job of writing handful of these queries.

http://www.alberton.info/postgresql_meta_info.html

  • Multiple schemas under one database, you can use it if your database has large number of tables, you can think of schemas as categories. All tables (regardless of it's schema) have access to all other tables and functions present in parent db.

Solution 10 - Database

You don't need to learn how to decipher "explain analyze" output, there is a tool: http://explain.depesz.com

Solution 11 - Database

select pg_size_pretty(200 * 1024)

Solution 12 - Database

pgcrypto: more cryptographic functions than many programming languages' crypto modules provide, all accessible direct from the database. It makes cryptographic stuff incredibly easy to Just Get Right.

Solution 13 - Database

A database can be copied with: > createdb -T old_db new_db

The documentation says: > this is not (yet) intended as a general-purpose "COPY DATABASE" facility

but it works well for me and is much faster than

> createdb new_db > > pg_dump old_db | psql new_db

Solution 14 - Database

Memory storage for throw-away data/global variables

You can create a tablespace that lives in the RAM, and tables (possibly unlogged, in 9.1) in that tablespace to store throw-away data/global variables that you'd like to share across sessions.

http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/

Advisory locks

These are documented in an obscure area of the manual:

http://www.postgresql.org/docs/9.0/interactive/functions-admin.html

It's occasionally faster than acquiring multitudes of row-level locks, and they can be used to work around cases where FOR UPDATE isn't implemented (such as recursive CTE queries).

Solution 15 - Database

1.) When you need append notice to query, you can use nested comment

SELECT /* my comments, that I would to see in PostgreSQL log */
       a, b, c
   FROM mytab;

2.) Remove Trailing spaces from all the text and varchar field in a database.

do $$
declare
    selectrow record;
begin
for selectrow in
select 
       'UPDATE '||c.table_name||' SET '||c.COLUMN_NAME||'=TRIM('||c.COLUMN_NAME||')  WHERE '||c.COLUMN_NAME||' ILIKE ''% '' ' as script
from (
       select 
          table_name,COLUMN_NAME
       from 
          INFORMATION_SCHEMA.COLUMNS 
       where 
          table_name LIKE 'tbl%'  and (data_type='text' or data_type='character varying' )
     ) c
loop
execute selectrow.script;
end loop;
end;
$$;

3.) We can use a window function for very effective removing of duplicate rows:

DELETE FROM tab 
  WHERE id IN (SELECT id 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), id 
                           FROM tab) x 
                 WHERE x.row_number > 1);

Some PostgreSQL's optimized version (with ctid):

DELETE FROM tab 
  WHERE ctid = ANY(ARRAY(SELECT ctid 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), ctid 
                           FROM tab) x 
                 WHERE x.row_number > 1));

4.) When we need to identify server's state, then we can use a function:

SELECT pg_is_in_recovery();

5.) Get functions's DDL command.

select pg_get_functiondef((select oid from pg_proc where proname = 'f1'));

6.) Safely changing column data type in PostgreSQL

create table test(id varchar );
insert into test values('1');
insert into test values('11');
insert into test values('12');

select * from test
--Result--
id
character varying
--------------------------
1
11
12

You can see from the above table that I have used the data type – ‘character varying’ for ‘id’
column. But it was a mistake, because I am always giving integers as id. So using varchar here is a bad practice. So let’s try to change the column type to integer.

ALTER TABLE test ALTER COLUMN id TYPE integer;

But it returns:

> ERROR: column “id” cannot be cast automatically to type integer SQL > state: 42804 Hint: Specify a USING expression to perform the > conversion

That means we can’t simply change the data type because data is already there in the column. Since the data is of type ‘character varying’ postgres cant expect it as integer though we entered integers only. So now, as postgres suggested we can use the ‘USING’ expression to cast our data into integers.

ALTER TABLE test ALTER COLUMN id  TYPE integer USING (id ::integer);

It Works.

7.) Know who is connected to the Database
This is more or less a monitoring command. To know which user connected to which database including their IP and Port use the following SQL:

SELECT datname,usename,client_addr,client_port FROM pg_stat_activity ;

8.) Reloading PostgreSQL Configuration files without Restarting Server

PostgreSQL configuration parameters are located in special files like postgresql.conf and pg_hba.conf. Often, you may need to change these parameters. But for some parameters to take effect we often need to reload the configuration file. Of course, restarting server will do it. But in a production environment it is not preferred to restarting the database, which is being used by thousands, just for setting some parameters. In such situations, we can reload the configuration files without restarting the server by using the following function:

select pg_reload_conf();

> Remember, this wont work for all the parameters, some parameter > changes need a full restart of the server to be take in effect.

9.) Getting the data directory path of the current Database cluster

It is possible that in a system, multiple instances(cluster) of PostgreSQL is set up, generally, in different ports or so. In such cases, finding which directory(physical storage directory) is used by which instance is a hectic task. In such cases, we can use the following command in any database in the cluster of our interest to get the directory path:

SHOW data_directory;

The same function can be used to change the data directory of the cluster, but it requires a server restarts:

SET data_directory to new_directory_path;

10.) Find a CHAR is DATE or not

create or replace function is_date(s varchar) returns boolean as $$
begin
  perform s::date;
  return true;
exception when others then
  return false;
end;
$$ language plpgsql;

Usage: the following will return True

select is_date('12-12-2014')
select is_date('12/12/2014')
select is_date('20141212')
select is_date('2014.12.12')
select is_date('2014,12,12')

11.) Change the owner in PostgreSQL

REASSIGN OWNED BY sa  TO postgres;

12.) PGADMIN PLPGSQL DEBUGGER

Well explained here

Solution 16 - Database

This is my favorites list of lesser know features.

Transactional DDL

Nearly every SQL statement is transactional in Postgres. If you turn off autocommit the following is possible:

drop table customer_orders;
rollback;
select *
from customer_orders;

Range types and exclusion constraint

To my knowledge Postgres is the only RDBMS that lets you create a constraint that checks if two ranges overlap. An example is a table that contains product prices with a "valid from" and "valid until" date:

create table product_price
(
   price_id      serial        not null primary key,
   product_id    integer       not null references products,
   price         numeric(16,4) not null,
   valid_during  daterange not null
);

NoSQL features

The hstore extension offers a flexible and very fast key/value store that can be used when parts of the database need to be "schema-less". JSON is another option to store data in a schema-less fashion and

insert into product_price 
  (product_id, price, valid_during)
values 
  (1, 100.0, '[2013-01-01,2014-01-01)'),
  (1,  90.0, '[2014-01-01,)');


-- querying is simply and can use an index on the valid_during column
select price
from product_price
where product_id = 42
  and valid_during @> date '2014-10-17';

The execution plan for the above on a table with 700.000 rows:

Index Scan using check_price_range on public.product_price  (cost=0.29..3.29 rows=1 width=6) (actual time=0.605..0.728 rows=1 loops=1)
  Output: price
  Index Cond: ((product_price.valid_during @> '2014-10-17'::date) AND (product_price.product_id = 42))
  Buffers: shared hit=17
Total runtime: 0.772 ms

To avoid inserting rows with overlapping validity ranges a simple (and efficient) unique constraint can be defined:

alter table product_price
  add constraint check_price_range 
  exclude using gist (product_id with =, valid_during with &&)

Infinity

Instead of requiring a "real" date far in the future Postgres can compare dates to infinity. E.g. when not using a date range you can do the following

insert into product_price 
  (product_id, price, valid_from, valid_until)
values 
  (1,  90.0, date '2014-01-01', date 'infinity');

Writeable common table expressions

You can delete, insert and select in a single statement:

with old_orders as (
   delete from orders
   where order_date < current_date - interval '10' year
   returning *
), archived_rows as (
   insert into archived_orders 
   select * 
   from old_orders
   returning *
)
select *
from archived_rows;

The above will delete all orders older than 10 years, move them to the archived_orders table and then display the rows that were moved.

Solution 17 - Database

It's convenient to rename an old database rather than mysql can do. Just using the following command:

ALTER DATABASE name RENAME TO new_name

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionramanujanView Question on Stackoverflow
Solution 1 - DatabasetommymView Answer on Stackoverflow
Solution 2 - DatabaseChristopheDView Answer on Stackoverflow
Solution 3 - DatabaseYann RaminView Answer on Stackoverflow
Solution 4 - DatabaseramanujanView Answer on Stackoverflow
Solution 5 - DatabaseQuassnoiView Answer on Stackoverflow
Solution 6 - DatabaseChad BirchView Answer on Stackoverflow
Solution 7 - DatabaseNicholas LeonardView Answer on Stackoverflow
Solution 8 - DatabaseCameronView Answer on Stackoverflow
Solution 9 - DatabaseNakhView Answer on Stackoverflow
Solution 10 - DatabaseAASView Answer on Stackoverflow
Solution 11 - DatabaseMichael BuenView Answer on Stackoverflow
Solution 12 - DatabasekquinnView Answer on Stackoverflow
Solution 13 - DatabaseKim RutherfordView Answer on Stackoverflow
Solution 14 - DatabaseDenis de BernardyView Answer on Stackoverflow
Solution 15 - DatabaseVivek S.View Answer on Stackoverflow
Solution 16 - Databasea_horse_with_no_nameView Answer on Stackoverflow
Solution 17 - DatabaseMoon_of_fatherView Answer on Stackoverflow