How to copy from CSV file to PostgreSQL table with headers in CSV file?

PostgresqlCsvPostgresql Copy

Postgresql Problem Overview


I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.

I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.

Is there a way to import a table with headers included like I am trying to do?

Postgresql Solutions


Solution 1 - Postgresql

This worked. The first row had column names in it.

COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER

Solution 2 - Postgresql

With the Python library pandas, you can easily create column names and infer data types from a csv file.

from sqlalchemy import create_engine
import pandas as pd
    
engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.

Solution 3 - Postgresql

Alternative by terminal with no permission

The pg documentation at NOTES say

> The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.

So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...

The only way to express relative path with client permissions is using STDIN,

> When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

as remembered here:

psql -h remotehost -d remote_mydb -U myuser -c \
   "copy mytable (column1, column2) from STDIN with delimiter as ','" \
   < ./relative_path/file.csv

Solution 4 - Postgresql

I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:

create or replace function data.load_csv_file
	(
		target_table  text, -- name of the table that will be created
		csv_file_path text,
		col_count     integer
	)

	returns void

as $$

declare
	iter      integer; -- dummy integer to iterate columns with
	col       text; -- to keep column names in each iteration
	col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
	set schema 'data';

	create table temp_table ();

	-- add just enough number of columns
	for iter in 1..col_count
	loop
		execute format ('alter table temp_table add column col_%s text;', iter);
	end loop;

	-- copy the data from csv file
	execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

	iter := 1;
	col_first := (select col_1
	              from temp_table
	              limit 1);

	-- update the column names based on the first row which has the column names
	for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
	loop
		execute format ('alter table temp_table rename column col_%s to %s', iter, col);
		iter := iter + 1;
	end loop;

	-- delete the columns row // using quote_ident or %I does not work here!?
	execute format ('delete from temp_table where %s = %L', col_first, col_first);

	-- change the temp table name to the name given as parameter, if not blank
	if length (target_table) > 0 then
		execute format ('alter table temp_table rename to %I', target_table);
	end if;
end;

$$ language plpgsql;

Solution 5 - Postgresql

## csv with header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV, header);"

## csv without header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV);"

## csv without header, specify column
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
  -c "\COPY TB_NAME(COL1,COL2) FROM 'data_sample.csv' WITH (FORMAT CSV);"

all columns in csv should be same as table (or same as specified column)


about COPY
https://www.postgresql.org/docs/9.2/sql-copy.html

Solution 6 - Postgresql

You can use d6tstack which creates the table for you and is faster than pd.to_sql() because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.

import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')

It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook

d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), 
	apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSoatlView Question on Stackoverflow
Solution 1 - PostgresqlG. CitoView Answer on Stackoverflow
Solution 2 - PostgresqljoelostblomView Answer on Stackoverflow
Solution 3 - PostgresqlPeter KraussView Answer on Stackoverflow
Solution 4 - PostgresqlmehmetView Answer on Stackoverflow
Solution 5 - PostgresqlyurenchenView Answer on Stackoverflow
Solution 6 - PostgresqlcitynormanView Answer on Stackoverflow