Can I automatically create a table in PostgreSQL from a csv file with headers?
PostgresqlCsvPostgresql Problem Overview
I'm running PostgreSQL 9.2.6 on OS X 10.6.8. I would like to import data from a CSV file with column headers into a database. I can do this with the COPY
statement, but only if I first manually create a table with a column for each column in the CSV file. Is there any way to automatically create this table based on the headers in the CSV file?
Per this question I have tried
COPY test FROM '/path/to/test.csv' CSV HEADER;
But I just get this error:
ERROR: relation "test" does not exist
And if I first create a table with no columns:
CREATE TABLE test ();
I get:
ERROR: extra data after last expected column
I can't find anything in the PostgreSQL COPY documentation about automatically creating a table. Is there some other way to automatically create a table from a CSV file with headers?
Postgresql Solutions
Solution 1 - Postgresql
There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.
The usage of the tool is simple. For example if you'd like to import myCSVfile.csv
:
pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv
This will create a table (called myCSVfile
) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.
A few notes: The command pgfutter
varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe
(rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd
and ensure pgfutter
is accessible). If you'd like to have a different table name add --table "myTable"
; to select a particular database schema us --schema "mySchema"
. In case you are accessing an external database use --host "myHostDomain"
.
A more elaborate example of pgfutter
to import myFile
into myTable
is this one:
pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv
Most likely you will change a few data types (from text to numeric) after the import:
alter table myTable
alter column myColumn type numeric
using (trim(myColumn)::numeric)
Solution 2 - Postgresql
There is a second approach, which I found here (from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).
select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)
Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)
CREATE OR REPLACE FUNCTION load_csv_file(
target_table text,
csv_path text,
col_count integer)
RETURNS void AS
$BODY$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'public';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from temp_table limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
OWNER TO postgres;
Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.
Solution 3 - Postgresql
I am using csvsql
to generate the table layout (it will automatically guess the format):
head -n 20 table.csv | csvsql --no-constraints --tables table_name
And then I use \COPY
in psql
. That's for me the fastest way to import CSV file.
You can also use sed
with csvsql
in order to get the desired datatype:
head -n 20 table.csv | csvsql --no-constraints --tables table_name | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/' | sed 's/DATETIME/TIMESTAMP'
Solution 4 - Postgresql
You can't find anything in the COPY
documentation, because COPY cannot create a table for you.
You need to do that before you can COPY
to it.
Solution 5 - Postgresql
Use sqlite as intermediate step.
Steps:
- In the command prompt type:
sqlite3
- In the sqlite3 CLI type:
.mode csv
.import my_csv.csv my_table
.output my_table_sql.sql
.dump my_table
- Finally execute that sql in your Postgresql
Solution 6 - Postgresql
I achieved it with this steps:
- Convert the csv file to utf8
iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv
- Use this python script to create the sql to create table and copy
#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify
origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)
d = open(destino,'w')
with open(origem,'r') as f:
header = f.readline().split(';')
head_cells = []
for cell in header:
value = slugify(cell,separator="_")
if value in head_cells:
value = value+'_2'
head_cells.append(value)
#cabecalho = "{}\n".format(';'.join(campos))
#print(cabecalho)
fields= []
for cell in head_cells:
fields.append(" {} text".format(cell))
table = origem.split('.')[0]
sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)
print(sql)
d.write(sql)
3.Run the script with
python3 importar.py
Optional: Edit the sql script to adjust the field types (all are text by default)
- Run the sql script. Short for console
sudo -H -u postgres bash -c "psql mydatabase < file.sql"
Solution 7 - Postgresql
I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.
Solution 8 - Postgresql
Automatic creation seems to be pretty easy with Python+Pandas
Install sqlalchemy library in your Python environment pip install SQLAlchemy==1.4.31
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df=pd.read_csv('example.csv')
df.to_sql('table_name', engine)
Solution 9 - Postgresql
For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql online and choose one.
Solution 10 - Postgresql
You can create a new table in DBeaver out of a CSV.
Solution 11 - Postgresql
Steps:
- In the command prompt type: sqlite3
- In the sqlite3 CLI type: .mode csv
- .import my_csv.csv my_table
- .output my_table_sql.sql
- .dump my_table
- Finally execute that sql in your Postgresql
Is there a way to use this then instead of insert into it will show a \copy format?
thank you for answering!!