Copy a few of the columns of a csv file into a table

Postgresql

Postgresql Problem Overview


I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.

the columns of my CSV table are like:

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

the columns of my PostgreSQL table should be like:

x2 x5 x7 x10

Postgresql Solutions


Solution 1 - Postgresql

If it is an ad hoc task

Create a temporary table with all the columns in the input file

create temporary table t (x1 integer, ... , x10 text)

Copy from the file into it:

copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)

Now insert into the definitive table from the temp:

insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t

And drop it:

drop table t

If it is a frequent task

Use the file_fdw extension. As superuser:

create extension file_fdw;

create server my_csv foreign data wrapper file_fdw;

create foreign table my_csv (
    x1 integer,
    x2 text,
    x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;

Grant select permission on the table to the user who will read it:

grant select on table my_csv to the_read_user;

Then whenever necessary read directly from the csv file as if it were a table:

insert into my_table (x2)
select x2
from my_csv
where x1 = 2

Solution 2 - Postgresql

You can provide the columns your want to fill with the COPY command. Like so:

\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;

Here's the doc for the COPY command.

Solution 3 - Postgresql

As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.

Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:

> PROGRAM > > A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command. > > Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut (in a UNIX-based system) to select certain columns by order:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

However, cut has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.

There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller to select columns by name:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

Solution 4 - Postgresql

Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut) to extract the wanted columns to a new file new_file:

$ awk '{print $2, $5, $7, $10}' file > new_file

and load the new_file. You could pipe the output straight to psql:

$ cut -d \  -f 2,5,7,10 file | 
  psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database

Notice COPY, not \COPY.

Update:

As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cut are not CSV aware. Quoted delimiters can be handled with GNU awk, though.

This is a three-column file:

$ cat file
1,"2,3",4

Using GNU awk's FPAT variable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:

$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print $2,$1,$3}' file
"2,3",1,4

Explained:

$ gawk '
BEGIN {                          # instead of field separator FS
    FPAT="([^,]*)|(\"[^\"]+\")"  # ...  we define field pattern FPAT
    OFS=","                      # output field separator OFS
} 
{
    print $2,$1,$3               # change field order
    # print $2                   # or get a subset of fields
}' file 

Notice that FPAT is GNU awk only. For other awks it's just a regular variable.

Solution 5 - Postgresql

You could take James Brown's suggestion further and do, all in one line:

$ awk -F ',' '{print $2","$5","$7","$10}' file | psql -d db -c "\copy MyTable from STDIN csv header"

Solution 6 - Postgresql

If the number of imported rows is not important for you as result, you could also:

create two tables:

  • t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):with all the columns of the csv file
  • t2 (x2 x5 x7 x10): as you need it

then create:

  • a trigger function, where you insert the desired columns into t2 instead and return NULL to prevent this row being inserted in t1

  • a trigger for t1 (BEFORE INSERT FOR EACH ROW) that calls this function.

Especially with larger csv files BEFORE INSERT triggers are also useful to filter out rows with certain properties beforehand, and you can do type conversions as well.

Solution 7 - Postgresql

To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:

Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.

In PostgreSQL, create an empty table with same number of column as the spreadsheet.

Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.

Then on postgreSQL, on SQL window, put the code:

copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;

NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.

If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:

If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.

If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.

In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.

Solution 8 - Postgresql

One quick way is to copy a table to your local directory is to:

\copy (select * from table_name) to 'data.csv' CSV;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPOTENZAView Question on Stackoverflow
Solution 1 - PostgresqlClodoaldo NetoView Answer on Stackoverflow
Solution 2 - PostgresqlJulienView Answer on Stackoverflow
Solution 3 - PostgresqlarredondView Answer on Stackoverflow
Solution 4 - PostgresqlJames BrownView Answer on Stackoverflow
Solution 5 - PostgresqlChris LawtonView Answer on Stackoverflow
Solution 6 - PostgresqlMichael KraxnerView Answer on Stackoverflow
Solution 7 - PostgresqlSagunView Answer on Stackoverflow
Solution 8 - PostgresqlWeevilsView Answer on Stackoverflow