Bulk Insert Partially Quoted CSV File in SQL Server

Sql ServerCsvBulkinsert

Sql Server Problem Overview


I'm trying to import a correctly quoted CSV file, meaning data is only quoted if it contains a comma, e.g.:

41, Terminator, Black
42, "Monsters, Inc.", Blue

I observe that the first row imports correctly, but the second row errors in a manner that suggests the quoted comma was treated as a field separator.

I have seen suggestions such as this one

https://stackoverflow.com/questions/96448/sql-bulk-import-from-csv

to change the field terminator

> FIELDTERMINATOR='","'

However, my CSV file only quotes fields that need it, so I do not believe that suggestion would work.

Can SQL Server's BULK IMPORT statement import a correctly quoted CSV file? How?

Sql Server Solutions


Solution 1 - Sql Server

Unfortunately SQL Server interprets the quoted comma as a delimiter. This applies to both BCP and bulk insert .

From http://msdn.microsoft.com/en-us/library/ms191485%28v=sql.100%29.aspx

> If a terminator character occurs within the data, it is interpreted as > a terminator, not as data, and the data after that character is > interpreted as belonging to the next field or record. Therefore, > choose your terminators carefully to make sure that they never appear > in your data.

Solution 2 - Sql Server

I know this is an old topic but this feature has now been implemented since SQL Server 2017. The parameter you're looking for is FIELDQUOTE= which defaults to '"'. See more on https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

Solution 3 - Sql Server

There is another solution for this.

Consider the quotes as part of the fields delimiter, by editing the fmt file.

You can check this out for more information:

http://blogs.msdn.com/b/sqlserverfaq/archive/2010/02/04/how-to-remove-unwanted-quotation-marks-while-importing-a-data-file.aspx

An extract of the link above:

The only way to remove the quotation marks would be to modify the column delimiters specified during the import operation. The only drawback here is that if you inspect the data to be inserted, you will very quickly realize that the column delimiters are different for each column (Delimiters highlighted above).

So to specify different column delimiters for each column, you would need to use a format file if you plan to use Bulk Insert or BCP. If you generate a format file for the above table structure, it would be as follows:

9.0
3
1       SQLCHAR       0       5       "\t"     1     FName              SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5       "\t"     2     LName              SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       50      "\r\n"   3     Company            SQL_Latin1_General_CP1_CI_AS

Modify the format file to represent the correct column delimiters for each column. The new format file to be used will look like this:

9.0
4
1       SQLCHAR       0       0     "\""      0     FIRST_QUOTE      SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5     "\",\""   1     FNAME               SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       5     "\",\""   2     LNAME            SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       50    "\"\r\n"  3     COMPANY          SQL_Latin1_General_CP1_CI_AS

Solution 4 - Sql Server

Make sure you have enabled TextQualified option and set it to be ".

Solution 5 - Sql Server

I had this same problem, and I didn't want to have to go the SSIS route, so I found a PowerShell script that is easy to run and handles the case of the quotes with the comma in that particular field:

Source Code and DLL for the PowerShell Script: https://github.com/billgraziano/CsvDataReader

Here's a blog that explains the usage: http://www.sqlteam.com/article/fast-csv-import-in-powershell-to-sql-server

Solution 6 - Sql Server

You could also look at using OpenRowSet with the CSV text file data provider.

This should be possible with any version of SQL Server >= 2005 although you need to enable the feature.

http://social.msdn.microsoft.com/forums/en-US/sqldataaccess/thread/5869d247-f0a0-4224-80b3-ff2e414be402

Solution 7 - Sql Server

I've spent half a day on this problem. It's best to import using SQL Server Import & Export data wizard. There is a setting in that wizard which solves this problem. Detailed screenshots here: https://www.mssqltips.com/sqlservertip/1316/strip-double-quotes-from-an-import-file-in-integration-services-ssis/ Thanks

Solution 8 - Sql Server

I had the same problem, with data that only occasionally double-quotes some text. My solution is to let the BULK LOAD import the double-quotes, then run a REPLACE on the imported data.

For example:

bulk insert CodePoint_tbl from "F:\Data\Map\CodePointOpen\Data\CSV\ab.csv" with (FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR='\n');

update CodePoint_tbl set Postcode = replace(Postcode,'"','') where charindex('"',Postcode) > 0

To make it less painful to write the REPLACE script, just copy and paste what you need from the results of something like this:

select C.ColID, C.[name] as Columnname into #Columns
from syscolumns C
join sysobjects T on C.id = T.id
where T.[name] = 'User_tbl'
order by 1;

declare @QUOTE char(1);
set @QUOTE = Char(39);
select 'Update User_tbl set '+ColumnName+'=replace('+ColumnName+','
 + @QUOTE + '"' + @QUOTE + ',' + @QUOTE + @QUOTE + ');
GO'
from #Columns
where ColID > 2
order by ColID;

Solution 9 - Sql Server

I had the same problem, however, it worked for me with the following settings:

bulk insert schema.table
from '\\your\data\source.csv'
with (
datafiletype = 'char'
,format = 'CSV'
,firstrow = 2
,fieldterminator = '|'
,rowterminator = '\n'
,tablock
)

My CSV-File looks like this:

"col1"|"col2"
"val1"|"val2"
"val3"|"val4"

My problem was, I had rowterminator set to '0x0a' before, it did not work. Once I changed it to '\n', it started working...

Solution 10 - Sql Server

Per CSV format specification, I don't think it matters if data is correctly quoted or not, as long as it adheres to specification. Excessive quotes should be handled by the parser, if it's properly implemented. FIELDTERMINATOR should be comma and ROWTERMINATOR is line end - this denotes a standard CSV file. Did you try to import your data with these settings?

Solution 11 - Sql Server

Been stung by the same :)

I wrapped this logic into a function to clean up data that had been was already imported

DECLARE @str NVARCHAR(MAX);
DECLARE @quote_identifier NVARCHAR(MAX);

SET @quote_identifier = N'"';

SET @str = N'"quoted stuff"';

SELECT IIF(
           LEFT(@str, 1) = @quote_identifier
           AND RIGHT(@str, 1) = @quote_identifier,
           SUBSTRING(@str, DATALENGTH(@quote_identifier), LEN(@str) - DATALENGTH(@quote_identifier)),
           @str);

Solution 12 - Sql Server

It seems this now works on partially quoted csv files as long as you include FORMAT='CSV' in the options. Once I added this, it worked as expected with a partially quoted file.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEric J.View Question on Stackoverflow
Solution 1 - Sql ServeriruvarView Answer on Stackoverflow
Solution 2 - Sql ServerDominixView Answer on Stackoverflow
Solution 3 - Sql ServerRafael SegoviaView Answer on Stackoverflow
Solution 4 - Sql ServerBangxinView Answer on Stackoverflow
Solution 5 - Sql ServersloppyjoeView Answer on Stackoverflow
Solution 6 - Sql ServerPaul-Andre PanonView Answer on Stackoverflow
Solution 7 - Sql Servermye.morrView Answer on Stackoverflow
Solution 8 - Sql ServerKeith MacDonaldView Answer on Stackoverflow
Solution 9 - Sql ServerOrsinusView Answer on Stackoverflow
Solution 10 - Sql ServerVictor ZakharovView Answer on Stackoverflow
Solution 11 - Sql ServerMartin PerezView Answer on Stackoverflow
Solution 12 - Sql ServerNickView Answer on Stackoverflow