OleDB & mixed Excel datatypes : missing data

ExcelDatatableDatasetOledb

Excel Problem Overview


I have an Excel worksheet I want to read into a datatable - all is well except for one particular column in my Excel sheet. The column, 'ProductID', is a mix of values like ########## and n#########.

I tried to let OleDB handle everything by itself automatically by reading it into a dataset/datatable, but any values in 'ProductID' like n###### are missing, ignored, and left blank. I tried manually creating my DataTable by looping through each row with a datareader, but with the exact same results.

Here's the code :

// add the column names manually to the datatable as column_1, column_2, ...
for (colnum = 0; colnum < num_columns; colnum ++){
  ds.Tables["products"].Columns.Add("column_" +colnum , System.Type.GetType("System.String")); 
}
while(myDataReader.Read()){
  // loop through each excel row adding a new respective datarow to my datatable 
  DataRow a_row = ds.Tables["products"].NewRow();
  for (col = 0; col < num_columns; col ++){
    try {  a_row[col] = rdr.GetString(col);  }
    catch {  a_row[col] = rdr.GetValue(col).ToString(); }
  }
  ds.Tables["products"].Rows.Add(a_row);
}

I don't understand why it won't let me read in values like n######. How can I do this?

Excel Solutions


Solution 1 - Excel

Using .Net 4.0 and reading Excel files, I had a similar issue with OleDbDataAdapter - i.e. reading in a mixed data type on a "PartID" column in MS Excel, where a PartID value can be numeric (e.g. 561) or text (e.g. HL4354), even though the excel column was formatted as "Text".

From what I can tell, ADO.NET chooses the data type based on the majority of the values in the column (with a tie going to numeric data type). i.e. if most of the PartID's in the sample set are numeric, ADO.NET will declare the column to be numeric. Therefore ADO.Net will attempt to cast each cell to a number, which will fail for the "text" PartID values and not import those "text" PartID's.

My solution was to set the OleDbConnection connectionstring to use Extended Properties=IMEX=1;HDR=NO to indicate this is an Import and that the table(s) will not include headers. The excel file has a header row, so in this case tell ado.net not to use it. Then later in the code, remove that header row from the dataset and voilà you have mixed data type for that column.

string sql = "SELECT F1, F2, F3, F4, F5 FROM [sheet1$] WHERE F1 IS NOT NULL";

OleDbConnection connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PrmPathExcelFile + @";Extended Properties=""Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text""");

OleDbCommand cmd = new OleDbCommand(sql, connection);
OleDbDataAdapter da = new OleDbDataAdapter(cmd);

DataSet ds = new DataSet();
ds.Tables.Add("xlsImport", "Excel");
da.Fill(ds, "xlsImport");

// Remove the first row (header row)
DataRow rowDel = ds.Tables["xlsImport"].Rows[0];
ds.Tables["xlsImport"].Rows.Remove(rowDel);

ds.Tables["xlsImport"].Columns[0].ColumnName = "LocationID";
ds.Tables["xlsImport"].Columns[1].ColumnName = "PartID";
ds.Tables["xlsImport"].Columns[2].ColumnName = "Qty";
ds.Tables["xlsImport"].Columns[3].ColumnName = "UserNotes";
ds.Tables["xlsImport"].Columns[4].ColumnName = "UserID";

connection.Close(); 

// now you can use LINQ to search the fields

    var data = ds.Tables["xlsImport"].AsEnumerable();
    var query = data.Where(x => x.Field<string>("LocationID") == "COOKCOUNTY").Select(x =>
                new Contact
                {
                    LocationID= x.Field<string>("LocationID"),
                    PartID = x.Field<string>("PartID"),
                    Quantity = x.Field<string>("Qty"),
                    Notes = x.Field<string>("UserNotes"),
                    UserID = x.Field<string>("UserID")
                });

Solution 2 - Excel

Several forums I found claim that by adding IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text to the Extended Properties in the connection string would fix the problem, but this was not the case. I finally solved this problem by adding "HDR=NO" to the Extended Properties in the connection string (as Brian Wells shows above) so that I could import mixed types.

I then added some generic code to name the columns after the first row of data, then remove the first row.

    public static DataTable ImportMyDataTableFromExcel(string filePath)
    {
        DataTable dt = new DataTable();

        string fullPath = Path.GetFullPath(filePath);

        string connString =
           "Provider=Microsoft.Jet.OLEDB.4.0;" +
           "Data Source=\"" + fullPath + "\";" +
           "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1;\"";

        string sql = @"SELECT * FROM [sheet1$]";

        using (OleDbDataAdapter dataAdapter = new OleDbDataAdapter(sql, connString))
        {
            dataAdapter.Fill(dt);
        }

        dt = BuildHeadersFromFirstRowThenRemoveFirstRow(dt);

        return dt;
    }

    private static DataTable BuildHeadersFromFirstRowThenRemoveFirstRow(DataTable dt)
    {
        DataRow firstRow = dt.Rows[0];

        for (int i = 0; i < dt.Columns.Count; i++)
        {
            if(!string.IsNullOrWhiteSpace(firstRow[i].ToString())) // handle empty cell
              dt.Columns[i].ColumnName = firstRow[i].ToString().Trim();
        }

        dt.Rows.RemoveAt(0);

        return dt;
    }

Solution 3 - Excel

No problem sh4, glad it helps w/ the mixed type issue.

The DateTime column is whole other animal that I recall caused me grief in the past... we have one excel file we process that the OleDbDataAdapter will sometimes convert dates to a double data type (apparently Excel stores dates as doubles, which encode the number of days elapsed since January 0, 1900 ).

The workaround was to use:

OleDbConnection mobjExcelConn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + txtExcelFile.Text + @";Extended Properties=""Excel 8.0;IMEX=1;HDR=Yes;""");
    
OleDbDataAdapter mobjExcelDataAdapter = new OleDbDataAdapter("Select * from [" + txtSheet.Text + "$] where [Supplier ID] <> '' ", mobjExcelConn);


DateTime dtShipStatus = DateTime.MinValue;
shipStatusOrig = excelRow["Est Ship Date"].ToString(); // excelRow is DataRow in the DataSet via the OleDbDataAdapter             
     
if (shipStatusOrig != string.Empty)
{
    // Date may be read in via oledb adapter as a double
    if (IsNumeric(shipStatusOrig))
    {
        double d = Convert.ToDouble(shipStatusOrig);
        dtShipStatus = DateTime.FromOADate(d);
                    
        if (DateTime.TryParse(dtShipStatus.ToString(), out dtShipStatus))
        {
            validDate = true;
            Debug.WriteLine("{0} converted: ", dtShipStatus.ToString("s"));
        }
    }
    else
    {
        if (ValidateShipDate(shipStatusOrig))
        {
            dtShipStatus = DateTime.Parse(shipStatusOrig);
            validDate = true;
            Debug.WriteLine("{0} converted: ", dtShipStatus.ToString("s"));
        }
        else
        {
            validDate = false;
            MessageBox.Show("Invalid date format in the Excel spreadsheet.\nLine # " + progressBar1.Value + ", the 'Ship Status' value '" + shipStatusOrig + "' is invalid.\nDate should be in a valid date time format.\ne.g. M/DD/YY, M.D.Y, YYYY-MM-DD, etc.", "Invaid Ship Status Date");
        }
    }
...
}
        public static Boolean IsNumeric (Object Expression)
        {
            if(Expression == null || Expression is DateTime)
                return false;

            if(Expression is Int16 || Expression is Int32 || Expression is Int64 || Expression is Decimal || Expression is Single || Expression is Double || Expression is Boolean)
                return true;
  
            try
            {
                if(Expression is string)
                    Double.Parse(Expression as string);
                else
                   Double.Parse(Expression.ToString());
                return true;
            } catch {} // just dismiss errors but return false

            return false;
        }

        public bool ValidateShipDate(string shipStatus)
        {
            DateTime startDate;
            try
            {
                startDate = DateTime.Parse(shipStatus);
                return true;
            }
            catch
            {
                return false;
            }
        }

Solution 4 - Excel

There are two ways to handle mixed datatypes & excel.

Method 1

  • Open up your excel spreadsheet and set the column format to the desired format manually. In this case, 'Text'.

Method 2

  • There is a "hack" that consists of appending "IMEX=1" to your connection string like so:

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=myfile.xls;Extended Properties=Excel 8.0;IMEX=1

  • This will attempt to handle mixed Excel formats according to how it is set in your registry. This can be set locally by you, but for a server, this is probably not an option.

Solution 5 - Excel

@Brian Wells Thank you, your suggestion did the trick, but not totally... Worked for the mixed field int-string, but the datetime columns went with strange characters after that, so i applied a "hack" over the "hack".

1.- Do a System.Io.File.Copy and create a copy of the excel file.

2.- Modify the Datetime column headers programatically at runtime to something in datetime format, i.e. "01/01/0001".

3.- Save the excel, and then apply your trick doing the query with HDR=NO to the modified file.

Tricky, yes, but worked, and reasonabily fast, if anyone has any alternative to this, i will be glad to hear.

Greetings.

P.D. Excuse my english, it isn't my native language.

Solution 6 - Excel

Shortcut --> if you have a mixed type column in Excel: Sort your column Z to A

I pretty much went through all of the answers here and some of them worked for me and some did not, however none was desirable for me because somehow ADO did not pick the data in a mixed type column that I had in my Excel file. I had to set HDR=NO to make ADO read my spreadsheet column that is a mix of text and numbers and that way I lose the ability of using column headers in my SQL statements which is not good. If the order of columns change in the Excel file, the SQL statement will result in error or wrong output.

In a mixed data type column the key is the first 8 rows. ADO determines the data type for the column based on the first 8 rows So if you still want to modify your connection string with the extended parameters, simply sort your column Z to A on your Excel file before reading the data by ADO so this way the rows on top are the text ones and then your column will be picked as text.

If your initial rows are numbers (regardless if your column is set to format TEXT in Excel) ADO will determine that columns as a numeric type, so once it read the text rows below, it cannot cast those into number. On the opposite case, if the column is determined text, if any row if number, it can be cast as text.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionrlb.usaView Question on Stackoverflow
Solution 1 - ExcelBrian WellsView Answer on Stackoverflow
Solution 2 - Exceluser1424725View Answer on Stackoverflow
Solution 3 - ExcelBrian WellsView Answer on Stackoverflow
Solution 4 - Excelrlb.usaView Answer on Stackoverflow
Solution 5 - Excelsh4View Answer on Stackoverflow
Solution 6 - ExcelIboView Answer on Stackoverflow