Invalid multibyte string in read.csv

Rread.csv

R Problem Overview


I am trying to import a csv that is in Japanese. This code:

url <- 'http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv'
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE)

returns the following error:

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
invalid multibyte string at '<91>ΊO<8b>y<82>ёΓ<e0><8f>،<94><94><84><94><83><8c>_<96>񓙂̏󋵁@(<8f>T<8e><9f><81>E<8e>w<92><e8><95>񍐋@<8a>փx<81>[<83>X<81>j'

I tried changing the encoding (Encoding(url) <- 'UTF-8' and also to latin1) and tried removing the read.csv parameters, but received the same "invalid multibyte string" message in each case. Is there a different encoding that should be used, or is there some other problem?

R Solutions


Solution 1 - R

Encoding sets the encoding of a character string. It doesn't set the encoding of the file represented by the character string, which is what you want.

This worked for me, after trying "UTF-8":

x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding="latin1")

And you may want to skip the first 16 lines, and read in the headers separately. Either way, there's still quite a bit of cleaning up to do.

x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE,
  fileEncoding="latin1", skip=16)
# get started with the clean-up
x[,1] <- gsub("\u0081|`", "", x[,1])    # get rid of odd characters
x[,-1] <- as.data.frame(lapply(x[,-1],  # convert to numbers
  function(d) type.convert(gsub(d, pattern=",", replace=""))))

Solution 2 - R

You may have encountered this issue because of the incompatibility of system locale try setting the system locale with this code Sys.setlocale("LC_ALL", "C")

Solution 3 - R

The readr package from the tidyverse universe might help.

You can set the encoding via the local argument of the read_csv() function by using the local() function and its encoding argument:

read_csv(file = "http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv",
         skip = 14,
         local = locale(encoding = "latin1"))

Solution 4 - R

I had the same error and tried all the above to no avail. The issue vanished when I upgraded from R 3.4.0 to 3.4.3, so if your R version is not up to date, update it!

Solution 5 - R

The simplest solution I found for this issue without losing any data/special character (for example when using fileEncoding="latin1" characters like the Euro sign € will be lost) is to open the file first in a text editor like Sublime Text, and to "Save with encoding - UTF-8".

Then R can import the file with no issue and no character loss.

Solution 6 - R

For those using Rattle with this issue Here is how I solved it:

  1. First make sure to quit rattle so your at the R command prompt
  2. > library (rattle) (if not done so already)
  3. > crv$csv.encoding="latin1"
  4. > rattle()
  5. You should now be able to carry on. ie, import your csv > Execute > Model > Execute etc.

That worked for me, hopefully that helps a weary traveller

Solution 7 - R

I had a similar problem with scientific articles and found a good solution here: http://tm.r-forge.r-project.org/faq.html

By using the following line of code:

tm_map(yourCorpus, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))

you convert the multibyte strings into hex code. I hope this helps.

Solution 8 - R

If the file you are trying to import into R that was originally an Excel file. Make sure you open the original file and Save as a csv and that fixed this error for me when importing into R.

Solution 9 - R

I came across this error (invalid multibyte string 1 ) recently, but my problem was a bit different:

We had forgotten to save a csv.gz file with an extension, and tried to use read_csv() to read it. Adding the extension solved the problem.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjaredwoodardView Question on Stackoverflow
Solution 1 - RJoshua UlrichView Answer on Stackoverflow
Solution 2 - Ruser3670684View Answer on Stackoverflow
Solution 3 - RJe HsersView Answer on Stackoverflow
Solution 4 - RstevecView Answer on Stackoverflow
Solution 5 - RGuiwald DohView Answer on Stackoverflow
Solution 6 - Rwired00View Answer on Stackoverflow
Solution 7 - RCarlosView Answer on Stackoverflow
Solution 8 - R822_BAView Answer on Stackoverflow
Solution 9 - RMirabilisView Answer on Stackoverflow