Adding BOM to UTF-8 files

Utf 8Batch FileScriptingByte Order-Mark

Utf 8 Problem Overview


I'm searching (without success) for a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.

Neither the language it is written in (perl, python, c, bash) nor the OS it works on, matters to me. I have access to a wide range of computers.

I've found a lot of scripts to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.

Did I miss the obvious?

Thanks!

Utf 8 Solutions


Solution 1 - Utf 8

I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.

#!/bin/sh

if [ $# -eq 0 ]
then
        echo usage $0 files ...
        exit 1
fi

for file in "$@"
do
        echo "# Processing: $file" 1>&2
        if [ ! -f "$file" ]
        then
                echo Not a file: "$file" 1>&2
                exit 1
        fi
        TYPE=`file - < "$file" | cut -d: -f2`
        if echo "$TYPE" | grep -q '(with BOM)'
        then
                echo "# $file already has BOM, skipping." 1>&2
        else
                ( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
        fi
done

edit: Added quotes around the mv arguments. Thanks @DirkR and glad this script has been so helpful!

Solution 2 - Utf 8

The easiest way I found for this is

#!/usr/bin/env bash

#Add BOM to the new file
printf '\xEF\xBB\xBF' > with_bom.txt

# Append the content of the source file to the new file
cat source_file.txt >> with_bom.txt

I know it uses an external program (cat)... but it will do the job easily in bash

Tested on osx but should work on linux as well

NOTE that it assumes that the file doesn't already have BOM (!)

Solution 3 - Utf 8

(Answer based on <https://stackoverflow.com/a/9815107/1260896> by yingted)

To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.

sed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' foo-*

If you know for sure there is no BOM already, you can simplify the command:

sed -i '1s/^/\xef\xbb\xbf/' foo-*

Make sure you need to set UTF-8, because i.e. UTF-16 is different (otherwise check <https://stackoverflow.com/q/1044595/1260896>;)

Solution 4 - Utf 8

As an improvement on Yaron U.'s solution, you can do it all on a single line:

printf '\xEF\xBB\xBF' | cat - source.txt > source-with-bom.txt

The cat - bit says to concatenate to the front of source.txt what's being piped in from the print command. Tested on OS X and Ubuntu.

Solution 5 - Utf 8

I find it pretty simple. Assuming the file is always UTF-8(you're not detecting the encoding, you know the encoding):

Read the first three characters. Compare them to the UTF-8 BOM sequence(wikipedia says it's 0xEF,0xBB,0xBF). If it's the same, print them in the new file and then copy everything else from the original file to the new file. If it's different, first print the BOM, then print the three characters and only then print everything else from the original file to the new file.

In C, fopen/fclose/fread/fwrite should be enough.

Solution 6 - Utf 8

Solution 7 - Utf 8

in VBA Access:

    Dim name As String
    Dim tmpName As String
	
	tmpName = "tmp1.txt"
	name = "final.txt"

    Dim file As Object
    Dim finalFile As Object
    Set file = CreateObject("Scripting.FileSystemObject")

    Set finalFile = file.CreateTextFile(name)
 
    
    'Add BOM
    finalFile.Write Chr(239)
    finalFile.Write Chr(187)
    finalFile.Write Chr(191)
    
    'transfer text from tmp to final file:
    Dim tmpFile As Object
    Set tmpFile = file.OpenTextFile(tmpName, 1)
    finalFile.Write tmpFile.ReadAll
    finalFile.Close
    tmpFile.Close
    file.DeleteFile tmpName

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionStephaneView Question on Stackoverflow
Solution 1 - Utf 8Steven R. LoomisView Answer on Stackoverflow
Solution 2 - Utf 8Yaron U.View Answer on Stackoverflow
Solution 3 - Utf 8Franklin PiatView Answer on Stackoverflow
Solution 4 - Utf 8TrentonView Answer on Stackoverflow
Solution 5 - Utf 8luiscubalView Answer on Stackoverflow
Solution 6 - Utf 8VdragonView Answer on Stackoverflow
Solution 7 - Utf 8dapiView Answer on Stackoverflow