How can I output UTF-8 from Perl?

PerlUnicodeUtf 8

Perl Problem Overview


I am trying to write a Perl script using the utf8 pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.

However, when I enter the following into a text file, save it as a ".pl", and execute it, I get the friendly "diamond with a question mark" in place of the non-ASCII characters.

#!/usr/bin/env perl -w

use strict;
use utf8;

my $str = 'Çirçös';
print( "$str\n" );

Any idea what I'm doing wrong? I expect to get 'Çirçös' in the output, but I get '�ir��s' instead.

Perl Solutions


Solution 1 - Perl

use utf8; does not enable Unicode output - it enables you to type Unicode in your program. Add this to the program, before your print() statement:

binmode(STDOUT, ":utf8");

See if that helps. That should make STDOUT output in UTF-8 instead of ordinary ASCII.

Solution 2 - Perl

You can use the open pragma.

For eg. below sets STDOUT, STDIN & STDERR to use UTF-8....

use open qw/:std :utf8/;

Solution 3 - Perl

TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don't have to think about it.

In the environment:

export PERL_UNICODE=SDL

on the command line:

perl -CSDL -le 'print "\x{1815}"';

or with binmode:

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

or with PerlIO:

open my $fh, ">:utf8", $filename
    or die "could not open $filename: $!\n";

open my $fh, "<:encoding(utf-8)", $filename
    or die "could not open $filename: $!\n";

or with the open pragma:

use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";

Solution 4 - Perl

You also want to say, that strings in your code are utf-8. See https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default. So set not only PERL_UNICODE=SDAL but also PERL5OPT=-Mutf8.

Solution 5 - Perl

Thanks, finally got an solution to not put utf8::encode all over code. To synthesize and complete for other cases, like write and read files in utf8 and also works with LoadFile of an YAML file in utf8

use utf8;
use open ':encoding(utf8)';
binmode(STDOUT, ":utf8");

open(FH, ">test.txt"); 
print FH "something éá";

use YAML qw(LoadFile Dump);
my $PUBS = LoadFile("cache.yaml");
my $f = "2917";
my $ref = $PUBS->{$f};
print "$f \"".$ref->{name}."\" ". $ref->{primary_uri}." ";

where cache.yaml is:

---
2917:
  id: 2917
  name: Semanário
  primary_uri: 2917.xml

Solution 6 - Perl

do in your shell: $ env |grep LANG

This will probably show that your shell is not using a utf-8 locale.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPeter ConreyView Question on Stackoverflow
Solution 1 - PerlChris LutzView Answer on Stackoverflow
Solution 2 - PerldraegtunView Answer on Stackoverflow
Solution 3 - PerlChas. OwensView Answer on Stackoverflow
Solution 4 - PerlHans GinzelView Answer on Stackoverflow
Solution 5 - PerlSérgioView Answer on Stackoverflow
Solution 6 - PerlnxadmView Answer on Stackoverflow