Expired
分类:
2007-01-21 11:31:49
先看一个例子:
先建一个文件 test.txt,写上“中文”两个字。运行下面一个程序:
use strict;
use warnings;
use Encode;
my $str_code = "中文";
my $file = "test.txt";
open(my $fh, $file) || die "Can't open file $file: $!";
chomp(my $str_file = <$fh>);
close $fh;
open(my $fh_utf8, "<:utf8", $file) || die "Can't open file $file: $!";
chomp(my $str_fileu = <$fh_utf8>);
close $fh_utf8;
test_utf8($str_code, "code");
test_utf8($str_file, "file");
test_utf8($str_fileu, "file(utf8 layer)");
# print $str_code, "\n";
# print $str_file, "\n";
# print $str_fileu, "\n";
Encode::_utf8_on($str_code);
Encode::_utf8_on($str_file);
test_utf8($str_code, "code");
test_utf8($str_file, "file");
sub test_utf8 {
my ($str, $type) = @_;
if ( Encode::is_utf8($str) ) {
print "String from $type, utf8 flag is on."
} else {
print "String from $type, utf8 flag is off."
}
my @chars = split('', $str);
# print join("\t", @chars), "length: ", scalar(@chars);
if (scalar(@chars)==2) {
print " Can match chinese character.";
} else {
print " Can't match chinese character.";
}
print "\n";
}
结果如下:
String from code, utf8 flag is off. Can't match chinese character.
String from file, utf8 flag is off. Can't match chinese character.
String from file(utf8 layer), utf8 flag is on. Can match chinese character.
Turn on utf flag:
String from code, utf8 flag is on. Can match chinese character.
String from file, utf8 flag is on. Can match chinese character.
所以如果要想使用汉字作为一个字符的特性,就要在 open 里指明 io layer 为 utf8,同样,输出指明 utf8,然后在脚本里写 use utf8。 这样你就不用担心 utf8 下的乱码,又能享受汉字做为一个字符来写正则表达式的的畅快。
如果还不知道我上面说的意思,再给一个例子。还是新建一个文件 test.txt, 写上:
求
汉
汁
汗
汕
江
运行下面的脚本:
use strict;
use warnings;
use Encode;
use utf8;
my $file = "test.txt";
my $outfile = "test-out.txt";
# open(my $out, '>:utf8', $outfile) || die "Can't open file $outfile: $!";
# select $out;
# open(my $fh, $file) || die "Can't open file $file: $!";
open(my $fh, '<:utf8', $file) || die "Can't open file $file: $!";
binmode STDOUT, ":utf8";
while (<$fh>) {
if (/[汉]/) {
unless (Encode::is_utf8($_)) {
Encode::_utf8_on($_);
}
print $_;
}
}
这个例子中,有好几个组合:
有时候在你没有指明 utf8 作 io layer 时,可能会看到一个 warning:
Wide character in print at script.pl line 32,line 1.
这个 warning 确实是一个小小的 warning,不会影响结果。如果不想有这个 warning,最好的办法是用 binmode 或者在 open 中指定使用 utf8 layer。或者对于 warning 的字符串用 Encode 模块中 _utf8_on 函数,强制加上 utf8 flag。