http://www.cnblogs.com/zhangchaoyang/articles/2034036.html
首先给Perl安装Text::Scws模块
perl Makefile.PL LIBS='-L/usr/local/lib' INC='-I/usr/local/include/scws'
make
make test
sudo make install
SCWS的Perl编程实例:
#!/usr/bin/perl
use Text::Scws;
$scws = Text::Scws->new();
$scws->set_charset('utf-8');
$scws->set_dict('/usr/local/etc/dict.utf8.xdb');
$scws->set_rule('/usr/local/etc/rules.utf8.ini');
$scws->set_ignore(1);
$scws->set_multi(1);
$s = shift;
$scws->send_text($s);
while ($r = $scws->get_result()) {
foreach (@$r) {
print $_->{word}, " ";
}
}
print "\n";
##############################################################################################
#!/usr/bin/perl
use Text::Scws;
$scws = Text::Scws->new();
$scws->set_charset('gbk');
$scws->set_dict('dict.xdb');
$scws->set_rule('/usr/local/etc/rules.ini');
$scws->set_ignore(0);
$scws->set_multi(1);
$s = ' 以我的理解,最简单的分词程序,应该是先将中文文本切成最小的单位--汉字--再从词典里找词,将这些字按照最左最长原则(与正则精神暗合),合并为以词为单位的集合。这样的应该是最快的,只按照给定的数据划分>
合并即可,不必考虑语法元素的权重(词性:名动形数量代等等,语法:主谓宾定状补),以及上下文的出现次数。
';
$scws->send_text($s);
while ($r = $scws->get_result()) {
foreach (@$r) {
print $_->{word}, " ";
} }
print "\n";
阅读(963) | 评论(0) | 转发(0) |