fcitx使用ibus和scim-python的词库-LaoLiulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4663837
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

fcitx使用ibus和scim-python的词库

分类： LINUX

2009-03-30 14:39:51

到open-phrase上面下载词库
命令：
# cat phrase_pinyin_freq_sc.txt | sort +2 -3 -r -g | awk '{print $2 " " $1 " " $3}' > try.txt
# uniq try.txt | awk '{print $1 " " $2}' > pyPhrase_op.org

得到重复词汇表：
# uniq -c -d try.txt | sort +0 -1 -r -g > duplicate.txt

几个跟SogouLabDic.dic相同的手动发现的勘误：
山陬海噬(山陬海噬?)
以狸致鼠以冰致绳(以狸致鼠、以冰致绳)
初生犊牛(初生犊?)

把pyPhrase_op.org改名成pyPhrase.org，替换fcixt-3.6.0-rc/date/pyPhrase.org,然后重新编译fcitx就可以使用新词库了。
是否还有重复的词还需要我的程序检验。

open-phrase 项目里面的"phrase_pinyin_freq_sc_20090402.txt"经过程序运行，已经没有重复的词汇了，特此声明。

阅读(3256) | 评论(2) | 转发(0) |

上一篇：计算程序运行时间

下一篇：FreeBSD handbook Unix Basics

给主人留下些什么吧！~~

chinaunix网友2009-05-26 22:48:44

好像这样不行，导入后发现词库丢库。与原pyPhrase.org比较，发现fcitx原来的文件是GB18030编码的，而转换的是utf-8的。用kate打开该文件（gedit打开好像有乱码）另存为GB18030编码后再编译安装，成功！实测仍有重复词组，但更多的词库整体来说还是使fcitx比以前更好用了。

回复 | 举报

chinaunix网友2009-05-20 23:01:37

不错，已经用上了

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6