常见自然语言语法分析器总结-jiangwen127-ChinaUnix博客

Chinaunix首页 | 论坛 | 博客

EricLiseo2register.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

博客访问： 2495161
博文数量： 392
博客积分： 7040
博客等级：少将
技术积分： 4138
用户组：普通用户
注册时间： 2009-06-17 13:03

个人简介

范德萨发而为

文章分类

全部博文（392）

nosql（1）
c/c++（7）
machine lea（67）
设计模式（1）
web架构（35）
关系型database（23）
distributed（11）
fuckingwindows（1）
SE（24）
life（9）
berkeleyDB（4）
beauty of math（3）
Java_study（11）
algorithm（77）
kernel（16）
hadoop（13）
programming（8）
network（9）
linux operation（14）
bash（12）
reading（5）
STL using（8）
intern（0）
job_hunter（29）
未分配的博文（4）

文章存档

2017年（5）

2016年（19）

2015年（34）

2014年（14）

2013年（47）

2012年（40）

2011年（51）

2010年（137）

2009年（45）

我的朋友

最近访客

推荐博文

相关博文

常见自然语言语法分析器总结

分类：大数据

2015-06-24 09:47:51

http://baojie.org/blog/2014/06/16/nlp-parser/

特性总表


Features	Satisfied by	Note
Web-scale parsing: for both training and parsing time, should be able to handle TB or higher text volume efficiently	Link, MiniPar, Malt, DeSR, MST, pfp, MBSP	Linear-time parsing is generally possible with dependency parsing; also parallelism support is important
Potentially support both statistical and knowledge-based parsing	Link, NLTK, Malt, DepParse, MBSP
High accuracy	Stanford, Collins and Bikel, Berkeley, Charniak-Johnson, RASP, Malt, Link, DeSR, MST, pfp, Senna
Active development	Stanford, Berkeley, Link, NLTK, Malt, DeSR, pfp, MBSP, OpenNLP, Senna
Production-friendly license	Link, NLTK, RASP, Malt, DepParse, OpenNLP	Some others with GPL can be used in production as a web service without opening source other parts
Good documentation	Stanford, Link, NLTK, Malt, DeSR, MBSP, OpenNLP
Code Reusability: easy-to-use API or easy-to-understand code	Stanford, Link, NLTK, MiniPar, DeSR, DepParse, pfp, MBSP, Senna

详细比较

这张表比较宽，点击开头的print或pdf按钮可见全表

Parser	Internationalization	Feature Summary	Links	Active Project
Stanford Parser Constituency and dependency Java, with Python and Ruby interfaces GPL license By Chris Manning et al	English, Chinese, German, Arabic, Italian, Bulgarian, and Portuguese	Part of It is a package of three kinds of parsers: a PCFG (probabilistic context-free grammar) parser, a lexicalized dependency parser, and a lexicalized PCFG parser Parsing accuracy ranks consistently high in surveys Good documentation The PCFG parser is based CKY algorithm However, the dependency parser is an with O(n^4) complexity. It is much worse than other linear time O(n) dependency parsers	Homepage Download Online test Javadoc	Yes (frequent releases)
Collins and Bikel Parser Constituency parser Java Free for research By Dan Bikel (UPenn) and(Columbia)	English, Chinese, Arabic	It is an improvement of Collins parser Based on CYK algorithm () Lexicalized PCFG state-of-the-art performance for English	Homepage: Download Javadoc	No (since 2008)
Berkeley parser Constituency parser Java GPL Slav Petrov and Dan Klein	English, Bulgarian, Arabic, Chinese, French, German	based on a hierarchical coarse-to-fine parsing, where a sequence ofgrammars is considered no need for language-specific adaptations, Automatically induced PCFG state-of-the-art performance for English on the Penn Treebank	Project homepage Online test	(infrequent changes)
Charniak-Johnson Parser Constituency parser C Eugene Charniak (Brown Univ) and Mark Johnson	English	Based on discriminative reranking, dynamic programming Lexicalized N-Best PCFG : for each sentence, constructing sets of 50-best parses based on a heuristic coarse-to-fine generative parser estimate the reranker feature weights using MaxEnt, Averaged Perceptron, etc State of the art performance on English	Current C-J parser (2011): Original (2005) Charniak parser	Yes (infrequent changes)
Link Grammar Parser Dependency parser C, Bindings from Ruby, Python, perl, Java and Ocaml BSD license Davy Temperley, John Lafferty and Daniel Sleator (CMU) Dom Lachowicz, Linas Vepstas (AbiWord)	Persian, Arabic, Chinese, German, Russian	Based on lexicons of link grammar (similar to IBM Watson’s English slot grammar parser). Its has 70k+ words Produce both dependencies (labelled links connecting pairs of words) and constituents (Penn tree-bank style phrase tree) Performance is comparable to the Stanford PCFG parsing model, and is 3+ times faster than the Stanford lexicalized model. 10+ extensions, including FrameNet-style framing, reference (anaphora) resolution and natural language generation However, it is grammar-rigid, may fail when the sentence is grammatically incomplete or incompliant Very good documentation	Original CMU page: Project page: part of Online test: SVN: API: Documentation:	Yes (frequent releases)
NLTK Parser Constituency and dependency Python Apache License Steven Bird	English, German, Chinese, Japanese	Very good documentation, various books available. Widely adopted in education and web application development Very easy to use, clean API interface Part of whole set of NLP tools covering major NLP needs Constituency parser with PCFG Dependency parser using shift-reduce algorithm, based CFG However, its parser implementation is less optimized	Project homepage: Source code: Book: Book:	Yes (very active)
MiniPar Dependency parser C and Lisp, with Java binding in GATE free of charge for non-commercial use Dekang Lin	English	One of the early dependency parser After 15+ years, is slightly worse than state-of-the-art parsers Code is small and easy to extend Its dependency maybe useful in designing a new parser	Homepage and download	No (since 1994)
RASP C and Common Lisp Constituency and dependency LGPL John Carroll et al (Sussex and Cambridge)	English	RASP = Robust Accurate Statistical Parsing fully domain-independent automated training integration of statistical techniques and incremental grammar rule induction state-of-the-art performance	Homepage: Download:	Yes (infrequent releases)
MaltParser Dependency parser Java, with Python binding in NLTK Johan Hall, Jens Nilsson and Joakim Nivre	English, French, Swedish	Shift-reduce algorithm (automaton-based) Inductive dependency parsing that learns from a treebank Very fast: linear time parsing State-of-the-art performance on accuracy	Project home Javadoc	Yes ()
DeSR Dependency parser C++ wth Python binding GPL Giuseppe Attardi	Italian, English, French, and 10+ others	Part of the shift-reduce dependency parser, can handle non-projective dependencies deterministically parsing, very fast (linear time) fully labeled dependency trees training with Multi Layer Perceptron, Averaged Perceptron, Maximum Entropy, SVM, memory-based learning using TiMBL on English labeled dependency parsing	Project homepage Code SVN: API: Online test:	Yes ()
MSTParser Dependency parser Java Jason Baldrige and Ryan McDonald (UPenn)	English, Chinese and 10+ other languages	MST = Maximum-Spanning Tree, based on graph algorithm Support online learning State-of-the-art performance, comparable to MaltParser outperform MaltParser on longer dependencies, but typically slower	Project homepage SVN	No (since 2007)
DepParse Dependency parser Python MIT Lincense Leif Johnson (UT Austin)	English	maximum spanning tree (MST) parser and a stack-based, shift-reduce parser support data parallelism on multicore machines performance has not been evaluated Self-contained, easy to extend	Project homepage Source	No ()
pfp Constituency parser C++ and Python GPL Erik Frey, Norman Casagrande et al (Wavii Inc)	English	pfp — pretty fast statistical parser Using PCFG grammar and CYK algorithm 3-4x faster than the Stanford parser, and uses 5-8x less resident memory Thread-safe/multi-core support	Homepage	Yes
MBSP Shallow (dependency) parsing Python GPL and Commercial	English	Memory-Based Shallow Parser, based on the TiMBL and MBT memory-based learning applications No need for manual pattern or grammar definition Client-server architecture Do shallow parsing, Share an API with Pattern Can be used together with DeSR and NLTK	Homepage	Yes
OpenNLP Parser Constituency parser Java Apache License (An Apache project)	English	A chunking parser (relatively simple) Can be used with UIMA	Project homepage Source SVN	Yes
Senna Constituency parser C a non-commercial license	English	Using deep-learning Very small code (3500 lines) syntactic parsing State-of-the-art performance	Pro

阅读(2703) | 评论(0) | 转发(0) |

0

上一篇：spider的一些抓站技巧

下一篇：deepQA相关资料

给主人留下些什么吧！~~

关于我们 | 关于IT168 | 联系方式 | 广告合作 | 法律声明 | 免费注册

Copyright 2001-2010 ChinaUnix.net All Rights Reserved 北京皓辰网域网络信息技术有限公司. 版权所有

感谢所有关心和支持过ChinaUnix的朋友们