the gestalt approach pattern matching-seagven-ChinaUnix博客

live free,die well

首页　| 　博文目录　| 　关于我

seagven

博客访问： 8702
博文数量： 4
博客积分： 120
博客等级：入伍新兵
技术积分： 50
用户组：普通用户
注册时间： 2008-11-03 17:32

文章分类

全部博文（4）

算法学习（1）

其他（0）
未分配的博文（3）

文章存档

2011年（1）

2009年（3）

我的朋友

最近访客

浪花小雨

推荐博文

the gestalt approach pattern matching

分类：

2009-03-23 21:24:12

The best way to describe the Ratcliff/Obershelp pattern-matching algorithm, in using conventional computer terminology, is as a wild-card search that doesn't require wild cards. Instead, the algorithm creates its own wildcards, based on the closest matches found between the strings. Specifically, the algorithm works by examining two strings passed to it and locating the largest group of characters in common. The algorithm uses this group of characters as an anchor between the two strings. The algorithm then places any group of characters found to the left or the right of this anchor on a stack for further examination. This procedure is repeated for all substrings on the stack until there is nothing left to examine. The algorithm calculates the score returned as twice the number of characters found in common divided by the total number of characters in the two strings; the score is returned as an integer, reflecting a percentage match.

For example, suppose you want to compare the similarity between the word `Pennsylvania' and a mangled spelling as `Pencilvaneya.' The largest common group of characters that the algorithm would find is `lvan.' The two sub-groups remaining to the left are `Pennsy' and `Penci,' and to the right are `ia' and`eya.' The algorithm places both of these string sections on the stack to be examined, and advances the current score to eight, two times the number of characters found in common. The substrings `ia' and `eya' are next to come off of the stack and are then examined. The algorithm finds one character in common: a. The score is advanced to ten. The substrings to the left---'i' and `ey'---are placed on the stack, but then are immediately removed and determined to contain no character in common. Next, the algorithm pulls `Pennsy' and `Penci' off of the stack. The largest common substring found is `Pen.' The algorithm advances the score by 6 so that it is now 16. There is nothing to the left of `Pen,' but to the right are the substrings `nsy' and `ci,' which are pushed onto the stack. When the algorithm pulls off `nsy' and `ci' next, it finds no characters in common. The stack is now empty and the algorith ready to return the similarity value found. There was a score of 16 out of a total of 24. This result means that the two strings were 67 percent alike.

阅读(893) | 评论(0) | 转发(0) |

上一篇：error C2146: syntax error : missing ';' before ide

下一篇：博客已升级，请注意变更地址

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6