256路Trie树的Java实现-coffee777-ChinaUnix博客

Blackcoffeeblackcoffee.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

coffee777

博客访问： 1356382
博文数量： 436
博客积分： 7854
博客等级：少将
技术积分： 3225
用户组：普通用户
注册时间： 2007-12-18 16:30

文章分类

全部博文（436）

未分配的博文（436）

文章存档

2013年（2）

2012年（56）

2011年（70）

2010年（308）

我的朋友

最近访客

推荐博文

256路Trie树的Java实现

分类：

2011-01-12 13:15:50

   最近看到一个题目，更收缩相关的，发现有人用的Trie来解决的，突然发现数据结构这门课堂上都没有听说过Trie树，很是惭愧，于是专门找了几本算法榆树拘结构的书来看了下，很庆幸在《Java算法》中找到了对它的描述。
    256路Trie是每个结点均有一个大小为256的数组，每个数组存储一个结点。这里256是一个byte的大小。实际上也就是将字符转换为字节形式（无符号非负整数），然后从高字节向低字节逐个检测，按照字节所表示的数字放到对应数组的位置上，在这个位置建立一个结点node，再取出下一个字节，对应放入node的数组位置上，直到字符串的字节形式被完全遍历，那么就得到了这个字符串所在的结点。
    如果是建立Trie树，就应该在得到的这个结点上储存字符串关键字；如果是查询，再比较这个结点上储存的关键是是否与要检索的关键字相同，如果相同就表明检索成功。
    对于书本上的Java代码，我没有太过细心的看，不过原理基本是理解了。于是在百无聊赖的时候自己实现了一个这样的Trie树。这个Trie结构提取出英文文本中的关键字，将该关键字出现的行数以及该行内的位置保存起来。如果检索成功的话，就打印出关键字所在的行号和位置。不过由于使用的正则表达式，但是这个东西我又是刚结识，不很明了，于是中文字符没有辨认出来，所以只能处理英文的。以下提供源码，仅供参考。

    TrieNode.java是Trie结构的结点类，储存的数据主要是关键字（String）、TrieNode数组、关键字的行号和位置（positions->Map），实现很简单。不过代码中没有什么注释。

import java.util.*;

public class TrieNode

{

// Attributes

private final int CHILD = 256;

private String value;

private TrieNode [] child = new TrieNode[CHILD];

private Map<Integer, List<Integer>> positions;

// Constructor

public TrieNode() {}

public TrieNode (String v)

{

this.value = v;

}

// Method (set/get)

public void setValue(String v)

{

this.value = v;

}

public String getValue()

{

return this.value;

}

public void addChild(int index, TrieNode tn)

{

this.child[index] = tn;

}

public TrieNode getChildAt(int index)

{

if (index > CHILD - 1 || index<0)

{

return null;

}

return this.child[index];

}

public void addPosition(int lineno, int pos)

{

Integer lineInt = new Integer(lineno);

if (this.positions == null)

{

this.positions = new TreeMap<Integer, List<Integer>>();

}

List<Integer> posList = this.positions.get(lineInt);

if (posList == null)

{

posList = new LinkedList<Integer>();

}

posList.add(new Integer(pos));

this.positions.put(lineInt, posList);

}

public List<Integer> getPosition(int lineno)

{

return this.positions.get(Integer.valueOf(lineno));

}

public Map<Integer, List<Integer>> getPositions()

{

return this.positions;

}

public void dump()

{

for (Map.Entry<Integer, List<Integer>> entry : this.positions.entrySet())

{

System.out.print(entry.getKey().intValue() + " - ");

for (Integer pos : entry.getValue())

{

System.out.print(pos.intValue() + ":");

}

System.out.println();

}

接下来是Trie.java，首先是获得一个文件名参数来解析该文件，提取出关键字，建立Trie树，然后查询一个具体的关键字，这里给定了一个“String”作为关键字查询。建立trie树的时候也需要首先检索该关键字是否已经存在于Trie树中了。具体参看代码实现。代码中有几个dump方法，是显示用的，这里我注释掉全部显示的这个方法，只看看搜索“String”的结果。

import java.util.regex.*;

import java.io.*;

public class Trie

{

private String fileName;

private TrieNode root;

public Trie(String fileName)

{

this.fileName = fileName;

root = new TrieNode();

}

public void parse()

{

try

{

BufferedReader br = new BufferedReader(new FileReader(this.fileName));

String line = null;

int lineno = 0;

Pattern pattern = Pattern.compile("\w*", Pattern.UNICODE_CASE);

Matcher m;

while ((line=br.readLine())!=null)

{

lineno++;

m = pattern.matcher(line);

while (m.find())

{

int startPos = m.start();

String subStr = m.group();

if (subStr.length() == 0)

{

continue;

}

this.insert(subStr, lineno, startPos);

}

catch (Exception e)

{

e.printStackTrace();

this.root = null;

}

/**

private TrieNode search(byte [] strBytes, int index, TrieNode node)

{

if (index >= strBytes.length)

{

//System.out.println("查找次数：" + index);

return node;

}

int indexInNode = strBytes[index] & 0xFF;

TrieNode trieNode = node.getChildAt(indexInNode);

if (trieNode == null)

{

trieNode = new TrieNode();

node.addChild(indexInNode, trieNode);

}

return search(strBytes, ++index, trieNode);

}

public TrieNode search(String search)

{

return search(search.getBytes(), 0, this.root);

}

// 应该首先查找，直到找到该字符串中没有走到但是还需要走的路

private void insert(String str, int lineno, int pos)

{

byte [] strBytes = str.getBytes();

TrieNode trieNode = search(strBytes, 0, this.root);

trieNode.setValue(str);

trieNode.addPosition(lineno, pos);

}

public void dump()

{

if (this.root != null)

{

System.out.println("-----");

String searchStr = "String";

TrieNode trieNode = this.search(searchStr);

if (trieNode.getValue() != null && trieNode.getValue().equals(searchStr))

{

trieNode.dump();

}

// 深度优先

private void dumpAll(TrieNode trieNode)

{

String value = trieNode.getValue();

if (value != null)

{

System.out.println("##" + value + "##");

trieNode.dump();

}

TrieNode node;

for (int i=0 ;i<256 ;i++ )

{

node = trieNode.getChildAt(i);

if (node != null)

{

dumpAll(node);

}

public void dumpAll()

{

this.dumpAll(this.root);

}

public static void main(String[] args)

{

Trie trie = new Trie(args[0]);

trie.parse();

trie.dump();

//trie.dumpAll();

}

     虽然整个代码有120行，但其中核心的解析与搜索不到50行，所以是比较简单的了。而我使用Java来实现是因为我对Java比较熟悉，而对C中的那些什么指针有点不甚了了。不过最近在看C#，这个东西跟Java比较接近，可能过段时间再用C#实现一个256路Trie树。
    我使用这个Java实现对Trie.java文件进行检索，得到的结果如下：

-----

6 - 9:

8 - 13:

18 - 3:

31 - 5:

65 - 24:

70 - 21:

83 - 3:23:

94 - 2:

115 - 25:

这些就是“String”在Trie.java中出现的具体位置，“-”左边是行数，右边是行内偏移位置。如果各位兄台有更好的实现以及其他的方法，请不吝赐教！

阅读(1119) | 评论(0) | 转发(0) |

上一篇：深入double array trie

下一篇：Trie树|字典树的简介及实现

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6