Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5179956
  • 博文数量: 1696
  • 博客积分: 10870
  • 博客等级: 上将
  • 技术积分: 18357
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-30 15:16
文章分类
文章存档

2017年(1)

2016年(1)

2015年(1)

2013年(1)

2012年(43)

2011年(17)

2010年(828)

2009年(568)

2008年(185)

2007年(51)

分类: 项目管理

2010-09-09 11:04:02

Finding duplicate code

Overview

Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:

  • First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described )
  • Then it was completely rewritten by Brian Ewins using the
  • Finally, it was rewritten by Steve Hawkins to use the string matching algorithm.

Each rewrite made it much faster, and now it can process the JDK 1.4 java.* packages in about 4 seconds (on my workstation, at least).

Here's a screenshot of CPD after running on the JDK java.lang package.

Note that CPD works with Java, JSP, C, C++, Fortran and PHP code. Your own language is missing ? See how to add it

CPD is included with PMD, which you can download . Or, if you have , you can .

are the duplicates CPD found in the JDK 1.4 source code.

are the duplicates CPD found in the APACHE_2_0_BRANCH branch of Apache (just the httpd-2.0/server/ directory).

Ant task

Andy Glover wrote an Ant task for CPD; here's how to use it:










Attribute Description Required
encoding The character set encoding (e.g., UTF-8) to use when reading the source code files; defaults to locale setting. No
format The format of the report (e.g. csv, text, xml); defaults to text. No
ignoreLiterals if true, CPD ignores literal value differences when evaluating a duplicate block. This means that foo=42; and foo=43; will be seen as equivalent. You may want to run PMD with this option off to start with and then switch it on to see what it turns up; defaults to false. No
ignoreIdentifiers Similar to ignoreLiterals but for identifiers; i.e., variable names, methods names, and so forth; defaults to false. No
language Flag to select the appropriate language (e.g. cpp, java, php, ruby); defaults to java. No
minimumtokencount A positive integer indicating the minimum duplicate size. Yes
outputfile The destination file for the report. If not specified the console will be used instead. No

Also, you can get verbose output from this task by running ant with the -v flag; i.e.:

 ant -v -f mybuildfile.xml cpd

Also, you can get an HTML report from CPD by using the XSLT script in pmd/etc/xslt/cpdhtml.xslt. Just run the CPD task as usual and right after it invoke the Ant XSLT script like this:



Command line usage

To run CPD from the command line, just give it the minimum duplicate size and the source directory:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java

You can also specify the language:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/c/source --language cpp

You may wish to check sources that are stored in different directories:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/other/source  --files /path/to/other/source --files /path/to/other/source --language fortran

There should be no limit to the number of '--files', you may add... But if you stumble one, please tell us !

And if you're checking a C source tree with duplicate files in different architecture directories you can skip those using --skip-duplicate-files:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/c/source --language cpp --skip-duplicate-files

You can also the encoding to use when parsing files:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java --encoding utf-16le

You can also specify a report format - here we're using the XML report:

$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java --format net.sourceforge.pmd.cpd.XMLRenderer

The default format is a text report, and there's also a net.sourceforge.pmd.cpd.CSVRenderer report.

Note that CPD is pretty memory-hungry; you may need to give Java more memory to run it, like this:

$ java -Xmx512m net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java

Suggestions? Comments? Post them . Thanks!

阅读(1099) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~