分类: 项目管理
2010-09-09 11:04:02
Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:
Each rewrite made it much faster, and now it can process the JDK 1.4 java.* packages in about 4 seconds (on my workstation, at least).
Here's a screenshot of CPD after running on the JDK java.lang package.
Note that CPD works with Java, JSP, C, C++, Fortran and PHP code. Your own language is missing ? See how to add it
CPD is included with PMD, which you can download . Or, if you have , you can .
are the duplicates CPD found in the JDK 1.4 source code.
are the duplicates CPD found in the APACHE_2_0_BRANCH branch of Apache
(just the httpd-2.0/server/
directory).
Andy Glover wrote an Ant task for CPD; here's how to use it:
Attribute | Description | Required |
encoding | The character set encoding (e.g., UTF-8) to use when reading the source code files; defaults to locale setting. | No |
format | The format of the report (e.g. csv , text , xml ); defaults to text . |
No |
ignoreLiterals | if true , CPD ignores literal
value differences when evaluating a duplicate block. This means that foo=42; and foo=43;
will be seen as equivalent. You may want to run PMD with this option off to start with and
then switch it on to see what it turns up; defaults to false . |
No |
ignoreIdentifiers | Similar to ignoreLiterals but for identifiers; i.e., variable names, methods names, and so forth; defaults to false . |
No |
language | Flag to select the appropriate language (e.g. cpp , java , php , ruby ); defaults to java . |
No |
minimumtokencount | A positive integer indicating the minimum duplicate size. | Yes |
outputfile | The destination file for the report. If not specified the console will be used instead. | No |
Also, you can get verbose output from this task by running ant with the -v
flag; i.e.:
ant -v -f mybuildfile.xml cpd
Also, you can get an HTML report from CPD by using the XSLT script in pmd/etc/xslt/cpdhtml.xslt. Just run the CPD task as usual and right after it invoke the Ant XSLT script like this:
To run CPD from the command line, just give it the minimum duplicate size and the source directory:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java
You can also specify the language:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/c/source --language cpp
You may wish to check sources that are stored in different directories:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/other/source --files /path/to/other/source --files /path/to/other/source --language fortran
There should be no limit to the number of '--files', you may add... But if you stumble one, please tell us !
And if you're checking a C source tree with duplicate files in different architecture directories you can skip those using --skip-duplicate-files:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /path/to/c/source --language cpp --skip-duplicate-files
You can also the encoding to use when parsing files:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java --encoding utf-16le
You can also specify a report format - here we're using the XML report:
$ java net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java --format net.sourceforge.pmd.cpd.XMLRenderer
The default format is a text report, and there's also a net.sourceforge.pmd.cpd.CSVRenderer
report.
Note that CPD is pretty memory-hungry; you may need to give Java more memory to run it, like this:
$ java -Xmx512m net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /usr/local/java/src/java
Suggestions? Comments? Post them . Thanks!