文件说明:数据如下,我想提取第三列含gene的行的特定
##gff-version 3
Chr1 phytozome9_0 gene 3631 5899 . + . ID=AT1G01010;Name=AT1G01010
Chr1 phytozome9_0 mRNA 3631 5899 . + . ID=PAC:19656964;Name=AT1G01010.1;pacid=19656964;longest=1;Parent=AT1G01010
Chr1 phytozome9_0 five_prime_UTR 3631 3759 . + . ID=PAC:19656964.five_prime_UTR.1;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 3760 3913 . + 0 ID=PAC:19656964.CDS.1;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 3996 4276 . + 2 ID=PAC:19656964.CDS.2;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 4486 4605 . + 0 ID=PAC:19656964.CDS.3;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 4706 5095 . + 0 ID=PAC:19656964.CDS.4;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 5174 5326 . + 0 ID=PAC:19656964.CDS.5;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 CDS 5439 5630 . + 0 ID=PAC:19656964.CDS.6;Parent=PAC:19656964;pacid=19656964
Chr1 phytozome9_0 three_prime_UTR 5631 5899 . + . ID=PAC:19656964.three_prime_UTR.1;
下面的脚本可以帮你实现,例子结果应该是 : Chr1 ID=AT1G01010 3631 5899
#!/usr/bin/perl -w
use strict;
print"please input your gff3 file name,then press Enter!\n";
my $plant_gff=;
open FILE,"$plant_gff";
open OUT,">result";
my @file=;
foreach my $line(@file){
my ($first)=split(/;/,$line);
my @second=split(/\t/,$first);
my $second_8=substr($second[8],3);
print OUT"$second[0]\t$second_8\t$second[3]\t$second[4]\n" if ($second[2] eq "gene");
}
exit;
阅读(2451) | 评论(0) | 转发(0) |