A.把序列切割成等长且有一定重叠的小段,放在一个FASTA格式的文件中。
B.拆分好的序列名字为原来的序列名字加位置信息(在原来序列上的起始和终止位置),如原来的序列名为E.coli,新的就是E.coli_1-100、E.coli_81-180、......
- #use Acme::PerlTidy;
- use Bio::Seq;
- use Bio::SeqIO;
- my $len = '100';
- my $offset = '20';
- my $in = Bio::SeqIO->new( -format => 'fasta', -file => 'example.fa' );
- my $out = Bio::SeqIO->new( -format => 'fasta', -file => '>>out.fa' );
- while ( my $seq = $in->next_seq ) {
- my $desc_seq = $seq->desc();
- my ( $str1, $str2 ) = ( split( '\s', $desc_seq ) )[ 0, 1 ];
- my $desc_tem = substr( $str1, 0, 1 ) . '.' . $str2;
- my $len_seq = $seq->length();
- my ($end,$i) = ('0','0');
- while ( $end < $len_seq ) {
- $i++;
- my $start =
- ( $i > 1 ) ? $len * ( $i - 1 ) + 1 - $offset * ( $i - 1 ) : 1;
- my $end_tem = $start + $len - 1;
- $end = ( $end_tem < $len_seq ) ? $end_tem : $len_seq;
- my $desc = $desc_tem . '_' . $start . '-' . $end;
- my $seqstr = Bio::Seq->new(
- -display_id => $desc,
- #-desc=>$desc_seq,
- -seq => $seq->subseq( $start, $end )
- );
- $out->write_seq($seqstr);
- }
- }
example.fa: example.rar
阅读(2028) | 评论(0) | 转发(0) |