如这样的文件:
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;
我要提取其中以SUBCELLULAR LOCATION开头的那一小段文件,如下:
SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
similarity). Cell membrane; Peripheral membrane protein (By
similarity).
SUBCELLULAR LOCATION: Isoform 2: Cell membrane; Lipid-anchor, GPI-
anchor; Extracellular side (By similarity).
NO1.
下面给出这一类问题的通用解决办法。
这是面向行处理的一种轻量级解决方法。
比那些对整个文件进行模式匹配的方法不知优雅了要多少倍。
$start 表示开始标记的模式,$end 表示结束标记的模式,
if ( (/$start/ .. /$end/) and !/$end/ ){
表示需要开始和结束之间的,但不需要结束的那一行。
#! /usr/bin/env perl
my $start = qr/^CC\s+-!- SUBCELLULAR LOCATION/; my $end = qr/^CC\s+-!- (?!SUBCELLULAR LOCATION)/;
while(<DATA>){ if ( (/$start/ .. /$end/) and !/$end/ ){ print "*** $_"; } else{ print "--- $_"; } } __END__ CC -!- FUNCTION: Rapidly . CC -!- CATALYTIC ACTIVITY: Acetylcholine. CC -!- SUBUNIT: Homotetramer; composed . CC Interacts with PRIMA1. CC anchor it to the basal CC (By similarity). CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By CC similarity). Cell membrane; Peripheral membrane protein (By CC similarity). CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane; CC anchor; Extracellular side (By similarity). CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=2;
|
运行结果:
flw@debian:~$ ./ttt.pl --- CC -!- FUNCTION: Rapidly . --- CC -!- CATALYTIC ACTIVITY: Acetylcholine. --- CC -!- SUBUNIT: Homotetramer; composed . --- CC Interacts with PRIMA1. --- CC anchor it to the basal --- CC (By similarity). *** CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By *** CC similarity). Cell membrane; Peripheral membrane protein (By *** CC similarity). *** CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane; *** CC anchor; Extracellular side (By similarity). --- CC -!- ALTERNATIVE PRODUCTS: --- CC Event=Alternative splicing; Named isoforms=2;
|
No2.
#!user/bin/perl
use strict; use warnings;
my @data = <DATA>; $_ = join '', @data;
my @t = /(SUBCELLULAR.*?)CC\s+-!-/msg;
print map {s/CC\s+//g; $_} @t;
__DATA__ CC -!- FUNCTION: Rapidly . CC -!- CATALYTIC ACTIVITY: Acetylcholine. CC -!- SUBUNIT: Homotetramer; composed . CC Interacts with PRIMA1. CC anchor it to the basal CC (By similarity). CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By CC similarity). Cell membrane; Peripheral membrane protein (By CC similarity). CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane; CC anchor; Extracellular side (By similarity). CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=2;
|
No3.
#! /bin/perl
use warnings; use strict;
my $key;
while(<DATA>){ if (/-!-/) { $key = 0; } if (/SUBCELLULAR LOCATION/) { print; $key = 1; next; } if ($key) { print; } }
__END__ CC -!- FUNCTION: Rapidly . CC -!- CATALYTIC ACTIVITY: Acetylcholine. CC -!- SUBUNIT: Homotetramer; composed . CC Interacts with PRIMA1. CC anchor it to the basal CC (By similarity). CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By CC similarity). Cell membrane; Peripheral membrane protein (By CC similarity). CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane; CC anchor; Extracellular side (By similarity). CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=2;
|
阅读(674) | 评论(0) | 转发(0) |