Chinaunix首页 | 论坛 | 博客
  • 博客访问: 48827
  • 博文数量: 19
  • 博客积分: 1504
  • 博客等级: 上尉
  • 技术积分: 190
  • 用 户 组: 普通用户
  • 注册时间: 2008-10-06 14:21
个人简介

系统运维

文章分类
文章存档

2013年(6)

2011年(2)

2009年(4)

2008年(7)

我的朋友

分类:

2009-03-19 14:16:04

如这样的文件:
CC   -!- FUNCTION: Rapidly .
CC   -!- CATALYTIC ACTIVITY: Acetylcholine.
CC   -!- SUBUNIT: Homotetramer; composed .
CC       Interacts with PRIMA1.
CC       anchor it to the basal
CC       (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC       similarity). Cell membrane; Peripheral membrane protein (By
CC       similarity).
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC       anchor; Extracellular side (By similarity).
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=2;
我要提取其中以SUBCELLULAR LOCATION开头的那一小段文件,如下:
SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
  similarity). Cell membrane; Peripheral membrane protein (By
  similarity).
SUBCELLULAR LOCATION: Isoform 2: Cell membrane; Lipid-anchor, GPI-
   anchor; Extracellular side (By similarity).
 
NO1.
下面给出这一类问题的通用解决办法。

这是面向行处理的一种轻量级解决方法。
比那些对整个文件进行模式匹配的方法不知优雅了要多少倍。

$start 表示开始标记的模式,$end 表示结束标记的模式,
if ( (/$start/ .. /$end/) and !/$end/ ){
表示需要开始和结束之间的,但不需要结束的那一行。

#! /usr/bin/env perl


my $start = qr/^CC\s+-!- SUBCELLULAR LOCATION/;
my $end = qr/^CC\s+-!- (?!SUBCELLULAR LOCATION)/;

while(<DATA>){
    if ( (/$start/ .. /$end/) and !/$end/ ){
        print "*** $_";
    }
    else{
        print "--- $_";
    }
}
__END__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

运行结果:

flw@debian:~$ ./ttt.pl
--- CC -!- FUNCTION: Rapidly .
--- CC -!- CATALYTIC ACTIVITY: Acetylcholine.
--- CC -!- SUBUNIT: Homotetramer; composed .
--- CC Interacts with PRIMA1.
--- CC anchor it to the basal
--- CC (By similarity).
*** CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
*** CC similarity). Cell membrane; Peripheral membrane protein (By
*** CC similarity).
*** CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
*** CC anchor; Extracellular side (By similarity).
--- CC -!- ALTERNATIVE PRODUCTS:
--- CC Event=Alternative splicing; Named isoforms=2;

No2.

 

#!user/bin/perl


use strict;
use warnings;

my @data = <DATA>;
$_ = join '', @data;

my @t = /(SUBCELLULAR.*?)CC\s+-!-/msg;

print map {s/CC\s+//g; $_} @t;

__DATA__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

No3.

 

#! /bin/perl


use warnings;
use strict;

my $key;

while(<DATA>){
    if (/-!-/) {
        $key = 0;
    }
    if (/SUBCELLULAR LOCATION/) {
        print;
        $key = 1;
        next;
    }
    if ($key) {
        print;
    }
}

__END__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

 

阅读(674) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~