使用python提取人类基因组片段-blackjimmy-ChinaUnix博客

乱78遭blackjimmy.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

blackjimmy

博客访问： 381934
博文数量： 97
博客积分： 2846
博客等级：少校
技术积分： 1000
用户组：普通用户
注册时间： 2007-03-19 20:00

文章分类

全部博文（97）

创业（5）
c/c++（3）
技巧（37）
biopython（1）
linux（10）
python（35）
未分配的博文（6）

文章存档

2017年（1）

2013年（2）

2012年（6）

2011年（17）

2010年（12）

2009年（41）

2007年（18）

我的朋友

相关博文

使用python提取人类基因组片段

分类： Python/Ruby

2009-04-04 14:49:59

参考论坛讨论：

根据大家的讨论，我改了下程序，运行的时间大大缩短。原来提取一条序列大概1秒，现在提取10万条也要不了1分钟。呵呵！看来有时候python也是相当的快啊。只要用好了函数和有好的思想。

下面奉上我的程序，请指教：

import os def get_oneseq(chr, start, end): # chr1, 23, 65 f = open(chr+'.fa', 'r') head = f.readline() # >chr1\n firstseqline = f.readline() # taaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta\n offset = len(head) # 6 linelen = len(firstseqline) # 51 startpos = offset + start % (linelen-1) + (start/(linelen-1))*linelen-1 endpos = offset + end % (linelen-1) + (end/(linelen-1))*linelen-1 f.seek(startpos) seq = f.read(endpos-startpos+1) sequence = seq.replace(os.linesep,'') return sequence def parsebed(file,header = 0): f3 = open(bedfile,'r') f4 = open(file+'_seq.txt','w') if header == 1: newlines = f3.read().split(os.linesep)[1:-1] else: newlines = f3.read().split(os.linesep)[:-1] seq = [] for i,c in enumerate(newlines): chr = c.split()[0] start = c.split()[1] end = c.split()[2] name = i+1 sequence = get_oneseq(chr, int(start), int(end)) print >>f4, '>'+str(name) print >>f4, sequence f3.close() f4.close() if __name__ == '__main__': file = 'test.txt' parsebed(file,1)

阅读(2103) | 评论(0) | 转发(0) |

上一篇：核酸序列的反向互补函数

下一篇：使用SVN下载文件

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6