fastq and fasta-jhh130910-ChinaUnix博客

个人简介

安徽屯溪，哈尔滨的雪地，扬州的瘦西湖，想必知道我是谁了吧！！对，小金思密达

文章分类

文章存档

2014年（29）

我的朋友

相关博文

fastq and fasta

分类：大数据

2014-10-04 21:01:05

通常fastq序列文件用四行描述一个序列

Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description.
Line 2 is the raw sequence letters.
Line 3 begins with a '+' character and is optionally followed by the same sequence identifier.
Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.The character '!' represents the lowest quality while '~' is the highest.
fastq转化为fasta （可以用fastx-tools），但是通常一个awk命令解决，多好！例如：
awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > output_file.fa
也可以上游处理之后，管道一下，交给下游
zcat input_file.fastq.gz 或者 gzip -d input_file.*.gz | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > output_file.fa

阅读(880) | 评论(0) | 转发(0) |

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们