當前位置：博客 > 生物信息

PacBio數據組裝軟件Sprai的安裝及使用說明

時間：2018-10-19 | 閱讀量：6314

一、Sprai簡介

Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. It is originally designed for correcting sequencing errors in single-molecule DNA sequencing reads, especially in Continuous Long Reads (CLRs) generated by PacBio RS sequencers. The goal of Sprai is not maximizing the accuracy of error-corrected reads; instead, Sprai aims at maximizing the continuity (i.e., N50 contig length) of assembled contigs after error correction.

官網： http://zombie.cb.k.u-tokyo.ac.jp/sprai/README.html#introduction

二、安裝方法：

2.1 軟件需求

1. python 2.6 or newer

2. BLAST+ 2.2.27 or newer

3. Celera Assembler ver. 8.1 or newer (if you assemble reads after error-correction)

2.2 安裝方法：

2.2.1 CA 安裝過程：

CA 下載地址： https://sourceforge.net/projects/wgs-assembler/

bzip2 -dc wgs-8.3rc2.tar.bz2 | tar -xf -

cd wgs-8.3rc2

cd kmer

make install

cd ../src

make

cd ../..

2.2.2 安裝List-MoreUtils-0.415.tar.gz：

perl Makefile.PL

make

make install

2.2.3 安裝 Exporter-Tiny-0.042.tar.gz （注意需要先安裝該模塊，然后安裝下面的Statistics-Descriptive-3.0612.tar.gz模塊，才不會出錯）

tar -xzvf Exporter-Tiny-0.042.tar.gz

cd Exporter-Tiny-0.042/

perl Makefile.PL

make

make install

2.2.4 安裝 Statistics-Descriptive-3.0612.tar.gz

tar -xzvf Statistics-Descriptive-3.0612.tar.gz

cd Statistics-Descriptive-3.0612/

perl Build.PL

./Build

./Build test

./Build install

2.2.5 sprai安裝：

spri下載地址：http://zombie.cb.k.u-tokyo.ac.jp/sprai/Download.html

tar -xzvf sprai-0.9.9.17.tar.gz

cd sprai-0.9.9.17/

./waf configure

./waf build

./waf install

三、使用方法

3.1 輸入文件要求是subreads in FASTQ格式，如果文件是.bas.h5格式，則需要使用軟件bash5tools.py進行格式的轉換。PacBio GitHub (pbh5tools) 使用方法：

bash5tools.py --outFilePrefix example_output --readType subreads --outType fastq --minReadScore 0.75 example.bas.h5

如果是多個subreads，則需要將所有的文件合并成一個fastq文件作為輸入，注意輸入的fastq文件不能為壓縮文件。

3.2 創建一個文件夾 mkdir tmp; cd tmp ，并復制sprai路徑下的pbasm.spec和ec.spec文件到當前的路徑中

3.3 修改配置文件

（1） ec.spec是軟件Sprai的配置文件，根據實際情況修改該配置文件

#>- params -<#input_fastq all.fqestimated_genome_size 50000estimated_depth 100partition 12evalue 1e-50trim 42ca_path /path/to/your/wgs/Linux-amd64/bin/word_size 18

參數說明：

input_fastq is your input file name.

estimated_genome_size is the number of nucleotides of your target. If you do not know it, set large number. For example, set 1e+12.

estimated_depth is the depth of coverage of input_fastq of your target. If you do not know it, set 0.

partition is the number of processors Sprai uses.

evalue is used by blastn.

trim is the number of nucleotides Sprai cut from both sides of alignments.

ca_path is the path to your wgs-assembler (Celera Assembler) installed.

word_size is used by blastn.

（2） pbasm.spec 是組裝軟件Celera assembler的配置文件，如果僅做數據的糾錯，則不需要這個配置文件。該文件中設置組裝過程中所用到的一些參數，包括CPU使用個數等。

3.4 運行方法：

（1）數據糾錯及組裝

ezez_vx1.pl ec.spec pbasm.spec > log.txt 2>&1 &

（2）僅做數據糾錯

ezez_vx1.pl ec.spec -ec_only > log 2>&1 &

或者

ezez_vx1.pl ec.spec > log 2>&1 &

即可

（3）僅做組裝

ca_ikki_v5.pl pbasm.spec estimated_genome_size \ -d directory in which fin.idfq.gzs exist \ -ca_path /path/to/your/wgs/Linux-amd64/bin \ -sprai_path the path to get_top_20x_fa.pl installed

3.5 輸出文件

（1）第一步，數據糾錯，輸出一個result_yyyymmdd_hhmmss的文件夾，處理后結果文件名稱為c01.fin.idfq.gz

（2）第二步，組裝，輸出的config文件為./CA/9-terminator/asm.ctg.fasta

（3）組裝統計結果，在CA/do_*_c01.fin.top20x.log 文件中

四. 軟件安裝過程中所遇問題

4.1 找不到/usr/bin/time 命令

解決方法：

a. 修改軟件中的代碼，將/usr/bin/time 修改為time

4.2 軟件運行過程中報"set Illegal option -o pipefail"

解決方法：

查看 sh調用的是什么，如果不是/bin/bash，則需要進行第二步的修改

（1）$ls -al /bin/sh

（2）直接修改 /bin/sh 鏈接文件，將其指定到 /bin/bash：

$sudo ln -fs /bin/bash /bin/sh

上一篇：全基因組甲基化軟件bismark安裝及使用說明下一篇：SPAdes安裝及使用說明

微信 QQ 微博

热久久免费视频,神波多一花番号,久久国产影视,调教喷奶水h文

PacBio數據組裝軟件Sprai的安裝及使用說明