Do assembly based on OLC algorithm (program newbler).
First, count the number of sequences in each library for the overview.
The first assembly should only be done with unpaired and short paired-end libraries, and the second assembly should include 3k mate-pairs.
Why? This will help you verify how small a portion of the data is made up of mate-pair libraries, and yet, how significantly they improve the assembly.
Perform the assembly (without .sff 3k mate-pairs) using the de Bruijn algorithm (program SOAP-denovo).
Why? This will help you understand the differences in results between the two types of algorithms.
Select a very long scaffold from the best assembly and map/align the contigs (scaffolds) from the other two assemblies to it using BLAST or FASTA. (Blast is installed on the VM, check the blastn --outfmt switch.) Show the result into a clear image/slide/table. Why? Visualize the differences between the approaches and understand the significance of the datasets.
The data contains three libraries: paired-end from Illumina and two mate-pair from 454. The Illumina dataset is preprocessed: during quality control, some reads were removed, and the second paired sequence appears as single-end. To reduce computational complexity, we combined paired-end pairs that were so close to each other that they overlapped (extendedFrags), turning them into single-end reads as well. The remaining reads stayed paired (notCombined).
files: (GAA2024/E8_data)
trim_database.fna
single end (second pair is not there), MiSeq:
Bcc7419-MiSeq-A895A-PE_1_U.fastq
Bcc7419-MiSeq-A895A-PE_2_U.fastq
single end (joined paired-end), MiSeq:
Bcc7419-MiSeq-A895A-PE_12_JOIN_P.extendedFrags.fastq
paired-end (paired-end without overlap), 600bp, MiSeq:
Bcc7419-MiSeq-A895A-PE_12_JOIN_P.notCombined_1.fastq
Bcc7419-MiSeq-A895A-PE_12_JOIN_P.notCombined_2.fastq
mate pairs, 3kbp, 454:
Bcc7419-454-HB0RHHA02-PE_3k-UNIQ.sff
Bcc7419-454-HAV0LKU05-PE_3k-UNIQ.sff
hints for OLC:
/mnt/shared/454/bin/runAssembly --version
/mnt/shared/454/bin/newAssembly $NAME
/mnt/shared/454/bin/addRun -lib SE_Illu -np $NAME /mnt/shared/GAA2024/E8_data/Bcc7419-MiSeq-A895A-PE_12_JOIN_P.extendedFrags.fastq
...
/mnt/shared/454/bin/addRun -lib PE_3k -p $NAME /mnt/shared/GAA2024/E8_data/Bcc7419-454-HAV0LKU05-PE_3k-UNIQ.sff
...
/mnt/shared/454/bin/runProject $NAME
less $NAME/assembly/454AssemblyProject.xml
hints pro deBruin: check soap.config in the folder
SOAPdenovo-63mer all -s soap.config -o $NAME