Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-1 Praktisch BLASTen & BLAST-Outputs praktisch BLASTen & BLAST-Outputs.

Similar presentations


Presentation on theme: "Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-1 Praktisch BLASTen & BLAST-Outputs praktisch BLASTen & BLAST-Outputs."— Presentation transcript:

1 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-1 Praktisch BLASTen & BLAST-Outputs praktisch BLASTen & BLAST-Outputs

2 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-2 Praktisch BLASTen & BLAST-Outputs Ein praktisches Beispiel ATGCTG TGGCAG CGTGCA GTCCAG TCTCGT ACTGCAT 1.506 kartierte Gersten-Gene 2.869.704 annotierte Proteine BlastX Ergebnis: 905 Annotation Laufzeit: 17,5 h

3 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-3 Praktisch BLASTen & BLAST-Outputs Lösung: Verteilung der Analysen

4 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-4 Praktisch BLASTen & BLAST-Outputs IPK Cluster BROCKEN Ergebnis: 905 Annotation 72 Nodes -> Laufzeit: 16 min

5 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-5 Praktisch BLASTen & BLAST-Outputs CEF: Cluster Execution Framework #!/bin/bash projdir=/data/pdw-16/agbi/projects/ #split query file python2.3 /data/pdw-20/python_scripts/splitFas2.py -i Clones.fasta -o $projdir -n 500 blast_db=$projdir/wheat_consensus.txt mergescript=$projdir/domerge.sh echo "#!/bin/sh" > $mergescript echo "cat \\" >> $mergescript z=0 for i in split/* do script_file=$projdir/script/blastjob_$$_$z.sh result_file=$projdir/result/blastresult_$$_$z.txt log_file=$projdir/log/joblog_$$_$z echo "#!/bin/sh" > $script_file #echo "cd $projdir" >> $script_file echo "/usr/bin/blastall -i $projdir/$i -p blastn -d $blast_db -m0 -e 1E-10 -v 10 -b 10 -o $result_file" >> $script_file echo "$result_file \\" >> $mergescript qsub -o $log_file.out -e $log_file.err -q long $script_file echo "qsub -o $log_file.out -e $log_file.err -q long $script_file" z=`expr $z + 1` done echo ">final_result.txt" >> $mergescript echo "rm log/* script/* " >> $mergescript file server /data/pdw-16/ file server /data/pdw-20/ Metadata about Tools (NCBI BLAST, Spidey, …) Tool parameters (-i FASTA-query, …) Files (FASTA, blastable, …) Jobs/sub jobs (progress, finished, …) … master/head node pdw-22 22 nodes CEF GUI CEF SOAP Web Services

6 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-6 Praktisch BLASTen & BLAST-Outputs CEF: APEX GUI

7 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-7 Praktisch BLASTen & BLAST-Outputs Eingabe EST-Sequenz >HY01A03T GAATTCGGCACCAGAGTGAGCACGCAAGCCAGTGTTTGTAGCCAGCAGCCACAATGGCCGGGAACATGCT AGCCAACTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGCGTCGACAACAAGTTCGAGAAG GGCGACGAGATCAGGGCGCAGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATAGACGTCT GGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTACAAGCAGGTCTTCGACCT GGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTCCACCAGTGCGGTGGCAACGTCGGCGAC GTAGTCAACATCCCCATCCCACAGTGGGTGCGGGATGTCGGCGCTACCGACCCCGACATTTTCTACACGA ACCGCAGAGGGACGAGGAACATCGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAG AACTGCCGTCCAGATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCC GGTACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTATCCTCAGA GCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTACCTGGAAGCAGACTTCAA

8 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-8 Praktisch BLASTen & BLAST-Outputs BlastN-Resultat >HY01A03T Length = 700 Plus Strand HSPs: Score = 2595 (395.4 bits), Expect = 3.0e-112, P = 3.0e-112 Identities = 573/618 (92%), Positives = 573/618 (92%), Strand = Plus / Plus Query: 77 CTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGC--GT-CGACAACAAGTT 133 ||| ||| | | || | | | | || || |||| | | || ||| || Sbjct: 89 CTACGTC-ATG-CTCCCGCTGGATGTCG-TGAGCGTCGACAACAAGTTCGAGAAGGGCGA 145 Query: 134 CGAGA--AGGGCGACGAGATCAGGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 191 ||||| |||||| | || | | ||||||||||||||||||||||||||||||||||||| Sbjct: 146 CGAGATCAGGGCG-C-AGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 203 Query: 192 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 251 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 204 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 263 Query: 252 AAGCAGGTCTTCGACCTGGTACACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 311 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct: 264 AAGCAGGTCTTCGACCTGGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 323 Query: 312 CACCCCGTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 371 |||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 324 CACCA-GTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 382 Query: 372 GGATGTCGGCGCTACCGACCCCGACATTTTCCACACGAACCTCAGAGGGACGAGGAACAT 431 ||||||||||||||||||||||||||||||| ||||||||| |||||||||||||||||| Sbjct: 383 GGATGTCGGCGCTACCGACCCCGACATTTTCTACACGAACCGCAGAGGGACGAGGAACAT 442 Query: 432 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 491 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 443 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 502 Query: 492 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 551 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 503 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 562 Query: 552 TACCATCGTGGACA---A-GTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 607 |||||||||||||| | ||||||||||||||||||||||||||||||||||||||||| Sbjct: 563 TACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 622 Query: 608 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 667 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 623 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 682 Query: 668 CCTGGAAGCAGACTTCAA 685 |||||||||||||||||| Sbjct: 683 CCTGGAAGCAGACTTCAA 700

9 Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-9 Praktisch BLASTen & BLAST-Outputs BlastX-Resultat >dbj|BAC83773.1| Gene info putative beta-amylase [Oryza sativa (japonica cultivar- group)] gb|EAZ40178.1| hypothetical protein OsJ_023661 [Oryza sativa (japonica cultivar-group)] Length=488 Score = 403 bits (1036), Expect = 4e-111 Identities = 191/215 (88%), Positives = 200/215 (93%), Gaps = 0/215 (0%) Frame = +3 Query 54 MAGNMLANYVQVYVMLPLDVVSVDNKFEKGDEIRAQLKKLTEAGVDGVMIDVWWGLVEGK 233 MAGN+LANYVQV VMLPLDVV+VDNKFEK DE RAQLKKLTEAGVDGVM+DVWWGLVEGK Sbjct 1 MAGNLLANYVQVNVMLPLDVVTVDNKFEKVDETRAQLKKLTEAGVDGVMVDVWWGLVEGK 60 Query 234 GPKAYDWSAYKQVFDLVHEARLKLQAIMSFHQCGGNVGDVVNIPIPQWVRDVGATDPDIF 413 GP +YDW AYKQ+F LV EA LKLQAIMSFHQCGGNVGD+VNIPIPQWVRDVGA+DPDIF Sbjct 61 GPGSYDWEAYKQLFRLVQEAGLKLQAIMSFHQCGGNVGDIVNIPIPQWVRDVGASDPDIF 120 Query 414 YTNRRGTRNIEYLTLGVDDQPLFHGRTAVQMYHDYMASFRENMKKFLDAGTIVDIEVGLG 593 YTNR G RNIEYLTLGVDDQPLFHGRTA+QMY DYM SFRENM +FLD G IVDIEVGLG Sbjct 121 YTNRGGARNIEYLTLGVDDQPLFHGRTAIQMYADYMKSFRENMAEFLDTGVIVDIEVGLG 180 Query 594 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 698 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF Sbjct 181 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 215


Download ppt "Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-1 Praktisch BLASTen & BLAST-Outputs praktisch BLASTen & BLAST-Outputs."

Similar presentations


Ads by Google