Presentation is loading. Please wait.

Presentation is loading. Please wait.

Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Similar presentations


Presentation on theme: "Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,"— Presentation transcript:

1 Submitting Sequenced BACs Robert Buels SGN

2 Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project, including:  finished BAC sequences  chromatograms  assembly files  annotations

3 BAC Submission Guidelines A comprehensive BAC submission format exists (see handout, pp. 4-5). If we all follow this format, the data will be much easier to work with.

4 Problems Not all submissions follow the submission guidelines. Irregularities in naming, directory structure, submission file contents. FTP site does not separate finished and unfinished sequences. Result: data is harder to work with than it should be!

5 Problems To address these problems: We improved the directory structure of the FTP site. Corrected existing submissions to follow submission guidelines. We made a new submission system that checks submission guidelines.

6 New FTP Structure tomato_genome/bacs/ chr01/ chr02/... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl

7 New FTP Structure tomato_genome/bacs/ chr01/ chr02/... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl all seqs

8 New FTP Structure tomato_genome/bacs/ chr01/ chr02/... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl all seqsfinished seqs

9 New FTP Structure tomato_genome/bacs/ chr01/ chr02/... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl all seqsfinished seqsvalidation script

10 New FTP Structure tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz...

11 New FTP Structure tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz... separate finished and unfinished

12 New FTP Structure tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz... separate finished and unfinished submission

13 New FTP Structure tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz... separate finished and unfinished submissionsequence

14 New FTP Structure tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz... separate finished and unfinished annotationssubmissionsequence

15 New Submission System Developed to automatically check submission formatting, analyze submissions, and publish them on the FTP site. Also improves turn-around time since it is mostly automatic.

16 New Submission System checks submission guidelines C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

17 New Submission System checks submission guidelines C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT... submission is tar.gz file, named for BAC

18 New Submission System checks submission guidelines C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT... submission is tar.gz file, named for BAC self-named subdirectory

19 New Submission System checks submission guidelines C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT... submission is tar.gz file, named for BAC self-named subdirectory self-named sequence file

20 New Submission System checks submission guidelines C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT... submission is tar.gz file, named for BAC self-named subdirectory self-named sequence file self-named sequence

21 validate_submission.pl Checks the formatting of your submission. Avoids having to retry uploads to correct formatting. ~$./validate_submission.pl C01HBa0001A01.tar.gz C01HBa0088L02.tar.gz passed. ~$ On your machine...

22 Submission Process 1.Use validate_submission.pl to verify that your submission is properly formatted. 2.Upload with scp to upload.sgn.cornell.edu, using your login and password. 3.When finished, the submission cron job will notice it and process it.

23 Submission Process 4.If properly formatted, you will see it on the FTP site within 3 - 4 hours. 5.Otherwise, you will receive an email asking you to correct the format and retry the submission.

24 Questions? Suggestions for improvement of FTP site? Suggestions for submission process?


Download ppt "Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,"

Similar presentations


Ads by Google