Presentation is loading. Please wait.

Presentation is loading. Please wait.

VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015.

Similar presentations


Presentation on theme: "VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015."— Presentation transcript:

1 VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015

2 What is PopBio? Flexible database for sample and assay metadata for field- or lab-derived population biology data. ●collection event & location (GeoData) ●basic sample information ●assays o species identification o phenotypes (host species [e.g. from blood meal], insecticide resistance,...) o genotypes o manipulations (sampleA+sampleB->sampleC)

3 What is it for? Allows integration of individual studies (e.g. insecticide resistance studies conducted in individual countries). Enables meta-analysis of community data.

4 Data sources Legacy: IRbase UC Davis/UCLA (but updates planned) Recent: Bulk imports (e.g. Malaria Atlas Project surveillance data) Publications (typically with extra data direct from authors) MalariaGen & 16 Anopheles Other unpublished/in progress

5 Future data sources ICEMRs National/international IR surveillance MalariaGen Partners (Vestergaard, Oxford University MAP) Smaller published and unpublished datasets

6 Data model GMOD Chado schema Heavy reliance on CVs/ontologies → flexibility → computability Vastly oversimplified explanation of schema: Projects have samples have assays have results

7 Ontologies VectorBase ontologies: insecticide resistance, malaria, dengue & anatomy Third party ontologies: sample properties, genomic variation types, placenames, phenotypic qualities

8 Curation and data import ISA-Tab spreadsheet format Investigation - Study - Assay Widely used for 'omics metadata Ontology-based annotation is well supported Ontology term suggestion tools available in Google Spreadsheets Challenges ●consistent representation of data and choice of ontology terms by curator(s) through time ●too complex for casual submitters ISA-Tab's Study and its associated list of samples maps to PopBio's project and samples, while Assay maps to… assay! High level "object relational mapper" Perl API handles storage into and retrieval from Chado database for consistency and maintainability. Example: a sample may have several species identification assays. Our API provides a method for the sample object which returns the best single species term to summarise those results.

9 Updating existing data 1.Edit ISA-Tab, delete project and reload project from new ISA-Tab (stable IDs for project, samples and assays are retained) 2.Edit ISA-Tab but apply simple SQL updates or an API script to modify the database (as delete+reload can be slow) No database → ISA-Tab route at present.

10 Scalability (storage + maintenance) Current size: 121 projects, 57, 637 samples, 172, 636 assays (of which 4, 387 are IR) API overhead ⇒ some tasks take overnight ●loading for 1000+ sample datasets ●search index generation No issues yet with maintenance (e.g. backup and transfer of databases

11 Scalability (web-based retrieval) "Dumb" API-based retrieval for "smart" web client (see next slide) is too slow on its own. Currently using pre-filled RAM-based cache to speed up API requests for web-users. Not necessarily scalable. Still not very fast! See future plans...

12 {"sample_manipulations":[], "name":"G05-2019", "species_identification_assays":[{"result_summary":" Anopheles arabiensis (PCR-based species identification)", "name":"G05-2019.species", "description":null, "props":[{"cvterms":[{"name":"species assay result", "accession":"VBcv:0000961"}, {"name":"Anopheles arabiensis", "accession":"VBsp:0002224"}]}], "protocols":[{"props":[], "name":"VBA0046035:PROTO2", "type":{"name":"PCR-based species identification", "accession":"MIRO:30000040"}, "description":"Mosquito DNA was extracted from the carcass and identified to species and molecular form using rDNA-based PCR assays.", "uri":""}], "performers":[], "id":"VBA0046035", "type":"species identification assay"}], "species":{"name":"Anopheles arabiensis", "accession":"VBsp:0002224"}, "description":null, "genotype_assays":[{"result_summary":"inversion: 2La/a; inversion: 2Rjb/b (cytological chromosome examination)", "genome_browser_path":null, "name":"G05-2019.karyotyping", "description":null, "genotypes":[{"uniquename":"VBA0046036:2La/a", "props":[{"value":"2La/a", "cvterms":[{"name":"inversion", "accession":"SO:1000036"}]}, {"value":"2L", "cvterms":[{"name":"chromosome_arm", "accession":"SO:0000105"}]}], "name":"2La/a", "type":{"name":"paracentric_inversion", "accession":"SO:1000047"}, "description":"inversion: 2La/a"}, {"uniquename":"VBA0046036:2Rjb/b", "props":[{"value":"2Rjb/b", "cvterms":[{"name":"inversion", "accession":"SO:1000036"}]}, {"value":"2R", "cvterms":[{"name":"chromosome_arm", "accession":"SO:0000105"}]}], "name":"2Rjb/b", "type":{"name":"paracentric_inversion", "accession":"SO:1000047"}, "description":"inversion: 2Rjb/b"}], "vcf_file":null, "props":[], "protocols":[{"props":[{"value":"microscope manufacturer: Olympus", "cvterms":[{"name":"protocol component", "accession":"VBcv:autocreated:protocol component"}]}, {"cvterms":[{"name":"protocol component", "accession":"VBcv:autocreated:protocol component"}, {"name":"Giemsa staining", "accession":"IDOMAL:0000552"}]}], "name":"VBA0046036:PROTO3", "type":{"name":"cytological chromosome examination", "accession":"MIRO:30000037"}, "description":"Ovaries were prepared for karyotype analysis according to standard procedures. The banding pattern was observed under a phase-contrast microscope (400×) and interpreted with reference to the chromosomal map and nomenclature of Coluzzi and colleagues. ", "uri":""}], "performers":[], "type":"genotype assay", "id":"VBA0046036"}], "props":[{"cvterms":[{"name":"sex", "accession":"EFO:0000695"}, {"name":"female", "accession":"PATO:0000383"}]}, {"cvterms":[{"name":"developmental stage", "accession":"EFO:0000399"}, {"name":"adult", "accession":"IDOMAL:0000655"}]}], "field_collections":[{"result_summary":"Burkina Faso (pyrethrum spray catch)", "name":"G05-2019.collect", "description":null, "geolocation":{"longitude":"-0.05727", "props":[{"cvterms":[{"name":"collection site", "accession":"VBcv:0000831"}, {"name":"Burkina Faso", "accession":"GAZ:00000905"}]}, {"value":"Bonsse", "cvterms":[{"name":"location", "accession":"VBcv:0000698"}]}, {"value":"Burkina Faso", "cvterms":[{"name":"country", "accession":"VBcv:0000701"}]}], "latitude":"12.1693", "geodetic_datum":"WGS 84", "name":"Burkina Faso", "altitude":null}, "props":[{"value":"2005-08-02", "cvterms":[{"name":"date", "accession":"VBcv:0000705"}]}], "protocols":[{"props":[], "name":"VBA0046034:PROTO1", "type":{"name":"pyrethrum spray catch", "accession":"MIRO:30000023"}, "description":"Freshly-fed female An. gambiae s.l. were collected in the morning while resting inside human dwellings by manual aspiration with the aid of electrical aspirators. Mosquitoes were kept in small cages wrapped in wet towels and stored inside cool boxes. Additionally, indoor insecticide space-sprays were carried out in the early afternoon.", "uri":"\n"}], "performers":[], "type":"field collection", "id":"VBA0046034"}], "species_qualifications":[{"name":"unambiguous", "accession":"VBcv:autocreated:unambiguous"}], "type":{"name":"individual", "accession":"EFO:0000542"}, "id":"VBS0015615", "phenotype_assays":[]}

13 Web interface PopBio browser: https://www.vectorbase.org/popbio/ A good example project page: https://www.vectorbase.org/popbio/project/?id=VBP0000010 New entry page currently in development: http://funcgen.vectorbase.org/popbio-map-preview/vb_geohashes_mean.html

14 Web interface Plan to develop or modify something similar to MalariaGen's Panoptes with richer/more flexible metadata capabilities:

15 Plans Map interface: delivery for June (VB-2015-06) release and present/demo at Kolymbari, ICEMR meetings Spreadsheet submission wizard development scheduled for Fall 2015. Year 2: Sample x genotype browser development, including e! REST and variation Solr work. Year 2: Refactor project pages with scalable (but still flexible) data transfer (probably also Solr-driven) & update graphics.


Download ppt "VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015."

Similar presentations


Ads by Google