Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sept 2008 Ensembl Funcgen Perl API Nathan Johnson EBI - Wellcome Trust Genome Campus, UK Funcgen.

Similar presentations


Presentation on theme: "Sept 2008 Ensembl Funcgen Perl API Nathan Johnson EBI - Wellcome Trust Genome Campus, UK Funcgen."— Presentation transcript:

1 Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

2 Sept 2008 What is Ensembl Funcgen/eFG? A local data storage and analysis platform OR A Ensembl functional genomics database providing epigenomic and regulatory annotations OR Both

3 Sept 2008 eFG Dataflow Experimental Data Export API Tab2MAGE MAGE-ML Analysis Pipeline Annotated Features DAS FuncGen DB Import API Web API GFF

4 Sept 2008 eFG data Experimental Processed Peak Calls e.g. Mpeak, TileMap, ChIPOTLE, Nessie Combinatorial analysis e.g Regulatory Build Externally curated e.g cisRED, MiRanda, Vista Experimental meta data Raw & Normalised data Technology Arrays/Chips/Probes e.g. Tiling arrays Short reads e.g Solexa, SOLiD etc

5 Sept 2008 eFG data Ensembl v50 July '08: >60 data sets (ChIP-chip, wiggle, bed, custom)‏ 3 species 9 cell types 24 Histone modifications, DHSS, CTCF, RNAPoLII … Regulatory Build v3: Gene Associated1584 Gene Associated - Cell type specific5614 Non-Gene Associated 799 Non-Gene Associated - Cell type specific520 Promoter Associated 12022 Promoter Associated - Cell type specific 1619 Unclassified 24814 Unclassified - Cell type specific 127633

6 Sept 2008 eFG Display Methylation data CTCF Data Regulatory Features cisRED miRanda Vista

7 Sept 2008 How eFG fits in. ensembl-functgenomics API -Object Oriented PERL -Follows Object ObjectAdaptor paradigm Fully integrated with wider Ensembl family of MySQL DBs Multi-Assembly: eFG stores a registry of core coordinate information which allows data to be stored using different core DBs and different genome assemblies. Minimal maintenance: Designed to aid incremental updates to local installations. Patch and update rather than blow away and recreate. Fully automated data import API and analysis pipeline

8 Sept 2008 ArrayExperimental Features Sets eFG Schema

9 Sept 2008  Features: Probe > Annotated; External > Regulatory.  Sets - An abstract concept for manipulation of data collections:  Logical association/combination  Access and administration  Supporting/Product  Set classes:  ResultSet - Chips/Channels > Replicates  ExperimentalSet - Feature only import.  FeatureSet - e.g. Peak calls > AnnotatedFeatures  DataSet - Combines supporting Sets and product FeatureSet Features & Sets

10 Sept 2008 eFG data flow 1... 2.. 3.. HitList Data Raw External DB Export API GFF DataSet3 ResultSet3 ResultSet2 ResultSet1 DataSet2 ResultSet3 ResultSet2 ResultSet1 DataSet1 SupportingSet2 SupportingSet1 Result Product FeatureSet Experimental Combined FeatureSet SupportingSet2 DataSet4 Feature SupportingSet1 Feature External

11 Sept 2008 Technology data Array: A definitive collection of chips. name(), format(), vendor(), description(), type(). fetch_by_name_vendor(), fetch_all_by_type(). ArrayChip: an individual chip in an array collection. name(), design_id(). fetch_all_by_array_design_ids, fetch_all_by_array_id(), fetch_all_by_ExperimentalChip. Probe: a unique probe sequence within a given array or set of arrays. name(), class(), length(). fetch_all_by_Array, fetch_all_by_ArrayChip(), fetch_all_by_array_probe_probeset_name(). ProbeFeature: an alignment of a Probe against the genome. start(), end(), strand(), mismatches(), cigarline(), analysis(). fetch_all_by_Probe, fetch_all_by_Slice_ExperimentalChips().

12 Sept 2008 DBAdaptor example code use strict; use Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor; use Bio::EnsEMBL::DBSQL::DBAdaptor; my $dna_db = Bio::EnsEMBL::DBSQL::DBAdaptor->new ( -user => ‘anonymous’, -host => ‘ensembldb.ensembl.org’, -species => ‘Homo_sapiens’, -dbname => ‘homo_sapiens_core_37_35j’, -group => ‘core’, ); my $efg_db = Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor->new ( -user => ‘anonymous’, -host => ‘ensembldb.ensembl.org’, -species => ‘Homo_sapiens’, -dbname => ‘homo_sapiens_fungen_48_36j’, -group => ‘funcgen’, -dnadb => $dnadb, );

13 Sept 2008 Array example code use strict; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db ( -host=> ‘ensembldb.ensembl.org’, -user => ‘anonymous’, ); my $efg_db = $reg->get_DBadaptor(‘Human’, ‘funcgen’); my $array_adaptor = $efg_db->get_ArrayAdaptor; my @arrays = @{$array_adaptor->fetch_all }; foreach my $array(@arrays){ print "\nArray:\t".$array->name."\n"; print "Type:\t".$array->type."\n"; print "Vendor:\t".$array->vendor."\n"; } Array: 2005-05-10_HG17Tiling_Set Type: OLIGO Vendor: NIMBLEGEN Array: ENCODE3.1.1 Type: PCR Vendor: SANGER

14 Sept 2008 ArrayChip example code my $array = $array_adaptor->fetch_by_name_vendor ('2005-05-10_HG17Tiling_Set', 'NIMBLEGEN’); my @achips = @{ $array->get_ArrayChips }; foreach my $ac(@achips){ print "ArrayChip:".$ac->name."\tDesignID:". $ac->design_id."\n"; } ArrayChip:2005-05-10_HG17Tiling_Set31 DesignID:2061 ArrayChip:2005-05-10_HG17Tiling_Set24 DesignID:2054 ArrayChip:2005-05-10_HG17Tiling_Set12 DesignID:2042 ArrayChip:2005-05-10_HG17Tiling_Set03 DesignID:2033 ArrayChip:2005-05-10_HG17Tiling_Set04 DesignID:2034 ArrayChip:2005-05-10_HG17Tiling_Set29 DesignID:2059 ArrayChip:2005-05-10_HG17Tiling_Set13 DesignID:2043 ArrayChip:2005-05-10_HG17Tiling_Set34 DesignID:2064 ArrayChip:2005-05-10_HG17Tiling_Set07 DesignID:2037 ArrayChip:2005-05-10_HG17Tiling_Set17 DesignID:2047 ArrayChip:2005-05-10_HG17Tiling_Set23 DesignID:2053 ArrayChip:2005-05-10_HG17Tiling_Set36 DesignID:2066 ArrayChip:2005-05-10_HG17Tiling_Set08 DesignID:2038

15 Sept 2008 Probe example code my $probe_adaptor = $efg_db->get_ProbeAdaptor; my $pfeature_adaptor = $efg_db->get_ProbeFeatureAdaptor; my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name ('2005-05-10_HG17Tiling_Set', 'chr22P38797630’); print "Got ".$probe->class." probe ".$probe->get_probename."\n"; my @pfeatures = @{$pfeature_adaptor->fetch_all_by_Probe($probe) }; print "Found ".scalar(@pfeatures)." ProbeFeatures\n"; foreach my $pfeature(@pfeatures){ print "ProbeFeature found at:\t".$pfeature->feature_Slice->name."\n"; } Got EXPERIMENTAL probe chr22P38797630 Found 1 ProbeFeatures ProbeFeature found at: chromosome:NCBI36:22:38803076:38803125:1

16 Sept 2008 ExperimentalData1 Experiment provides a natural containers for experimetnal meta. name(), group(), mage_xml(), primary_design_type(), description(), get_ExperimentalChips(). fetch_by_name(), fetch_all_by_group(), get_all_experiment_names(). ExperimentalChip represents a unique physical instance of an ArrayChip. unique_id(), cell_type(), feature_type(), biological_replicate(), technical_replicate(). fetch_all_by_experiment(), fetch_by_unique_id_vendor(). Channel represents a control or experimental channel from and ExperimentalChip. dye(), type(), sample_id(). fetch_all_by_ExperimentalChip(), fetch_all_type_experimental_chip_id().

17 Sept 2008 ExperimentalData1 example code my $exp_adaptor = $efg_db->get_ExperimentAdaptor; my $exp = $exp_adaptor->fetch_by_name(‘ctcf_ren’); my $num_chips = scalar(@{$exp->get_ExperimentalChips }); print $exp->name.' '.$exp->primary_design_type. " experiment contains $num_chips ExperimentalChips\n"; ctcf_ren binding_site_identification experiment contains 36 ExperimentalChips

18 Sept 2008 ExperimentalData2 ResultSet provides easy access to discrete sets of experimental data e.g replicates. name(), cell_type(), feature_type(), display_label(), get_ExperimentalChips(), get_ResultFeatures_by_Slice(). fetch_all_by_name(), fetch_all_by_name_Analysis(), fetch_all_by_FeatureType(), fetch_all_by_Experiment(). ResultFeature is a special lightweight Feature optimised for display and analysis purposes. start(), end(), score(). ResultSet::get_ResultFeatures_by_Slice().

19 Sept 2008 ExperimentalData2 example code my $resultset_adaptor = $efg_db->get_ResultSetAdaptor; my $slice_adaptor = $efg_db->get_SliceAdaptor; my ($result_set) = @{$resultset_adaptor-> fetch_all_by_name(‘ctcf_ren_BR1’) }; my $slice = $slice_adaptor->fetch_by_region(‘chromosome’,‘X’); my @result_features = @{$result_set->get_ResultFeatures_by_Slice($slice)}; print "Chromosome X has ".scalar(@result_features). " results\n"; foreach my $rf(@result_features){ print "Locus:\t".$rf->start.'-'.$rf->end. "\tScore:".$rf->score."\n"; } Chromosome X has 582133 results Locus: 429-478 Score:-0.1095 Locus: 529-578 Score:-0.1155 Locus: 629-678 Score:0.0135 Locus: 729-778 Score:-0.1735 Locus: 829-878 Score:0.256

20 Sept 2008 More Sets Experimental(Sub)Set are a special placeholder sets which facilitate feature import without any underlying data. name(), cell_type(), feature_type(), format(), get_subsets(), ExperimentalSubSet->name(). fetch_by_name(), fetch_all_by_Experiment(), fetch_all_by_CellType(), fetch_all_by_FeatureType(). FeatureSet is generic set for containing features of various types e.g. AnnotatedFeatures, ExternalFeatures, RegulatoryFeatures. name(), cell_type(), feature_type(), analysis(), get_Feature_by_Slice(). fetch_by_name(), fetch_all_by_type(), fetch_all_by_CellType, fetch_all_by_FeatureType().

21 Sept 2008 More Sets DataSet is the top level container which associates underlying data or ‘supporting sets’ and a product FeatureSet i.e. the results of an analysis based on the underlying data. Supporting sets can be any other type of ‘Set’. name(), cell_type(), feature_type(), product_FeatureSet(), get_supporting_sets(). fetch_by_name(), fetch_all_by_supporting_set(), fetch_all_by_product_FeatureSet().

22 Sept 2008 Set example code 1 my $dataset_adaptor = $efg_db->get_DataSetAdaptor; my $data_set = $dataset_adaptor->fetch_by_name (‘ Nessie_NG_STD_2_ctcf_ren_BR1 ’); my @supporting_sets = @{$data_set->get_supporting_sets}; foreach my $sset(@supporting_sets){ print ‘Supporting set ‘.$sset->name.”\n”; print 'Produced by analysis '. $sset->analysis->logic_name."\n"; } my $pfset = $data_set->product_FeatureSet; print “\nProduct FeatureSet is “.$pfset->name.”\n”; print 'Produced by analysis '. $pfset->analysis->logic_name."\n"; Supporting set: ctcf_ren_BR1_TR1 Produced by analysis VSN_GLOG Product FeatureSet is Nessie_NG_STD_2_ctcf_ren_BR1 Produced by analysis Nessie_NG_STD_2

23 Sept 2008 Set example code 2 my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor; my @ext_fsets = @{$featureset_adaptor-> fetch_all_by_type('external')}; foreach my $ext_fset(@ext_fsets){ print "External FeatureSet:\t".$ext_fset->name."\n"; } External FeatureSet: miRanda miRNA External FeatureSet: cisRED group motifs External FeatureSet: cisRED search regions External FeatureSet: VISTA enhancer set

24 Sept 2008 Features ProbeFeature represent an individual alignment of a probe sequence. probe(), probeset(), probelength(), get_result_by_ResultSet(). fetch_all_by_Probe(), fetch_all_by_Slice_ExperimentalChips(). AnnotatedFeature represents any feature based on experimental information i.e. ResultSet or ExperimentalSet data. cell_type(), feature_type(), score(), display_label(). ExternalFeature represents an individual feature from an externally curated set. cell_type(), feature_type(), display_label().

25 Sept 2008 Features RegulatoryFeature represents a feature generated by the Regulatory Build. A combinatorial analysis based on DNase1 HSS’s, CTCF and histone modifications. feature_type(), bound_start(), bound_end(), regulatory_attributes, display_label(), stable_id(). fetch_all_by_Slice, fetch_by_stable_id().

26 Sept 2008 Features example code 1 my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor; my $feature_set = $featureset_adaptor->fetch_by_name (‘ miRanda miRNA ’); my @features = $feature_set->get_Features_by_Slice($slice); foreach my $feat(@features){ print $feat->display_label."\t".$feat->feature_Slice->name."\n"; } ENST00000390665:mmu-miR-712 chromosome:NCBI36:X:214111:214131:-1 ENST00000390665:mmu-miR-673-5p chromosome:NCBI36:X:214115:214136:-1 ENST00000390665:hsa-miR-22 chromosome:NCBI36:X:214125:214146:-1 ENST00000390665:hsa-miR-887 chromosome:NCBI36:X:214138:214159:-1 ENST00000390665:mmu-miR-696 chromosome:NCBI36:X:214149:214165:-1 ENST00000390665:hsa-miR-328 chromosome:NCBI36:X:214178:214200:-1 ENST00000390665:mmu-miR-669b chromosome:NCBI36:X:214228:214250:-1 ENST00000390665:hsa-miR-197 chromosome:NCBI36:X:214264:214285:-1 ENST00000390665:hsa-miR-220b chromosome:NCBI36:X:214265:214286:-1 ENST00000390665:hsa-miR-636 chromosome:NCBI36:X:214341:214362:-1 ENST00000390665:mmu-miR-689 chromosome:NCBI36:X:214424:214445:-1

27 Sept 2008 Features example code 2 my $regfeat_adaptor = $efg_db->get_RegulatoryFeatureAdaptor; my @reg_feats = $regfeat_adaptor->fetch_by_Slice($slice); foreach my $reg_feat(@reg_features){ print $reg_feat->stable_id.' '. $reg_feat->feature_type->name."\n"; foreach my $attr_feat(@{$reg_feat->regulatory_attributes}){ print 'AttributeFeature '. $attr_feat->feature_type->name."\n"; } ENSR00000175296 Promoter Associated - Cell type specific AttributeFeature H3K4me3 AttributeFeature DNase1 AttributeFeature H3K4me3 ENSR00000092125 Unclassified - Cell type specific AttributeFeature DNase1

28 Sept 2008 eFG Environments eFG environments provides useful functions, configuration and administration utilities: efg efg_pipeline Coming soon… Array mapping environment: Affy, Illumina, Codelink, Agilent, Nimblegen. Genomic & transcript mapping pipelines.

29 Sept 2008 eFG Import efg environment Arrays: Nimblegen Sanger ENCODE Simple: GFF BED Wiggle External: cisRED miRanda VISTA redFLY

30 Sept 2008 eFG Import ChIP-chip Normalisation: VSN; TukeyBiweight. Bio::MAGE/Tab2Mage ResultSet nomeclature: EXP1 EXP1_BR1 EXP1_BR1_TR1 EXP1_BR1_TR2 ChIP-Seq Pre/Post analysis

31 Sept 2008 eFG Analysis efg_pipeline environment Pipeline - Ensembl gene build pipeline technology. Analysis Runnables: ACME Chipotle Splitter TileMap Nessie(unpublished) SWEmbl(unpublished) Regulatory Build

32 Sept 2008 eFG Analysis Regulatory Build - Feature construction: Anchor/Focus sets: DNase1; CTCF. Attribute sets: Histone Modifications; Transcription factors. Regulatory Annotation - Patterns associated with: Promoter regions Gene regions Non-Gene regions DNAse1 CTCF H3K36me3 H3K4me3 H3K27me3

33 Sept 2008 Getting More Information Workshop material http://www.ebi.ac.uk/~njohnson/courses/15.09.2008-GI-Hinxton perldoc – Viewer for inline API documentation. shell> perldoc Bio::EnsEMBL::Funcgen::RegulatoryFeature online at: http://www.ensembl.org/info/software/Pdoc/http://www.ensembl.org/info/software/Pdoc/ eFG schema description : online at: http://www.ensembl.org/info/using/api/funcgen/funcgen_schema.htmlhttp://www.ensembl.org/info/using/api/funcgen/funcgen_schema.html eFG installation document: online at: http://www.ensembl.org/info/using/api/funcgen/efg_introduction.htmlhttp://www.ensembl.org/info/using/api/funcgen/efg_introduction.html ensembl-dev mailing list: ensembl-dev@ebi.ac.uk

34


Download ppt "Sept 2008 Ensembl Funcgen Perl API Nathan Johnson EBI - Wellcome Trust Genome Campus, UK Funcgen."

Similar presentations


Ads by Google