Presentation is loading. Please wait.

Presentation is loading. Please wait.

The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number.

Similar presentations


Presentation on theme: "The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number."— Presentation transcript:

1 The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity) EBI services Jennifer McDowall EMBL-EBI

2 Overview Introduction EBI Databases Searching for sequences –NEW: simple EBI search –Advanced SRS text search –Sequence search tools Accessing Old entries –Sequence archives Chemoinformatics

3 Website: http://www.ebi.ac.uk/ Thematic index EBI Search Search all databases and literature in one go EBI Search Search all databases and literature in one go

4 Website: www.ebi.ac.uk Databases Patent resources Sequences Genomes Chemistry Structures Gene expression Reactions & pathways Literature Sequence searching Sequence analysis Structural analysis Functional analysis Tools Training eLearning Workshops 2Can education resource Industry programme Industry support SME Support

5 patent-related resources... EBI databases

6 Sequence data from patent literature October 2010 patent nucleotides > 17.5m sequences patent proteins > 4.9m sequences GenBank ENA DDBJ EPO USPTO JPO + KIPO EPO policy: Data publically released 18 months after patent application date (whether patent granted or not) INSDC agreement: Free unrestricted access Permanently accessible All data exchanged daily

7 Patent resources at EBI www.ebi.ac.uk/patentdata

8 Patent sequence records at EBI NR patent sequences >124 million sequences patent + non-patent nucleotides redundant UniParc (division of UniProt) ENA (formerly EMBL-Bank) >24 million sequences patent + non-patent proteins non-redundant patent proteins and nucleotides non-redundant additional patent annotation non-patent sequence prior art searches patent sequence prior art searches

9 Non-redundant patent databases www.ebi.ac.uk Remove sequence redundancy Level-1 NR Group by patent families Level-2 NR Additional annotation, including priority dates for patent families ENA (redundant)

10 Sequence submissions Generate sequence Submit to journal Submit to ENA Submission guides at www.ebi.ac.uk Not acceptedSubmit to journal Step 2 Submit claim to EPO Step 1

11 Searching for sequences simple EBI search...

12 EBI-Search by patent number www.ebi.ac.uk Follow link to NEW EBI Search

13 Link to NEW EBI Search EBI-Search by patent number

14 Link to NEW EBI Search Getting started How it works Gene & protein summaries NEW EBI Search Training video

15 EBI-Search by patent number Link to NEW EBI Search Search for patent WO0146262

16 Link to NEW EBI Search EBI-Search by patent number Search for WO0146262

17 EBI-Search by patent number Link to NEW EBI Search Search for WO0146262 Literature for WO0146262 Sequence data for WO0146262

18 EBI-Search by patent number Link to NEW EBI Search Search for WO0146262 Link to full patent paper

19 EBI-Search by patent number Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

20 Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore EBI-Search by patent number

21 Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore

22 EBI-Search by patent number Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore WO0146262 in Esp@cenet

23 EBI-Search by patent number Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXploreWO0146262 in Esp@cenet

24 EBI-Search by patent number WO0146262 in Esp@cenet Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore WO0146262 in Patent Lens

25 EBI-Search by patent number WO0146262 in Esp@cenet Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore WO0146262 in Patent Lens

26 EBI-Search by patent number WO0146262 in Esp@cenet Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore WO0146262 in Patent Lens Lists nucleotide sequences from WO0146262 Additional annotation

27 EBI-Search by patent number WO0146262 in Esp@cenet Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases WO0146262 in CiteXplore WO0146262 in Patent Lens WO0146262 nucleotide sequence record in ENA

28 Patent sequence record in ENA www.ebi.ac.uk Graphical viewer Sequence Patent reference Navigate to related data e.g. Version archive Navigate to external data sources e.g. UniProt Download data DNA source Dates (first public and last updated) Sequence version

29 WO0146262 in Esp@cenet Link to NEW EBI Search Search for WO0146262 WO0146262 in CiteXplore WO0146262 in Patent Lens WO0146262 literature and sequence databases ENA sequence record EBI-Search by patent number

30 EBI-Search by gene name Link to NEW EBI Search Search for src gene

31 Link to NEW EBI Search EBI-Search by gene name Search for src

32 EBI-Search by gene name Link to NEW EBI Search Search for src Genome information Gene & protein summaries

33 EBI-Search by gene name Link to NEW EBI Search Search for src Let’s select src in humans

34 EBI-Search by gene name Link to NEW EBI Search src gene & protein summary Search for src

35 EBI-Search by gene name Link to NEW EBI Search src gene & protein summary Search for src Species selector

36 EBI-Search by gene name Link to NEW EBI Search src gene & protein summary Search for src Gene tab Gene structure (forward & reverse strand) Gene sequence Location Sequence variations Orthologs Data source (Ensembl)

37 src gene & protein summary Link to NEW EBI Search Search for src Gene & protein summary gene tab EBI-Search by gene name

38 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab Expression tab Expression studies Data source (Expression Atlas) EBI-Search by gene name

39 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab See expression in cell type EBI-Search by gene name Gene Atlas

40 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary expression tab

41 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary expression tab Protein tab Function Isoforms Sequence Classification Interactions Data sources (UniProt, InterPro, IntAct)

42 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab

43 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Structure tab Citation Data source (PDBe) Structural domains 47 additional structures

44 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab

45 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab Literature tab Search results taken from: PubMed PubMedUK Agricola EPO Divided into categories Description of categories

46 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab Literature tab Patents Curator-selected articles

47 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab Gene & protein summary literature tab

48 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab Reporting view  print full summary page

49 Link to NEW EBI Search Search for src src gene & protein summary Gene & protein summary gene tab EBI-Search by gene name Gene & protein summary protein tab Gene & protein summary expression tab Gene & protein summary structure tab Gene & protein summary literature tab Print report

50 Searching for sequences advanced SRS text search...

51 SRS – for more search options www.ebi.ac.uk/srs 1 st : Select resources to search 2 nd : Create query

52 SRS – for more search options Select library tab

53 SRS – for more search options Select library tab Patent literature Patent DNA Patent proteins Search >100 databases

54 SRS – for more search options Select library tab Here, selected NR-level 2 DNA database

55 SRS – for more search options Select library tab Select resources to search

56 SRS – for more search options Select library tab Select resources to search 2) Type in text 1) Select field

57 SRS – for more search options Select library tab Select resources to search Here, selected patent number

58 SRS – for more search options Select library tab Select resources to search Create query

59 SRS – for more search options Select library tab Select resources to search Create query Lists non-redundant nucleotide sequences from WO0146262

60 SRS – for more search options Select library tab Select resources to search Create queryWO0146262 sequences

61 SRS – for more search options Select library tab Select resources to search Create query WO0146262 sequences WO0146262 nucleotide sequence record in NRNL2

62 Patent sequence record in NRNL2 Patent equivalents Sequence record in ENA Sequence Patent literature Priority number and date Translation

63 SRS – for more search options Select library tab Select resources to search Create query WO0146262 sequencesNRNL2 sequence record

64 SRS – for more search options Select library tab Select resources to search Create query WO0146262 sequencesNRNL2 sequence record WO0146262 literature www.ebi.ac.uk/srs

65 Searching for sequences sequence search...

66 Sequence searching – specialised tools Navigate to ‘Sequence Similarity & Analysis’ www.ebi.ac.uk

67 Sequence searching – specialised tools Navigate to search tools

68 Sequence searching – specialised tools Navigate to search tools www.ebi.ac.uk/Tools/sss BLAST FASTA PSI search Choose Search tool

69 When to use which search? Query length FASTA WU-BLAST NCBI BLAST PSI-SEARCH time to search Database size

70 When to use which search? Chose the appropriate search engine for the job  BLAST – initial fast search  FASTA – better general search engine  PSI-BLAST – find remote family members  GLSEARCH – match peptide/domain to protein  GGSEARCH –full length matches  FASTM – match several peptides to protein (one search engine won’t do everything)

71 Sequence searching – specialised tools Navigate to search tools www.ebi.ac.uk/Tools/sss Here, try FASTA protein

72 Sequence searching – specialised tools Navigate to search tools Select search tool

73 Sequence searching – specialised tools Navigate to search tools Select search tool For patent proteins: Search individual patent offices or non-redundant patent datasets Step 1: Select database

74 Sequence searching – specialised tools Navigate to search tools Select search tool Here, selected UniProt Knowledgebase + NR patent proteins L2 Step 1: Select database

75 Sequence searching – specialised tools Navigate to search tools Select search tool(1) Select database

76 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database Step 2: Copy/paste sequence or upload file Copy/pasted patent protein A00210 from patent EP0242329

77 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence

78 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence Step 3: Set parameters Can change search engine

79 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence Step 3: Set parameters Can change search parameters

80 How to optimise parameters? User manual provides help

81 How to optimise parameters? 2.length of query sequence Choice of matrix depends on: 1.strictness of search QUERY LENGTH MATRIX open ext >300 BLOSUM50 -10 -2 85-300 BLOSUM62 -7 -1 50-85 BLOSUM80 -16 -4 >300 PAM250 -10 -2 85-300 PAM120 -16 -4 35-85 MDM40 -12 -2 <=35 MDM20 -22 -4 <=10 MDM10 -23 -4

82 How to optimise parameters? Choice of gap penalties depends on: 2.to match scoring matrix 1.strictness of search QUERY LENGTH MATRIX open ext >300 BLOSUM50 -10 -2 85-300 BLOSUM62 -7 -1 50-85 BLOSUM80 -16 -4 >300 PAM250 -10 -2 85-300 PAM120 -16 -4 35-85 MDM40 -12 -2 <=35 MDM20 -22 -4 <=10 MDM10 -23 -4 larger penalty  fewer gaps

83 How to optimise parameters? Do I mask my sequence? **Be careful you don’t mask what you are looking for Low complexity regions should be masked to avoid spurious results CA repeats poly-A tails proline-rich regions

84 How to optimise parameters?  use strict matrices  use high gap penalties  avoid masking  allow high e-values What do I use for short sequences?

85 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence Step 3: Set parameters Here, use default parameters

86 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters

87 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters Step 4: submit Can select to have results emailed

88 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit

89 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Results include patent proteins (from NRPL2)......and non-patent proteins (from UniProtKB) View additional annotation (non-patent proteins)

90 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Related EMBL nucleotide entries

91 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Related genomic information

92 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Gene ontology (GO) mapping for protein

93 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit InterPro family/domain classification

94 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Literature

95 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Functional predictions on ALL proteins

96 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Result summary + annotation

97 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Result summary + annotation Visual comparison  find mis- or partial matches Prioritize results Functional predictions: InterPro family/domain classifications Extract information

98 Sequence searching – specialised tools Navigate to search tools Select search tool (1) Select database (2) Copy/paste sequence(3) Set parameters (4) Submit Result summary + annotationFunctional predictions

99 Accessing old entries sequence archives...

100 Sequence archives www.ebi.ac.uk ENA nucleotide sequence version archive (SVA) www.ebi.ac.uk/embl/sva UniSave – UniProt sequence/annotation version archive www.ebi.ac.uk/uniprot/unisave Search by date  get specific record Search by accession only  get all records

101 Sequence archives View old entries Compare different versions Provides complete version list

102 Sequence archives View old entries

103 Sequence archives Compare different versions

104 Chemoinformatics ChEBI & ChEMBL...

105 Chemoinformatics databases at EBI Chemical Entities of Biological Interest ‘Small’ chemical entities (no protein/nucleic acids) Illustrated dictionary of chemical nomenclature http://www.ebi.ac.uk/chebi/ ChEBI ChEMBL Database of bioactive drug-like small molecules ‘Small’ molecules and peptides Illustrated dictionary of chemical nomenclature http://www.ebi.ac.uk/chembl/

106 ChEBI data overview Visualisation caffeine 1,3,7-trimethylxanthine methyltheobromine Nomenclature Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology MSDchem: CFF KEGG DRUG: D00528 Database Xrefs Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O

107 ChEBI search for  -lactamase Chemical Entities of Biological Interest (ChEBI)

108 ChEBI search for  -lactamase Compounds interacting with BLA2_KLEPN

109 ChEBI search for  -lactamase Patent abstracts ChEMBL db bioactivity details

110 Summary Comprehensive sequence databases  ENA & UniParc (PAT / PRT class data)  Non-redundant patent sequences  enriched Sequence archives  ENA SVA & UniSave  track changes Multiple search engines Broad patent sequence coverage  Protein/nucleotides: EPO, USTPO, JPO, KIPO  EB-eye text search  fetch patent literature ad sequences  SRS  advanced text searching >100 databases (including patents)  Sequence searching  specialised tools; annotation-enhanced

111 User support  2Can bioinformatics user support – www.ebi.ac.uk/2Can  Online help pages – www.ebi.ac.uk/help  E-mail support – www.ebi.ac.uk/support

112 The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity) Any questions? Contacts: www.ebi.ac.uk/support


Download ppt "The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number."

Similar presentations


Ads by Google