Presentation is loading. Please wait.

Presentation is loading. Please wait.

Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Plan 1.Introduction 2.Querying sequence databases (60%) 3.Building.

Similar presentations


Presentation on theme: "Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Plan 1.Introduction 2.Querying sequence databases (60%) 3.Building."— Presentation transcript:

1 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Plan 1.Introduction 2.Querying sequence databases (60%) 3.Building your own sequence databases (30%) 4.Use of API (10%) 5.Further

2 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Introduction 1.History 2.Un système de base de données et un outil d’interrogation 3.Principe général d’ACNUC 4.Accès aux programmes et aux bases 5.Déroulement de l’atelier

3 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Introduction Historique ACNUC est un système de gestion de bases de données dédié à la gestion des séquences biologiques, en particulier génomiques. Son développement a débuté en 1980. Il sert à la fois d'outil d'interrogation et de couche basse pour le développement de logiciel. Il reste le seul logiciel permettant l'interrogation, transparente pour l'utilisateur, des sous-séquences des séquences présentent dans les banques. Des développements récents avec Stéphane Delmote permettent d’interroger les banques à distance via un serveur de sockets

4 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Introduction Principe Le principal géneral d’ACNUC repose sur l’indexation des fichiers de séquences annotées (EMBL, GenBank, SwissProt...) Les différents champs des annotations sont indexés dans des fichier d’index (NOMS, ESPECES, MOT- CLEFS, etc) qui sont mis en relation via des pointeurs.

5 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Introduction Accès aux programmes et aux bases Les programmes, les bases de données et la documentation sont accessibles sur le site du PBIL: http://pbil.univ-lyon1.fr/

6 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Introduction Workshop progress Several exercises and examples of applications will be discussed during the workshop. This presentation and several scripts are available at: ftp://pbil.univ-lyon1.fr/pub/in2p3/formation_acnuc/ GENERAL DOCUMENTATION: http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html QUERY LANGAGE DOCUMENTATION LANGUAGE: http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE

7 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Query sequence databases 1.First steps with ‘QueryWin’ 2.The query language simple query séquences and sub-sequences complicated query 3.Data extraction several formats extract peculiar part of the sequences 4.Using ‘query’ simle scripts complex scripts 5.Using ‘seqinR’ query databases from R

8 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin « QueryWin » works on all platforms : Unix/Linux, Mac, Windows 2 versions are availble: the « local version» works on local databases the « client version » works on distant databases Available at PBIL: http://pbil.univ-lyon1.fr/software/query_win.html Documentation available at PBIL http://pbil.univ-lyon1.fr/software/doclogi/docacnuc/acnucwin/acnwian/aquerywin.html

9 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin Lauch Query_Win - Mac version: click on the application

10 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin Launch Query_Win - on the clusters (local version) launch query_win on EMBL: >query_win embl

11 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin Lauch Query_Win - on the clusters (local version) launch query_win on EMBL: >query_win embl command window - query language command buttons

12 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin Two ways (not exclusives) of querying tthe database: 1.using buttons and menus 2.using the query language Exercise 1 :select mouse sequences in EMBL method 1: Click on the buttons select then species and type « mus » in the opening window. Choose option « build query » Have a look on the command window. Execute Try again with the option « make list »

13 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon First steps with QueryWin method 2: type « sp=mus » in the command window IMPORTANT : Queries done with method 1 are displayed as a query langage in the command window This is an excellent way to learn the query language From now, try to answer the question with the buttons and menus and observe thow it is translated in query language. Little by little,you may tru to use directly the query language. Another thing: A « HELP » mode is available in Query_Win Exercice 1 suite

14 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language simple queries All operations are possible with query_win (by clicking on buttons or using the query language) Some simple examples : -query a sequence according to its name -query a sequence according to its accession number -query a sequence according to its species or taxon -query a sequence according to a keyword Other examples : -Which species is associated to this sequence ? -Which keywords are associated to this sequence ?

15 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language simple queries ACNUC query language is described here: http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE Exercise 2 : Query SwissProt Retrieve sequences of cat (Felis cattus) using the buttons Retrieve sequences of cat (Felis cattus) using the query language Compare the results Exercise 2bis : Query SwissProt Retrieve sequences with the taxonomic ID (TaxonID) of the felis genre (tid=9682)

16 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language simple queries Exercise 3 : Query SwissProt Retrieve sequences associated to the keyword « adenylate cyclase » using the buttons Retrieve sequences associated to the keyword « adenylate cyclase » using the query language Check the different annotation fields. Where is adenylate cyclase? Do the same with GenBank ACNUC query language is described here: http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE

17 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language simple queries Exercise 4 : Query GenBank Retrieve sequences associated to the BTG1 gene Check the different annotation fields. Where is the information on the gene ? Do the same with SwissProt Help the gene name is a keyword ACNUC query language is described here: http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE

18 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language simple queries Use of « wild card » : @ To retrieve keyword beginning with « toto », search for toto@. Exercize 5 : Retrieve sequences associated to keyword beginning with BTG Note You may use the wild card for species and sequence name

19 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language sequences & sub-sequences One of the main strength of ACNUC is the definition and the use of sequences and sub-sequences. ID ESCOL3_3; SV 2; circular; genomic DNA; GRV; PRO; 5498450 BP. XX AC BA000007_GR; XX blah blah blah XX CC This Genome Reviews entry was created from entry BA000007.2 in the CC EMBL/Genbank/DDBJ databases on 03 March 2009. XX FH Key Location/Qualifiers FH FT source 1..5498450 FT /organism="GR Escherichia coli" FT /strain="Sakai = O157:H7 = RIMD 0509952 = EHEC" FT /mol_type="genomic DNA" FT /chromosome="Chromosome" FT /db_xref="taxon:386585" FT.5F1 5'ncr 1..189 FT /cds_name="ESCOL3_3.PE1 " FT.PE1 CDS 190..273 FT /codon_start=1 FT /gene_name="thrL" FT /locus_tag="ECs0001" FT /protein_id="BAB33424.1" FT /transl_table=11 FT /translation="MKRISTTITTTITTTITITITTGNGAG" FT.3F1 3'ncr 274..353 FT /cds_name="ESCOL3_3.PE1 " FT misc_structure 215..328 FT /gene_name="Thr_leader" FT /db_xref="Rfam:RF00506" FT.5F2 5'ncr 274..353 FT /cds_name="ESCOL3_3.PE2 " FT.PE2 CDS 354..2816 FT /codon_start=1 FT /gene_name="thrA" FT /locus_tag="ECs0002" FT /product="Aspartokinase I, homoserine dehydrogenase I " FT /function="NADP or NADPH binding" FT /function="amino acid binding" FT biosynthetic process" FT /protein_id="BAB33425.1" FT /db_xref="GO:0004072" FT /db_xref="UniProtKB/TrEMBL:Q8XA84" FT /transl_table=11 etc CDS 5’ncr 3’ncr 5’ 3’ 1 1 1 2 2 2 3 3 3

20 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon ACNUC defines sequences and sub-sequences. A sequence may contain many sub-sequences. For example, a chromosome and its CDS are respectively a sequence containing several sub-sequences A sub-sequence may be of several type Exercise 6 : Query HOGENOMDNA (complete genomes) Retrieve sequences of Escherichia coli o157:h7 str. sakai Question: what are these sequences ? Retrieve sub-sequences of chromosome ESCOL3_3 Question: which type are these sequences ? Retrieve the CDS of chromosome ESCOL3_3 Back to the séquence ESCOL3_3: check for the CDS in the annotations The query language sequences & sub-sequences

21 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Séquences are associated to one species. All its sub-sequences are associated to this species. It is not the case of keywords. A keyword may be associated to a sequence or only to one of its sub-sequence. Exercise 7 : Query SwissProt Retrieve sequences associated to the BTG1 gene Do the same in GenBank What are these sequences? Help gene name is a keyword The query language sequences & sub-sequences

22 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Combinations of criteria: Operations AND, OR, NOT, AND NOT Use of parenthesis Crossing results list: Exercice 8 : Query SwissProt Retrieve mammalian sequences Retrieve sequences associated to BTG1 Cross these 2 list : list1 AND list2 Retrieve mammalian sequences associated to BTG1 in a single query Retrieve mammalian sequences associated to BTG1,BTG2,BTG3 and BTG4 in a single query. How many sequences you obtained? Indice beware OR and AND The query language complex queries

23 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Other criteria: year of publication ex: y<1986 author of publication au=marley idem journal moleculem=mRNA organelleo=MITOCHONDRION typet=CDS hôte h=homo sapiens status (not for GenBank) st=EST The query language complex queries

24 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Modify a sequences list according to the sequences date or sequence lengths Exercise 10:: Query SwissProt Retrieve sequences from mus Select sequences with more than 300 aa Select sequences which have been added after Y2K The query language complex queries

25 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language complex queries Exercise 11: Query SwissProt Wich are species in witch BTG1 is found in sequence annotations? (it does not mean that other species do not present this gene) Solution :retrieve sequences associated to the gene then retrieve the species associated to these sequences) Exercise 11bis Do the same in one command line Exercise 12 Retrieve the name of all the strains of E. coli found in EMBL Exercice 12bis Retrieve the list of eukaryots in HOGENOMDNA. Retrieve the list of fungi. Help projecting species ps

26 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language browsing taxonomy and keywords Both taxonomy and keytwords are organised in a hierarchy. It is possibleto browse these hierarchies with the button browse of Query_win A keyword may have « parent ». For example, EC-numbers are keyword, all descending of the keyword « EC_Number » This is very useful to sort and select keywords. You may select a parent keywords in Query_Win by selecting the button « by name », then enter the word and click « exec » then « done

27 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language browsing taxonomy and keywords Exercise 13 : Query SWISSPROT Retrieve all keywords associated to human There is too many keywords! We only want EC numbers: Retrieve descending keyowrds of de « EC_NUMBERS » How many are they? Exercise 13 bis: Retrieve EC_NUMBERS associated to human Vocabulaire pk list kd list (nk=)

28 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of de files You may use of files containing: sequence names sequence accession number keywords species Exercise 14 : In Uniprot retrieve the human EC numbers from the file created in exercise 13bis. What are the mouse sequences associated to these EC numbers. Vocabulaire fk file un lmist ps list The query language complex queries

29 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon The query language scan of annotations It is possible to scan the annotations. Interesting of the word to scan is not indexed and if the list of sequences to scan is not too big

30 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Data extraction several formats Exercise 15 : Query HOGENOMDNA Selectionner sequences of yeast (saccharomyces) Extract sequences of chromosomes in FASTA format Extract sequences of CDS translated into protein in FASTA format

31 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Data extraction extract part of sequences Exercise 16 : In HOGENOMDNA Selectionner sequences of yeast (saccharomyces) Extract sequences of CDS in FASTA format Extract sequences of CDS in EMBL format Extract 5’non coding sequences in FASTA format Extract the 1000 first residus of each chromosome in FASTA format Extract the 500 residus preceding the CDS in FASTA format

32 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query « query » is the command line version of query_win Its interest relies on the possibilty of using scripts. This helps the automation of th processing, which is very useful in the following cases: - long suite of queries boring re-write each time: less errors, save time - use of workflows - use of generic scripts for different uses - use on clusters and farms.

33 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query launching As Query_Win, 2 versions are available: local version ( installed on pbil, pbil-dev, et les workers pbil-debX) client version (query distant databases) Both available for Linux/Unix, MacOS, Windows. Locale version : query embl >query embl Client version : queryr embl >raa_query then choose database, or directly: >raa_query pbil.univ-lyon1.fr:5558/embl

34 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query instructions « query » use the same query language as query_win. However, there are small differences, especially in the managment of lists. Do not hesitate to consult help by typing HELP. Exercise 17 Query HOGENOMDNA (complete genomes) Retrieve sequences of Escherichia coli o157:h7 str. sakai Retrieve sub-sequences of chromosome ESCOL3_3 Retrieve CDS of chromosome ESCOL3_3

35 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query instructions Solution exercise 17 Query HOGENOMDNA (complete genomes) Retrieve sequences of Escherichia coli o157:h7 str. sakai Retrieve sub-sequences of chromosome ESCOL3_3 Retrieve CDS of chromosome ESCOL3_3 Save CDS query hogenomdna sel sp=Escherichia coli O157:h7 str. sakai mod list1 5 sel n=ESCOL3_3 et t=cds save list3 list_cds stop select a list ( defaut :list1) selection criterium modify list list to be modified type of modification selec a new list ( default: list3) selection criterium save list list to be saved file exit query

36 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query instructions

37 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query use of scripts A script is used as it follows query banque << EOF instructions EOF

38 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query use of scripts Execute precedng exercise with a script. Moreover, extract CDS in FASTA format source exemple_script_1.csh or csh exemple_script_1.csh terminal no Exercice 18

39 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query use of scripts source exemple_script_2.csh ou csh exemple_script_2.csh sel/l=plant giving a name to the list helps the writing and understanding of the script This script select homologous gene famiies ( HOGENOM families) shared by plants and cyanobacteria but not by animals. CDS of Arabidopsis present in these families are saved and extracted in FASTA format Exercice 19

40 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of query use of scripts csh exemple_script_3_bis.csh viridiplantae cyanobacteria metazoa Use of a script with arguments

41 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of seqinR It is possible to query ACNUC databases from the R software. Use the seqinR package Exercise 17ter with R: Query HOGENOMDNA (complete genomes) Retrieve the CDS of Escherichia coli o157:h7 str. sakai Plot the histogram of CDS lengths

42 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of seqinR Solution Exercise 17 install.pacakges(« seqinr ») library(« seqinr ») choosebank(« hogenomdna ») query("cds","sp=Escherichia coli o157:h7 str. sakai et t=cds") lengths<-lapply( cds$req,getLength) hist(unlist(lengths))

43 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Why ? 1.To stock and access to sequences of interest. selection and modification of a sub-set of a generalist database sequencing 2.Allowing complex queries 3.Create your own keywords and associated hierarachy 4.Automation of queries 5.Share and diffusion

44 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database How to select a local database: index are in /ma_banque/index flat files are in /ma_banque/flat_files Define environnement variables acnuc et gcgacnuc setenv mabase « /ma_banque/index /ma_banque/flat_files » query mabase

45 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Build a database from annotated data script build_uniprot.csh initf :create indexes acnucgener:indexation of sequences Documentation: http://pbil.univ-lyon1.fr/databases/acnuc/acnuc_gestion.html Exercise 20 build a database in SWISSPROT format

46 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Build a database from annotated data script build_embl.csh initf :create indexes acnucgener:indexation of sequences Documentation: http://pbil.univ-lyon1.fr/databases/acnuc/acnuc_gestion.html Exercise 21 build a database in EMBL format

47 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database By default, many fields are used to define the keywords However it is possible to specify supplementary fields to define keywords. Example search for keyword HBG298754 in the previously created embl database. The keyowrd is nout found.. However the field /gene_family="HBG298754" exists (cf séquence ECODH_1.PE2) Exercise 22 Rebuild the database with build_embl_customized.csh... Query for the keyword again. Defining new keywords (EMBL/GenBank only)

48 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Qualifier = GENE_FAMILY Use_Value = True Parent_Keyword = GENE_FAMILY Qualifier = DB_XREF Use_Value = True Parent_Keyword = CROSS REFERENCES Qualifier = PROTEIN_ID Use_Value = True Parent_Keyword = PROTEIN IDS Qualifier = %(C+G) Use_Value = True Parent_Keyword = CG_CONTENTS Qualifier = LOCUS_TAG Use_Value = True Parent_Keyword = LOCUS_TAG Build your own ACNUC database Defining new keywords (EMBL/GenBank only) Use the file « custom_policy » which should be in the directory $acnuc (index) fichier custom_policy ECODH_1.PE2 Location/Qualifiers (length=2463 bp) FT CDS 337..2799 FT /codon_start=1 FT /gene_family="HBG298754" FT /evidence="4: Predicted" FT /gene_id="IGI03726849" FT /gene_name="thrA" FT /locus_tag="ECDH10B_0002" FT /product="Fused aspartokinase I and homoserine FT dehydrogenase I" FT /function="NADP or NADPH binding" FT /function="amino acid binding" FT /function="homoserine dehydrogenase activity" FT /biological_process="aspartate family amino acid FT biosynthetic process" FT /protein_id="ACB01207.1" FT /db_xref="GO:0004072" FT /db_xref="InterPro:IPR001048" FT /db_xref="UniProtKB/TrEMBL:B1XBC7" FT /transl_table=11 FT /%(C+G)="CG<60%" FT /note="C+G content in third codon positions = 57.6 % " //

49 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Enrich annotations et create keywords Yoy may enrich the annotations with adapted keywords. For example, the following lines FT /gene_family="HBG298754" FT /%(C+G)="CG<60%" FT /note="C+G content in third codon positions = 57.6 % " have been added to allows to query the database according to the GC contents or the gene family.

50 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Enrich annotations et create keywords Going further: Modify the annotations and create an associated custom_qualifier_policy file. Exercise 23 Modify custom_policy to generate different keywords 2 examples custom_qualifier_policy.hogenom custom_qualifier_policy.tp

51 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Build a database from raw sequence data (FASTA) ACNUC database are builded from SwissProt, EMBL ou Genbank format. You need to convert a FASTA file into the correct format to build the database. Uniprot: script BioPerl EMBL/GenBank : readseq http://www.ebi.ac.uk/cgi-bin/readseq.cgi gener_prot.pl Chlre4_best_proteins_small.fasta Chlre4_best_proteins.dat CHLRE

52 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Build your own ACNUC database Build a database from raw sequence data (FASTA) Exercise 24 Transfom ecoli_dna.fasta file in EMBL format and build an ACNUC database

53 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Sample sequence management Yoy may want to do query as: Retrieve all the sequences send to the sequencing of 15/02/2010 Retrieve all the sequences send to the sequencing of 15/02/2010 and associated to the « toto » experiment. Retrieve all the sequences associated to the « toto » experiment and the « tata » species Build your own ACNUC database

54 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Steps Step 1: Cleaning and annotation of sequences Step 2: Transform FASTA file into EMBL file (readseq). Step 3: Add keywords as:  Obtention date  Experiment name  Etc. Step 4: Build the database Build your own ACNUC database

55 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of API API C/C++ Documentation : General structure http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html API C (local version) http://pbil.univ-lyon1.fr/databases/acnuc/structure.html API C (client version,acces via sockets) http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html API C++ (client version,acces via sockets, Bio++) http://pbil.univ-lyon1.fr/databases/acnuc/bpp-raa/bpp-raa.html Exemples of API C local version : http://pbil.univ-lyon1.fr/databases/acnuc/example.php http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html

56 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of API API C local version Exercise 25 test exemple1.c /* gcc -c exemple1.c -I /bge/banques/csrc gcc -o exemple1 exemple1.o -L /bge/banques/csrc -lcacnucdeb */ #include "dir_acnuc.h" main(int argc,char *argv[]) { /*char my_taxon[] = "Bovidae"; /* case ignored */ char my_taxon[500]; int num, err, *list, numsp; int i = 2; if (argc == 1) { fprintf(stderr,"Usage: exemple1 taxon_name\n"); exit(1); } strcpy(my_taxon,argv[1]); while (argc > i) { strcat(my_taxon," "); strcat(my_taxon,argv[i]); i ++; } acnucopen(); list = (int *)calloc(lenw, sizeof(int) ); err = shkseq(my_taxon, list, 1); if(err == 2) { printf("Taxon %s does not exist in the current database.\n",my_taxon); exit(1); } num = 1; while( (num = irbit(list, num, nseq)) != 0) { /* here num is the rank of a seq attached to taxon my_taxon */ readsub(num); printf("%s\t%s\n",my_taxon,psub->name); } free(list); } select a local database: « choix embl » or « choixbanque » else setenv acnuc/acnucdb/embl/index setenv gcgacnuc /acnucdb/embl/flat_files

57 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of API API C local version Exercise 26 http://pbil.univ-lyon1.fr/databases/acnuc/ex_requete.php

58 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of API API C client version Exercise 27 http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html#example API C client version (acess via les sockets) http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html

59 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Use of API API Python Documentation : http://pbil.univ-lyon1.fr/cgi-bin/raapythonhelp.csh

60 Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Further Install an ACNUC server Questions? et n’oubliez pas: www.mangerbouger.fr


Download ppt "Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon Plan 1.Introduction 2.Querying sequence databases (60%) 3.Building."

Similar presentations


Ads by Google