Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns.

Similar presentations


Presentation on theme: "Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns."— Presentation transcript:

1 Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

2 Overview Basics of DNA Collecting the data Collection : my application Perl Goal

3 Basics of DNA DNA = polymer of 4 molecules : bases or nucleotides A = Adenine, C = Cytosine, G = Guanine, T = Thymine Replication ( copying ) and translation ( reading ) => double helix : A  T, G  C ( copying ) 3 letter combination = codon RNA : U = Uracil in place of T => Transcribing Protein = polymer composed of 20 amino acids ( reading ) => more complex structure than DNA

4 Transition DNA RNA Protein

5 Intron – Exon - Splicejunction exon 200 characters  intron thousands 30,000 genes identified out of possible 100,000 Identification gene patent

6

7

8

9 Summary Human : 23 chromosomes Chromosomes thousands of genes Gene info : exons, comments : introns Exons and introns codons Codon bases

10 Datacollection Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm

11

12 Datacollection Human Genome Project NCBI website : http//www.ncbi.nlm.nih.gov Entrez-Nucleotide.htm NCBI Sequence Viewer.htm

13 Datacollection : my application BioBrowser Download HTML ExtractLinks() Download HTML - data ExtractData() TranslateData()

14

15 Datacollection : my application BioBrowser Download HTML ExtractLinks() Download HTML - data ExtractData() TranslateData()

16 Perl Practical Extraction and Report Language POD – files -> web Portability Free – CPAN modules String manipilation Extremely powerfull regex-engine Glue language designed for short and simple tasks, not equal to lack of power or “serious” features Tutorial : http://www.netcat.co.uk/rob/perl/win32perltut.html

17 Regular Expression – Pattern Matching Practical Extraction and Report Language Scan through data and extract useful information m/ PATTERN / s/ PATTERN / REPLACEMENT / 1 line Perl = 100 lines C or Java Complex, but easy

18 Regex examples /[KCZ]arl^sa/ / /(.*?) /i $1,$2,… i, g, c, …., *, +, ? /([0-9a-zA-Z])+/ or /([\w])+/ s/us[^a-z]/them/g or s/us\W/them/g /([acc|act][ttt|ttc|att])/ TIMTOWTDT

19 Part 2 : Applying AI Our choice : evolutionary computing First part : identify exon part Second part : identify splicejunctions Third part : combine previous parts Hope to reach +90% accuracy

20 Questions ?


Download ppt "Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns."

Similar presentations


Ads by Google