Presentation is loading. Please wait.

Presentation is loading. Please wait.

Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University.

Similar presentations


Presentation on theme: "Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University."— Presentation transcript:

1 Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University

2 Edinburgh,UKBNCOD21 Overview Motivation Challenges of Bioinformatics Databases Management Approaches to integration of bioinformatics databases Association rule mining Hypothesis Basic concepts Material and methods

3 Edinburgh,UKBNCOD21 Motivation Very large heterogeneous databases. Need to link. Integration. Complex relation.

4 Edinburgh,UKBNCOD21 Challenges of Bioinformatics Databases Management Bioinformatics Databases format: Flat files: GenBank, EMBL, DDBJ, PDB. Relational databases: HGMD, MGMD Object-oriented database: AceDB. XML databases: PIR, SwissProt, InterPro. Characteristics: The Diversity/variety of data. The representational heterogeneity. Autonomous and web-based sources. Varied interface and query capabilities

5 Edinburgh,UKBNCOD21 Approaches to integration of bioinformatics databases Multiple models of data integration: Federation Warehousing Mediations

6 Edinburgh,UKBNCOD21 Federation Provides access to distributed data while preserving database autonomy examples: K2/BioKleisli Entrez

7 Edinburgh,UKBNCOD21 Warehousing import data from remote sources and copy to local server Example: GUS (Genome Unified Schema) Sequence Retrieval System (SRS)

8 Edinburgh,UKBNCOD21 Mediations stores no data on its own rather it provides a virtual view of the integrated sources Examples: Transparent Access to Multiple Bioinformatics Information Source (TAMBIS) Knowledge-based Integration of Neuroscience Data (KIND)

9 Edinburgh,UKBNCOD21 Hypothesis: It is possible to mine diverse databases to recover datasets related to a disease, associated gene mutations and mutagens which aid scientists understanding of their cause.

10 Edinburgh,UKBNCOD21 Association Rules Association Rules –interesting association relationship among huge amounts of transactions An association rule is an expression of the form X => Y, where X and Y are sets of items Goal of AA – To find all association rules that satisfy user-specified minimum support and minimum confidence threshold Examples. – Rule form: “Body  ead [support, confidence]”. – buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] – major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]

11 Edinburgh,UKBNCOD21 Association Rules Applications: Basket data analysis Genomic Data Cross-marketing Catalog design sale campaign analysis Web Personalization clustering, classification, etc.

12 Edinburgh,UKBNCOD21 Basic Concepts The discovery of interesting association relationships among huge amount of gene mutation can help in determining the cause of mutation in tumours and diseases. Gene is a segment of a DNA molecule that contains all the information required for synthesis of a product.synthesis Gene mutation is any change in the DNA sequence of a gene. Types of mutations: Insertion, Deletion, Insertion/Deletion, Complex, and Multiple Substitution

13 Edinburgh,UKBNCOD21 Material and Methods HGMD database The Human Gene Mutation Database (HGMD) runs by University of Wales College of Medicine. Known (published) gene lesions responsible for human inherited disease. Provides information about practical diagnoses.

14 Edinburgh,UKBNCOD21 Material and Methods MGMD database The Mammalian Gene Mutation Database (MGMD). Runs by Centre of Molecular Genetics and Toxicology, University of Wales Swansea. profiles of known (published) mutagen-induced gene mutations. Stores the mutation spectra information. It has 39134 records.

15 Edinburgh,UKBNCOD21 Material and Methods Sets of items whose elements tend to be in both databases will be retrieved to discover the interesting association rules among genes, mutations, mutagens and diseases.

16 Edinburgh,UKBNCOD21 Material and Methods DBnMGMDHGMD Wrapper Query interpreter Graphical User Interface (GUI) Mining tools

17 Edinburgh,UKBNCOD21 References [1] Hernandez T. and Kambhampati S. (2004) Integration of Biological Sources: Current Systems and Challenges Ahead, Proc. of the ACM SIGMOD Conference. [2] C. Goble et al. (2001) Transparent access to multiple bioinformatics information sources. IBM Systems Journal, 40(2). [3] Barbara Eckman,Zoe Lacroix and Louiqa Raschid (2001) Optimized Seamless Integration of Biomolecular Data,IEEE, International Conference on Bioinformatics and Biomedical Egineering,23-32. [4] Lacroix Z, Boucelma O and Essid M (2003) The Biological Integration System. Proc. of the 5th ACM Workshop on Web Information and data management, pp 45-49. [5] Aldana J.,Roldán M, Navas I, Pérez A and Trelles O (2004) Integrating Biological Data Sources and Data Analysis Tools through Mediators, Proceedings of the 2004 ACM symposium on Applied computing.AldanaRoldánNavasPérezTrelles [6]. Agrawal, R.-Imielinski, T.-Swami, A. (1993) Mining Association Rules Between Sets of Items in Large Databases. Proc. ACM SIGMOD:207-216. [7] P.D. Lewis, J.S. Harvey, E.M. Waters, and J.M. Parry (2000) The Mammalian Gene Mutation Database, Mutagenesis, 15(5): 411- 414. [8] Krawczak M, Ball EV, Fenton I, Stenson PD, Abeysinghe S, Thomas N, Cooper DN (2000): Human Gene Mutation Database - a biomedical information and research resource. Human Mutation 15(1):45-51.

18 Edinburgh,UKBNCOD21


Download ppt "Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University."

Similar presentations


Ads by Google