1 Antal Péter Computational Biomedicine (Combine) workgroup Department of Measurement and Information Systems, Budapest University of Technology and Economics.

Slides:



Advertisements
Similar presentations
U Computer Systems Research: Past and Future u Butler Lampson u People have been inventing new ideas in computer systems for nearly four decades, usually.
Advertisements

Beth Roland Eighth Grade Science JFMS
An Introduction to Data Mining
The Nature of Science Living Environment. What is Science? Science: knowledge based on the study of nature Theory: explanation of a natural phenomenon.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
UNCLASSIFIED ENTERPRISE MANAGEMENT DECISION SUPPORT (EMDS) UNCLASSIFIED T HE 2013 C OMPUTERWORLD H ONORS L AUREATE I NNOVATION A WARD W INNER 2013 Federal.
1 Famous Psychology Experiments. 2 Conducting Psychology Experiments.
Knowledge Discovery Centre: CityU-SAS Partnership 1 Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr.
Section 2: Scientific Methods
A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,
Towards EU big data economy Kimmo Rossi European Commission
2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
Cyber Basics and Big Data. 2 Semantic Extraction Sentiment Analysis Entity Extraction Link Analysis Temporal Analysis Geospatial Analysis Time Event Matrices.
Scientific Method-Define words underlined
INTELLIGENT SYSTEMS MOTIVATIONS M. Gams. Definition (scientific): Intelligent ststem is a system that learns during its existence. It senses its environment.
Designing for Big Data What you never expected to learn By: Reuben Tozman.
1 Ma Jiantang Commissioner National Bureau of Statistics of China September 3, 2012 Ma Jiantang Commissioner National Bureau of Statistics of China September.
1 Types of Business Information Systems Different people have different information needs depending on their work.
Mobile HCI Presented by Bradley Barnes. Mobile vs. Stationary Desktop – Stationary Users can devote all of their attention to the application. Very graphical,
Issues in Information Systems Research & Research Methods IL IM Assistant Professor Information Systems Department New Jersey Institute of Technology
1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan Pengfei Research Institute of Statistical Sciences.
The Scientific Method in Psychology.  Descriptive Studies: naturalistic observations; case studies. Individuals observed in their environment.  Correlational.
Scientific Method 5 Steps. 1.Define the problem Starting point for using the scientific method is to define or identify the problem.
10/25/2015.  Familiarity with the mouse and keyboard  Basic Internet proficiency  An understanding of the real estate industry 10/25/2015.
Sociologists Doing Research Chapter 2. Research Methods Ch. 2.1.
5/30/2016.  Fusion 1 or equivalent knowledge and experience  Familiarity with the mouse and keyboard  Basic Internet proficiency  An understanding.
Chapter 1 — Computer Abstractions and Technology — 1 The Computer Revolution Progress in computer technology – Underpinned by Moore’s Law Makes novel applications.
The Nature Of Science Schoonover. Gather more information to see if your answer is correct. If possible, perform experiments. Data are observations and.
Chapter 1 Introduction to Earth Science. Standards SCSh1. Evaluate the importance of curiosity, honesty, openness, and skeptics in science. SCSh3. Identify.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
Click on a lesson name to select. The Study of Life Section 1: Introduction to Biology Section 2: The Nature of Science Section 3: Methods of Science.
Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center.
Cyberspace Law Committee Meeting, August 3, 2012 Big Data Lois Mermelstein The Law Office of Lois D. Mermelstein
Data Science Background and Course Software setup Week 1.
SCIENTIFIC METHOD. A researcher must follow scientific method for research to be considered valid. The following slides will discuss the procedure for.
4/25/2017 Lesson 1-1 Nature of Science.
How Will Applications Drive Future Data-Intensive Systems? Data-Intensive Computing Workshop Applications Break-Out Session.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
KING SAUD UNIVERSITY – COLLAGE OF COMPUTER AND INFORMATION SCIENCES CSC 113 JAVA ONLINE DOCUMENTATION.
Copyright McGraw-Hill, Inc Scientific Method: Observation Choose a variable. Phenomenon studied by scientists Anything that can change Develop.
February 4, Location Based M-Services Soon there will be more on-line personal mobile devices than on-line stationary PCs. Location based mobile-services.
Chapter 1 What is Biology? 1.1 Science and the Natural World.
9/30/2001Craig Ganoe Methods Supporting Usability Evaluation of the Collaborative Meeting Place Craig Ganoe Project Description LiNC (Learning.
COMM 3353: Information & Communicattion Tehnologies I Dr. Shawn McCombs Chapter 3a: Technology.
1.3 Scientific Thinking and Processes KEY CONCEPT Science is a way of thinking, questioning, and gathering evidence.
Vanessa Lalaine Fuentes, BSN, RN. The development and evaluation of applications, tools, processes and structures which assist nurses with the management.
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
There is an inherent meaning in everything. “Signs for people who can see.”
Market Intelligence Analysis
Nature of Science What are the basic ideas behind the process of science? Science is a process based upon observational and experimental studies using.
@ HOW BIG DATA CAN BENEFIT A CSD
Scientific Method.
Future Technologies FTC 2016 Future Technologies Conference December 2016 San Francisco, United States.
Harry Hochheiser Assistant Professor
Introduction to TIMAN: Text Information Managemetn & Analysis
A Must to Know - Testing IoT
Tiers vs. Layers.
Big Data.
eCareTaker: Context Aware Web Services
Java Online documentation
The Nature and Methods of Science
Theory Vs. Law.
Data Warehousing Data Mining Privacy
Scientific Method Vocabulary
Data Analysis and R : Technology & Opportunity
The Scientific Method.
Observations information gathered by our senses.
Presentation transcript:

1 Antal Péter Computational Biomedicine (Combine) workgroup Department of Measurement and Information Systems, Budapest University of Technology and Economics – Big Data Day 2013

 The „common”/hétköznapi „big data”  Definition(s)  The biomedical „big data”  Biomedical „big data” trends and conclusions  Data and knowledge fusion ◦ From correlation to causation  lessons for DBD ◦ Kernel methods  lessons for DBD ◦ Semantic X  lessons for DBD  Synergies of biomedical and common big data 2

 Financial transaction data, mobile phone data, user (click) data, data, internet search data, social network data, sensor networks, ambient assisted living, intelligent home, wearable electronics,... 3 Gadgets Internet Moore’s law “The line between the virtual world of computing and our physical, organic world is blurring.” E.Dumbill: Making sense of big data, Big Data, vol.1, no.1, 2013 Factors:

4 M. Cox and D. Ellsworth, “Managing Big Data for Scientific Visualization,” Proc. ACM Siggraph, ACM, 1997 The 3xV: volume, variety, and velocity (2001). The 8xV: Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data (2013) Not „conventional” data: „Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it (E.Dumbill: Making sense of big data, Big Data, vol.1, no.1, 2013)

5 Lack of results, problems, „Boom” „Hype” Criticism, re-evaluation Application Technology trigger Present? „Data over algorithms” ‘‘correlations from more&messy data’’ vs. „ causation from small, clean data”

6

7.. [data] is often big in relation to the phenomenon that we are trying to record and understand. So, if we are only looking at 64,000 data points, but that represents the totality or the universe of observations. That is what qualifies as big data. You do not have to have a hypothesis in advance before you collect your data. You have collected all there is— all the data there is about a phenomenon. OMIC DATA

Sequencing --- Genotyping Partial genome screening (x100 SNP) Candidate gene assoc. Effect validation Genome-wide assoc, <2.5million SNPs, x1000 samples Full genome sequencing Samples Variables Exome sequencing (~30Mb, 1% data, 10% cost) 8

Genome(s), epigenome, microbiome Phenome (disease, side effect) Transcriptome Proteome Metabolome Environment&life style Drugs 2010<: “Clinical phenotypic assay”/drugome: open clinical trials, adverse drug reaction DBs, adaptive licensing, Large/scale cohort studies (~100,000 samples) Moore’s lawCarlson’s law

M.Swan: THE QUANTIFIED SELF: Fundamental Disruption in Big Data Science and Biological Discovery, Big data, Vol 1., No. 2.,

 Better outcome variable ◦ „Lost in diagnosis”: phenome  Better and more complete set of predictor variables ◦ „Right under everyone’s noses”: rare variants (RVs) ◦ „The great beyond”: Epigenetics, environment  Better statistical models ◦ „In the architecture”: structural variations ◦ „Out of sight”: many, small effects ◦ „In underground networks”: epistatic interactions  Causation (confounding)  Statistical significance („multiple testing problem”)  Complex models: interactions, epistatis  Interpretation

 Missing the mark, Nature, 2007  MISSING THE MARK: Why is it so hard to find a test to predict cancer?, Nature, 2011

14 The hypothesis-free pitfall in omics GEO Ingenuity HAPMAP GO STRING ….. Hypothesis-free measurement Hypothesis-free data analysis Interpretational/translational bottleneck

 Why: ◦ visualization, ◦ retrieval, ◦ extraction/reduction, ◦ inference (deductive, inductive, predictive/transductive)  What: raw data vs. similarities vs. (free-text...logical) statements  How: ◦ Raw data: * (probabilistic graphical models) ◦ Similarities: * (kernel methods) ◦ Statements: * (probabilistic logic)  Technology: - 16

Bayes’ rule: Systems-based, Bayesian biomarker discovery A priori distribution (prior), p(Model): is a technical tool for inversion (to achieve a direct probabilistic statement), provides a principled solution to incorporate prior knowledge. Open problem: derivation and formalization of informative (domain/data specific) priors. PhenomeSNP mRNAmiRNA lipids mABs PCR MSCGH Heterogeneos entities Samples

BayesEye: relevance map/tree/interactions

Methodological papers & applications: P. Antal, A. Millinghoffer, G. Hullam, C. Szalai, and A. Falus, "A Bayesian view of challenges in feature selection: Feature aggregation, multiple targets, redundancy and interaction“, JMLR proceedings, Hullam G, Juhasz G, Bagdy G, Antal P.: Beyond Structural Equation Modeling: model properties and effect size from a Bayesian viewpoint. An example of complex Neuropsychopharmacol Hung O. Lautner-Csorba, A. Gezsi, A. F. Semsei, P. Antal, D. J. Erdelyi, G. Schermann, et al., "Candidate gene association study in pediatric acute lymphoblastic leukemia evaluated…..…," BMC Med Genomics I. Ungvari, G. Hullam, P. Antal, P. Kiszel, A. Gezsi, E. Hadadi, et al., "Evaluation of a Partial Genome Screening of Two Asthma Susceptibility Regions Using Bayesian Network Based…..…. " Plos One, G. Varga, A. Szekely, P. Antal, P. Sarkozy, Z. Nemoda, Z. Demetrovics, et al., "Additive Effects of Serotonergic and Dopaminergic………," AMERICAN JOURNAL OF MEDICAL GENETICS PART B, 2012 O. Lautner-Csorba, A. Gézsi, D. Erdélyi, G. Hullám, P. Antal, Á. Semsei, et al., "ROLES OF GENETIC POLYMORPHISMS IN THE FOLATE PATHWAY IN CHILDHOOD ACUTE……………….," Plos One, A.Vereczkei, Z. Demetrovics, A. Szekely, P. Sarkozy, P. Antal, A. Szilagyi, M. Sasvári-Székely., "Multivariate Analysis of Dopaminergic Gene Variants as Risk Factors of Heroin ," Plos One, 2013 Chapters: G.Hullám, A.Gézsi, A.Millinghoffer, P Sárközy, B. Bolgár, Z Pál, E Buzás, P Antal: Bayesian, systems- based genetic association analysis with effect strength estimation and omic wide interpretation, in press P. Antal, A. Millinghoffer, G. Hullám, G. Hajós, Csaba Szalai, András Falus: Bayesian, systems-based, multilevel analysis of associations for complex phenotypes: from interpretation to decisions, in press Biomarker discovery II. Tools: The biomarker computational service is available at The biomarker visualization tool, BayesEye, is available at The gene prioritizer tool applied after BMLA can be accessed at

?

Similarity-based fusion in drug repositioning ChemicalSide-effectsTarget prot.MMoAPathways Query-based optimal fusion Tanimoto Preval. BLASTp Ontology Txtmining Query: set of corresponding drugs

 The information content of ◦ the query, ◦ the information resources, ◦ and the unknown observations(!)  allow a one-class analysis of the query (data description)  and this induction is used in prioritization. JOINTLY OPTIMIZED: 1. weighting the members in the query (e.g. detection of outliers in the query),  GETTING THE RIGHT/IMPROVED QUERY 2. weighting the similarity measures (e.g.information resources),  GETTING THE SCORING (SIMILARITY) FOR THE RIGHT/IMPROVED QUERY 3. scoring/ranking the aggregate similarity of the unknown data points to the query.

Ádám Arany, Bence Bolgár, Balázs Balogh, Péter Antal, Péter Mátyus: Multi-Aspect Candidates for Repositioning: Data Fusion Methods Using Heterogeneous Information, Current Medicinal Chemistry, 2013 Bence Bolgár, Ádám Arany, Gergely Temesi, Balázs Balogh, Péter Antal, Péter Mátyus: Drug repositioning for treatment of movement disorders: from serendipity to rational discovery strategies, Current Topics in Medicinal Chemistry, in press Gergely Temesi, Bence Bolgár, Ádám Arany, Balázs Balogh, Csaba Szalai, Péter Antal, Péter Mátyus: Early repositioning through Compound Set Enrichment, Future Medicinal Chemistry, in prep. 23

M. Gerstein, "E-publishing on the Web: Promises, pitfalls, and payoffs for bioinformatics," Bioinformatics, 1999 M. Gerstein: Blurring the boundaries between scientific 'papers' and biological databases, Nature, 2001 P. Bourne, "Will a biological database be different from a biological journal?," Plos Computational Biology, 2005 M. Gerstein et al: "Structured digital abstract makes text mining easy," Nature, M. Seringhaus et al: "Publishing perishing? Towards tomorrow's information architecture," Bmc Bioinformatics, M. Seringhaus: "Manually structured digital abstracts: A scaffold for automatic text mining," Febs Letters, D. Shotton: "Semantic publishing: the coming revolution in scientific journal publishing," Learned Publishing,

25

Adatelemzési tudásbázis Wiki Valószínűségi adatbázis BayesEye Jellemzők: –Többszintű struktúra Szabadszöveg Valószínűségi adatbázis Kiértékelő felület –Kapcsolat a rétegek között Előnyök: Újrafelhasználhatóság -> megszerzett tudás nem vész el Utófeldolgozás támogatása -> adat erejének bemutatása Bayes hálók, relevancia szintek, más szempontok,… Döntéstámogatás

 Positivism (19th century-) ◦ experience-based knowledge  Logical positivism (1920-) ◦ L. Wittgenstein: all knowledge should be codifiable in a single standard language of science + logic for inference  Data/www-driven positivism ◦ Data are available in public repositories ◦ Scientific papers are available public repositories ◦ In a formal, single probabilistic representation the results of statistical data analyses are available in KBs ◦ In a formal, single probabilistic representation models, hypotheses, conclusions linked to data are available in KBs ◦ …

 Common big data provides ◦ detailed (quantitative, deep, complex) phenotype ◦ NEW ENDPOINTs 28  Common big data provides „common sense” ◦ A.M.Turing: Computing Machinery and Intelligence, 1950 ◦ D. Lenat: The CYC project, 1990<

29

30 knowledgE *: Thanks to my colleagues at the Computational Biomedicine (Combine) workgroup: Ádám Arany, Bence Bolgár, András Gézsi, András Millinghoffer, Gergely Temesi, Péter Marx, Péter Sárközy, Gábor Hullám Harvesting  „Big” data = „Multilevel, omic” data  Data and knowledge fusion ◦ From correlation to causation ◦ Kernel methods ◦ Semantic publication  Synergies of biomedical and common big data ◦ Detailed (deep, quantitative, structured) phenotype ◦ Common sense(?)

31