1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
P. Kacsuk, G. Sipos, A. Toth, Z. Farkas, G. Kecskemeti and G. Hermann P. Kacsuk, G. Sipos, A. Toth, Z. Farkas, G. Kecskemeti and G. Hermann MTA SZTAKI.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Mark Goadrich Computer Science and Mathematics
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
University of Wisconsin ISMB 2002 Department of Biostatistics Department of Computer Science Mining Three-dimensional Chemical Structure Data Sean McIlwain.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
1 Learning from Behavior Performances vs Abstract Behavior Descriptions Tolga Konik University of Michigan.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
February 13, 1997CWU B.Kovalerchuk1 DESIGN OF CONSISTENT SYSTEM FOR RADIOLOGISTS TO SUPPORT BREAST CANCER DIAGNOSIS.
Uncovering Age-Specific Invasive and DCIS Breast Cancer Rules Using Inductive Logic Programming Houssam Nassif, David Page, Mehmet Ayvaci, Jude Shavlik,
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
The Electronic Geometry Textbook Project Xiaoyu Chen LMIB - Department of Mathematics Beihang University, China.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Automated Theory Formation in Bioinformatics Simon Colton Computational Bioinformatics Lab Imperial College, London.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Integrating Machine Learning and Physician Knowledge to Improve the Accuracy of Breast Biopsy Inês Dutra University of Porto, CRACS & INESC-Porto LA Houssam.
Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz- Correia, E. Sousa, D. Page, E. Burnside, and I.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.
INFSO-RI Enabling Grids for E-sciencE Clinical Decision Support Systems Pilot Demo 2nd EGEE Conference Den Haag, November the 24,
INFSO-RI Enabling Grids for E-sciencE Status of the Biomedical Applications in EELA Project (E-Infrastructures Shared Between Europe.
Learning Metabolic Network Inhibition using Abductive Stochastic Logic Programming Jianzhong Chen, Stephen Muggleton, José Santos Imperial College, London.
Working Group 4 Creative Systems for Knowledge Management in Life Sciences.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Gleaning Relational Information from Biomedical Text Mark Goadrich Computer Sciences Department University of Wisconsin - Madison Joint Work with Jude.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Design of an Expert System for Enhancing.
ECML/PKDD 2003 Discovery Challenge Attribute-Value and First Order Data Mining within the STULONG project Anneleen Van Assche, Sofie Verbaeten,
Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001.
The Language of Science.  Hypothesis: a prediction that can be tested; an educated guess base on observations and prior knowledge  Theory: a well tested.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP14 Pattern Recognition Miguel Tavares.
Incorporating Artificial Intelligence into Mammography Prediction Louis Oliphant Computer Sciences Department University of Wisconsin-Madison.
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
Learning Ensembles of First- Order Clauses That Optimize Precision-Recall Curves Mark Goadrich Computer Sciences Department University of Wisconsin - Madison.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Web Mining and Semantic Web Web Mining and Semantic Web Pınar Şenkul Dept. of Computer Engineering.
Data Mining and Decision Support
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
SCORE AS YOU LIFT (SAYL) A Statistical Relational Learning Approach to Uplift Modeling Houssam Nassif 1, Finn Kuusisto 1, Elizabeth S. Burnside 1, David.
WP1.4 Index and Search George Kakaletris University of Athens.
CS 9633 Machine Learning Inductive-Analytical Methods
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Grid Computing.
Presentation transcript:

1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

2 Outline Introduction –ILP –Examples –Motivation Experiments Conclusions Future Work

3 Introduction EELA selected application Task 3.3: additional applications

4 Introduction What is ILP? –It is NOT Instruction Level Parallelism –It is NOT Integer Linear Programming So, what is it????

5 Introduction It is Inductive Logic Programming –data mining –machine learning –Knowledge/information extraction Where: –Given: Set of observations (positive and negative) Background knowledge (descriptions) Language bias –Find: A hypothesis (in first order language) that best explains all positive observations and none of the negatives.

6 Introduction Advantages: –Use of an understandable description language –Relational knowledge

7 Introduction: example TRAINS GOING EASTTRAINS GOING WEST

8 Introduction: example short(car_12). closed(car_12). long(car_11). long(car_13). short(car_14). open_car(car_11). open_car(car_13). open_car(car_14). shape(car_11,rectangle). shape(car_12,rectangle). shape(car_13,rectangle). shape(car_14,rectangle). load(car_11,rectangle,3). load(car_12,triangle,1). load(car_13,hexagon,1). load(car_14,circle,1). wheels(car_11,2). wheels(car_12,2). wheels(car_13,3). wheels(car_14,2). has_car(east1,car_11). has_car(east1,car_12). has_car(east1,car_13). has_car(east1,car_14).

9 Introduction: example TRAINS GOING EASTTRAINS GOING WEST

10 Introduction: example eastbound(T) IF has_car(T,C) AND short(C) AND closed(C) TRAINS GOING EASTTRAINS GOING WEST

11 Another less “toyish” example: extracting knowledge from mammograms is_malignant(A) if 'BIRADS_category'(A,b5), 'MassPAO'(A,present), 'Age'(A,age6570), previous_finding(A,B,C), 'MassesShape'(B,none), 'Calc_Punctate'(B,notPresent), previous_finding(A,C), 'BIRADS_category'(C,b3). This rule states that finding (A) IS malignant IF it is: classified as BI-RADS 5 AND had a mass present in a patient who: was between the ages of 65 and 70 had two prior mammograms (B, C) and prior mammogram (B): had no mass shape described had no punctate calcifications and prior mammogram (C) was classified as BI-RADS 3

12 Introduction: Motivation Applications: –Link discovery –Social Network Analysis –Equivalent identities –Drug design –Protein unfolding –Protein metabolism –Why not? Classifying grid failures ( ) –And...many others!

13 Introduction: Motivation Why does ILP need a grid? –Search space can become large very quickly –Need many experiments to have statistical significant results Cross-validation Training, tuning, testing –Can combine classifiers: ensembles

14 Introduction: Motivation Assume we want to run a task for one domain: find a “good” hypothesis that describes pos examples Assume we run 5x4-fold cross-validation Assume we have 100 classifiers per fold # of experiments: 2,000

15 Introduction: Motivation Now assume each experiment takes 1 hour to run How long would it take to generate the 2,000 classifiers to be combined? ~ 83 days!!! If we consider varying learning parameters and learning algorithms, this number can be really big!!

16 Experiment Predict carcinogenecity in rodents –Difficult task –large search space! –Important problem Phase 1: –Tuning using 5x4-fold cross-validaton –Generating ensembles up to 100 Aleph: well-known ILP system Yap: Yet another prolog

17 Experiment: one of the classifiers active(A) if atom(A,_,n,32,B), B ≤ , has_property(A,cytogen_sce,n), methyl(A,_). Sister Chromatid Exchange (SCE) SCE is used for the determination of mutagenity

18 Experiment 2 submissions: –From LA –From EU

19 Submitting jobs from LA....

20 Experiment EELA resources utilised Resource# of jobs CERN1,160 CIEMAT279 CETA-CIEMAT173 UniCan98 LIP10 INFN38 UNAM16 BIOF.UFRJ159 IF.UFRJ8 UFCG28 Total1,969 ~ 300 resources in LA 211 jobs in LA

21 Experiments Why 1,969 out of 2,000??? 2 reasons: –Proxy expiration: On submission (takes loooooong!!!) On execution –Use of dynamic libraries

22 Submitting jobs from EU... from a non-EELA site, BUT Using the EELA VO: –Jobs run only on EU resources... Reasons: –Misconfiguration? –Closer brokers with more machines?

23 Conclusions Happiness: EELA is working!!! We can run thousands of experiments! Frida is happy!!! (see Condor introductory tutorials, if you feel curious about Frida ) Experiment showed good utilization of EELA resources in LA and EU Low failure rate (1%) Failures motivated by: –Dynamic libs not available in the remote machine –Proxy expiration

24 Future work More detailed analysis of jobs and logs Full ILP experiment More domains Other kinds of experiments based on Statistical Relational Learning And, do not forget: ILP can help to model and diagnose errors in the grid environment!

25 Collaborators Fernando Silva (DCC-UPorto) Vítor Santos Costa (DCC-UPorto) Rui Camacho (FE-UPorto) Nuno Fonseca (IBMC/IBMEC, Porto) Beth Burnside (UW-Madison hospital) David Page (UW-Madison) Jesse Davis (UWashington)

26 Thanks!!! Questions??