Download presentation
Presentation is loading. Please wait.
Published byLucas Lewis Modified over 8 years ago
1
1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal
2
2 Outline Introduction –ILP –Examples –Motivation Experiments Conclusions Future Work
3
3 Introduction EELA selected application Task 3.3: additional applications
4
4 Introduction What is ILP? –It is NOT Instruction Level Parallelism –It is NOT Integer Linear Programming So, what is it????.......
5
5 Introduction It is Inductive Logic Programming –data mining –machine learning –Knowledge/information extraction Where: –Given: Set of observations (positive and negative) Background knowledge (descriptions) Language bias –Find: A hypothesis (in first order language) that best explains all positive observations and none of the negatives.
6
6 Introduction Advantages: –Use of an understandable description language –Relational knowledge
7
7 Introduction: example TRAINS GOING EASTTRAINS GOING WEST
8
8 Introduction: example short(car_12). closed(car_12). long(car_11). long(car_13). short(car_14). open_car(car_11). open_car(car_13). open_car(car_14). shape(car_11,rectangle). shape(car_12,rectangle). shape(car_13,rectangle). shape(car_14,rectangle). load(car_11,rectangle,3). load(car_12,triangle,1). load(car_13,hexagon,1). load(car_14,circle,1). wheels(car_11,2). wheels(car_12,2). wheels(car_13,3). wheels(car_14,2). has_car(east1,car_11). has_car(east1,car_12). has_car(east1,car_13). has_car(east1,car_14).
9
9 Introduction: example TRAINS GOING EASTTRAINS GOING WEST
10
10 Introduction: example eastbound(T) IF has_car(T,C) AND short(C) AND closed(C) TRAINS GOING EASTTRAINS GOING WEST
11
11 Another less “toyish” example: extracting knowledge from mammograms is_malignant(A) if 'BIRADS_category'(A,b5), 'MassPAO'(A,present), 'Age'(A,age6570), previous_finding(A,B,C), 'MassesShape'(B,none), 'Calc_Punctate'(B,notPresent), previous_finding(A,C), 'BIRADS_category'(C,b3). This rule states that finding (A) IS malignant IF it is: classified as BI-RADS 5 AND had a mass present in a patient who: was between the ages of 65 and 70 had two prior mammograms (B, C) and prior mammogram (B): had no mass shape described had no punctate calcifications and prior mammogram (C) was classified as BI-RADS 3
12
12 Introduction: Motivation Applications: –Link discovery –Social Network Analysis –Equivalent identities –Drug design –Protein unfolding –Protein metabolism –Why not? Classifying grid failures ( ) –And...many others!
13
13 Introduction: Motivation Why does ILP need a grid? –Search space can become large very quickly –Need many experiments to have statistical significant results Cross-validation Training, tuning, testing –Can combine classifiers: ensembles
14
14 Introduction: Motivation Assume we want to run a task for one domain: find a “good” hypothesis that describes pos examples Assume we run 5x4-fold cross-validation Assume we have 100 classifiers per fold # of experiments: 2,000
15
15 Introduction: Motivation Now assume each experiment takes 1 hour to run How long would it take to generate the 2,000 classifiers to be combined? ~ 83 days!!! If we consider varying learning parameters and learning algorithms, this number can be really big!!
16
16 Experiment Predict carcinogenecity in rodents –Difficult task –large search space! –Important problem Phase 1: –Tuning using 5x4-fold cross-validaton –Generating ensembles up to 100 Aleph: well-known ILP system Yap: Yet another prolog
17
17 Experiment: one of the classifiers active(A) if atom(A,_,n,32,B), B ≤ -0.401, has_property(A,cytogen_sce,n), methyl(A,_). Sister Chromatid Exchange (SCE) SCE is used for the determination of mutagenity
18
18 Experiment 2 submissions: –From LA –From EU
19
19 Submitting jobs from LA....
20
20 Experiment EELA resources utilised Resource# of jobs CERN1,160 CIEMAT279 CETA-CIEMAT173 UniCan98 LIP10 INFN38 UNAM16 BIOF.UFRJ159 IF.UFRJ8 UFCG28 Total1,969 ~ 300 resources in LA 211 jobs in LA
21
21 Experiments Why 1,969 out of 2,000??? 2 reasons: –Proxy expiration: On submission (takes loooooong!!!) On execution –Use of dynamic libraries
22
22 Submitting jobs from EU... from a non-EELA site, BUT Using the EELA VO: –Jobs run only on EU resources... Reasons: –Misconfiguration? –Closer brokers with more machines?
23
23 Conclusions Happiness: EELA is working!!! We can run thousands of experiments! Frida is happy!!! (see Condor introductory tutorials, if you feel curious about Frida ) Experiment showed good utilization of EELA resources in LA and EU Low failure rate (1%) Failures motivated by: –Dynamic libs not available in the remote machine –Proxy expiration
24
24 Future work More detailed analysis of jobs and logs Full ILP experiment More domains Other kinds of experiments based on Statistical Relational Learning And, do not forget: ILP can help to model and diagnose errors in the grid environment!
25
25 Collaborators Fernando Silva (DCC-UPorto) Vítor Santos Costa (DCC-UPorto) Rui Camacho (FE-UPorto) Nuno Fonseca (IBMC/IBMEC, Porto) Beth Burnside (UW-Madison hospital) David Page (UW-Madison) Jesse Davis (UWashington)
26
26 Thanks!!! Questions??
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.