Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching for Credible Relations in Machine Learning Doctoral Dissertation Vedrana Vidulin Supervisor: prof. dr. Matjaž Gams Co-supervisor: prof. dr. Bogdan.

Similar presentations


Presentation on theme: "Searching for Credible Relations in Machine Learning Doctoral Dissertation Vedrana Vidulin Supervisor: prof. dr. Matjaž Gams Co-supervisor: prof. dr. Bogdan."— Presentation transcript:

1 Searching for Credible Relations in Machine Learning Doctoral Dissertation Vedrana Vidulin Supervisor: prof. dr. Matjaž Gams Co-supervisor: prof. dr. Bogdan Filipič Ljubljana, 3 February 2012

2 2 of 20 Searching for Credible Relations in Machine Learning Introduction Task: domain analysis of complex domains Problem: –When DM methods construct models on complex domains, the models often contain parts (relations) that are less-credible from the perspective of human analyst. –Less-credible parts can: Lead to wrong conclusions about the most important relations in the domain Undermine users trust in DM methods (Stumpf et al., 2009). Proposed solution: a new method that in algorithmic way combines human understanding and raw computer power in order to extract credible relations – supported by data and meaningful for the human.

3 3 of 20 Searching for Credible Relations in Machine Learning An Example A decision-tree model is constructed: –With J48 algorithm in Weka, –From a data set that represents the impact of R&D sector on economic welfare of a country CountryGERD per capita (PPP$) Researchers per million inhabitants (HC) … Sector investing the most in R&D GNI per capita Armenia7.61,660…Governmentlow Latvia37.12,455…Governmentmiddle Japan813.76,227Business enterprisehigh ……………… 37 attributes: R&D sector 167 examples: Countries Class: Economic welfare

4 4 of 20 Searching for Credible Relations in Machine Learning An Example (2)

5 5 of 20 Searching for Credible Relations in Machine Learning Outline Definition of credible relation Human-Machine Data Mining (HMDM) method Experimental evaluation Conclusions and contributions

6 6 of 20 Searching for Credible Relations in Machine Learning Credible Relation Relation – a pattern that connects a set of attributes that describe the properties of a concept underlying the data and a class/target attribute that represents the concept. Credible relation – of great meaning and of high quality: –Meaning – a subjective criterion attributed by the human based on the common sense, an informal knowledge about the domain, observed frequency and stability of the relation. –Quality – an objective criterion that indicates a support of the selected quality measures. Credible model – composed only of credible relations.

7 7 of 20 Searching for Credible Relations in Machine Learning How to Establish Credible Relations? The relation is composed of attributes A 1 and A 2. If the relation is supported by evidence, add it to the list of candidates for credible relations.

8 8 of 20 Searching for Credible Relations in Machine Learning The HMDM Algorithm Until no new interesting relations Repeat Create several models (e.g., trees) Choose most interesting models For each interesting model Examine credibility of relations in the model by adding and removing attributes from the data set Merge candidate relations with the output list of credible relations

9 9 of 20 Searching for Credible Relations in Machine Learning The HMDM Algorithm (2) HMDM (data set) REPEAT Select DM method Select parameters and their ranges, define constraints Perform INITIAL_DM creating a list of models LM: FOR each interesting model M from LM, reexamine M: REPEAT Perform any of the following: { ADD_ATTRIBUTES REMOVE_ATTRIBUTES Expand credibility indicator } Evaluate the results with several quality measures and for meaning UNTIL no more interesting relations are found in the search space near the initial model Store credible relations and integrate conclusions END FOR UNTIL no more new interesting relations are found anywhere in the data set

10 10 of 20 Searching for Credible Relations in Machine Learning HMDM: ADD_ATTRIBUTES A TTRIBUTES A1A1 A2A2 A3A3 C 1101 1101 1010 0110 1101 0010 1000 Quality: Accuracy (%) Model: J48 trees Candidates for credible relations A 1 & A 2 – combination …

11 11 of 20 Searching for Credible Relations in Machine Learning HMDM: REMOVE_ATTRIBUTES Quality: Accuracy (%) A TTRIBUTES A1A1 A2A2 A3A3 C 1011 0100 0100 1011 1011 1111 1111 Model: J48 trees Candidates for credible relations A 1 || A 3 – redundancy …

12 12 of 20 Searching for Credible Relations in Machine Learning Type-Credibility Scheme Three levels of credibility: 1.Frequent and stable relations Often appear in models When added improve quality When removed reduce quality 2.Frequent and less-stable relations Often appear in models When added sometimes improve quality and sometimes not When removed sometimes reduce quality and sometimes not 3.Not supported by evidence

13 13 of 20 Searching for Credible Relations in Machine Learning Quality Measures

14 14 of 20 Searching for Credible Relations in Machine Learning Experimental Evaluation Performed on three domains: 1.Research and development (R&D) 2.Higher education 3.Automatic web genre identification

15 15 of 20 Searching for Credible Relations in Machine Learning R&D Domain: Remove Attributes Graph GERD-PC || GERD-GDP RES-HC || RES-FTE APP-NON-RES

16 16 of 20 Searching for Credible Relations in Machine Learning Domains Higher education –Goal: An analysis of the impact of higher education sector on economic welfare of a country –DM methods: J48 and M5P trees –Data: 60 attributes; 167 examples: countries; class: GNI per capita Automatic web genre identification –Goal: Improve predictive performance by eliminating less- credible relations from J48 decision-tree models –Data: 500 attributes: words; 1,539 examples: web pages; class: 20 genres

17 17 of 20 Searching for Credible Relations in Machine Learning R&D and Higher Education Domains – Credible Relations R&D First level: increase the level of investment in R&D sector Second level: –Increase the number of patents –Increase the number of researchers –Develop business enterprise sector as the key leader in R&D activities Higher education First level: stimulate participation in higher education and improve student exchange programs Second level: –Increase the level of investment in all levels of education (low) –Increase number of graduates in science programs (middle) –Attract more foreign students (middle)

18 18 of 20 Searching for Credible Relations in Machine Learning Evaluation User study on 22 participants: –64% of participants did not recognize less-credible relations in the single model –When presented with credible models all accepted credible models as better Accuracy (%) Data J48HMDM HI-EDU71.86 R&D63.47 Correlation coefficient Data M5PHMDM HI-EDU0.681 R&D0.7220.787 Data: Genres F-MeasureJ48HMDM Micro-AVG0.2800.370 Macro-AVG0.2840.377

19 19 of 20 Searching for Credible Relations in Machine Learning Conclusions A novel method Human-Machine Data Mining (HMDM) was designed that combines human understanding and raw computer power to extract credible relations from data. The HMDM method was evaluated on three complex domains showing that: –the method is able to find important relations in data –credible models are better in quality than the models constructed by automatic DM methods –humans accept credible models

20 20 of 20 Searching for Credible Relations in Machine Learning Contributions The main contributions: –A new method Human-Machine Data Mining (HMDM) was designed for extracting credible relations from data –The CCPE statistical measure, originally conceived for classification rules, was extended for decision trees –Interactive explanation structures in the form of added and removed attributes graphs were designed, conceived to facilitate the extraction of credible relations Additional contributions: –A computer program was developed to support the HMDM method –The analysis of three real-life domains


Download ppt "Searching for Credible Relations in Machine Learning Doctoral Dissertation Vedrana Vidulin Supervisor: prof. dr. Matjaž Gams Co-supervisor: prof. dr. Bogdan."

Similar presentations


Ads by Google