Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE CBMS06, DM Track Salt Lake City, Utah 22.06.06 Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham.

Similar presentations


Presentation on theme: "IEEE CBMS06, DM Track Salt Lake City, Utah 22.06.06 Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham."— Presentation transcript:

1 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 1 Dynamic Integration of Classifiers for Handling Concept Drift Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland Seppo Puuronen Dept. of CS and IS University of Jyväskylä Finland Mykola Pechenizkiy Dept. of Mathematical IT University of Jyväskylä Finland IEEE CBMS06: DM Track Salt Lake City, Utah, USAJune 21-23, 2006 Pádraig Cunningham Department of Computer Science Trinity College Dublin Ireland

2 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 2 Outline Introduction –Supervised Learning –The Problem of Concept Drift (CD) Approaches to Handle CD: –Instance selection; instance weighting; and ensemble learning Dynamic Integration of Classifiers for Handling CD –Dynamic Selection, Dynamic Integration, and their mix Domain of Antibiotic resistance –How resistance occurs, concept drift context Experiments design –C4.5 ensembles with static and dynamic integration Results and Conclusion

3 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 3 CLASSIFICATION New instance to be classified Class Membership of the new instance J classes, n training observations, p features Given n training instances ( x i, y i ) where x i are values of attributes and y is class Goal: given new x 0, predict class y 0 Training Set The task of classification Examples: - diagnosis of thyroid diseases; - heart attack prediction, etc.

4 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 4 The Task of Classification Predicting Antibiotic Resistance –predict the sensitivity of a pathogen to an antibiotic based on data about the antibiotic, the isolated pathogen, and the demographic and clinical features of the patient.

5 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 5 The Problem of Concept Drift Changes in the hidden context can induce more or less radical ( gradual or abrupt ) changes in the target concept –A typical example – antibiotic resistance: pathogen sensitivity may change over time as new pathogen strains develop resistance to antibiotics that were previously effective –Even in most strictly controlled environments some unexpected changes may happen due to: fail and replacement of some medical equipment, or changes in personnel, causing the necessity to change the model. –The necessity in the change of current model due to the change of data distribution is called virtual concept drift An effective learner should be able to track such changes and to quickly adapt to them.

6 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 6 Approaches to Handle Concept Drift instance selection: –select instances relevant to the current concept; –generalizing from a moving window and uses the learnt concepts for prediction only in the immediate future; –case-base editing strategies in CBR that delete noisy, irrelevant and redundant cases; instance weighting: –weighting according to age, and competence wrt the current concept; –weighting techniques handle CD worse than analogous instance selection techniques (due overfitting the data); ensemble learning: –maintains a set of concept descriptions, predictions of which are combined using e.g. a form of voting; –dividing the data into sequential blocks of fixed size and building an ensemble on them.

7 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 7 Handling Concept Drift with Ensembles Ensemble is constructed as a set of concept descriptions corresponding to different time intervals: time training set for next base classifier Usually simple voting is used for model combination –does not work in complex domains with local concept drift Our basic idea: use local accuracies for model combination in order to handle local concept drift –adapts to concept drift better (e.g. with antibiotic resistance data)

8 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 8 Local Concept Drift In the real world, concept drift may often be local, –changes in the concept or data distribution occur in some regions of instance space only, only particular bacteria may develop their resistance to certain antibiotics, while resistance to the others could remain the same. –the type and severity of changes may depend on the location in the instance space. Local CD - changes in concept and data distribution occurring at an instance rather than data set level. –Local CD occurs between two consecutive time points if there is a sub-space of the whole instance space such that it has different changes of concept and/or data distribution in comparison with the rest of the data. –This is reflected by a different change in (local) predictive performance of currently used model in this sub-space.

9 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 9 Stability of Regions: Rotating Hyperplane Base models of an ensemble should not be discarded if - global accuracy on the current block of data falls, but they are still good experts in the stable parts of the data. One solution to this problem is the use of DIC: - the models are integrated at an instance level according to their local accuracies. Stability of regions in the rotating hyperplane problem

10 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 10 Local Concept Drift: Most gradual CDs may be considered local, if: –the velocity of changes is small relative wrt. arriving instances in the data stream; –most regions of the data remain stable. Most abrupt CDs are –not local unless substantial sub-areas remain stable between the two changing concepts. –local, if it relates to a subgroup of the whole population. CD may also be complex, - different concept or data distribution changes (potentially also differently!) in different clusters –changes in AR and data distribution are usually different for different bacteria in the AR problem. Local CD occurs at an instance level –its treatment should be at that level as well! Potential approaches to handle local CD: –CBR: a case base is updated at an instance level; –a hybrid of ensemble learning and instance selection – Ensemble integration based on local accuracies

11 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 11 How Antibiotic Resistance Happens

12 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 12 How Antibiotic Resistance Happens In spontaneous DNA mutation, bacterial DNA may mutate spontaneously. Drug- resistant tuberculosis arises this way. In a form of microbial sex called transformation, one bacterium may take up DNA from another bacterium. Pencillin-resistant gonorrhea results from transformation. Resistance acquired from a small circle of DNA called a plasmid, that can flit from one type of bacterium to another. –A single plasmid can provide a slew of different resistances. –In 1968, 12,500 people in Guatemala died in an epidemic of Shigella diarrhea. The microbe harbored a plasmid carrying resistances to four antibiotics!

13 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 13 Data Collection & Organization N.N. Burdenko Institute of Neurosurgery Bacterial analyzer Vitek-60 (by bioMérieux ) Information Systems: "Microbiologist" & "Microbe" Each instance: one sensitivity test: – pathogen that is isolated during the bacterial identification analysis, – antibiotic that is used in the sensitivity test –the result of the sensitivity test itself (sensitive, resistant or intermediate), obtained from Vitek according to the guidelines of (NCCLS). –The above information is connected with patient, his or her demographical data ( sex, age ) and hospitalization in the Institute ( main department, days spent in ICU, days spent in the hospital before test, etc.) sensitivity tests corresponding to a single specimen (liquor) including the meningitis cases of the year

14 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 14 Classification over Sequential Data Blocks accuracy for C4.5 ensembles

15 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 15 Weighted Average of Classification Accuracy C4.5 ensembles

16 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 16 Summary and Conclusions In the real world concepts are often not stable but change with time, which is known as the problem of concept drift (CD). Among the most popular and effective approaches to handling CD is ensemble learning: –a set of concept descriptions built on data blocks corresponding to different time intervals is maintained, and –the final prediction is the aggregated prediction of ensemble members. We suggested a dynamic integration approach for ensembles (DIC) used in handling CD: –integrates the base classifiers at an instance level, assigning to them weights proportional to their local accuracy on each instance considered. We considered an example of CD from the area of antibiotic resistance. We demonstrated that DIC often results in better accuracy with the considered data set than the more commonly used weighted voting: –this supports our hypothesis that favors DIC for handling CD, especially in the presence of local CD.

17 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 17 Contact Info Mykola Pechenizkiy Department of Mathematical Information Technology, University of Jyväskylä, FINLAND THANK YOU! MS Power Point slides of this and other recent talks and full texts of selected publications are available online at:

18 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 18 Additional Slides …

19 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 19 Antibiotic Resistance in Nosocomial Infections % of patients admitted to hospital acquire an infection during their stay, and that the risk for hospital-acquired infection, or nosocomial infection, has risen steadily in recent decades. The frequency depends mostly on the type of conducted operation being greater for dirty operations (10-40%), and smaller for pure operations (3-7%). E.g. such serious infectious complication as postoperative meningitis is often the result of nosocomial infection. Antibiotics are the drugs that are commonly used to fight against infections caused by bacteria. According to the Center for Disease Control and Prevention (CDC) statistics, more than 70% of the bacteria that cause hospital-acquired infections are resistant to at least one of the antibiotics most commonly used to treat infections. Analysis of the microbiological data included in antibiograms collected in different institutions over different periods of time is considered as one of the most important activities to restrain the spreading of antibiotic resistance and to avoid the negative consequences of this phenomenon.

20 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 20 Antibiotic sensitivity of different bacteria Comparing the antibiotic sensitivity of different bacteria © Jim Deacon, Institute of Cell and Molecular Biology, The University of Edinburgh

21 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 21 The emergence of antibiotic resistance Effects of different antibiotics on growth of a Bacillus strain. The right-hand image shows a close-up of the novobiocin disk (marked by an arrow on the whole plate). In this case some individual mutant cells in the bacterial population were resistant to the antibiotic and have given rise to small colonies in the zone of inhibition. © Jim Deacon, Institute of Cell and Molecular Biology, The University of Edinburgh

22 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 22 How Antibiotic Resistance Happens Horizontal Gene Transfer (© Grace Yim and Fan Sozzi)

23 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 23 Mechanisms of Antibiotic Resistance © Grace Yim and Fan Sozzi

24 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 24 Mechanisms of Antibiotic Resistance AntibioticMethod of resistance Chloramphenicolreduced uptake into cell Tetracyclineactive efflux from the cell β-lactams, Erythromycin, Lincomycin eliminates or reduces binding of antibiotic to target β-lactams, Erythromycinhydrolysis Aminoglycosides, Chloramphenicol, Fosfomycin, Lincomycin inactivation of antibiotic by enzymatic modification β-lactams, Fusidic Acid sequestering of the antibiotic by protein binding Sulfonamides, Trimethoprim metabolic bypass of inhibited reaction Sulfonamides, Trimethoprim overproduction of antibiotic target (titration) Bleomycin binding of specific immunity protein to antibiotic

25 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 25 Dataset Characteristics Patient and hospitalization related Sex{Male, Female} AgeInteger Recurring stay{True,False} Days of stay in NSIInteger Days of stay in ICUInteger Days of stay in NSI before specimen was receivedInteger Bacterium is isolated when patient is in ICU{True,False} Main department{1,…,10} Department of stay (departments + ICU){1,…,11} Pathogen and pathogen groups Pathogen name{Pat_name1, …, Pat_name17} Gram(+/- ){True,False} Staphylococcus{True,False} Enterococcus{True,False} Enterobacteria{True,False} Nonfermenters{True,False} Antibiotic and antibiotic groups Antibiotic name{Ant_name1, …, Ant_name39} Group1{True,False} …… Group15{True,False} sensitivity{Sensitive, Intermediate, Resistant}

26 IEEE CBMS06, DM Track Salt Lake City, Utah Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham and S. Puuronen 26 Experiment design In Naïve Bayes, a normal distribution was assumed for numeric features, and the Laplace correction with a multiplicative factor of 1 was used in probability estimation for categorical features. C4.5 decision trees were built using 0.25 as the confidence factor for pruning and 2 as the minimum number of instances per leaf. With all ensembles considered here we use the simple so-called replace the loser ensemble pruning strategy. –if the ensemble size is greater than or equal to 25, the worst classifier, according to the current validation estimates, is replaced with a new one trained on the most recent data. We experimented with 5 different sizes of neighbourhood k; 7, 15, 31, 63, and 127. –Naturally, usually accuracy decreases with the increase in the size of neighbourhood, becoming closer to static voting. –Our experiments demonstrated that DIC was not very sensitive to the size of neighbourhood. –A reason for that is the locally weighted learning scheme used, with which the more distant an instance is from the current test instance, the less influence it will have on the prediction of local performance. –However, the smaller neighbourhoods (7 and 15) sometimes result in noisy performance estimates and inferior accuracies (especially with DS). –We continue our analysis of experimental results focusing on the size of neighbourhood equal to 31, as usually it gives the best improvement due to DIC in the problems considered. WEKA3 environment: Data Mining Software in Java: –http://www.cs.waikato.ac.nz/ml/weka/http://www.cs.waikato.ac.nz/ml/weka/ –Default settings were used in the WEKA learning algorithms used in our experiments.


Download ppt "IEEE CBMS06, DM Track Salt Lake City, Utah 22.06.06 Dynamic Integration of Classifiers for Handling Concept Drift by A. Tsymbal, M. Pechenizkiy, P. Cunningham."

Similar presentations


Ads by Google