Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11 Incremental Learning Friday, November 16 James Plummer jwp1924@ksu.edu Reference Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997.

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Outline Machine Learning –Extracting information from data –Forming concepts The Data –Arrangement of Data Attributes, Labels, and Instances –Categorization of Data –Results MLJ ( Machine Learning in Java ) –Collection of Machine Learning algorithms –Current Inducers Incremental Learning –Description of technique –Nearest Neighbor Algorithm –Distance-Weighted Algorithm Advantages and Disadvantages –Gains and Loses.

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Machine Learning Sometimes called Data Mining The process of extracting useful information from data –Marketing databases, medical databases, weather databases Finding Consumer purchase patterns Used to form concepts –Predictions –Classifications –Numeric Answers

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) The Data Arrangement of Data –A piece of data is a set of attributes a i which make up an instance x j Attributes can be considered evidence –Each instance has a label or category f(x j ) (outcome value) x j = a 1, a 2, a 3,... a i ; f(x j ); –A set of data is a set of instances Categorization –A set of instances is used as control for new query instances x q (training) –Calculate f^(x j ) based on training data f^(x j ) is the predicted value of the actual f(x j ) Results –The number of correctly predicted values over the total number of query instances –f^(x q ) correct / f(x q ) total

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Data Example Predict the values of Example 6, 7, 8 given data examples 1 through 5 ExampleOutlookAir Temp HumidityWindPlay Tennis? 1SunnyHotHighWeakNo 2SunnyHotHighStrongNo 3OvercastHotHighWeakYes 4RainMildHighWeakYes 5RainCoolNormalWeakYes 6RainCoolNormalStrong 7OvercastCoolNormalStrong 8SunnyMildHighWeak Yes No Yes

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) MLJ (Machine Learning in Java) MLJ is a collection of learning algorithms –Inducers Categorize data to learn concepts Currently in Development –ID3 Uses trees –Naïve Bayes Uses complex calculations –C4.5 Uses trees with pruning techniques Incremental Learning –Uses comparison techniques –Soon to be added

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Incremental Learning Instance Based Learning –k-Nearest Neighbor –All instances correspond to points in an n-dimensional space The distance between two instances is determined by: a r (x) is the rth attribute of instance x –Given a query instance x q to be categorized the k-nearest neighbors are calculated –f^(x q ) is assigned the most frequent value of the nearest k f(x j ) –For k = 1, f^(x q ) will be assigned f(x i ) if x i is the closest instance in the space

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Distance-Weighted Nearest Neighbor Same as k-Nearest Neighbor –Effect of f(x j ) on f^(x q ) based on d(x q, x j ) –In the case x q = x i then f^(x q ) = f(x i ) –Examine three cases for the 2 dimensional space to the right k=1 k=5 Weighted, k=5

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Advantages and Disadvantages Gains of using k-Nearest Neighbor –Individual attributes can be weighted differently –Change d(x i, x q ) to allow nearest x i to have stronger of weaker effect on f^(x q ) –Unaffected by noise in training data –Very Effective when provided a large set of training data –Flexible, f^(x q ) can be calculated in many useful ways –Very small training time Loses –Not good when training data is insufficient –Not very effective if similar x i have disimilar f^(x i ) –More computation time need to categorize new instances

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) Referrences Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997. Witten and Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”. Morgan Kaufmann publishers. 2000. * equation reduced for simplicity

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.

Similar presentations

Presentation on theme: "Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.

Similar presentations

Presentation on theme: "Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar."— Presentation transcript:

Similar presentations

About project

Feedback