Presentation on theme: "Machine learning Overview"— Presentation transcript:
1Machine learning Overview PD. Dr. Gabriella KókaiFriedrich-Alexander-UniversitätLehrstuhl für Informatik 2Raum Tel:
2Machine Learning: Content Why Machine Learning?How can a learning problem be definedDesigning a learning system: learning to play checkerPerspectives and questions in MLSummary
3Why Machine Learning? (1/10) Webster 's definition of 'learn''To gain knowledge, or understanding of, or skill in by study instruction or experience‘Simons' definition (Machine Learning I, 1993, Chapter 2.)'Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more effectively the next time‘Donald Michie's Definition (Computer Journal 1991)'A learning system uses sample data to generate an update basis for improved (performance) on subsequent data from the same source and express the new basis in intelligible symbolic form'
4Why Machine Learning? (2/10) Machine learning is typically thought of as a sup-topic of artificial intelligence.It is inspired by several disciplinesComputerScienceCognitiveScienceMachineLearningPatternRecognitionStatistic
5Why Machine Learning? (3/10) Relevant topics:Artificial Intelligence:Learning: Learning symbolic representation of concepts, ML as search problem , Prior knowledge + training examples guide the learning-processBayesian Methods:Calculating probabilities of the hypotheses, Bayesian-classifierTheory of the computational complexity: Theoretical bounds of the complexity for different learning task measured in the terms of the computational effort, number of different training examples, the number of mistakes required in order to learnInformation theory:Measurement of the entropy, minimal description length, optimal codes and their relationship to optimal training sequences for encoding a hypothesisPhilosophy: Occam's razor suggesting the simpliest hypothesis is the bestPsychology and Neurobiology: Motivation of NN the power law of the practiceStatistics: Characterisation of the errors (e.g. bias,variance), that occur when estimating the accuracy of hypothesis based, confidence interval, statistical testsGoal: Description of the different learning paradigms, the algorithms, the theoretical results and applications
6Why Machine Learning?(4/10) Dimension: ConstraintsTask/objectiveLearning taskPerformance taskAvailability of the background knowledgeEncodedInteractiveAvailability of dataIncremental vs. batchPassive vs. activeCharacteristics of the dataStatic vs. driftingPropositional or first-order
9Why Machine Learning? (7/10) Evaluation MethodologiesMathematicalPreviously: Learning in the limitNow: PAC (Probably Approximately Correct)More tolerantAddresses efficiency constraintsRecent:Best cases analysis (Helpful Teacher Model)Average case analysis (constraining assumption)Empirical:When mathematical analysis isn't obviousPopularData intensivePsychologicalGoal: Model human learning behaviourMethod: Comparison with subject data
10Why Machine Learning? (8/10) Knowledge-Poor Supervised LearningGiven: A training set of annotated instancesTo Induce: A hypothesis (concept description)Knowledge-Intensive Supervised LearningGiven : A set of training instances + a hypothesis of the target concept background knowledgeTo Induce: A modified hypothesis (concept description) that is consistent with the domain theory & the training instancesUnsupervised learning: clusteringGiven: A set of unclassified instances I Have not any special target attributeTo Do: Create a set of clusters for I according to their presumed classes Clusters need not to be disjoint Clusters can be hierarchically related
12Why Machine Learning? (10/10) Importance: How can computers be programmed that they 'learn'Machine learning natural learningApplication areasData mining: automatic detection of regularity in big amounts of dataImplementation of software, which cannot be easily programmed by handSelf adaptive programs: programs for playingTheoretical results: Connection among the number of training examples, the hypothesis and the expected errorBiological studies
13How can the learning problem be defined Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P improves with experience EExample: Learning to play checkerTask T: design a program to learn to play checkerPerformance measure P: The percentage of the games wonExperience E: Playing against itself
14Content Why Machine Learning? How can the learning problem be defined Choosing the training experienceChoosing the target functionChoosing the representation of the target functionChoosing a function approximation algorithmDesigning a learning system: learning to play checkerPerspectives and questions in MLSummary
15Choosing the Training Experience (1/2) What experience is providedDirect or indirect feedback regarding the choices executed by the systemDirect: Individual checker board states and the correct move for eachIndirect: move sequences and final outcomes Problem: determining the degree to which each move in the sequence deserves credit or blame for the final outcome (credit assignment)The rate of the controls of the sequence of the training examples by the learning systemThe teacher selects informative board states and provides the correct move for eachThe learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct moveThe learner may have complete control over both the board states and the (indirect) training classification, as it does when it learns playing against itself with no teacher
16Choosing the Training Experience (2/2) How well does it represent the distribution of examples over which the final system performance P must be measured Problem: The distribution of the training examples is identical to the distribution of the test examplesA checkers learning problem:Task T: playing checkerPerformance measure P: percentage of games won in the world tournamentTraining experience E: games played against itself
17Choosing the Target Function (1/2) What type of knowledge will be learned and how will this be used by the performaning programExample: The program needs to learn how to choose the best move from any board stateChooseMove: B: the set of legal board state M: the set of legal movesProblem: difficult to learn if only the kind of indirect training experience is available to our system => B: the set of legal board states : some real value
18Choosing the Target Function (2/2) Question: Definition of the target function V:If b is a final board state that is won, thenIf b is a final board state that is lost, thenIf b is a final board state that is drawn, thenIf b is not a final state in the game, then where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game (assuming the opponent plays optimally as well).Problem: While this definition specifies a value of V(b) for every board state b recursively, this definition is not usable by our checker's player because it is not efficiently computableSolution: Discovering an operational description of the ideal target function V, Difficult => learning some approximation
19Choosing a Function Approximation Algorithm (1/2) How can be represented?For any given board state, the function will be calculated as a linear combination of weightsbp(p): the number of black pieces on the boardrp(b): the number of red pieces on the boardbk(b): the number of black kings on the boardrk(b): the number of red kings on the boardbt(b): the number of black pieces threatened by red (i.e., which can be captured on red's next turn)rt(b): the number of red pieces threatened by black
20Choosing a Function Approximation Algorithm (2/2) Partial design of a checker learning program:Task T: playing checkerPerformance measure P: percentage of games won in the world tournamentTraining experience E: games played against itselfTarget functionTarget function representation :
21Choosing a Function Approximation Algorithm: Estimating Training Values How to assign training values to the more numerous intermediate board states?Approach: assign the training value of for any intermediate board state b to be , where is the learner's current approximation to V and where Successor(b) denotes the next board state following b for which it is again the program's turn to move.Rule for estimating the training values:
22Choosing a Function Approximation Algorithm: Adjusting the Weights LMS Weight update rule (choosing the weights to best fit the set of training examples)Best fit:minimise the squared error E between the training values and the values predicted by the hypothesis:For each training exampleUse the current weights to calculate:For each update c is a small constant that moderates the size weight update.
23Some Issues in Machine Learning What algorithms can approximate functions well (and when?)How does the number of training examples influence the accuracy?How does the complexity of the hypothesis representation impact it?How does noisy data influence the accuracy?What are the theoretical limits of learnability?How can prior knowledge of the learner help?What clues can we get from a biological learning system?How can systems alter their own representation?
24SummaryGoal: Building computer programs that improve their performance at some task through experienceApplication domain:Data Mining: discover automatically implicit regularities in large data setsPoorly understood domains where humans might not have the knowledge needed to develop effective algorithmsDomains where the program must dynamically adapt to changing conditionsML draws on ideas from several sets of disciplines, including artificial intelligence, probability and statistics, computational complexity information theory, psychology and neurobiology, control theory and philosophyWell defined learning problem = well specified task + performance metric + source of training examples