CONTENTS 1. Introduction 2. The Basic Checker-playing Program

CONTENTS 1. Introduction 2. The Basic Checker-playing Program
3. Rote Learning and Its Variants 4. Learning Procedure Involving Generalizations 5. Rote Learning vs. Generalization

INTRODUCTION General Methods of Approach
Choice of Problem : ‘Checkers’ Heuristic procedures A definite goal (final goal) at least one intermediate goal (criterion) Definite rules of activity The learning process can be tested Familiar & understandable

The Basic Checker-playing Program
General method from ‘Shannon, 1950’ as applied to chess 1. Alternatives Which alternative moves are to be considered? 2. Analysis a. Which continuations are to be explored and to what depth? b. How are positions to be evaluated in terms of their patterns? c. How are the evaluations to be integrated into a single value for an alternative? 3. Final choice procedure What procedure is to be used to select the final preferred move?

The Basic Checker-playing Program (Cont’d)
<< Ply Number >> +20 1 : Proposed move by Machine +20 +3 -70 +15 2 : Anticipated reply by Opponent +100 +20 +4 +3 -10 -70 +7 +15 3 : Proposed move by Machine +100 +50 +20 -7 +4 -3 +3 -10 -20 -70 -100 +3 +7 +15 -5 Exploration to ply level 3 Evaluation with scoring polynomial Selection of alternative by ‘minimax’ procedure

Ply Limitations depends on the board conditions a. Set a minimum distance b. When the next move is a jump, the last move is a jump, an exchange offer is possible, program continues looking ahead. desired results

Other Modes of Play Have program play both sides of the game Follow book games evaluation of book move and proposed move by machine (correlation coefficient) Have program play several simultaneous games against different opponents

Scoring polynomial a. Measure of intermediate goals b. Linear polynomial: sum of terms multiplied by coefficients f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters

Scoring polynomial (Cont’d) c. Each term relates to the relative standings of the two sides, with respect to the parameter in question; difference between the ratings for the individual sides. d. Dominant parameters: inability to move, relative piece advantage

+20 << Ply Number >> 1 2 3 +20 Selection of the best next move depends on the evaluation process. Learning involves improving the evaluation as a result of ‘experiences’ .

Rote Learning and Its Variants
Storage scheme Simply save all of the board positions encountered during play, together with their computed scores. Reference is made to this memory record Improvement Reduce computing time Looking much farther in advance Sense of direction

Rote Learning and Its Variants (Cont’d)
+20 Board position score +20 +15 …. …. +20 +20 Ply level 6 Learning Improvement

Cataloging & Culling Stored Information Limit the the number of boards that can be saved & Long search time a. catalog boards that are saved Standardizing & Grouping b. delete redundancies c. discard board positions Method based on frequency of use: Refreshing & Forgetting Method based on ply: cull lowest-ply board positions

Rote-learning Tests Conclusions: a. A sense of direction & refined system of cataloging and storing information b. Efficiency depends on the data handling capacity of computer c. More information must be stored to improve midgame play d. Game/ suitable vehicle for use during development of learning techniques

Learning Procedure Involving Generalizations
An obvious way to decrease the amount of storage needed to utilize the past experience is to generalize on the basis of experience and to save only the generalizations. Generalize on experience after each move by adjusting the coefficients in the evaluation polynomial and by replacing terms which appear to be unimportant by new parameters drawn from a reserve list.

Learning Procedure Involving Generalizations (Cont’d)
backed-up score A Scoring System Y=f(x) X: current board position Y: an estimate for backed-up score +20 +20 Ply level 6 Evaluation Improvement Learning

Back-up score from ply level 3 Board position +20 +15 score …. Board position Backed-up score Function (scoring system) f(x,c) : linear polynomial

Scoring Polynomial for generalization: f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) g(x): terms selected from a list of 38 parameters c: coefficients which multiply these parameters Learning procedure involves, after each move, adjusting the coefficients replacing terms which appear to be unimportant by new parameters

Training Alpha (with learning) & Beta program (without learning) determine relative ability of Alpha manual intervention (arbitrary change in scoring polynomial)

Polynomial Modification Procedure Initial scoring polynomial f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x) At a given board position(xk), a. compute the scoring polynomial (f(xk,c)) and save this polynomial. b. compute the backed-up score(yk), using the look-ahead procedure

Polynomial Modification Procedure (Cont’d) Delta = yk - f(xk,w) indicator of change used to check the scoring polynomial and adjust weight(coefficient) for each term in polynomial check the scoring polynomial, using delta

Polynomial Modification Procedure (Cont’d) Adjustment in the values of coefficient a. Correlation beween the signs of the individual term contributions in the initial polynomial and the sign of delta b. Adjustment in consideration of Number of times that each term has been used and has had nonzero value. If delta is positive, terms which contributed positively should have been given more weight, while those that contributed negatively should have been given less weight. c. The coefficient for the term with the largest correlation coefficient is set at a prescribed maximum value, with proportionate values determined for all of the remaining coefficients.

Instabilities Stabilizing against minor variations in the delta values set an arbitrary minimum value of delta fixed at the average value of the coefficients for the terms in the currently existing evaluation polynomial. Stabilizing violent fluctuations, when a new term is introduced replace the times-used number by an arbitrary number, until the usage does, in fact, equal this number.

Term Replacement Low-term tally against the lowest correlation coefficient Is it a satisfactory scheme to select terms for the evaluation polynomial? Binary Connective Terms Combinational, nonlinear terms

Preliminary Learning-by-generalization Tests Learning procedure did work and learning rate was high. Learning was quite erratic and none too stable.

Second Series of Tests Four Modifications for improving stability Conclusions a. effective learning device for problem to amenable to tree-searching procedures. b. modest memory requirements & reasonable operating time c. instability can be dealt with by straight-forward procedures. d. machine can learn to play a better-than-average game of checkers

Rote Learning vs. Generalization
Improvement is made by increasing data storage Good opening play and end-game play poor middle game Learning-by-generalization: Generalization on the experience by adjusting a scoring system Good opening play and end-game play poor middle game

1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 28 29 30 31 32 33 34 35

CONTENTS 1. Introduction 2. The Basic Checker-playing Program

Similar presentations

Presentation on theme: "CONTENTS 1. Introduction 2. The Basic Checker-playing Program"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CONTENTS 1. Introduction 2. The Basic Checker-playing Program

Similar presentations

Presentation on theme: "CONTENTS 1. Introduction 2. The Basic Checker-playing Program"— Presentation transcript:

Similar presentations

About project

Feedback