Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Similar presentations


Presentation on theme: "Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers."— Presentation transcript:

1 Supervised Learning I, Cont’d

2 Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers in the ML field Might be of interest. All are welcome Meets Fri, 3:00-4:30, FEC349 conf room More info: http://www.cs.unm.edu/~terran/research/reading_group/ Lecture notes online Pretest/solution set online

3 5 minutes of math... Solve the linear system

4 5 minutes of math... What if this were a scalar equation?

5 5 minutes of math... Not much different for linear systems Linear algebra developed to make working w/ linear systems as easy as working w/ linear scalar equations BUT matrix multiplication doesn’t commute! NOTE! not

6 5 minutes of math... So when does this work? When does a solution for V exist/unique? Think back to scalar version: When does this have a solution? What’s the moral equivalent for linear systems?

7 5 minutes of math... The moral equivalent of a scalar “0” is a “singular matrix” Many ways to determine this. Simplest is the determinant: System has (unique) solution iff

8 5 minutes of math... Finally, what “shapes” are all of the parts? RHS and LHS must have same shape... So R must be a column vector What about c T V ? Column vector

9 5 minutes of math... Consider some cases. What if T is a vector? What about a rectangular matrix?

10 5 minutes of math... ⇒ For the term c T V to be a column vec, T must be a square matrix:

11 Feature (attribute): Instance (example): Label (class): Feature space: Training data: Review of notation

12 Hypothesis spaces The “true” we want is usually called the target concept (also true model, target function, etc.) The set of all possible we’ll consider is called the hypothesis space, NOTE! Target concept is not necessarily part of the hypothesis space!!! Example hypothesis spaces: All linear functions Quadratic & higher-order fns.

13 Visually... Space of all functions on Might be here Or it might be here...

14 More hypothesis spaces Rules if (x.skin==”fur”) { if (x.liveBirth==”true”) { return “mammal”; } else { return “marsupial”; } } else if (x.skin==”scales”) { switch (x.color) { case (”yellow”) { return “coral snake”; } case (”black”) { return “mamba snake”; } case (”green”) { return “grass snake”; } } } else {... }

15 More hypothesis spaces Decision Trees

16 More hypothesis spaces Decision Trees

17 Finding a good hypothesis Our job is now: given an in some and an, find the best we can by searching Space of all functions on

18 Measuring goodness What does it mean for a hypothesis to be “as close as possible”? Could be a lot of things For the moment, we’ll think about accuracy (Or, with a higher sigma-shock factor...)

19 Constructing DT’s, intro Hypothesis space: Set of all trees, w/ all possible node labelings and all possible leaf labelings How many are there? Proposed search procedure: 3. Propose a candidate tree, 4. Evaluate accuracy of w.r.t. and 5. Keep max accuracy 6. Go to 1 Will this work?

20 A more practical alg Can’t really search all possible trees Instead, build tree greedily and recursively: DecisionTree buildDecisionTree(X,Y) Input: InstanceSet X, LabelSet Y Output: decision tree if (pure(X,Y)) { return new Leaf(Y); } else { Attribute a=getBestSplitAttribute(X,Y); DecisionNode n=new DecisionNode(a); [X1,..., Xk, Y1,..., Yk]=splitData(X,Y,a); for (i=1;i<=k;++i) { n.addChild(buildDecisionTree(Xi,Yi)); } return n; }

21 A bit of geometric intuition x1: petal length x2: sepal width

22 The geometry of DTs Decision tree splits space w/ a series of axis orthagonal decision surfaces A.k.a. axis parallel Equivalent to a series of half-spaces Intersection of all half-spaces yields a set of hyper-rectangles (rectangles in d>3 dimensional space) In each hyper-rectangle, DT assigns a constant label So a DT is a piecewise-constant approximator over a sequence of hyper-rectangular regions

23 Filling out the algorithm Still need to specify a couple of functions: pure(X) Determine whether we’re done splitting set X getBestSplitAttribute(X,Y) Find the best attribute to split X on pure(X) is the easy (easier, anyway) one...

24 Splitting criteria What properties do we want our getBestSplitAttribute() function to have? Increase the purity of the data After split, new sets should be closer to uniform labeling than before the split Want the subsets to have roughly the same purity Want the subsets to be as balanced as possible These choices are designed to produce small trees Definition: Learning bias == tendency to find one class of solution out of H in preference to another


Download ppt "Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers."

Similar presentations


Ads by Google