4Machine Learning Suppose we are to design a robot How is to learn about the world?What is learning?How can it make decisions on what actions to take?
5Types of Question we want an AI construct to answer… Should the robot turn left or right down a roadDo we perform an operation or not?Is this person a terrorist or not?Does it think the world is flat or round? (are scientific laws amenable to inference)?
6Example (from Jaynes)Suppose some dark night a policeman walks down a street, apparently deserted; but suddenly he hears a burglar alarm, looks across the street, and sees a jewellery store with a broken window. Then a gentleman wearing a mask comes crawling out through the broken window, carrying a bag which turns out to be full of expensive jewellery. The policeman doesn't hesitate at all in deciding that this gentleman is dishonest.But by what reasoning process does he arrive at this conclusion?
7Computer Vision What inference should the robot draw from this image? How shall we operationalize this inference (e.g. make a computer program to carry out inference)?
8Discuss How should the robot decide the following: Whether the person shown previously is a burglar?
9Discuss How should the robot decide upon the following actions: Whether to phone the police or not?
10More Generally We would like the robot to be able to rationally decide Whether smoking causes cancerWhether to believe in GodWe would like the robot to be able to also decide how to rationally actIn any situation given the evidence.E.g. robotically guided vehicle (autopilot).
11What is Inference? Dictionary in·fer·ence (ĭn'fər-əns) n. The act or process of deriving logical conclusions from premises known or assumed to be true.The act of reasoning from factual knowledge or evidence.Something inferred.Usage. To draw inferences has been said to be the great business of life.
12InferenceTo draw inferences has been said to be the great business of life. Every one has daily, hourly, and momentary need of ascertaining facts which he has not directly observed; not from any general purpose of adding to his stock of knowledge, but because the facts themselves are of importance to his interests or to his occupations.Introduction from the Longman's 1884 edition of the System of Logic.John Stuart Mill (1843)
13Epistemology"is derived from the Greek words episteme, which means knowledge, and logos, which means theory. It is the branch of philosophy that addresses the philosophical problems surrounding the theory of knowledge. It answers many questions concerning what knowledge is, how it is obtained, and what makes it knowledge. "Excerpt from - "Rhetoric & Epistemology" by Nathan T. Floyd of the Georgia Institute of Technology.
14Logic logic, the systematic study of valid inference. Logic can take many forms.
15Types of Logical Argument Deductive; Reasoning from the general to the specific.Inductive; The process of deriving general principles from particular facts or instances.Abductive; reasoning based on the principle of inference to the best explanation. (Charles Pierce).Doxastic; to do with belief.
16Deductive ArgumentA deductive argument offers two or more assertions that lead automatically to a conclusion.Though they are not always phrased in syllogistic form, deductive arguments can usually be phrased as "syllogisms," or as brief, mathematical statements in which the premises lead inexorably to the conclusion.
17SyllogismsSyllogism; A form of deductive reasoning consisting of a major premise, a minor premise, and a conclusion; for example,All humans are mortal, the major premise,I am a human, the minor premise, therefore,I am mortal, the conclusion.Reasoning from the general to the specific; deduction.
18Deductive ArgumentAs long as the first two sentences in this argument are true, there can be no doubt that the final statement is correct--it is a matter of mathematical certainty.Deductive arguments are not spoken of as "true" or "false," but as "sound" or "unsound.“A sound argument is one in which the premises guarantee the conclusions, and an unsound argument is one in which the premises do not guarantee the conclusions.A deduction can be completely true, yet unsound. It can also be sound, yet demonstrably untrue
19DeductionThe major premise is a statement of general truth dealing with categories (sets) rather than individual examples:All humans are mortalThe subject section of the major premise (All humans) is known as the antecedent; the predicate section of the major premise (are mortal) is known as the consequent.
20DeductionThe minor premise is a statement of particular truth dealing with a specific instance governed by the major premise (an element of the set):“I am human”The conclusion is the statement derived from the minor premises relationship to the major premise: I am mortal.
21Deductive logicIn Western thought, systematic logic is considered to have begun with Aristotle's collection of treatises, the Organon [tool].Aristotle introduced the use of variables: While his contemporaries illustrated principles by the use of examples, Aristotle generalized, as in: All x are y; all y are z; therefore, all x are z.Aristotle posited three laws as basic to all valid thought: the law of identity, A is A; the law of contradiction, A cannot be both A and not A; and the law of the excluded middle, A must be either A or not A.
22Consider Syllogism; Must this always be true? All people in masks are burglarsI see a man wearing a maskThe man must be a burglarMust this always be true?
23Post-Aristotelian Logic One of Aristotle's tacit assumptions was that there is a correspondence linking the structures of reality, the mind, and language (and hence logic). This position came to be known in the Middle Ages as realism.The opposing school of thought, nominalism, is exemplified by William of Occam, a medieval logician, who maintained that the structure of language and logic corresponds only to the structure of the mind, not to that of reality.Since knowledge is a study of generalizations, while nature occurs in myriad single instances, the distinction between the world and our conception of it is stressed by the nominalists.
24Occam's razorOccam's razor is a logical principle attributed to the mediaeval philosopherWilliam of Occam (or Occam). The principle states that one should not make more assumptions than the minimum needed.This principle is often called the principle of parsimony. It underlies all scientific modeling and theory building.
26Inductive ReasoningIf you were to measure 20 carrots, and found that they were all between six and eight inches long, you might conclude that all carrots were in that size range. The manner of logic you used to draw your conclusion is called inductive reasoning.
27Inductive ReasoningAccording to the philosopher John Stuart Mill, its chief proponent, we are using inductive reasoning when we conclude"that what is true of certain individuals of a class, is true of the whole class,or what is true at a certain time will be true in similar circumstances at all times."
28Example Observations; Observation: The man is wearing a mask Observation: He is climbing in via the windowPrior Experience: People normally do not wear masks or climb in via windows unless they are up to no good.Conclusion: The man is probably a burglar.
29QuestionHow might we design an algorithm for the robot to perform inductive reasoning ?Are there any rules we can use to help us (perhaps we can deduce the rules of inductive reasoning deductively? i.e. by the laws of mathematics).
30BurglaryAlthough we can not be certain it seems probable that this man is a burglar.The word probability derives from the Latin probare (to prove, or to test).
31Prior Knowledge in Vision We resolve complex scenes on the basis of prior knowledge, what we have previously learned about the world.
32IllusionsThis is even true for low level vision (we can’t help ourselves).We are so tuned to natural scenes, that our prior information dominates the evidences of our eyes.
34Questions:More on probability laterHowever first a quick test…
35Deductively valid? Premise: All cars have wheels Premise: All wheels are roundConclusion: All cars have round wheels--Premise: I have a diamondPremise: Most diamonds are shinyPremise: My diamond is shiny---Premise: John is 93Conclusion: John will not do a double back flip today
36Inductive vs. Deductive Reasoning conclusion follows logically from premises: CERTAINInductive reasoning:conclusion is likely based on premises (evidence).Does not use syllogismsinvolves a degree of uncertaintyMost reasoning in real-world is based on inductionHow do people reason with uncertainty?What is the right way to reason with uncertainty?
38Problem: Karl PopperPopper claims that there is no such thing as induction and that deduction is all that we need in science.Was he right?
39PopperismIn place of induction, Popper offers the method of conjecture and refutation. Scientific hypotheses are offered as bold conjectures (guesses) about the nature of the world. In testing these conjectures through empirical experiment, we cannot give positive inductive reasons for thinking that they are true. But we can give reasons for thinking they are false
40Popper’s Scientific Process If H then OThen not O Therefore, not HThis pattern of reasoning is deductively valid (to see this try to suppose that the premises are true and the conclusion is false. If the conclusion were false, then 'H' would be true. And, given this and the truth of the first premise, 'O' would follow. But 'O' contradicts ‘not O” which is asserted by the second premise. So it is not possible for the premises to be true and the conclusion false. In other words, the pattern of reasoning here is deductively valid.)
41FalsifiabilityPopper's method of conjecture and refutation suggests another criterion for distinguishing science from non-science. That is, that we can take a hypothesis, a proposed explanation, to be investigated scientifically if and only if it is falsifiable. For a hypothesis to be falsifiable does not mean that that it will be proven false or that it can be shown to be falseRather, to say that a claim is falsifiable is just to say that we can state some possible observable conditions under which we would judge the claim to be false.
42Popper’s accidentSuppose a car comes speeding towards you, you have never been hit by a car…You have two hypothesesThe car will hurt you if it hits youThe car will bounce off youWhich hypothesis do you think more probable?How would you act on the basis of each hypothesis?Would you want to falsify one of the theories?
43TeapotsThere are many theories that are equally not falsified by observationsBertrand Russell’s tea pot.There is error in measurement, can anything really be falsified with certainty?
44Hume on InductionThe classic philosophical treatment justification for inductive reasoning, was by the Scotsman David Hume.
45Hume on InductionHume highlighted the fact that our everyday reasoning depends on patterns of repeated experience rather than deductively valid arguments.For example we believe that bread will nourish us because it has in the past, but it is at least conceivable that bread in the future will poison us.
46Hume on InductionSomeone who insisted on sound deductive justifications for everything would starve to death, said Hume.Instead of unproductive radical skepticism about everything, he advocated a practical skepticism based on common-sense, where the inevitability of induction is accepted.
47Hume In other words, although we can not prove induction (by logic) We might want our robot to behave as induction were valid.i.e. he should use his past experience.
48Bertrand Russell. What these arguments prove--and I do not think the proof can be controverted--is, that induction is an independent logical principle, incapable of being inferred either from experience or from other logical principles, and that without this principle science is impossible.
49The Irrationalists? Science as we know it has been built on induction. Stove refers to those who deny induction (Hume and Popper) as the irrationalists.Popper and After: Four Modern Irrationalists, Pergamon Press, David Charles Stove.
50PlanHaving introduced the puzzle of induction the course of this lecture will continue to show one of the most intriguing attempts to solve it.Most of the development of which have only occurred in the past 100 years.
51Probability As mentioned before it seems to us to make sense to say This man is probably a burglar.What does probability mean?
52What has a valid Probability Consider the Statements:The next roll of the dice will be a six.Blair will win the next election.The end of the universe is one billion years distant.The 1000th number of pi is 9.A coin is in my left or right hand.
53Probability There are many interpretations of probability; EpistemologicalTo do with randomnessIt will turn out that not all of them will be useful;We next examine what these interpretations are and their relative merits.
54Interpretations of Probability Kolmogorov's Probability CalculusClassical ProbabilityLogical ProbabilityFrequency InterpretationsPropensity InterpretationsSubjective Probability
60Criteria of adequacy for the interpretations of probability What criteria are appropriate for assessing the cogency of a proposed interpretation of probability?An interpretation should at least beprecise,unambiguous, anduse well-understood primitives.
61Salmon (1966, 64),Admissibility. We say that an interpretation of a formal system is admissible if the meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently all the theorems, into true statements. A fundamental requirement for probability concepts is to satisfy the mathematical relations specified by the calculus of probability…Ascertainability. This criterion requires that there be some method by which, in principle at least, we can ascertain values of probabilities. It merely expresses the fact that a concept of probability will be useless if it is impossible in principle to find out what the probabilities are…Applicability. The force of this criterion is best expressed in Bishop Butler's famous aphorism, “Probability is the very guide of life.”…
62Applicability Applicability to frequencies Applicability to rational beliefApplicable to ScienceApplicable to design of an AI system
63Interpretations of Probability Mental (epistemic)/Physical (ontological)Mental: probabilities just exist within our mind, this position is adopted by the Bayesians.Physical: probabilities exist in nature and as an attribute of physical systems, this position is adopted by the frequentists.Objective/SubjectiveSubjective interpretations allow for two agents with the same background knowledge to assign different probability values.Note a physical interpretations entails an objective stance.
64Shape of rest of lecture Frequency theoryVennVon MisesPopperSubjective Bayesian TheoryRamseyDe Finetti
65Frequency Theory Developed by Venn, and Von Mises The “text book” definition for many schools.Probability is nothing but proportions! (Venn)
66Finite Frequentism (Venn) A simple version of frequentism, which we will call finite frequentism, attaches probabilities to events or attributes in a finite reference class in such a straightforward manner:the probability of an attribute A in a finite reference class B is the relative frequency of actual occurrences of A within B.
67Problem How can I answer rare questions: What is probability of Tony Blair being re-elected (could it be 1)A meteorite hit?If I have never tossed a given coin how can I assess the probability of it being heads?
68Infinite setsSome frequentists (notably Venn 1876, Reichenbach 1949, and von Mises 1957 among others), partly in response to some of the problems above, have gone on to consider infinite reference classes, identifying probabilities with limiting relative frequencies of events or attributes therein.Thus, we require an infinite sequence of trials in order to define such probabilities.
69Infinite sets: Problem Generally the world does not provide an infinite sequence of trials of a given experiment.We have to imagine hypothetical infinite extensions of an actual sequence of trials; probabilities are then what the limiting relative frequencies would be if the sequence were so extended
70Infinite sets: Problem Limiting relative frequencies, we have seen, must be defined relative to a sequence of trials.Herein lies another difficulty. Consider an infinite sequence of the results of tossing a coin, as it might be H, T, H, H, H, T, H, T, T, …Suppose for definiteness that the corresponding relative frequency sequence for heads, which begins 1/1, 1/2, 2/3, 3/4, 4/5, 4/6, 5/7, 5/8, 5/9, …, converges to 1/2.By suitably reordering these results, we can make the sequence converge to any value in [0, 1] that we like.
71Von Mises’s KollektivWe create a collective, a mathematical series (a collective) such that, informally:Axiom of Convergence: As the number of elements tends to infinity the frequency tends to the probability.Axiom of Randomness: Given the first k elements there is no gambling system that would make money (related to Church’s thesis on recursive functions).
72Axiom of RandomnessThe study of randomness is still an active area of research today.No gambling system means that we can not easily predict the n+1th value based on the n previous, with higher of lower frequency than the probability.
73Reference Class Problem for Frequentism Consider a probability concerning myself that I care about -- say, my probability of living to age 80. I belong to theclass of males,the class of non-smokers,the class of computing professors who have one vowel in their surname, …Presumably the relative frequency of those who live to age 80 varies across (most of) these reference classes.What, then, is my probability of living to age 80?
74Reference Class Problem for Frequentism It seems that there is no single frequentist answer.Instead, there ismy probability-given I am a-male,my probability-given I am a-non-smoker,my probability-given I am a-male-non-smoker, and so on.
75Reference Class Problem for Frequentism This problem becomes extreme when we talk of an individual:Von Mises embraces this consequence, insisting that the notion of probability only makes sense relative to a collective.In particular, he regards single case probabilities as nonsense:“We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us”
76Another problemDe Finetti points out that the frequentist interpretation cannot satisfy our axiom 4 of Kolmorogrov’s probability.
78Simple Example (Giere 1976) Suppose there are an infinite number of events E.Then the probability of each event becomes 0.The sum of a zeros is not 1!
79Other Problems Things outside the theory are Probability of behaviour Unrepeatable events
80Rational NumbersIn a frequency theory probabilities may only be represented by rational numbers (the ratio of two integers).Of course it might be that this is sufficient for probability theory but it is certainly not clear that this is the case.
81Popper’s Propensity Theory Like the frequency interpretations, propensity interpretations locate probability ‘in the world’ rather than in our heads or in logical abstractions.Probability is thought of as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a long run relative frequency of such an outcome.This view was motivated by the desire to make sense of single-case probability attributions such as ‘the probability that this radium atom decays in 1600 years is 1/2’.
82Popper’s Propensity Theory For him, a probability p of an outcome of a certain type is a propensity of a repeatable experiment to produce outcomes of that type with limiting relative frequency p.For instance, when we say that a coin has probability 1/2 of landing heads when tossed, we mean that we have a repeatable experimental set-up -- the tossing set-up -- that has a propensity to produce a sequence of outcomes in which the limiting relative frequency of heads is 1/2.
83Problem However there are many factors that affect the toss of a coin: Weight distribution of coinTechnique of the person who tosses itCurrents in the airConvections in the air etc.
84Problem with Objective Theories Suppose I hold a coin in one of my hands, and ask you the probability it is in the left.Based on your state of information you would say ½.Based on mine I would say 0 or 1.Thus probability could be said to be subjective…
85Probability Does Not Exist! “Probability Does Not Exist” De Finetti!What does he mean?That probability can not exist objectively and is only a product of the human mind.
86Subjective Probability. De Finetti Frank and Ramsey evolved a subjective theory of probability based on utility, and rational choice.
87Subjective Probability. We may characterize subjectivism (also known as personalism and subjective Bayesianism) with the slogan: ‘Probability is degree of belief’.We identify probabilities with degrees of confidence, or credences, or “partial” beliefs of suitable agents.Thus, we really have many interpretations of probability here, as many as there are doxastic states of suitable agents: we have Aaron's degrees of belief, Abel's degrees of belief, Abigail's degrees of belief, … , or better still, Aaron's degrees of belief-at-time-t1, Aaron's degrees of belief-at-time-t2, Abel's degrees of belief-at-time-t1, … . Of course, we must ask what makes an agent ‘suitable’.
88Subjective Probability Provides a way to rationally update ones belief.The approach is normative not descriptive i.e. it provides rules for how one should think about things, not a description of how one should think about things, however this might be just what we want for an AI.
89RationalityBeginning with Ramsey (1926), various subjectivists have wanted to assimilate probability to logic by portraying probability as the logic of partial belief.A rational agent is required to be logically consistent, now taken in a broad sense.These subjectivists argue that this implies that the agent obeys the axioms of probability (although perhaps with only finite additivity), and that subjectivism is thus (to this extent) admissible.Before we can present this argument, we must say more about what degrees of belief are
90Risk Free Gamble 1. My right hand contains a coin 2 My left hand contains a coinYou offered the following choice£10 if event 1 occurs, nothing otherwise£10 if event 2 occurs, nothing otherwise
91Risk Free GambleThis provide a way of calibrating a rational person’s preference as they will prefer to win money and so choose the more likely.
94Extraordinary Explanations, Require Extraordinary Evidence Implausible inductive leaps require more evidence than plausible ones do.It requires more evidence to support the notion that a strange light in the sky is an invasion force from the planet Xacron than the notion than the notion that it is a low-flying plane.The evidentiary requirements are greater for the first assumption simply because induction requires us to combine what we observe with what we already know, and most of us know more about low-flying planes than extra-terrestrial invaders.
95Utility and InferenceGeneralizations require less support when there are tremendous negative costs involved with rejecting them. Consider the following two arguments:1) I drank milk last night and got a minor stomachache. I can probably conclude that the milk was a little bit sour.2) I ate a mushroom out of my backyard last night and I went into violent fits of projectile vomiting and had to be rushed to the hospital to have my stomach pumped. I can probably conclude that the mushrooms were poison.Technically, the amount of evidence for these two arguments is the same. However, most people would take the second argument much more seriously, simply because the consequences for not doing so are so disastrous.
96Conditioning is not the same as implication The probability statement P(A | B) = p has totally different operational semantics than the logical statement "B implies A with certainty p". The logical statement means that whenever B is true then A is true with certainty p. This applies regardless of any other information we may have. In other words, it is modular. But the probability statement is not modular: it applies when the only thing we know is B. If anything else is known, e.g. C, than we must refer to P(A | B, C) instead. The only exception is when we can prove that C is conditionally independent of A given B, so that P(A | B, C) = P(A | B). To illustrate why this is important, letA = "It rained last night"B = "My grass is wet"C = "The sprinkler was on last night"Given only B, it is reasonable to conclude A. But if B is deduced from C, then it is not reasonable to conclude A.This point was made eloquently by Pearl (p57). He used it to show that logic based on "certainty factors" is not an adequate replacement for probability theory.