Machine Learning Suppose we are to design a robot –How is to learn about the world? –What is learning? –How can it make decisions on what actions to take?
Types of Question we want an AI construct to answer… Should the robot turn left or right down a road Do we perform an operation or not? Is this person a terrorist or not? Does it think the world is flat or round? (are scientific laws amenable to inference)?
Example (from Jaynes) Suppose some dark night a policeman walks down a street, apparently deserted; but suddenly he hears a burglar alarm, looks across the street, and sees a jewellery store with a broken window. Then a gentleman wearing a mask comes crawling out through the broken window, carrying a bag which turns out to be full of expensive jewellery. The policeman doesn't hesitate at all in deciding that this gentleman is dishonest. But by what reasoning process does he arrive at this conclusion?
Computer Vision What inference should the robot draw from this image? How shall we operationalize this inference (e.g. make a computer program to carry out inference)?
Discuss How should the robot decide the following: –Whether the person shown previously is a burglar?
Discuss How should the robot decide upon the following actions: –Whether to phone the police or not?
More Generally We would like the robot to be able to rationally decide –Whether smoking causes cancer –Whether to believe in God We would like the robot to be able to also decide how to rationally act –In any situation given the evidence. –E.g. robotically guided vehicle (autopilot).
What is Inference? Dictionary in·fer·ence (ĭn'fər-əns) n. –The act or process of deriving logical conclusions from premises known or assumed to be true. –The act of reasoning from factual knowledge or evidence. –Something inferred. –Usage. To draw inferences has been said to be the great business of life.
Inference To draw inferences has been said to be the great business of life. Every one has daily, hourly, and momentary need of ascertaining facts which he has not directly observed; not from any general purpose of adding to his stock of knowledge, but because the facts themselves are of importance to his interests or to his occupations. Introduction from the Longman's 1884 edition of the System of Logic. John Stuart Mill (1843)
Epistemology "is derived from the Greek words episteme, which means knowledge, and logos, which means theory. It is the branch of philosophy that addresses the philosophical problems surrounding the theory of knowledge. It answers many questions concerning what knowledge is, how it is obtained, and what makes it knowledge. " Excerpt from - "Rhetoric & Epistemology" by Nathan T. Floyd of the Georgia Institute of Technology.
Logic logic, the systematic study of valid inference. Logic can take many forms.
Types of Logical Argument Deductive; Reasoning from the general to the specific. Inductive; The process of deriving general principles from particular facts or instances. Abductive; reasoning based on the principle of inference to the best explanation. (Charles Pierce). Doxastic; to do with belief.
Deductive Argument A deductive argument offers two or more assertions that lead automatically to a conclusion. Though they are not always phrased in syllogistic form, deductive arguments can usually be phrased as "syllogisms," or as brief, mathematical statements in which the premises lead inexorably to the conclusion.
Syllogisms Syllogism; A form of deductive reasoning consisting of a major premise, a minor premise, and a conclusion; for example, –All humans are mortal, the major premise, –I am a human, the minor premise, therefore, –I am mortal, the conclusion. Reasoning from the general to the specific; deduction.
Deductive Argument As long as the first two sentences in this argument are true, there can be no doubt that the final statement is correct--it is a matter of mathematical certainty. Deductive arguments are not spoken of as "true" or "false," but as "sound" or "unsound. A sound argument is one in which the premises guarantee the conclusions, and an unsound argument is one in which the premises do not guarantee the conclusions. A deduction can be completely true, yet unsound. It can also be sound, yet demonstrably untrue
Deduction The major premise is a statement of general truth dealing with categories (sets) rather than individual examples: All humans are mortal The subject section of the major premise (All humans) is known as the antecedent; the predicate section of the major premise (are mortal) is known as the consequent.
Deduction The minor premise is a statement of particular truth dealing with a specific instance governed by the major premise (an element of the set): I am human The conclusion is the statement derived from the minor premises relationship to the major premise: I am mortal.
Deductive logic In Western thought, systematic logic is considered to have begun with Aristotle's collection of treatises, the Organon [tool]. Aristotle introduced the use of variables: While his contemporaries illustrated principles by the use of examples, Aristotle generalized, as in: All x are y; all y are z; therefore, all x are z.Aristotle Aristotle posited three laws as basic to all valid thought: the law of identity, A is A; the law of contradiction, A cannot be both A and not A; and the law of the excluded middle, A must be either A or not A.
Consider Syllogism; –All people in masks are burglars –I see a man wearing a mask –The man must be a burglar Must this always be true?
Post-Aristotelian Logic One of Aristotle's tacit assumptions was that there is a correspondence linking the structures of reality, the mind, and language (and hence logic). This position came to be known in the Middle Ages as realism. The opposing school of thought, nominalism, is exemplified by William of Occam, a medieval logician, who maintained that the structure of language and logic corresponds only to the structure of the mind, not to that of reality. Since knowledge is a study of generalizations, while nature occurs in myriad single instances, the distinction between the world and our conception of it is stressed by the nominalists.
Occam's razor Occam's razor is a logical principle attributed to the mediaeval philosopher William of Occam (or Occam). The principle states that one should not make more assumptions than the minimum needed. This principle is often called the principle of parsimony. It underlies all scientific modeling and theory building.
If you were to measure 20 carrots, and found that they were all between six and eight inches long, you might conclude that all carrots were in that size range. The manner of logic you used to draw your conclusion is called inductive reasoning.
Inductive Reasoning According to the philosopher John Stuart Mill, its chief proponent, we are using inductive reasoning when we conclude "that what is true of certain individuals of a class, is true of the whole class, or what is true at a certain time will be true in similar circumstances at all times."
Example Observations; –Observation: The man is wearing a mask –Observation: He is climbing in via the window –Prior Experience: People normally do not wear masks or climb in via windows unless they are up to no good. –Conclusion: The man is probably a burglar.
Question How might we design an algorithm for the robot to perform inductive reasoning ? Are there any rules we can use to help us (perhaps we can deduce the rules of inductive reasoning deductively? i.e. by the laws of mathematics).
Burglary Although we can not be certain it seems probable that this man is a burglar. The word probability derives from the Latin probare (to prove, or to test).
Prior Knowledge in Vision We resolve complex scenes on the basis of prior knowledge, what we have previously learned about the world.
Illusions This is even true for low level vision (we cant help ourselves). We are so tuned to natural scenes, that our prior information dominates the evidences of our eyes.
Questions: More on probability later However first a quick test…
Deductively valid? Premise: All cars have wheels Premise: All wheels are round Conclusion: All cars have round wheels -- Premise: I have a diamond Premise: Most diamonds are shiny Premise: My diamond is shiny --- Premise: John is 93 Conclusion: John will not do a double back flip today
Inductive vs. Deductive Reasoning Deductive reasoning: –conclusion follows logically from premises: CERTAIN Inductive reasoning: –conclusion is likely based on premises (evidence). –Does not use syllogisms –involves a degree of uncertainty Most reasoning in real-world is based on induction –How do people reason with uncertainty? –What is the right way to reason with uncertainty?
Problem Not everyone accepts induction is valid:
Problem: Karl Popper Popper claims that there is no such thing as induction and that deduction is all that we need in science. Was he right?
Popperism In place of induction, Popper offers the method of conjecture and refutation. Scientific hypotheses are offered as bold conjectures (guesses) about the nature of the world. In testing these conjectures through empirical experiment, we cannot give positive inductive reasons for thinking that they are true. But we can give reasons for thinking they are false
Poppers Scientific Process If H then O Then not O Therefore, not H This pattern of reasoning is deductively valid (to see this try to suppose that the premises are true and the conclusion is false. If the conclusion were false, then 'H' would be true. And, given this and the truth of the first premise, 'O' would follow. But 'O' contradicts not O which is asserted by the second premise. So it is not possible for the premises to be true and the conclusion false. In other words, the pattern of reasoning here is deductively valid.)
Falsifiability Popper's method of conjecture and refutation suggests another criterion for distinguishing science from non- science. That is, that we can take a hypothesis, a proposed explanation, to be investigated scientifically if and only if it is falsifiable. For a hypothesis to be falsifiable does not mean that that it will be proven false or that it can be shown to be false Rather, to say that a claim is falsifiable is just to say that we can state some possible observable conditions under which we would judge the claim to be false.
Poppers accident Suppose a car comes speeding towards you, you have never been hit by a car… You have two hypotheses –The car will hurt you if it hits you –The car will bounce off you Which hypothesis do you think more probable? –How would you act on the basis of each hypothesis? – Would you want to falsify one of the theories?
Teapots There are many theories that are equally not falsified by observations –Bertrand Russells tea pot. There is error in measurement, can anything really be falsified with certainty?
Hume on Induction The classic philosophical treatment justification for inductive reasoning, was by the Scotsman David Hume.
Hume on Induction Hume highlighted the fact that our everyday reasoning depends on patterns of repeated experience rather than deductively valid arguments. For example we believe that bread will nourish us because it has in the past, but it is at least conceivable that bread in the future will poison us.
Hume on Induction Someone who insisted on sound deductive justifications for everything would starve to death, said Hume. Instead of unproductive radical skepticism about everything, he advocated a practical skepticism based on common-sense, where the inevitability of induction is accepted.
Hume In other words, although we can not prove induction (by logic) We might want our robot to behave as induction were valid. i.e. he should use his past experience.
Bertrand Russell. What these arguments prove-- and I do not think the proof can be controverted--is, that induction is an independent logical principle, incapable of being inferred either from experience or from other logical principles, and that without this principle science is impossible.
The Irrationalists? Science as we know it has been built on induction. Stove refers to those who deny induction (Hume and Popper) as the irrationalists. –Popper and After: Four Modern Irrationalists, Pergamon Press, David Charles Stove. –http://www.geocities.com/ResearchTriangle/F acility/4118/dcs/popper/popper.html
Plan Having introduced the puzzle of induction the course of this lecture will continue to show one of the most intriguing attempts to solve it. Most of the development of which have only occurred in the past 100 years.
Probability As mentioned before it seems to us to make sense to say This man is probably a burglar. What does probability mean?
What has a valid Probability Consider the Statements: –The next roll of the dice will be a six. –Blair will win the next election. –The end of the universe is one billion years distant. –The 1000 th number of pi is 9. –A coin is in my left or right hand.
Probability There are many interpretations of probability; –Epistemological –To do with randomness It will turn out that not all of them will be useful; We next examine what these interpretations are and their relative merits.
Interpretations of Probability Kolmogorov's Probability Calculus Classical Probability Logical Probability Frequency Interpretations Propensity Interpretations Subjective Probability
Kolmogorov's Probability Calculus
This is a set theoretic notation and commonly adopted by all mathematicians. It provides basic axioms from which a deductive theory of probability can be derived.
Kolmogorov's Probability Calculus It is based on measure theory. A mathematical theory involving the size of sets. The mathematical definitions do not address the meaning of probability.
Kolmogorov's Probability Calculus Problem there is no relation from the purely mathematical theory to the real world. Many types of physical thing could follow the laws of probability e.g. area…
Probability and Area
Criteria of adequacy for the interpretations of probability What criteria are appropriate for assessing the cogency of a proposed interpretation of probability? An interpretation should at least be –precise, –unambiguous, and –use well-understood primitives.
Salmon (1966, 64), Admissibility. We say that an interpretation of a formal system is admissible if the meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently all the theorems, into true statements. A fundamental requirement for probability concepts is to satisfy the mathematical relations specified by the calculus of probability… Ascertainability. This criterion requires that there be some method by which, in principle at least, we can ascertain values of probabilities. It merely expresses the fact that a concept of probability will be useless if it is impossible in principle to find out what the probabilities are… Applicability. The force of this criterion is best expressed in Bishop Butler's famous aphorism, Probability is the very guide of life.…
Applicability Applicability to frequencies Applicability to rational belief Applicable to Science Applicable to design of an AI system
Interpretations of Probability Mental (epistemic)/Physical (ontological) –Mental: probabilities just exist within our mind, this position is adopted by the Bayesians. –Physical: probabilities exist in nature and as an attribute of physical systems, this position is adopted by the frequentists. Objective/Subjective –Subjective interpretations allow for two agents with the same background knowledge to assign different probability values. –Note a physical interpretations entails an objective stance.
Shape of rest of lecture Frequency theory –Venn –Von Mises –Popper Subjective Bayesian Theory –Ramsey –De Finetti
Frequency Theory Developed by Venn, and Von Mises The text book definition for many schools. Probability is nothing but proportions! (Venn)
Finite Frequentism (Venn) A simple version of frequentism, which we will call finite frequentism, attaches probabilities to events or attributes in a finite reference class in such a straightforward manner: the probability of an attribute A in a finite reference class B is the relative frequency of actual occurrences of A within B.
Problem How can I answer rare questions: –What is probability of Tony Blair being re- elected (could it be 1) –A meteorite hit? If I have never tossed a given coin how can I assess the probability of it being heads?
Infinite sets Some frequentists (notably Venn 1876, Reichenbach 1949, and von Mises 1957 among others), partly in response to some of the problems above, have gone on to consider infinite reference classes, identifying probabilities with limiting relative frequencies of events or attributes therein. Thus, we require an infinite sequence of trials in order to define such probabilities.
Infinite sets: Problem Generally the world does not provide an infinite sequence of trials of a given experiment. We have to imagine hypothetical infinite extensions of an actual sequence of trials; probabilities are then what the limiting relative frequencies would be if the sequence were so extended
Infinite sets: Problem Limiting relative frequencies, we have seen, must be defined relative to a sequence of trials. Herein lies another difficulty. Consider an infinite sequence of the results of tossing a coin, as it might be H, T, H, H, H, T, H, T, T, … Suppose for definiteness that the corresponding relative frequency sequence for heads, which begins 1/1, 1/2, 2/3, 3/4, 4/5, 4/6, 5/7, 5/8, 5/9, …, converges to 1/2. By suitably reordering these results, we can make the sequence converge to any value in [0, 1] that we like.
Von Misess Kollektiv We create a collective, a mathematical series (a collective) such that, informally: –Axiom of Convergence: As the number of elements tends to infinity the frequency tends to the probability. –Axiom of Randomness: Given the first k elements there is no gambling system that would make money (related to Churchs thesis on recursive functions).
Axiom of Randomness The study of randomness is still an active area of research today. No gambling system means that we can not easily predict the n+1th value based on the n previous, with higher of lower frequency than the probability.
Reference Class Problem for Frequentism Consider a probability concerning myself that I care about -- say, my probability of living to age 80. I belong to the –class of males, –the class of non-smokers, –the class of computing professors who have one vowel in their surname, … Presumably the relative frequency of those who live to age 80 varies across (most of) these reference classes. What, then, is my probability of living to age 80?
Reference Class Problem for Frequentism It seems that there is no single frequentist answer. Instead, there is –my probability-given I am a-male, –my probability-given I am a-non-smoker, –my probability-given I am a-male-non-smoker, and so on.
Reference Class Problem for Frequentism This problem becomes extreme when we talk of an individual: Von Mises embraces this consequence, insisting that the notion of probability only makes sense relative to a collective. In particular, he regards single case probabilities as nonsense: –We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase probability of death, when it refers to a single person, has no meaning at all for us
Another problem De Finetti points out that the frequentist interpretation cannot satisfy our axiom 4 of Kolmorogrovs probability.
Kolmogorov's Probability Calculus
Simple Example (Giere 1976) Suppose there are an infinite number of events E. Then the probability of each event becomes 0. The sum of a zeros is not 1!
Other Problems Things outside the theory are –Probability of behaviour –Unrepeatable events
Rational Numbers In a frequency theory probabilities may only be represented by rational numbers (the ratio of two integers). –Of course it might be that this is sufficient for probability theory but it is certainly not clear that this is the case.
Poppers Propensity Theory Like the frequency interpretations, propensity interpretations locate probability in the world rather than in our heads or in logical abstractions. Probability is thought of as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a long run relative frequency of such an outcome. This view was motivated by the desire to make sense of single-case probability attributions such as the probability that this radium atom decays in 1600 years is 1/2.
Poppers Propensity Theory For him, a probability p of an outcome of a certain type is a propensity of a repeatable experiment to produce outcomes of that type with limiting relative frequency p. For instance, when we say that a coin has probability 1/2 of landing heads when tossed, we mean that we have a repeatable experimental set-up -- the tossing set-up -- that has a propensity to produce a sequence of outcomes in which the limiting relative frequency of heads is 1/2.
Problem However there are many factors that affect the toss of a coin: –Weight distribution of coin –Technique of the person who tosses it –Currents in the air –Convections in the air etc.
Problem with Objective Theories Suppose I hold a coin in one of my hands, and ask you the probability it is in the left. Based on your state of information you would say ½. Based on mine I would say 0 or 1. Thus probability could be said to be subjective…
Probability Does Not Exist! Probability Does Not Exist De Finetti! What does he mean? That probability can not exist objectively and is only a product of the human mind.
Subjective Probability. De Finetti Frank and Ramsey evolved a subjective theory of probability based on utility, and rational choice.
Subjective Probability. We may characterize subjectivism (also known as personalism and subjective Bayesianism) with the slogan: Probability is degree of belief. We identify probabilities with degrees of confidence, or credences, or partial beliefs of suitable agents. Thus, we really have many interpretations of probability here, as many as there are doxastic states of suitable agents: we have Aaron's degrees of belief, Abel's degrees of belief, Abigail's degrees of belief, …, or better still, Aaron's degrees of belief-at-time-t1, Aaron's degrees of belief-at-time-t2, Abel's degrees of belief-at- time-t1, …. Of course, we must ask what makes an agent suitable.
Subjective Probability Provides a way to rationally update ones belief. The approach is normative not descriptive i.e. it provides rules for how one should think about things, not a description of how one should think about things, however this might be just what we want for an AI.
Rationality Beginning with Ramsey (1926), various subjectivists have wanted to assimilate probability to logic by portraying probability as the logic of partial belief. A rational agent is required to be logically consistent, now taken in a broad sense. These subjectivists argue that this implies that the agent obeys the axioms of probability (although perhaps with only finite additivity), and that subjectivism is thus (to this extent) admissible. Before we can present this argument, we must say more about what degrees of belief are
Risk Free Gamble 1. My right hand contains a coin 2 My left hand contains a coin You offered the following choice £10 if event 1 occurs, nothing otherwise £10 if event 2 occurs, nothing otherwise
Risk Free Gamble This provide a way of calibrating a rational persons preference as they will prefer to win money and so choose the more likely.
Extraordinary Explanations, Require Extraordinary Evidence Implausible inductive leaps require more evidence than plausible ones do. It requires more evidence to support the notion that a strange light in the sky is an invasion force from the planet Xacron than the notion than the notion that it is a low-flying plane. The evidentiary requirements are greater for the first assumption simply because induction requires us to combine what we observe with what we already know, and most of us know more about low-flying planes than extra- terrestrial invaders.
Utility and Inference Generalizations require less support when there are tremendous negative costs involved with rejecting them. Consider the following two arguments: 1) I drank milk last night and got a minor stomachache. I can probably conclude that the milk was a little bit sour. 2) I ate a mushroom out of my backyard last night and I went into violent fits of projectile vomiting and had to be rushed to the hospital to have my stomach pumped. I can probably conclude that the mushrooms were poison. Technically, the amount of evidence for these two arguments is the same. However, most people would take the second argument much more seriously, simply because the consequences for not doing so are so disastrous.
Conditioning is not the same as implication The probability statement P(A | B) = p has totally different operational semantics than the logical statement "B implies A with certainty p". The logical statement means that whenever B is true then A is true with certainty p. This applies regardless of any other information we may have. In other words, it is modular. But the probability statement is not modular: it applies when the only thing we know is B. If anything else is known, e.g. C, than we must refer to P(A | B, C) instead. The only exception is when we can prove that C is conditionally independent of A given B, so that P(A | B, C) = P(A | B). To illustrate why this is important, let A = "It rained last night" B = "My grass is wet" C = "The sprinkler was on last night" Given only B, it is reasonable to conclude A. But if B is deduced from C, then it is not reasonable to conclude A. This point was made eloquently by Pearl (p57). He used it to show that logic based on "certainty factors" is not an adequate replacement for probability theory.Pearl