Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Similar presentations


Presentation on theme: "Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University"— Presentation transcript:

1 Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw 1

2 Part 1 Computation Tools 2

3 Computation Tools  R (http://www.r-project.org/): good for statistical computinghttp://www.r-project.org/  C/C++: good for fast computation and large data sets  More: http://www.stat.nctu.edu.tw/subhtml/source /teachers/hslu/course/statcomp/links.htm http://www.stat.nctu.edu.tw/subhtml/source /teachers/hslu/course/statcomp/links.htm 3

4 The R Project  R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.  Similar to the commercial software of Splus.  C/C++, Fortran and other codes can be linked and called at run time.  More: http://www.r-project.org/http://www.r-project.org/ 4

5 Download R from http://www.r-project.org/ http://www.r-project.org/ 5

6 Choose one Mirror Site of R 6

7 Choose the OS System 7

8 Select the Base of R 8

9 Download the Setup Program 9

10 Install R Double click R-icon to install R 10

11 Execute R Interactive command window 11

12 Download Add-on Packages 12

13 Choose a Mirror Site Choose a mirror site close to you 1. 2. 13

14 Select One Package to Download Choose one package to download, like rgl. 1. 2. 14

15 Load Packages  There are two methods to load packages: Method 1: Click from the menu bar Method 2: Type “ library(rgl) ” in the command window 15

16 Help in R (1)  What is the loaded library? help(rgl) 16

17 Help in R (2)  How to search functions for key words? help.search( “ key words ” ) It will show all functions has the key words. help.search( “ 3D plot ” ) Function name (belong to which package) description 17

18 Help in R (3)  How to find the illustration of function? ?function name It will show the usage, arguments, author, reference, related functions, and examples. ?plot3d 18

19 R Operators (1)  Mathematic operators: +, -, *, /, ^ Mod: % Sqrt, exp, log, log10, sin, cos, tan, … 19

20 R Operators (2)  Other operators: :sequence operator %*%matrix algebra, =inequality ==, !=comparison &, &&, |, ||and, or ~formulas <-, =assignment 20

21 Algebra, Operators and Functions >1+2 [1] 3 >1>2 [1] FALSE >1>2|2>1 [1] TRUE >A=1:3 >A [1] 1 2 3 >A*6 [1] 6 12 18 >A/10 [1] 0.1 0.2 0.3 >A%2 [1] 1 0 1 >B=4:6 >A*B [1] 4 10 18 >t(A)%*%B [1] [1] 32 >A%*%t(B) [1] [2] [3] [1] 4 5 6 [2] 8 10 12 [3] 12 15 18 >sqrt(A) [1] 1.000 1.1414 1.7320 >log(A) [1] 0.000 0.6931 1.0986 >round(sqrt(A),2) [1] 1.00 1.14 1.73 >ceiling(sqrt(A)) [1] 1 2 2 >floor(sqrt(A)) [1] 1 1 1 >eigen(A%*%t(B)) $values [1] 3.20e+01 5.83e-16 2.48e-16 $vectors [1] [2] [3] [1] 0.2672 0.3273 -0.8890 [2] 0.5345 -0.5217 0.2530 [3] 0.8017 0.4665 0.3810 21

22 Variable Types ItemDescriptions Vector X=c(10.4,5.6,3.1,6.4) or Z=array(data_vector, dim_vector) Matrices X=matrix(1:8,2,4) or Z=matrix(rnorm(30),5,6) FactorsStatef=factor(state) Listspts = list(x=cars[,1], y=cars[,2]) Data Frames data.frame(cbind(x=1, y=1:10), fac=sample(LETTERS[1:3], 10, repl=TRUE)) Functionsname=function(arg_1,arg_2,…) expression Missing Values NA or NAN 22

23 Define Your Own Function (1)  Use “ fix(myfunction) ” # a window will show up  function (parameter){ statements; return (object); # if you want to return some values }  Save the document  Use “ myfunction(parameter) ” in R 23

24 Define Your Own Function (2)  Example: Find all the factors of an integer 1. 2. 3. 24

25 Define Your Own Function (3)  When you leave the program, remember to save the work space for the next use, or the function you defined will disappear after you close R project. 25

26 Read and Write Files  Write Data to a CSV File  Write Data to a TXT File  Read TXT and CSV Files  Demo 26

27 Write Data to a TXT File  Usage: write(x,file,…) >X=matrix(1:6,2,3) >X [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 >write(t(X),file=“d:/out2.txt”,ncolumns=3) >write(X,file=“d:/out3.txt”,ncolumns=3) d:/out2.txt 1 3 5 2 4 6 d:/out3.txt 1 2 3 4 5 6 27

28 Write Data to a CSV File d:/out4.txt 1,2 3,4 5,6 d:/out5.txt 1,3,5 2,4,6  Usage: write.table(x,file=“foo.csv”,sep=“,”,…) > X=matrix(1:6,2,3) > X [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 >write.table(t(X),file=“d:/out4.txt”,sep=“,”,col.names=FALS E,row.names=FALSE) >write.table(X,file=“d:/out5.txt”,sep=“,”,col.names=FALSE, row.names=FALSE) 28

29 Read TXT and CSV Files  Usage: read.table(file,...) >X=read.table(file="d:/out2.txt") >X v1 v2 v3 1 1 3 5 2 2 4 6 > Y=read.table(file="d:/out5.txt",sep=",",header=FALSE) >Y V1 V2 1 1 2 2 3 4 3 5 6 29

30 Demo >Data=read.table(file="d:/01.csv",header=TRUE,sep=",") >Data Y X1 X2 1 2.651680 13.808990 26.75896 2 1.875039 17.734520 37.89857 3 1.523964 19.891030 26.03624 4 2.984314 15.574260 30.21754 5 10.423090 9.293612 28.91459 6 0.840065 8.830160 30.38578 7 8.126936 9.615875 32.69579 >mean(Data$Y) [1] 4.060727 >boxplot(Data$Y) 01.csv 30

31 Part 2 Motivation Examples 31

32 Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab F ( Female) 1- r ’ r ’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) A Bb a B A b a a B b A A B b a 32

33 Example 1 in Genetics (2)  r and r ’ are the recombination rates for male and female  Suppose the parental origin of these heterozygote is from the mating of. The problem is to estimate r and r ’ from the offspring of selfed heterozygotes.  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79 – 92.  http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/ba nk/handout12.pdf http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/ba nk/handout12.pdf 33

34 Example 1 in Genetics (3) b a B A A B b a a bb aA BB A A B A B b a b a 1/2 a B b A A B b a ABabaBAb Male(1-r)/2 r/2 Female(1-r ’ )/2 r ’ /2 34

35 Example 1 in Genetics (4) MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4 35

36 Example 1 in Genetics (5)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b* : the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 36

37 Example 1 in Genetics (6) 37

38 Example 1 in Genetics (7) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 38

39 Example 1 in Genetics (8) Suppose that we observe the data of y = (y1, y2, y3, y4) = (125, 18, 20, 24), which is a random sample from Then the probability mass function is 39

40 Estimation Methods  Frequentist Approaches: http://en.wikipedia.org/wiki/Frequency_probability Method of Moments Estimate (MME) http://en.wikipedia.org/wiki/Method_of_moments _%28statistics%29 Maximum Likelihood Estimate (MLE) http://en.wikipedia.org/wiki/Maximum_likelihood  Bayesian Approaches: http://en.wikipedia.org/wiki/Bayesian_probability 40

41 Method of Moments Estimate (MME)  Solve the equations when population means are equal to sample means: for k = 1, 2, …, t, where t is the number of parameters to be estimated.  MME is simple.  Under regular conditions, the MME is consistent!  More: http://en.wikipedia.org/wiki/Method_of_moments _%28statistics%29 http://en.wikipedia.org/wiki/Method_of_moments _%28statistics%29 41

42 MME for Example 1 Note: MME can ’ t assure 42

43 MME by R 43

44 MME by C/C++ 44

45 Maximum Likelihood Estimate (MLE)  Likelihood:  Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros.  Under regular conditions, the MLE is consistent, asymptotic efficient and normal!  More: http://en.wikipedia.org/wiki/Maximum_lik elihood 45

46 Example 2 (1) # of tossing head ( )probability 0(0,0,0)(1-p) 3 1(1,0,0) (0,1,0) (0,0,1)p(1-p) 2 2(0,1,1) (1,0,1) (1,1,0)p 2 (1-p) 3(1,1,1)p3p3 We toss an unfair coin 3 times and the random variable is If p is the probability of tossing head, then 46

47 Example 2 (2) Suppose we observe the toss of 1 heads and 2 tails, the likelihood function becomes One way to maximize this likelihood function is by solving the score equation, which sets the first derivative to be zero: 47

48 Example 2 (3)  The solution of p for the score equation is 1/3 or 1.  One can check that p=1/3 is the maximum point. (How?)  Hence, the MLE of p is 1/3 for this example. 48

49 MLE for Example 1 (1)  Likelihood  MLE: A B C 49

50 MLE for Example 1 (2)  Checking: (1) (2) (3) 50

51 Use R to find MLE (1) 51

52 Use R to find MLE (2) 52

53 Use C/C++ to find MLE (1) 53

54 Use C/C++ to find MLE (2) 54

55 Exercises  Write your own programs for those examples presented in this talk.  Write programs for those examples mentioned at the following web page: http://en.wikipedia.org/wiki/Maximum_li kelihood  Write programs for the other examples that you know. 55

56 More Exercises (1)  Example 3 in genetics: The observed data are (nO, nA, nB, nAB) = (176, 182, 60, 17) ~ Multinomial(r^2, p^2+2pr, q^2+2qr, 2pq), where p, q, and r fall in [0,1] such that p+q+r = 1. Find the likelihood function and score equations for p, q, and r. 56

57 More Exercises (2)  Example 4 in the positron emission tomography (PET): The observed data are n*(d) ~Poisson(λ*(d)), d = 1, 2, …, D, and  The values of p(b,d) are known and the unknown parameters are λ(b), b = 1, 2, …, B.  Find the likelihood function and score equations for λ(b), b = 1, 2, …, B.. 57

58 More Exercises (3)  Example 5 in the normal mixture: The observed data x i, i = 1, 2, …, n, are random samples from the following probability density function:  Find the likelihood function and score equations for the following parameters: 58


Download ppt "Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University"

Similar presentations


Ads by Google