Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Artificial Intelligence Classification

Similar presentations


Presentation on theme: "Advanced Artificial Intelligence Classification"— Presentation transcript:

1 Advanced Artificial Intelligence Classification
Chung-Ang University, Hae-Cheon Kim Reference #1: Pattern Classification (Richard O. Duda, etc.) #2: Wikipedia ( Hello, I am Hae-Cheon Kim in Machine Intelligence Lab. In this representation, I would like to announce about classification.

2 Contents 1. Introduction about Classification
2. List of classification Rule based Decision Tree K-Nearest Neighbor Support Vector Machine Naïve Bayes Neural Network 3. Applications 4. Q&A I will introduce the classification in artificial intelligence and then talk about six representative classification methods in this presentation. And I will explain some simple applications using classification and QA, and over this presentation. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

3 Introduction No Name Antigen-a Antigen-b Antigen-d Blood Type 1 Alice
Ο × A 2 Bob B 3 Eve 4 Mallory 5 Trent O There is a dataset of showing five people’s blood type and characteristics. We already know, there are four blood types, A type, B type, O type and AB type. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

4 Introduction No Name Antigen-a Antigen-b Antigen-d Blood Type 1 Alice
Ο × A 2 Bob B 3 Eve 4 Mallory 5 Trent O A B AB O Hae-Cheon × Ο And there is a characteristic of my blood, my blood has antigen-b and antigen-d, and don’t has antigen-a. What is my blood type here? This process is called classification. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

5 Classification ? There is a set of labels 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞]
Given a test instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. For example Test instance 𝑥= ×, O, O Label 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 ℎ ×, O, O = ? A B AB O ? Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο By definition, if there are a set of possible labels Y and test instance with feature vector x, the classification means all task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

6 Classification ? There is a set of labels 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞]
Given a test instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the most relevant label 𝑦 𝑥 for instance 𝑥. For example Test instance 𝑥= ×, O, O Label 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 ℎ ×, O, O = ? A B AB O ? Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο In the above example, the classification is to find Hae-Cheon’s blood type when his feature vector 𝑥 is ×, O, O and the result exists in 𝑌= 𝐴, 𝐵, 𝐴𝐵, 𝑂 . Then, how to find that? Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

7 Sample Dataset in real world
Classifier There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Assumption Training dataset contains the world. Sample Dataset in real world Training Classifier ℎ(𝑥) Name Antigen-a Antigen-b Antigen-d Hae-Cheon × Ο Input Output To find the answer, the function is need, the function is called 'classifier’. To predict right answers, many classifiers use a sample data set from real world for high accuracy. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

8 List of Classifiers Rule-based Decision Tree K-Nearest Neighbor
Support Vector Machine Naïve Bayes Neural Network There is six representative classifiers and they train using given dataset in their own way. Let's see how each classifier tries to classify given instances. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

9 Rule-based Classifier
Classify records by using a collection of “if…then…”, “switch” rules. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O Blood Type is AB Not Training! First, there is a rule-based classifier. The classifier uses handmade rules, for example, “if…then…”, or “switch”. The rules are written by the programmer directly, not training! Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

10 Rule-based Classifier
Classify records by using a collection of “if…then…” rules. Example) Classification rule of the blood type. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O Blood Type is AB Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο Anti-a Anti-b O X AB B A Blood Type B For example, in the blood types problem, the rule-based classifier classifies using scientific facts. And if the classifier gets a new instance data, the classifier runs the rule-code and classifies the new instance. Advanced Artificial Intelligence / Chung-Ang University / Your name here

11 Rule-based Classifier
Classify records by using a collection of “if…then…” rules. Example) Classification rule of the blood type. There is a people 𝑃. If 𝑃 has Antigen-a & has not Antigen-b: Blood Type is A If 𝑃 has not Antigen-a & has Antigen-b: Blood Type is B If 𝑃 has not Antigen-a & has not Antigen-b: Blood Type is O If 𝑃 has Antigen-a & has Antigen-b: Blood Type is AB Name Antigen-a Antigen-b Antigen-d Blood Type Hae-Cheon × Ο Anti-a Anti-b O X AB B A Blood Type B The picture shows the process of classifying by my blood type according to the defined rules. The people has not Antigen-a and has Antigen-b. Then, the people’s blood type is ‘B’. Advanced Artificial Intelligence / Chung-Ang University / Your name here

12 Decision Tree The second classifier is decision tree.
The decision tree classifier uses a tree structure which a node has dividing rules and leaf node has class (or label). Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

13 Decision Tree If new instance comes in, it starts from the top-node, searches for leaf nodes according to the criteria of each node and returns the class that the leaf has. Therefore, It is important to create efficient dividing rules. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

14 Decision Tree Dataset: List of Titanic passengers
SibSp: Number of Siblings/Spouses Aboard Training with high accuracy & divided as large as possible. Name Age Gender SibSp Survived DiCaprio 24 Man 1 × Kate 17 Woman 4 𝐎 Edward 62 To be efficient, the decision tree makes rules from dataset, and the process is called training in decision tree. For example, if you want to make about alive passengers of Titanic, the classifier needs to train a list of Titanic passengers. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

15 Decision Tree CART algorithm(Classification And Regression Tree)
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. CART algorithm(Classification And Regression Tree) The goal: Maximize information gain (IG) 𝐼𝐺(𝐷, 𝑓)= 𝐼(𝐷)− 𝑗=1 𝑚 𝑁 𝑗 𝑁 𝐼 𝐷 𝑐ℎ𝑖𝑙𝑑 Metric Gini impurity 𝐼 𝐺 𝑡 = 𝑖=1 𝑐 𝑝 𝑖 𝑡 1−𝑝 𝑖 𝑡 =1− 𝑖=1 𝑐 𝑝 𝑖 𝑡 2 Entropy 𝐼 𝐻 𝑡 =− 𝑖=1 𝑐 𝑝 𝑖 𝑡 log 2 𝑝 𝑖 𝑡 How does the classifier find the dividing rules in the dataset? A typical example of such an algorithm is the Classification And Regression Tree (CART) algorithm. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

16 Decision Tree CART algorithm(Classification And Regression Tree)
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. CART algorithm(Classification And Regression Tree) The goal: Maximize information gain (IG) 𝐼𝐺(𝐷, 𝑓)= 𝐼(𝐷)− 𝑗=1 𝑚 𝑁 𝑗 𝑁 𝐼 𝐷 𝑐ℎ𝑖𝑙𝑑 Metric Gini impurity 𝐼 𝐺 𝑡 = 𝑖=1 𝑐 𝑝 𝑖 𝑡 1−𝑝 𝑖 𝑡 =1− 𝑖=1 𝑐 𝑝 𝑖 𝑡 2 Entropy 𝐼 𝐻 𝑡 =− 𝑖=1 𝑐 𝑝 𝑖 𝑡 log 2 𝑝 𝑖 𝑡 This algorithm creates a dividing rules of decision tree, so that maximize information gain when it passes through each branch point. Typically, the Gini impurity and Entropy are used to obtain the gain of information. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

17 Decision Tree An example tree which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. Using CART algorithm, a decision tree can get a fairly accurate tree from given dataset. There is a example of which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

18 Decision Tree An example tree which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery was started. The first figure shows the probability of the patient is present in kyphosis. The second graph is a 3D graph of the probability of the feature dimension and the third is 2D representation. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

19 Nearest Neighbor Find the one data in training data.
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Find the one data in training data. Most similar to 𝑥. Most closet to 𝑥. Find 𝑥 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 Then, there is 𝑦 𝑘 that match 𝑥 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 . Return 𝑦 𝑘 Third, there is Nearest Neighbor classifier. The principle of Nearest Neighbor is very simple, finds the data closest to the input data, and outputs the result of the data. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

20 K-Nearest Neighbor Find the 𝑘 data in training data.
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. Find the 𝑘 data in training data. More similar to 𝑥. More closet to 𝑥. Find 𝑥 𝑠 1 , 𝑥 𝑠 2 ,…, 𝑥 𝑠 𝑘 . Then, there is 𝑦 𝑠 1 , …, 𝑦 𝑠 𝑘 . Return  average of 𝑦 𝑠 1 , …, 𝑦 𝑠 𝑘 . If the classifier is k-Nearest Neighbor, find the k data and return the average of k data’s labels. Assume the k is 3, then in the above figure, the green object is like triangle. Assume the k is 5, then the green object is like square. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

21 K-Nearest Neighbor Divide area using K-Nearest Neighbor.
There are the graphs to show the result area in feature dimension space, and these figures show why we use the k-Nearest Neighbor. We can see the k-Nearest Neighbor classifier is smoother then Nearest Neighbor classifier because is averaged. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

22 K-Nearest Neighbor How to calculate distance? Minkowski distance
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. How to calculate distance? Minkowski distance Manhattan distance (𝑝=1) Euclidean distance (𝑝=2) 𝐷 𝑥, 𝑧 = 𝑖=1 𝑑 𝑥 𝑖 − 𝑧 𝑖 𝑝 𝑝 Then, how to calculate the distance between each data? In k-NN classifier, measuring method becomes an important issue. Most use the Minkowski distance method. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

23 K-Nearest Neighbor How to calculate distance? Minkowski distance
There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier is a function ℎ(𝑥) that classifies a class of instance 𝑥. How to calculate distance? Minkowski distance Manhattan distance (𝑝=1) Euclidean distance (𝑝=2) 𝐷 𝑥, 𝑧 = 𝑖=1 𝑑 𝑥 𝑖 − 𝑧 𝑖 𝑝 𝑝 The Manhattan distance when 𝑝=1 and Euclidean distance when 𝑝=2. The right figure shows the area of distance 1 according to the variation of 𝑝. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

24 Support Vector Machine
A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) Fourth, there is Support Vector Machine. The objective of the support vector machine is to find a hyperplane in an 𝑑-dimensional space that distinctly classifies the data points. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

25 Support Vector Machine
A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) For example, two classes of data are distributed in feature dimension space as shown in first figure. Then you can decide the position and angle of hyperplane that divides to two classes like the second picture. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

26 Support Vector Machine
A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) The Support Vector Machine (SVM) is based on finding a hyperplane that gives the maximum margin between the two classes, as given in the third figure. After finding the SVM, classifier uses the hyperplane to classify input instances. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

27 Support Vector Machine
A set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] A instance 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classifier ℎ(𝑥) Hyperplane equation 𝑤𝑥+𝑏=𝑦 Training matrix 𝑤 and vector 𝑏 such that Where 𝑤 𝑥 𝑖 −𝑏≥1 arg min (𝑤, 𝑏) 𝑤 Mathematically, when the hyperplane linear equation is 𝑤𝑥+𝑏=𝑦, find matrix 𝑤 and vector 𝑏 is minimized to a matrix of where 𝑤 𝑥 𝑖 + 𝑏 is bigger then 1. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

28 Support Vector Machine
If the dataset do not divide linear function? Kernel mapping function If the dataset doesn’t divide linear function, then the classifier can use kernel mapping technique. If you apply a kernel mapping function to a dataset, you can find a better hyperplane because the dimension is higher, the figure shows the dataset mapping into higher dimension space (like Gaussian shape mountain). Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

29 arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘
Naïve Bayes There are a set of class 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Given a instance with feature vector 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 Classification All task of finding the label 𝑦 𝑥 that most closely relates to the instance 𝑥. Find 𝑦 𝑥 such that arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 Fifth, there is a Naïve Bayes classifier. The classifier uses a method of finding the most probable label by estimating the posterior probability by applying the prior probability obtained from the data set. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

30 Naïve Bayes Using Bayes' theorem 𝑃 𝐴 𝐵 = 𝑃 𝐵|𝐴 ⋅𝑃 𝐴 𝑃 𝐵
𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Using Bayes' theorem 𝑃 𝐴 𝐵 = 𝑃 𝐵|𝐴 ⋅𝑃 𝐴 𝑃 𝐵 Find y x such that arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑥 = 𝑃 𝑦 𝑘 𝑃 𝑥 𝑦 𝑘 𝑃 𝑥 =𝑃 𝑦 𝑘 𝑃 𝑥 𝑦 𝑘 ≈𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 This is where we apply the Bayes' rule to derive the approximate equation. Since 𝑥 is a given value, 𝑃(𝑥) can be ignored, and if 𝑦 𝑘 is determined, conditional independence from 𝑥 1 to 𝑥 𝑑 , then we can derive as a most right-hand side. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

31 Neural Network Perceptron(Neural)
𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Perceptron(Neural) It is a perceptron made from human neurons. Linear function: 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 :=𝑎 Non-linear function 𝑓 𝑎 Neural Network using Perceptron Finally, there is a Neural Network a classifier, that made by perceptron like neural network like human neuron. What perceptron does is read the values from the previous neurons and combine them linearly, then apply a non-linear function(that called activation function) to produce better performance. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

32 Neural Network Three Layer Input Layer – 𝑑 neural Hidden Layer
𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Three Layer Input Layer – 𝑑 neural Hidden Layer Output Layer – 𝑞 neural The form of neural network has three or more layers, input, hidden and output. Input layer must be composed 𝑑 neural and output layer must be composed 𝑞 neural. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

33 Neural Network Three Layer Input Layer – 𝑑 neural Hidden Layer
𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] Three Layer Input Layer – 𝑑 neural Hidden Layer Output Layer – 𝑞 neural How to training Backpropagation (Differential) 𝑤 𝑖𝑘  𝑤 𝑖𝑘 −𝜎 𝜗 𝐸 𝑟𝑟𝑜𝑟 𝜗 𝑤 𝑖𝑘 A typical training method uses a backpropagation method. That use differential and change the variable in the direction that the error decreases. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

34 Neural Network 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞]
𝑓 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏 =𝑦 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] There is example, if the new instance is given. If the new instance is coming, import the neural network. And find the label that highest probability and select highest thing. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

35 Applications Final, the classification is used in many problems, for example, text categorization or image classification. The text categorization is to judge what a kind of document, likes spam filter. The image classification is what is the object in images, in the figure, the classifier finds using feature in image. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim

36 Advanced Artificial Intelligence Classification
Chung-Ang University, Hae-Cheon Kim QnA

37 After this slice, not used in presentation, just temptation slices.

38 Naïve Bayes 𝑥= 𝐹, 𝐹, 𝑇 If 𝑦=𝑇: = 2 5 ⋅ 1 2 ⋅ 1 2 ⋅ 1 2 = 1 20 If 𝑦=𝐹:
arg max 𝑦 𝑘 ∈𝑌 𝑃 𝑦 𝑘 𝑖=1 𝑑 𝑃 𝑥 𝑖 𝑦 𝑘 𝑥= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 𝑌= 𝑦 𝑘 | 𝑘=[1,𝑞] 𝑥= 𝐹, 𝐹, 𝑇 If 𝑦=𝑇: 𝑃 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅𝑃 𝑊=𝐹 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅ 𝑃 𝑋=𝐹 𝑐𝑙𝑎𝑠𝑠=𝑇 ⋅𝑃 𝑌=𝑇 𝑐𝑙𝑎𝑠𝑠=𝑇 = 2 5 ⋅ 1 2 ⋅ 1 2 ⋅ 1 2 = 1 20 If 𝑦=𝐹: 𝑃 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅𝑃 𝑊=𝐹 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅ 𝑃 𝑋=𝐹 𝑐𝑙𝑎𝑠𝑠=𝐹 ⋅𝑃 𝑌=𝑇 𝑐𝑙𝑎𝑠𝑠=𝐹 = 3 5 ⋅ 1 3 ⋅ 2 3 ⋅ 2 3 = 4 45 1 20 <  Class F Let’s see the example. When the table is a dataset, what is the label of 𝑥= 𝐹, 𝐹, 𝑇 ? Using the approximate equation, we can estimate the posterior probability for each class. Advanced Artificial Intelligence / Chung-Ang University / Hae-Cheon Kim


Download ppt "Advanced Artificial Intelligence Classification"

Similar presentations


Ads by Google