Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 4705 Artificial Intelligence

Similar presentations


Presentation on theme: "CSE 4705 Artificial Intelligence"— Presentation transcript:

1 CSE 4705 Artificial Intelligence
Jinbo Bi Department of Computer Science & Engineering

2 Machine learning (1) Supervised learning algorithms

3 Topics in machine learning
Supervised learning such as classification and regression Unsupervised learning such as cluster analysis, outlier/novelty detection Dimension reduction Semi-supervised learning Active learning Online learning

4 Common techniques Supervised learning Regularized least squares
Least-absolute-shrinkage-and-selection operator Neural networks Logistic regression Decision trees Fisher’s discriminant analysis Support vector machines Graphical models

5 Common techniques Unsupervised learning K-means
Gaussian mixture models Hierarchical clustering Graph-based clustering (e.g., Spectral clustering)

6 Common techniques Dimension reduction Principal component analysis
Independent component analysis Canonical correlation analysis Feature selection Sparse modeling

7 Machine learning / Data mining
Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information ACM SIGKDD conference The ultimate goal of machine learning is the creation and understanding of machine intelligence ICML conference Heavily related to statistical learning theory Artificial intelligence is the intelligence exhibited by machines or software. It is to study how to create computers and computer software that are capable of intelligent behavior. AAAI conference

8 Supervised learning: definition
Given a collection of examples (training set ) Each example contains a set of attributes (independent variables), one of the attributes is the target (dependent variables). Find a model to predict the target as a function of the values of other attributes. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

9 Supervised learning: definition
Given a collection of examples (training set ) Each example contains a set of attributes (independent variables), one of the attributes is the target (dependent variables). Find a model to predict the target as a function of the values of other attributes. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

10 Supervised learning: classification
When the dependent variable is categorical, a classification problem

11 Classification: example
Face recognition Goal: Predict the identity of a face image Approach: Align all images to derive the features Model the class (identity) based on these features

12 Supervised learning: regression
When the dependent variable is continuous, a regression problem

13 Regression: example Risk prediction for patients
Goal: Predict the likelihood if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Monitor patients by expert medical professionals to rate the likelihood of a patient having complication Learn a model as patient vital signs to map to the risk ratings. Use this model to detect potential high-risk patients for a particular surgical procedure

14 Unsupervised learning: clustering
Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Similarity Measures: Euclidean Distance if attributes are continuous. Other Problem-specific Measures

15 Clustering: example High Risky Patient Detection
Goal: Predict if a patient will suffer major complication after a surgery procedure Approach: Use patients vital signs before and after surgical operation. Heart Rate, Respiratory Rate, etc. Find patients whose symptoms are dissimilar from most of other patients.

16 Practice Judge what kind of the problem it is in the following scenarios A student collected a couple of online documents about movies, and try to identify which movie the documents discuss In a cognitive test, a person is asked if he could recognize the “red” color from a screen. The person needs to press a button if he thinks he sees red, or otherwise not. Then an EEG recording is made during the test. A researcher wants to use the EEG recordings to predict whether the red color is recognized. A researcher observed and recorded whether conditions (temperature, wind speed, snow etc.) from the past month, then he wants to use the data to predict the temperature in the next day.

17 Practice Judge what kind of the problem it is in the following scenarios A student collected a couple of online documents about movies, and try to identify which movie the documents discuss In a cognitive test, a person is asked if he could recognize the “red” color from a screen. The person needs to press a button if he thinks he sees red, or otherwise not. Then an EEG recording is made during the test. A researcher wants to use the EEG recordings to predict whether the red color is recognized. A researcher observed and recorded whether conditions (temperature, wind speed, snow etc.) from the past month, then he wants to use the data to predict the temperature in the next day.

18 Review of probability and linear algebra

19 Basics of probability An experiment (random variable) is a well-defined process with observable outcomes. The set or collection of all outcomes of an experiment is called the sample space, S. An event E is any subset of outcomes from S. Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.

20 Probability theory

21 Probability theory Joint Probability Marginal Probability
Conditional Probability Joint Probability

22 Probability theory Sum Rule Product Rule
The marginal prob of X equals the sum of the joint prob of x and y with respect to y Product Rule The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X

23 Illustration Y=1 Y=2 p(X) p(Y) p(X|Y=1) p(X,Y)

24 The rules of probability
Sum Rule Product Rule Bayes’ Rule = p(X|Y)p(Y) posterior  likelihood × prior

25 Application of probability rules
Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25* *0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3

26 Application of probability rules
Assume P(Y=r) = 40%, P(Y=b) = 60% P(X=a|Y=r) = 2/8 = 25% P(X=o|Y=r) = 6/8 = 75% P(X=a|Y=b) = 3/4 = 75% P(X=o|Y=b) = 1/4 = 25% p(X=a) = p(X=a,Y=r) + p(X=a,Y=b) = p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20 =0.25* *0.6 = 11/20 p(Y=r|X=o) = p(Y=r,X=o)/p(X=o) = p(X=o|Y=r)p(Y=r)/p(X=o) = 0.75*0.4 / (9/20) = 2/3

27 Mean and variance The mean of a random variable X is the average value X takes. The variance of X is a measure of how dispersed the values that X takes are. The standard deviation is simply the square root of the variance.

28 Simple example X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2 Mean
0.8 X X 2 = 1.2 Variance 0.8 X (1 – 1.2) X (1 – 1.2) X (2 – 1.2) X (2-1.2)

29 Gaussian distribution

30 Gaussian distribution

31 Multivariate Gaussian
x y

32 Basics of linear algebra

33 Matrix multiplication
The product of two matrices Special case: vector-vector product, matrix-vector product C A B

34 Matrix multiplication

35 Rules of matrix multiplication
B

36 Vector norms

37 Matrix norms and trace

38 A bit more on matrix

39 Orthogonal matrix 1 .

40 Square matrix – eigenvalue, eigenvector
where

41 Symmetric matrix eigen-decomposition of A

42 Singular value decomposition
orthogonal orthogonal diagonal

43 Supervised learning – practical issues
Underfitting Overfitting Before introducing these important concept, let us study a simple regression algorithm – linear regression

44 Questions?


Download ppt "CSE 4705 Artificial Intelligence"

Similar presentations


Ads by Google