Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assignments CS 434-534 fall 2015. Assignment 1 due 9-18-15 Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.

Similar presentations


Presentation on theme: "Assignments CS 434-534 fall 2015. Assignment 1 due 9-18-15 Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between."— Presentation transcript:

1 Assignments CS 434-534 fall 2015

2 Assignment 1 due 9-18-15 Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between 0 and 5 Use 25 samples for training, 75 for validation Fit polynomials of degree 1 – 5 to the training set. Calculate at each degree. Plot your result as shown in previous slide to find the “elbow” in E val and best complexity for data mining Use the full data set to find the optimum polynomial of best complexity Show this result as plot of data and fit on the same set of axes. Report the minimum sum of squared residuals and coefficient of determination

3 Assignment 2: Due now Suppose we want  < 0.1 with 90% confidence (i.e.  = 0.1) We require using We get Use a non-linear root-finding code to solve this implicit relationship for N with d VC = 3 and 6. Hint:

4 Classify beer bottles for 3 breweries with the most data Randomized shortened dataset on website For better results change label 6 to 3 Fit and bin results Calculate confusion matrix and training accuracy. Estimate CV-1 accuracy by “leave one out” method Calculate  and confidence in  Revised Assignment 3 due 10-2-15

5 Assignment 4: Due 10-16-15 Code the Logistic Regression Algorithm for fixed learning rate. Use stopping criteria on slide 48. For w(0), use random numbers uniformly distributed on [0,1]. Modify csv file logit-data on the class web page as needed to obtain a training set of 298 samples. For 10 different draws of w(0) find the optimum w and E in For the best case (smallest E in ), use risk scores to construct a 2x2 confusion matrix. Use risk scores of the best case to calculate the probability of a heart attack for each example in the training set. Plot these probabilities using different symbols for positive and negative examples.

6 Revised assignment 5 due 10-30-15 Find the eigenvalues and eigenvectors of the covariance matrix for data set randomized shortened glassdata.csv. Plot the PoV. How many eigenvalues are required to capture more than the 90% of the variance? Transform the attribute data by the eigenvectors of the 3 largest eigenvalues. Do a scatter plot of pc1 vs pc2 with data labels. Use a validation set of 100 examples to find the best quadratic extension of a linear model by successively including z 1 2, z 2 2, z 3 2, z 1 z 2, z 1 z 3, and z 2 z 3. Plot E val vs number of added terms and identify the elbow. Use all data to compare the best quadratic extension with the linear model in attribute space (confusion matrix and fraction correctly classified).

7 Assignment 6 due 11-13-15 Use dataset randomized shortened glassdata.csv to develop a classifier for beer-bottle glass by ANN non-linear regression. Keep the class labels as 1, 2, and 6. With validations set of 100 examples and training set of 74 examples, select the best number of hidden nodes in a single hidden layer and the best number of epochs for weight refinement. Use all the data to optimize weights at the selected structure and training time. Calculate confusion matrix and accuracy of prediction. Use 10-fold cross validation to estimate the accuracy of a test set. MatLab code for calculating confusion matrices is on the class web page


Download ppt "Assignments CS 434-534 fall 2015. Assignment 1 due 9-18-15 Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between."

Similar presentations


Ads by Google