Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python.

Similar presentations


Presentation on theme: "COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python."— Presentation transcript:

1 COMP 4332 Tutorial 1 Feb 16 WANG YUE ywangby@connect.ust.hk Tutorial Overview & Learning Python

2 Project-oriented tutorials Project and assignments count for 80% of your grade. You will write code in a few languages/tools. More importantly, you will do experiments! Very different from COMP4331. Light on concepts/math. Heavy hands-on course. COMP 433 2 = COMP 433 1 + COMP 433 1

3 A data mining project requires... 1. Explore data and data preprocessing. 2. Trying algorithms, SVM, Logistic Regression, Decision Trees, Dimensionality Reduction, etc... And try varying parameters in each algorithm. Labor intensive! Sometimes frustrated. 3. Summarize findings and design new methods and go back to step 2. Repeatedly go to step 1 to re- processing the data to feed into different tools. The creative part!

4 1. Explore data/look at the data Visualization: 1D data summary: mean, variance, median, skewness; density estimation(pdf), cdf; outliers, etc. 2D data summary: scatter plot, QQ-plot, correlation scores, etc. High-dimensional data summary: dimensionality reduction and plot to 2D or 3D Store data and extract wanted part. Organized: SQL like queries... Quick and dirty: write a script for each operation...

5 2. Run experiments using tools Most of the time, tools are available. Weka, libsvm, etc.. Sometimes, you need to implement a variant of existing algorithm. A different decision tree A classifier handles unbalanced data Run the methods and vary parameters and plot results and trends. Good news:) Numerical code is generally hard to write correctly (hard to DEBUG!). You will do this in this course!

6 3. Summarize findings and design new methods After each iteration of step 1 and 2, you know more about the data, you may have new ideas and go back to step 1 and 2. But before that, first document your findings.

7 A cloud of tools... Data preprocessing: Python, Java/C++, SQL, Excel, text editors.... Visualization: Excel, Matlab, R, matlibplot SVM: libsvm, svmlight, liblinear packages Logistic regression: liblinear Decision Trees & tree ensemble: Weka, FEST Matrix factorization: libfm, GraphLab

8 Teaching all of them is impossible You have to take time to read the manuals of these tools, and sometimes source code of them! Through this course, we will use Python to illustrate Data preprocessing (mostly its string processing) Algorithm implementation (numpy/scipy) Automaticly perform experiments Simple plotting (matlibplot) Sometimes, we use R’s plotting packages (core, ggplot2) if matlibplot does not fit the requirement.

9 Why Python Easy to learn and easy to use. A good tool for us to illustrate the three steps of doing a data mining project. A concise and powerful language. A glue language. Easily integrate components written in other languages. Widely used in IT industries. Organizations using PythonOrganizations using Python We would use latest python version in this course(python3.4)

10 Setup Python Scientific Environment Anaconda Scientific Python Distribution It includes over 195 of the most popular Python packages for science, math, engineering, data analysis. (numpy, scipy, sklearn, matplotlib) Cross Platform No need to install scientific package one by one Default IDE is weak. Recommended IDEs: Sublime Text (recommended) PyCharm (recommended) Eclipse + pydev (cross platform) Or simply Notepad++ editor with syntax highlighting (only in Windows)

11 Learn Python The official Python tutorial. Written for experienced programmers.official Python tutorial Read it twice and try every code snippet in the tutorial. Code Like a Pythonista: Idiomatic Python Python Howto: sort, logging, functional programming, etc. Python Howto MIT 6.00 course material. MIT 6.00 course material Liang Huang’s Python Short Course.Python Short Course numpy examples and scipy tutorial. numpy examples scipy tutorial Best place to ask a Python-related question: http://stackoverflow.com/. It is better to send your Python question to Stackoverflow rather than to our mailing list. http://stackoverflow.com/

12 Learn Python (Books) A Byte of Python Learning Python Python Cookbook Moving from Python2 to Python3

13 Play with Python data structures basic types: bool, integer, float, complex tuple: (x, y,..) list: [x, y,...] string: ‘hello’, “world” dictionary: { x: a, y: b,... } set: set([a, b, c, d]) iteratable/sequence: a unified view for data structures tuple/list/dictionary/set/string are all iteratable.

14 Learning By Doing 1. Go through basic Python data structures and their operations. 2. Show Python’s functions and control structures (if-then- else/for/while).


Download ppt "COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python."

Similar presentations


Ads by Google