Download presentation

Presentation is loading. Please wait.

Published byStacey Darling Modified over 2 years ago

1
A Data Mining Course for Computer Science and non Computer Science Students Jamil Saquer Computer Science Department Missouri State University Springfield, MO

2
Outline Introduction Introduction Motivation Motivation Challenges Challenges Design of the Course Design of the Course Topics Covered Topics Covered Assignments Assignments Examination Format Examination Format Conclusion Conclusion

3
Introduction What is data mining (DM)? What is data mining (DM)? non-trivial process of identifying valid, novel, useful, and ultimately understandable patterns in large volumes of data. non-trivial process of identifying valid, novel, useful, and ultimately understandable patterns in large volumes of data. DM is an interdisciplinary topic DM is an interdisciplinary topic Has many things in common with machine learning and pattern recognition Has many things in common with machine learning and pattern recognition

4
Motivation for the Course Introducing more electives Introducing more electives Introducing graduate level CS courses Introducing graduate level CS courses Informatics Program Informatics Program Interest to faculty members and students from other departments Interest to faculty members and students from other departments Authors main area of research Authors main area of research

5
Challenges in Designing the Course Diverse student population Diverse student population CS vs. non-CS CS vs. non-CS undergrad vs. grad undergrad vs. grad Solution Solution Informatics program in design stages Informatics program in design stages MNAS CS option is new MNAS CS option is new Therefore, emphasis on undergrad CS studentsTherefore, emphasis on undergrad CS students

6
Accommodating other students Minimize prerequisites Minimize prerequisites CS 2 (or even CS 1) CS 2 (or even CS 1) Capable of using a DM software Capable of using a DM software Scientific background/mentality Scientific background/mentality One from business, another from GGPOne from business, another from GGP For grad CS students: For grad CS students: project requires more researchproject requires more research Tests could be a little differentTests could be a little different Emphasize understanding basic DM concepts and using software for mining data Emphasize understanding basic DM concepts and using software for mining data

7
Design of the Course Used book by Dunham Used book by Dunham Book divided into 3 parts Book divided into 3 parts About 1 week spent on definitions, applications, motivations, challenges, … About 1 week spent on definitions, applications, motivations, challenges, … Core of the course spent on core DM subjects: classification, clustering, mining association rules Core of the course spent on core DM subjects: classification, clustering, mining association rules Last week for project presentations Last week for project presentations

8
Classification Assigning objects to classes Assigning objects to classes supervised learning supervised learning Example: classify a military vehicle as a friendly or an enemy vehicle Example: classify a military vehicle as a friendly or an enemy vehicle Methods covered include: decision trees, Naïve Bayesian, k-nearest neighbor, backpropogation Methods covered include: decision trees, Naïve Bayesian, k-nearest neighbor, backpropogation

9
Clustering Grouping objects into different classes Grouping objects into different classes unsupervised learning unsupervised learning Example: cluster Weblog data to discover groups of similar access patterns Example: cluster Weblog data to discover groups of similar access patterns Techniques covered include: link algorithms, nearest neighbor, k-means, PAM, BIRCH, DBSCAN, CURE, ROCK Techniques covered include: link algorithms, nearest neighbor, k-means, PAM, BIRCH, DBSCAN, CURE, ROCK

10
Association Rules Finding patterns that occur together Finding patterns that occur together Example: diapers and beer are usually bought together Example: diapers and beer are usually bought together Techniques covered: Apriori, sampling, partitioning, FP-growth Techniques covered: Apriori, sampling, partitioning, FP-growth

11
Assignments Students need to learn how to mine data Students need to learn how to mine data One assignment on each core DM topic One assignment on each core DM topic apply two different algorithms on at least two data sets, one has to be relatively large apply two different algorithms on at least two data sets, one has to be relatively large can use any DM package (Weka) can use any DM package (Weka) Students write a report Students write a report Students learn how to run an experiment Students learn how to run an experiment

12
Term Project Group projects Group projects Either provide a non-trivial implementation of a DM algorithm Either provide a non-trivial implementation of a DM algorithm Or, learn about a DM topic not discussed in class Or, learn about a DM topic not discussed in class Graduate students required to read at least three research papers and to write a report Graduate students required to read at least three research papers and to write a report All students present their project in class All students present their project in class

13
Examination Format Open book Open book Two types of questions Two types of questions First type, require basic knowledge of the material First type, require basic knowledge of the material definitions, T/F, short answers definitions, T/F, short answers Second type, apply certain algorithms on small data sets Second type, apply certain algorithms on small data sets

14
Conclusion DM is an interesting course for CS and non-CS students DM is an interesting course for CS and non-CS students DM can be taught for non-CS students DM can be taught for non-CS students A DM course can be taught for students with minimal CS background A DM course can be taught for students with minimal CS background

15
Questions

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google