Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.

Slides:



Advertisements
Similar presentations
A small taste of inferential statistics
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.
A number of MATLAB statements that allow us to control the order in which statements are executed in a program. There are two broad categories of control.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Biologically Inspired Computing: Operators for Evolutionary Algorithms
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 22 By Herbert I. Gross and Richard A. Medeiros next.
Planning under Uncertainty
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
AP World History Multiple Choice Exam.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Test Taking Advice.
Measures of Central Tendency
Fundamentals of Python: From First Programs Through Data Structures
College Algebra Prerequisite Topics Review
Workforce Engagement Survey Accessing your survey results and focussing on key messages in the survey data.
Genetic Algorithm.
How to make a presentation (Oral and Poster) Dr. Bernard Chen Ph.D. University of Central Arkansas July 5 th Applied Research in Healthy Information.
Chapter An Introduction to Problem Solving 1 1 Copyright © 2013, 2010, and 2007, Pearson Education, Inc.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Randomized Turing Machines
بسم الله الرحمن الرحیم. Ehsan Khoddam Mohammadi M.J.Mahzoon Koosha K.Moogahi.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 Extending the Number System.
Multiplying Whole Numbers © Math As A Second Language All Rights Reserved next #5 Taking the Fear out of Math 9 × 9 81 Single Digit Multiplication.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Unaddition (Subtraction)
Developing an Algorithm
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 2.1.
8 Strategies for the Multiple Choice Portion of the AP Literature and Composition Exam.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 22, 2012.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
ALGORITHMS.
Artifact Inquiry Goal: Working in groups, students will identify the functions of previously unknown objects by their appearances and/or structures.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
ANOVA, Regression and Multiple Regression March
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 24, 2012.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
VizTree Huyen Dao and Chris Ackermann. Introducing example
Scholastic Aptitude Test Developing Critical Reading Skills Doc Holley.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
An Introduction to Prime Factorization by Mrs. Gress
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 April 15, 2013.
Algebra 2 Complex Numbers Lesson 4-8 Part 1. Goals Goal To identify, graph, and perform operations with complex numbers. Rubric Level 1 – Know the goals.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
3.3 Dividing Polynomials.
Lesson 8: Basic Monte Carlo integration
Expanding and Factoring Algebraic Expressions
Simplifying Algebraic Expressions
Core Methods in Educational Data Mining
By Dr. Abdulrahman H. Altalhi
Learning Sequence Motif Models Using Expectation Maximization (EM)
STANDARD ALGORITHMS YEARS 4-7
Measuring Polygon Side Lengths
A10 Generating sequences
Biologically Inspired Computing: Operators for Evolutionary Algorithms
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012

Today’s Class Motif Extraction

Today… We’re going to discuss a method that I’ve never used In fact, to the best of my knowledge it has only been used once in EDM It is a key method in bioinformatics, and I think it has a lot of potential for EDM

Today… You might ask: why are we discussing it? It is a key method in bioinformatics, and I think it has a lot of potential for EDM

Since… It‘s not a well-established method in EDM We’ll focus on a single paper more than usual And brainstorm together for how the method might be applicable more broadly in educational problems – As well as other relevant problems in the social sciences

Motif Short, recurring pattern in a sequence of categories occurring over time

Motif in Music Short, recurring pattern of notes in a musical composition

Motif in Music What’s the motif? KIY KIY How many times does the motif occur?

Motif in Music What’s the motif? KIY KIY How many times does the motif occur? – Depends on how you define it, right? – And that’s part of the challenge…

Motif in Language Short, recurring pattern of characters in a sequence of characters occurring over time

Motif in Genetics Short, recurring pattern of genes in a sequence of genes occurring over time Typically written as letters

Goal of Motif Extraction Discern a common pattern of characters in a large corpus of characters The characters may vary slightly from case to case

Can you find the motif?

UBSWWDFKLWPRHUC JBDPXBDVEJVMBKK VBDWNLROFVUBFFW OWIFTIENDOXJXIOB AUAAOOXZAABZSBT MUAWSNTVZXSFHMI LFQRKUTFRIENDOV LOMTPOQHJVYYMFJ LWGJMVPKYOZNMSA RUPMFOHPVSPPVPT BAZXVFTPQFQJVBM HLPMOKUOXGRIENDO INUSUNSGDAAICAV XRZZWCDXOVZZJKQ VOVCROMCJTOLXYU HUVRYFREENDOBBGC AQJBVXJCAJLEMAU ONJORIFCGAUGIRN PJGCHBDQIWJJTMQ IQYQHKKBNBVDFPV JJLHWPZAYZIGGEH IGJZRMAAWJBESSS JXZFRIEMDOVZRBJY IRPWYIRJISLFVFF

How would you describe the motif? UBSWWDFKLWPRHUC JBDPXBDVEJVMBKK VBDWNLROFVUBFFW OWIFTIENDOXJXIOB AUAAOOXZAABZSBT MUAWSNTVZXSFHMI LFQRKUTFRIENDOV LOMTPOQHJVYYMFJ LWGJMVPKYOZNMSA RUPMFOHPVSPPVPT BAZXVFTPQFQJVBM HLPMOKUOXGRIENDO INUSUNSGDAAICAV XRZZWCDXOVZZJKQ VOVCROMCJTOLXYU HUVRYFREENDOBBGC AQJBVXJCAJLEMAU ONJORIFCGAUGIRN PJGCHBDQIWJJTMQ IQYQHKKBNBVDFPV JJLHWPZAYZIGGEH IGJZRMAAWJBESSS JXZFRIEMDOVZRBJY IRPWYIRJISLFVFF

Finding motifs Several algorithms

Finding motifs Variant on PROJECTION algorithm (Tompa & Buhler, 2001) used in (Shanabrook et al., 2010) – Only example of motif extraction in educational data mining so far

Big idea For each character string C that could be a motif example (e.g. all character strings of desired length) – Create a set of projections, random variations of C that vary in one or more ways

Big idea For each pair of strings C1 and C2, see how many overlaps there are between their projection matrices Take the pair with the most matches and combine into a motif – Creating multi-example motif if 3+ get added together Repeat until goal number of motifs is found, or until new motif is below criterion goodness

Goodness Typically, likelihood is used

Motif in Education Short, recurring pattern of behaviors in a sequence of behaviors occurring over time Written as letters in Shanabrook et al. (2010)

Detail for education How do you segment student behavior? Could use student’s interaction on an entire problem, and compute letters across whole problem – Might make more sense in tutors with shorter problems (e.g. ASSISTments) Could use student’s interaction on an entire problem, and define letters differently for context within whole problem – Approach used by Shanabrook et al. (2010) Could use “sliding window” of N actions

Behaviors in Shanabrook et al. “hints (a, b, c) – Hints is a measure of the number of hints viewed for this problem. Although each problem has a maximum number of hints, the hint count does not have an upper bound because students can repeat hints and the count will increase at each repeated view. The three categories for hints are: (a) no hints, meaning that thestudent did not use the hint facility for that problem, (b) meaning the student used the hint facility, but was not given the solution, and (c) last hint solved, meaning that the student was given the solution to the problem by the last hint. As described above, this metric combines two values logged by the tutor: the count of hints seen, and an indicator that the final hint giving the answer was seen. The data could have been simply binned low, medium, high hints; however, this would have missed the significance of zero hints and using hints to reveal the problem solution.”

Behaviors in Shanabrook et al. “secFirst (d, e, f) – The seconds to first attempt is an important measure as it is during this time that the student is reading the problem and formulating their response. In previous research [6], five seconds was determined to be a threshold for this metric representing gaming: students who make a first attempt in less than five seconds are considered not working on-task. We divide secFirst into three bins: (d) less than 5 sec, (e) 5 to 30 sec, (f) greater than 30 sec. (d) represents students who are gaming the system, (e) represents a moderate time to the first attempt, (f) represents a long time to the first attempt. The cut at 30 seconds was chosen because it equalizes the distribution of bins (e and f), representing a division between a moderate and a long time to the first attempt.”

Behaviors in Shanabrook et al. “secOther (g, h, i, j, k) – This variable represents actions related to answering the problem after the first attempt was made. While the first attempt includes the problem reading and solution time, subsequent solution attempts could be much quicker and the student could still be making good effort. secOther is categorized in five bins: (g) skip, (h) solved on first, (i) 0 to 1.2 sec, (j) 1.2 to 2.9 sec, (k) greater than 2.9 sec. First, there are two categorical bins, skip and solve on first attempt. These are each determined from an indicator in the log data for that problem. Skipping a problem implies only that students never clicked on a correct answer; they could have worked on the problem and then given up, or immediately skipped to the next problem with only a quick look. Solved on first attempt indicates correctly solving the problem. If neither of the first two bins are indicated in the logs, then the secOther metric measures the mean time for all attempts after the first. The divisions of 1.2 sec and 2.9 sec for the latter three bins were obtained using the mean and one standard deviation above the mean for all tutor usage; (i) less than 1.2 seconds would indicate guessing, (j) would indicate normal attempts, and (k) would indicate a long time between attempts.”

Behaviors in Shanabrook et al. “numIncorrect – (o, p, q) - Each problem has four or five possible answer choices, that we divide into three groups: (o) zero incorrect attempts, indicates either solved on first attempt, skipped problem, or last hint solves problem (defined by the other metrics); (p) indicates choosing the correct answer in the second or third attempt, and (q) obtaining the answer by default in a four answer problem or possibly guessing when there is five answer problem.”

What other constructs could be used? What other kinds of constructs could be used for the atoms of motif analyses in educational analyses? – At this grain-size (e.g. specific actions)

What other constructs could be used? What other kinds of constructs could be used for the atoms of motif analyses in educational analyses? – At other grain-sizes?

Common Motifs {adgo, adip, adiq} {aeho, afho} {ceho} {adgo, aeho} {aeiq aeho aeho aekp aeho aeiq aeho aeip aeho aeip}

Interpretations (Shanabrook et al., 2010) {adgo, adip, adiq} – gaming the system {aeho, afho} – “This student is using the tutor appropriately, but not being challenged.” {ceho} – problem is too difficult {adgo, aeho} – student is skipping problems {aeiq aeho aeho aekp aeho aeiq aeho aeip aeho aeip} – working on-task

Do you agree with interpretations? {adgo, adip, adiq} – gaming the system {aeho, afho} – “This student is using the tutor appropriately, but not being challenged.” {ceho} – problem is too difficult {adgo, aeho} – student is skipping problems {aeiq aeho aeho aekp aeho aeiq aeho aeip aeho aeip} – working on-task

How can researchers form good interpretations?

What other applications? What other applications could motif extraction be used for in education?

Questions? Comments?

Asgn. 8 Questions? Comments?

Next Class Monday, March 19 3pm-5pm AK232 Association Rule Mining Readings Witten, I.H., Frank, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques. Section 4.5. Merceron, A., Yacef, K. (2008) Interestingness Measures for Association Rules in Educational Data. Proceedings of the 1st International Conference on Educational Data Mining, Assignments Due: None

The End