Feature Engineering Studio March 30, 2015. Iterative Feature Refinement.

Slides:



Advertisements
Similar presentations
Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Advertisements

SC ICT Certification Level 1 07 Spreadsheets By Ross Parker.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Feature Engineering Studio November 11, Poster Session Features What features did each of you create after the poster session? Who did the ideas.
Spreadsheets With Microsoft Excel ® as an example.
Using Excel for Data Analysis in CHM 161 Monique Wilhelm.
CEP Welcome September 1, Matthew J. Koehler September 1, 2005CEP Cognition and Technology Who’s Who?  Team up with someone you don’t.
Math 010 online work that was due today at the start of class:
Presented by Janine Termine Welcome 090 PreAlgebra.
Google Confidential and Proprietary 1 Intro to Docs Google Apps Apps.
©2001 Chariot Software Group Using MicroGrade Classroom Management Software.
How to Fill Out the CARD Form (Course Assessment Reporting Data Form)
Review of last Session Adding custom html Adding custom html HTML is the language that web servers understand, all web pages are created using HTML. HTML.
131 Agenda Overview Review Roles Lists Libraries Columns.
Feature Engineering Week 3 Video 3. Feature Engineering.
Tux Paint Reviewed by team iTeach Jodi Hovest, Scottie Fetters, & Melanie Stainbrook.
NPR DS Marketing Forms powered by Springboard Reports May, 2015.
Managing Business Data Lecture 8. Summary of Previous Lecture File Systems  Purpose and Limitations Database systems  Definition, advantages over file.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Instructional Guide Original presentation created by EasyBib, adapted by S. Hall for educational purposes following Fair Use Guidelines and permission.
New Tools to Increase Sales And to Enhance The User Experience.
FW364 Ecological Problem Solving Lab 4: Blue Whale Population Variation [Ramas Lab]
Welcome to MGT 323: Organizational Behavior Distribute Syllabus, student data sheet Distribute Syllabus, student data sheet Brief introduction Brief introduction.
Introduction to database systems
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
EXCEL. Lesson #1: Introduction to Spreadsheets  You will learn the basics about spreadsheets, cell addresses, rows/columns, and data entry.
VoiceThread:. With VoiceThread, group conversations are collected and shared in one place from anywhere in the world. All with no software to install.
Ranjeet Department of Physics & Astrophysics University of Delhi Working with Origin.
Exploring Engineering Chapter 3, Part 2 Introduction to Spreadsheets.
Feature Engineering Studio September 23, Welcome to Mucking Around Day.
SPREADSHEET BASICS SPREADSHEET BASICS What are the benefits of using a spreadsheet to solve a problem?
“The Power At The Click Of A Mouse” (Using Power Point To Create A Presentation)
Advanced Higher Physics Investigation Report. Hello, and welcome to Advanced Higher Physics Investigation Presentation.
CREATING TEMPLATES CREATING CUSTOM CHARACTERS IMPORTING BATCH DATA SAVING DATA & TEMPLATES CREATING SERIES DATA PRINTING THE DATA.
Key Words: Functional Skills. Key Words: Spreadsheets.
EXAM REVIEW PROJECT Microsoft Excel Exam 1. EXAM PROCEDURES 10 minutes to review project before starting 60 minutes to complete the exam In this presentation,
Feature Engineering Studio September 23, Let’s start by discussing the HW.
Feature Engineering Studio October 14, Iterative Feature Refinement.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Mail merge Sort records in field Select certain records Putting many on the same page Adding a comment – If … Then … Else.
Feature Engineering Studio March 1, Let’s start by discussing the HW.
Feature Engineering Studio September 30, Quick Note Please me for appointments rather than just showing up at my office – I’m always glad.
Feature Engineering Studio October 7, Welcome to Bring Me a Rock Day 2.
Day #2 Take out green packets from yesterday In addition, you have another packet that will help you for: –Common App –Teacher Recommendations –Resumes/Cover.
Lexile Project Guidelines for Data Collection and Analysis.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Lesson Plan Integration Hannah Hobi Tessa Angelo IT 442.
What is ? ●It is a Halton District School Board term ●It is HDSB use of Google Apps for education. It includes: ■ Google Docs ■ Google Drive ■ Google Applications.
Making your own web site How to use Publisher to do it!
Open app store and download QR code reader This will shorten the time we take to find and download the apps today. This tool is a quick way to direct pupils.
1 1.Log in to the computer in front of you –Temp account: 210class / 2.Update your in Cascadia's system –If I need to you I'll use.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
FINAL EXAM REVIEW 1. EXAM PROCEDURES 10 minutes to review project before starting 120 minutes to complete the exam, although most students finish in
GroupMap Starter’s Guide Think Better Together Plan, brainstorm, discuss and prioritise for action. © GroupMap Pty Ltd |
Feature Engineering Studio October 7, Welcome to Bring Me Another Rock.
TechKnowlogy Conference August 2, 2011 Using GoogleDocs for Collaboration.
Data Virtualization Tutorial… OAuth Example using Google Sheets
Joomla Charles Kann.
Year 7 E-Me Web design.
Mail Merge Instructions (Yanick’s Version)
Adding Assignments and Learning Units to Your TSS Course
Collaboration with Google Docs
Big Data, Education, and Society
Integrating Google Classroom into Middle School and High School Education Reed Peterson.
Guidelines for Group Projects and Papers
Introducing Schoolwires Forms & Surveys Module
Word Processing Software Photo credit: © 2007 JupiterImagesCorporation.
Core Methods in Educational Data Mining
Presentation transcript:

Feature Engineering Studio March 30, 2015

Iterative Feature Refinement

Who here Used the Excel Equation Solver Did not use the Excel Equation Solver

Excel Equation Solver Users Sort yourself by the town you were born in (in Roman letters)

Excel Equation Solver Users Pick one feature What feature did you improve? What parameter did you adjust? What was the original and final value? How big an improvement did you obtain? Did this process change the meaning of the feature?

Everyone Else Sort yourself by the town you were born in (in Roman letters)

Everyone Else Pick one feature What feature did you improve? What parameter did you adjust? What values did you try? How big an improvement did you obtain? Did this process change the meaning of the feature?

Comments? Questions? Thoughts?

Question Is the excel equation solver likely to change the meaning of the feature more than hand processes?

Question Is it a good thing or a bad thing, when your feature changes meaning due to refinement?

Feature Parameter Space I need a volunteer who had a final best feature that was quite different from their original feature

One interesting exercise I need a volunteer who had a final best feature that was quite different from their original feature Please bring up your laptop or a flash drive with your data set

Making… A line graph X axis – parameter value Y axis – model goodness

Another volunteer? Would anyone else like to look at their feature this way? Multiple volunteers are welcome

What does it mean?

Questions? Comments? Thoughts?

EDM Workbench Tool to address the bottleneck in labeling data and simple feature distillation Currently allows learning scientists to – Label previously collected data – Collaborate with others in labeling data – Distill additional features from log files

Log import Allows importation of CSV and DataShop text files. Allows importation of batches of files

Batch importation

Feature distillation Automatically distills 26 features based on the work of (Baker, et al., 2008 and others) Adding to these features requires modification of the EDM Workbench config file – 21 operations defined in the program – Any new feature has to be defined in terms of a subset of the 21 operations

EDM Workbench config file Step Name Duration timeSD Row Anon Student Id Problem Name timeSD 3 timelastnSD Feature timeSD Feature timelastnSD

Clip generation Clip: subsets of student-tutor interactions Defined by the user based on time intervals (Baker & de Carvalho, 2008), number of actions (Lee et al., 2011), or “begin” and “end” events (Sao Pedro, et al., 2013).

Clip generation

Sampling Supports both stratified and random sampling

Sampling

Labeling Allows the user to specify – Features will be displayed – Labels to use Displays text replays (Baker & de Carvalho, 2008) of clips together with labeling options Coder selects from the labels Work can be saved and resumed

Labeling

Adding features at the clip level Once labeling is complete, clip-level features can be generated Limited set of functions, e.g. maximum, minimum, average, standard deviation

Adding features at the clip level

Data export Labeled data can be exported in CSV format

Questions? Comments?

GoogleRefine (now OpenRefine)

Mostly just an Excel clone, abandoned in favor of the fully-online Google Towels Sheets But some nice additional functionality

GoogleRefine (now OpenRefine) Functionality to make it easy to regroup and transform data – Find similar names – Connect names – Bin numerical data – Mathematical transforms showing resultant graphs – Text transforms and column creation

GoogleRefine (now OpenRefine) Functionality for finding anomalies/outliers

GoogleRefine (now OpenRefine) Functionality for automatically repeating the same process on a new data set *Really* nice for cases where you complete a complex process and want to repeat it – Replicates a really good logbook, which most data analysts don’t keep – Now seen in other tools like iPython Notebook – Still not in Excel, but Excel has been stagnant for years

GoogleRefine (now OpenRefine) Functionality for connecting your data set to web services to get additional relevant info

GoogleRefine (now OpenRefine) Can load in and export common but hard-to- work-with data types – JSON and XML

GoogleRefine (now OpenRefine) Some videos you should watch later AWM Ba0 k

Questions? Comments?

Upcoming Classes 4/1 Lab Session: Building Predictive Models – Come to this if you want to learn more about the theory behind building predictive models; how to do it effectively and appropriately (beyond just the how) – You don’t need to come to this if you’ve taken Core Methods or Big Data and Education 4/6 Brainstorming – Read Kelley (2001) – Do Assignment 7

Next week Kelley, T. (2001) The Art of Innovation: Lessons in Creativity from IDEO, America’s Leading Design Firm. A lot of reading (more than the rest of the semester put together) – You can focus on the parts about brainstorming if you want, although the whole book is fun and interesting – Heck, you can just skim the parts about brainstorming if you want I assume you’ve all gotten yourself a copy of the book? – If not, e-books are available immediately from Amazon…

Assignment 7: Brainstorming Write a 1-page essay (longer is also fine)

Assignment 7: Brainstorming Write a 1-page essay (longer is also fine) – I know, an essay

Assignment 7: Brainstorming Write a 1-page essay (longer is also fine) – I know, an essay – I’ll be grading based on your thoughts, not grammar, writing style, writing ability, etc. – Just get your thoughts down on a page – It doesn’t even have to look like an essay. Bulleted lists are fine (although in that case, make it longer than a page)

Assignment 7: Brainstorming The essay should be about – Your past experience with brainstorming (if you’ve never brainstormed, think about any time you’ve come up with ideas with a group of friends or colleagues for a project) – What went wrong with brainstorming you’ve done in the past? – Do you think the ideas in this book about how to brainstorm are good in general? What’s good in specific? What’s bad in specific?

Questions? Comments?