DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

Slides:



Advertisements
Similar presentations
StarBuilder Payroll/GL/AP Importing Session - 203
Advertisements

Social Studies 7 Map of the Colonies Power Point Game Created by Mrs. Bordas and Tim Norton Start Game.
Cambodia Schools Project Or What I Did for Spring Break David Harris Spring 2002.
Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science.
1 Evolution of the Scotts Turf Builder Line University Recommendations for Growing Lawns: 3 to 5 lbs N 1 lb P 2 O 5 2 lbs K 2 O per year Apply ~ 1 lb N/application.
Essential Excel Tools, Tips & Tricks Nicole Soer Loras College.
CIS101 Introduction to Computing Week 04 Spring 2004.
Excel Project One CIS101. What is Excel? Spreadsheet program with four parts Worksheets – where you enter, calculate, manipulate, and analyze data Charts.
Students in a Grade 7 class measured their pulse rates. Here are their results in beats per minute. 97, 69, 83, 66, 78, 8, 55, 82, 47, 52, 67, 76, 84,
MIS2502: Data Analytics MySQL and SQL Workbench David Schuff
Unit 5 – Evidence. Maximum Formula The maximum formula is to find out the maximum, and out of the whole spreadsheet the maximum altogether was 120, which.
Computer Science: A Structured Programming Approach Using C1 3-7 Sample Programs This section contains several programs that you should study for programming.
Practical Problems n By n Dr. Julia Arnold n Math 04 n Intermediate Algebra Click on icon for sound.
Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,
Redevelopment of NOC Into a Relational Database Kieran Holmes John Prince.
Education Bachelor of Science, December 2009 (Anticipated) Major: Psychology Honors, Awards, Activities Psychology Club, Historian BASIC Gospel Choir,
Editing and Revision. Revision Revision is the process of improving your piece of writing through: > the addition or removal of chunks of text > the rearrangement.
MOS - Spreadsheets CTE – IT Performance Task Spring, 2014.
Data Cleansing Rule Based Strategy.
Science Fair Project Push Don’t Pollute Physical Science Michael Abert and David Ritter.
# 1# 1 Error Messages, VLookup, Practical Tips What use is VLookup? How do you error check in Excel? CS 105 Spring 2010.
Constitutional Convention and the Ratification Debates.
Using fixed-cell references and built-in functions in Excel living with the lab © 2012 David Hall.
Radar Chart Radar Data in Excel rd Reading Total** th Reading Total** th Reading Total** th.
Neil Gealy 8/6/10. Pruning Plots What I have found to be the most optimal sequence of pruning: 1. Consecutive pruning based on radius and threshold 2.
Pivot Table Training. Agenda Purpose of a Pivot Table Creating a Pivot Table: Count Tailoring Your Information Cloning Pivot Tables Behind the Scenes:
Fully integrated AX module
Dave’s T-Shirt and Hat Stand Chart Go back to your Dave’s T-Shirt and Hat Stand project. Insert three different charts into the Excel spreadsheet. Chart.
Chapter 3 section 1-2.  General expectations. 1. On task, 2. You can talk and discuss but stay quiet, 3. Everyone participates. Procedures:
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Why preprocessing? Learning method needs data type: numerical, nominal,.. Learning method cannot deal well enough with noisy / incomplete data Too many.
40 Minutes Left.
Journal - Middle Colonies New York New Jersey Pennsylvania Delaware Founder/Leader: Year Settled/Town(s): Political: Religious: Resources: Economy: ***Assignment.
Multiple Choice Test Tips PowerPoint® by Mr. Brown – Info Source:
MIS 451 Building Business Intelligence Systems Data Staging.
ACT PREP AND TEST TAKING TIPS Objectives: Explore Motivations for studying hard for the ACT test Understand test preparation and test taking tips for the.
Rhonda Hawley Rhonda Hawley Presenter Regional Chapters.
 It is my home town.  It is a historical, and beautiful city in New Jersey  A good opportunity to learn more about it.
How to Fix Bitdefender Total Security Error -1022? Support Number
McAfee Support | McAfee Helpline Number
mcafee technical support number 1 (800) Mcafee customer service
Norton Antivirus Installation, Activation & Malware Support
Contact Malwarebytes Technical Support Phone Number.
Human Bodies – Data Handling
A linear approach to predicting house prices
How to Fix HP Printer Error Message 0xc19a0035
Science Fair Title Name Period Teacher.
Call on How Can Be Fix to Norton Product Error 8504, 104?
Fix Gmail Error 776 Get connected at Gmail Customer Service Number to Fix Gmail Error 776 under supervision of Gmail tech support team.
Call to Fix Dell Printer Error Code
Call to Fix Epson Printer Error Code 0xf1
How to fix Printer Errors- Reliable Printer Repair Services at an affordable rate-
How To Activate Norton Antivirus Activation Key?.
HOW TO FIX DATA CORRUPTION IN SAGE 50?. HOW TO FIX ERRORS & WARNINGS IN YOUR SAGE DATA.
آشنايی با اصول و پايه های يک آزمايش
Fix Garmin GPS Map Update Error
برنامج التميز في خدمة عملاء السادة موظفي مكاتب المساعدة القانونية
Guess the letter!.
60 MINUTES REMAINING.
Data Anomalies in Data Mining and Knowledge Discovery in Data
Excel Formulas Made Easy
Two Spreadsheet Uploads Today
How to fix Juno Error code 49? Dial: +1(844)
Datum Callouts.
Technical Aspects of the Data
Timely Services Report
All sales information is derived from the Whistler Listing Service and is believed correct. E&EO
Types of Errors And Error Analysis.
Dr. David A. Gaitros Department of Computer Science
Presentation transcript:

DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

Discuss (5 minutes) Have you fallen victim to any of Taber’s “stupid data corruption tricks?” From the readings, what are the best tips for cleaning data?

Cleaning Data Consider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through Identify two problems with this data set.

And the problems show up during analysis… How do you find the “errors” and fix them?

The problem of outliers Do you correct this by… Removing the data point? Using the average of the other data points? Guessing at the right value? And is this an error or just an anomaly?