Presentation is loading. Please wait.

Presentation is loading. Please wait.

AP CSP: Cleaning Data & Creating Summary Tables

Similar presentations


Presentation on theme: "AP CSP: Cleaning Data & Creating Summary Tables"— Presentation transcript:

1 AP CSP: Cleaning Data & Creating Summary Tables

2 Introduction: Last class we learned about the difficulties that come about when converting raw data into a visualization We also learned how to create simple charts and graphs inside Google Spreadsheets Choosing the appropriate charts is key when deciding how to properly visualize data. Last class we were provided data that was “clean” (ready to work with). Today we will be looking at the data we’ve collected about ourselves, but first we have to clean up that data.

3 Survey Questions Review:
Before we get started, what challenges do you think we’ll encounter as we begin to peek into the data we’ve been collecting? Think about the survey questions you answered What could possibly go wrong with the type of questions asked and the way students could respond to them?

4 Data Survey Download There are many challenges associated with analyzing data. Today we’re going to look at one that a lot of people don’t often think about. When we collect data, it’s usually “dirty”, which means that, for one reason or another, it’s not ready for analysis. We’re going to investigate what this looks like and learn to use some tools to help us look at and “clean” the data. Download the data collected by following the link provided 6QAmE/edit?usp=sharing Create your own copy and so that you can edit certain properties of the data

5 Familiarizing Yourself with the Data:
Now you are going to log into code studio Unit 2 Stage 13 and start at the part called “Getting to Know Your Data” Go through the section and start learning different ways you can manipulate your data After you have figured all the different ways you can filter and sort your data move on to the next slide and read that slide labeled “Why is it Important to Clean Data? Wait for further instruction after this part

6 Cleaning the Data: Now that you’ve learned how to filter and sort data now we must clean up the data so that it makes sense with the calculations that we may want to perform Ignore “freeform text” responses for now like questions that ask “What did you do to relax” Focus your attention on values that should be numeric or single words Use different sorting and filtering techniques to find invalid values and you will choose either fix or delete them. Once you are done cleaning the data wait for further instruction.

7 Categorizing Data: Now focus attention on “freeform text” columns.
You will need to manually create new columns that categorize the inputs. This is a necessary step in order to perform computation with the data but it won’t feel very “algorithmic.” Follow the prompts on the part “Categorizing Data” and learn how to better categorize your “freeform text” data. You can write whatever in a freeform category but sometimes that isn’t useful when since a user can literally type anything. To better perform calculations on your data you may have to reduce responses to the question “What did you do to relax last night” to a more basic answer. Reading The Great Gatsby can be converted to Read or Reading

8 Is Data Analysis Objective?
In order to analyze data with a computer, we need to clean the data first. Based on your experience today, would you say that data analysis is a perfectly objective process? Why or why not? Before we can use the data to find trends or to make predictions, first the data must be “clean” so that it is easier to analyze and perform computations on.

9 Creating New Data: We cleaned up the data we’ve been collecting for the last week. Now the question is: what can we do with it? Look at this table. It was created from the over 65,000 rows of data in the movie rating dataset we saw a few lessons ago…. This is a summary table. this is actually new data that was computed from the raw data. Spreadsheets, allow you to quickly group, categorize, count, and average things. Making a summary table is a computational technique for exploring the data

10 Making Summary Tables Part 1 – Basics:
Now we are going to create our own Summary Tables (Pivot Tables) using raw data. Go to Unit 2 Stage 14 and go through the entire section to learn how to create Summary tables. First get started by copying the data into Google Sheets and then follow the instructions on how to create these tables. Play around and see all the different things you can come up with in terms of creating new data.

11 Summary Tables Explained:
The power of the pivot table is that it allows you to compute things you could never do by just filtering and sorting. The pivot table is doing a lot of computing behind the scenes for you  Rows - Group By: movie Rows act like the major categories or groupings for which you want to calculate values. The Computation: When you set the rows to be "movie," the software finds all of the unique movie titles in the raw dataset and puts one on each row. This is called aggregation, which is a fancy word that means grouping or clustering. Values - Display: rating; Summarize by: AVERAGE Values lets you specify the computation that should happen for each row. The Computation: We're interested in the average rating for each movie, so for Values we choose rating, Summarize by: AVERAGE.

12 Making Summary Table Part 2 – Manipulation and Visualization
Now we are going to continue to learn about more features when creating Summary Tables. Add Columns Filtering Pivot Tables Manipulating Pivot Tables After creating Pivot Tables, you can copy the data into a new spreadsheet (values only) and create visualizations based on the data from the summary table. Summarize the data, then chart it You can't see any trend or pattern in the data just by looking at the table. But if you plot the results on a graph you can!

13 Free Play: Using our Data Set
Try creating Pivot Tables with the Data that we collected in our own survey Use the techniques you discovered today to help create Summary Tables that tell you interesting information about the data Also create a visualization from this data.

14 Summary: Summary tables (pivot tables) provide a way to visualize data. Yes, it’s still a table, but by aggregating and summarizing information from a large dataset, summary tables allow you to see things in the data you might otherwise not see. Summary tables allow you to manipulate and create new data. Even for our simple movies example here, the raw data didn’t contain the average rating for every movie, or count how many ratings there were. We had to compute it, and the pivot table let us do that quickly and easily. A summary table helps you look at your data in new ways. Think: how could data be grouped? What could be calculated? Once you know how to make a summary table you can begin to look at raw data and ask questions that you know might be possible to answer. A summary table can be a first step toward a good visualization Often it’s difficult to make a meaningful chart or graphic out of raw data. You often want to summarize it first, then chart it!


Download ppt "AP CSP: Cleaning Data & Creating Summary Tables"

Similar presentations


Ads by Google