Download presentation
Presentation is loading. Please wait.
1
Data Science and Analytics
Introduction to Data Science and Analytics Stephan Sorger Unit 2. Excel Essentials Disclaimer: All images such as logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only Some material adapted from: Sorger, Stephan. “Marketing Analytics: Strategic Models and Metrics. Admiral Press Welcome to Module 2 of Introduction to Data Science and Analytics. This module is important because it covers essential tools and techniques we can use in Microsoft Excel. © Stephan Sorger 2016; Data Science: Excel Essentials; 1
2
Outline/ Learning Objectives
Topic Description Charts Displaying and interpreting data in graphs Copy/Paste Pasting data without the underlying equations Filter and Sort Arranging data in logical sequences Find/Replace Replacing multiple instances of data Formatting Highlighting aspects of data through formatting Vlookup Accessing additional data in spreadsheets In this module, we will cover several learning objectives: -How to display and interpret data in graphs and charts -How to copy and paste data -How to filter and sort data -How to find instances of data, and replace it with other data -How to highlight aspects of data through formatting And -How to apply the Microsoft Excel function know as Vlookup to access additional data © Stephan Sorger 2016; Data Science: Excel Essentials; 2
3
Charts In this section, we examine how to display and interpret data in graphs. © Stephan Sorger 2016; Data Science: Excel Essentials; 3
4
Charts To insert chart in Excel: Insert Charts
Select type of chart Inserting a chart in a Microsoft Excel spreadsheet is easy. In this case, we want to plot out revenue for different products. To insert a chart, select the Insert tab and then select the type of chart you want from the Charts section. In our case we will go with a vertical column-style chart, one of the most popular types. © Stephan Sorger 2016; Data Science: Excel Essentials; 4
5
Chart Selection To insert chart: Insert Charts
Select type of chart The result is a nice neat column chart. Column charts are a popular choice because they are clear and easy to read. In this case, for example, we can easily discern how product A drives the most revenue, followed by B, C, D, and so forth. © Stephan Sorger 2016; Data Science: Excel Essentials; 5
6
Chart Selection Many different types of charts available
Microsoft Excel does not limit us to column charts. As the screen shows, we can select from many different types of charts, such as line charts, horizontal bar charts, x-y scatter charts, and others. © Stephan Sorger 2016; Data Science: Excel Essentials; 6
7
Chart Selection Avoid 3D charts
When inserting charts, avoid three dimensional or 3D charts. 3D charts can make the data difficult to interpret. In addition, 3D charts can look dated and busy, which do not help you create an air of credibility and competency. © Stephan Sorger 2016; Data Science: Excel Essentials; 7
8
Chart Selection $500K A B $400K A B $300K C $200K D C E E D $100K
Revenue Sales Revenue by Product Sales Revenue by Product Vertical Bar Chart Pie Chart Typical Applications: -Sales revenue comparisons -Before-After comparisons -Competitive comparisons Typical Applications: -Market share breakdown -Revenue breakdown -Budget breakdown We now discuss how to communicate effectively with data charts. This information is indispensable if you plan to communicate with organizational executives. We start with relatively simple charts. Pie charts are well suited any time we need to break down totals into individual constituents. For example, we can use them for market share breakdowns, revenue breakdowns, and marketing budget breakdowns, such as showing how much was spent on social media campaigns, search engine marketing campaigns, and so forth. Alternatively, any time we want to compare data, we should think of vertical bar charts. Typical applications include sales revenue comparisons, before and after comparisons, and competitive comparisons. For example, the percentage of A, B, C, D, and E are the same in the two charts. But notice how different they look. On the pie chart graph, the slices look relatively equal. On the bar chart, one notices the differences much easier. © Stephan Sorger 2016; Data Science: Excel Essentials; 8
9
Chart Selection Variations on Column Charts
Typical data set $1,000K $1,000K 100% C E $800K $800K C 80% $600K $600K 60% D A $400K A $400K A 40% E C $200K D $200K D 20% Revenue E Revenue Store Internet Store Internet Store Internet Sales Revenue by Product Sales Revenue by Product Sales Revenue Contribution Clustered Column Chart Compare two sets of data Stacked Column Chart Show total contribution 100% Stacked Column Chart Hide actual values from competitors Microsoft Excel offers several choices on vertical bar charts. Some useful ones are shown on this slide. On the left, we see the clustered column chart, which works well when we want to compare sets of data. In this case, we can compare the sales of product A with those of C, and sales of D with those of E. In the middle, we see the stacked column chart, which works well when we want to show the contribution from multiple sources of sales, as is shown in the chart. On the right, we see a variant of the stacked column chart, which expresses the constituents in percentage rather than absolute form. This view is useful if we want to not disclose the actual values. Variations on Column Charts © Stephan Sorger 2016; Data Science: Excel Essentials; 9
10
Chart Selection Horizontal Bar Chart
Typical Applications: -Long category names -Tornado charts (see next slide) D E Revenue $100K $200K $300K $400K $500K This slide shows a horizontal bar chart. This type of chart is not as popular as the vertical bar chart, in part because it can be difficult to fit labels for increments on the horizontal axis, as is demonstrated on this slide. We generally limit horizontal bar charts to two applications. The first application is for the situation when we have long category names. For example, if our top bar had a long descriptor, such as Sales of Product A through the Internet Channel, the horizontal bar chart would allow us to write that long descriptor horizontally, instead of vertically, making it easier to read. The second application is for tornado charts, which we show on the next slide. Horizontal Bar Chart © Stephan Sorger 2016; Data Science: Excel Essentials; 10
11
Pivot Tables Tornado Chart
Male Female Age 50-up Age 40-49 Age 30-39 Age 20-29 Age 10-19 50 40 30 20 10 10 20 30 40 50 Male Population, X 1000 Female Population, X 1000 This slide shows an example of a tornado chart. Tornado charts work well when we want to compare data from two groups. For example, the chart in this slide compares data between males and females, further subdivided by age. Microsoft Excel does not offer a tornado chart graphing function. To create a tornado chart, build a table with three columns. On the left column, show the categories to be compared. In our case, our categories are different age groups. In the middle column, show the data from one of the two groups to be compared. In our case, we show males on the left side. In that column, add a minus sign to every entry. That way, Excel will plot the data to the left of the zero line. In the right column, show the data of the other group to be compared. We then ask Excel to create a standard vertical bar chart with the three columns of data. The result will be a tornado chart as shown on the slide. Tornado Chart © Stephan Sorger 2016; Data Science: Excel Essentials; 11
12
Chart Selection Line Charts
Product B Overtaking Product A A B Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Best for trend data; data over time Line Charts This slide shows a line chart, compared with a clustered vertical bar chart. Line charts work well when we want to show trends, especially comparing internal data with other internal data, or with external data. To emphasize the story-telling ability for line charts for trends, we have created a vertical bar chart, shown at left, with the same data. Note how easily one can detect the trend of product b sales increasing, and product a sales decreasing in the line chart versus the clustered vertical bar chart. © Stephan Sorger 2016; Data Science: Excel Essentials; 12
13
Chart Selection A1 B2 A2 B1 C2 C1 Scatter Chart Doughnut Chart Radar/ Spider Chart CAUTION: Use sparingly in front of general audiences Can be confusing, with large chance of mis-interpretation AVOID We specifically recommend avoiding the three types of charts shown in this slide for general audiences, such as company executives. From left to right, those charts include a scatter chart, a doughnut chart, and a radar chart, sometimes also called a spider chart. In fact, if someone presents data to you using one of the types shown, be extremely wary. The presenter might be trying to confuse the issue by using such complex chart types. © Stephan Sorger 2016; Data Science: Excel Essentials; 13
14
Chart Selection Summary of Wall Street’s sector calls 2007 - 2014
15 Scatter plots can be confusing to general audiences; EXAMPLE 10 5 Actual End-Of-Year Sector Performance By Decile - 10 - 8 - 6 - 4 - 2 2 4 6 8 10 12 - 5 - 10 - 15 This slide shows one example of how certain charts serve to confuse, or even mislead, audiences. Here we see an actual X-Y scatter plot taken from a stock market investment journal. The horizontal x-axis shows actual end of year sector performance. The vertical y-axis shows certain forecasts made for sector performance. It appears that the chart ostensibly shows the relationship between the sector performance predictions and the actual performance per sector. As we can see from the chart, the relationship is murky at best and could be interpreted as self-serving, in that it alleges the forecasts were effective. Beginning-of-Year Sector Picks Minus Pans (Net) Forecast © Stephan Sorger 2016; Data Science: Excel Essentials; 14
15
Chart Enhancements Addition of Trend Arrow, Threshold, and Headline
20% Sales Growth in 2013 $7M $6M $5M Threshold $5M $4M $3M $2M $1M Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec To enhance communications with executives and other important audiences, we can add enhancements. In this slide, we added a headline, a trend arrow, and a threshold line to greatly increase the story-telling ability of this simple vertical bar chart. Addition of Trend Arrow, Threshold, and Headline © Stephan Sorger 2016; Data Science: Excel Essentials; 15
16
Charting Ethics “With great power comes great responsibility”
450 500 440 400 Yes Yes No Results of Survey: “Would you recommend Acme products to others?” 430 300 420 200 410 100 No 400 Truncating Trickery The Ethical Way “With great power comes great responsibility” With the great power of data science comes great responsibility. Some people will display data in unusual ways to confuse or even mislead audiences. We take this time to make a personal appeal to present data accurately and ethically. This slide shows a typical misleading example that we call Truncating Trickery. A company runs a market survey, asking "Would you recommend Acme products to others?" The results are nearly equal, with 440 people saying yes and 408 people saying no. But if we truncate the scale, we can distort the difference to make it seem as if the Yes scores dominate over the No scores, as shown in the chart on the left. The chart on the right shows the real picture--the scores are almost equal. The lesson here is clear. When you see truncated axes, be careful. Someone might be trying to mislead you. © Stephan Sorger 2016; Data Science: Excel Essentials; 16
17
Copy/Paste In this section, we examine how to copy and paste data in spreadsheets, especially when we want to paste static values. © Stephan Sorger 2016; Data Science: Excel Essentials; 17
18
Copy/Paste Suppose we want to copy the total
sales amount from column C to L Column C contains formulas. Total Sales = Sales of A + B + C If we copy Col. C, we will copy formulas. A typical copy and paste scenario occurs when we want to copy the total sales amount from column C to column L on the right. In this screenshot, we see the common situation where column C contains fomulas, in this case adding the sales from products A, B, and C from columns D, E, and F. © Stephan Sorger 2016; Data Science: Excel Essentials; 18
19
Copy/Paste If we do a simply copy/paste
of Column C, we copy the formulas. But Excel has assigned new cells for the formulas, so it will not work. If we do a simple copy and paste from column C to column L, we copy the formulas. But because Microsoft Excel has assigned new cells to the formulas, we get all zeros, which is not what we wanted. © Stephan Sorger 2016; Data Science: Excel Essentials; 19
20
Copy/Paste So instead we do a “Paste Special” and select “Values”
So instead we execute a Paste Special command. Right click on the column, select Paste Special, and then select the Values radio button and click OK. © Stephan Sorger 2016; Data Science: Excel Essentials; 20
21
Copy/Paste Mission accomplished!
We now see that the values have been now correctly copied. When applying the Paste Special command for values, note that the values will remain static. If we want the values to reflect changes in the sales of different products, then we could selected the Formulas option in the Paste Special command instead of Values. © Stephan Sorger 2016; Data Science: Excel Essentials; 21
22
Filter/ Sort In this section, we examine how to apply Microsoft Excel filter and sort features to arrange data in logical sequences © Stephan Sorger 2016; Data Science: Excel Essentials; 22
23
Sort: One Column Initial Data Set Segment Sort by Sales, Total
This screenshot shows a typical customer data set. The customer data set shows the customer name, the total sales for each customer, the breakdown of total sales into sales of products and services for each customer, the order dates for each customer, the address for each customer, and the distribution channel through which each customer placed the order. Distribution channels here include a physical brick and mortar store, called Store, one of two Internet websites offering the products, here called Internet_1 and Internet_2, a kiosk placed inside a physical brick and mortar store, called Kiosk, and affiliate companies with which we have business development relationships, called Affiliates. In our case, we want to sort each customer by total sales to see which ones purchased the most. © Stephan Sorger 2016; Data Science: Excel Essentials; 23
24
Sort: One Column Data Sort Column C Sort from Largest to smallest
To execute a simple sort, we select the data tab and then click on the Sort icon. When we do so, Microsoft Excel pops up a Sort dialog box. We select the “Sales, Total” column from the pull-down menu and sort on the values. In terms of sort order, we want to see customers with the most sales first, so we select Largest to Smallest in the order pull-down menu. We then click OK. © Stephan Sorger 2016; Data Science: Excel Essentials; 24
25
Sort: One Column Sorted Data
Microsoft Excel returns a neatly sorted data set, with highest-sale customers on top. © Stephan Sorger 2016; Data Science: Excel Essentials; 25
26
Sort: Two Column Sort by channel, then by total sales
We can do much more with the sort function. For example, suppose a distribution channel manager wishes to compare the performance of different channels. In this case, we would sort by channel, then by total sales for each channel. To execute this type of multi-level sort, we first sort by Channel with values ranging from A to Z. We then select the Add Level tab to add a second level to the sort. In this second level, we sort by the Sales, Total column, with values ranging from largest to smallest, just as we did in the previous example. © Stephan Sorger 2016; Data Science: Excel Essentials; 26
27
Sort: Two Column Sorted by Channel first, and then total sales
The screenshot shows the result. In this case, we see the results for the Affiliate channel, because it is the channel beginning with the earliest alphabetical letter, A. The channels then continue with Internet_1, Internet_2, Kiosk, and then Store. Within the Affiliate channel, we see that the sales, total amounts are neatly sorted from largest to smallest. © Stephan Sorger 2016; Data Science: Excel Essentials; 27
28
Filter Only show sales from our Internet_1 sales channel
If we are the channel manager for the Internet_1 distribution channel, we might care only about sales through that channel. To do so, we filter the data to only include that from the Internet_1 channel. When executing the Filter command, we do not delete the remaining data. We simply suppress viewing it until we remove the filter. In our case, we select the Filter command and select Internet_1, as shown in the screenshot. © Stephan Sorger 2016; Data Science: Excel Essentials; 28
29
Filter Only show sales from our Internet_1 sales channel
Doing so shows only the sales through the Internet_1 channel. © Stephan Sorger 2016; Data Science: Excel Essentials; 29
30
Find/Replace In this section, we examine how to find and replace desired instances of data. © Stephan Sorger 2016; Data Science: Excel Essentials; 30
31
Find/Replace What if we want to change the date from 2016 to 2015?
In our first example, suppose we made an error in the dates in our customer data set. We want to change all instances of 2016 to 2015. © Stephan Sorger 2016; Data Science: Excel Essentials; 31
32
Find/Replace What if we want to change the date from 2016 to 2015?
To change the dates, we invoke the Find and Select command, which brings up a dialog box. In the dialog box, enter “2016” in the object to be found. We then enter “2015” in the Replace with pull-down menu, and then select Replace All to replace all instances of 2016 with 2015. © Stephan Sorger 2016; Data Science: Excel Essentials; 32
33
Find/Replace What if we want to change the date from 2016 to 2015?
Microsoft Excel then searches for all instances of 2016 and replaces them with 2015. As shown in the pop-up box, Excel finds 41 such instances, and replaces all of them. © Stephan Sorger 2016; Data Science: Excel Essentials; 33
34
Formatting In this section, we examine how to highlight specific aspects of data through formatting features available in Microsoft Excel. © Stephan Sorger 2016; Data Science: Excel Essentials; 34
35
Formatting Highlight Cells Rules Highlight total sales
greater than $200 sold through our Internet_1 sales channel Home Styles Conditional Formatting Highlight Cells Rules Greater Than… To increase the effectiveness of our data communications, we want to highlight important areas. For example, suppose we want to highlight when total sales exceed $200. To do so, we select Styles under the Home tab, then select Conditional Formatting. We select Highlight Cells Rules and the Greater Than option. © Stephan Sorger 2016; Data Science: Excel Essentials; 35
36
Formatting Highlight Cells Rules Highlight total sales
greater than $200 sold through our Internet_1 sales channel Enter “200” Click “OK” In the Greater Than dialog box, we tell Microsoft Excel to apply a special format to any cells greater than $200. To do so, we enter 200 in the box on the left, and select a color from the pull-down box on the right. In this case, we select a light red fill with dark red text, but we could have selected virtually any contrasting color choice. © Stephan Sorger 2016; Data Science: Excel Essentials; 36
37
Formatting Highlight Cells Rules Highlight total sales
greater than $200 sold through our Internet_1 sales channel Clear conditional form. Clear rules Clear Rules from… If we want to clear our formatting, we select the area we formatted, click on Conditional Formatting, click on Clear Rules, and then select Clear Rules from Selection. © Stephan Sorger 2016; Data Science: Excel Essentials; 37
38
Formatting Add “Heat Map” Green: Strong sales Red: Poor sales
Home Styles Conditional Formatting Color Scales Green-Yellow-Red scale To further enhance our data, we can add a color scale, sometimes called a Heat Map. For example, we might want to indicate strong sales with the color green and poor sales with red. To do so, we select Styles, then Conditional Formatting, then Color Scales, then Green-Yellow-Red. The screenshot shows the result. High sales revenue are shown in green and low sales revenue are colored red. Sometimes we flip the scales to Red-Yellow-Green, such as when we want to show cost information. © Stephan Sorger 2016; Data Science: Excel Essentials; 38
39
Vlookup In this section, we examine how to access additional data in spreadsheets by applying the Vlookup function available in Microsoft Excel. © Stephan Sorger 2016; Data Science: Excel Essentials; 39
40
Vlookup V (vertical) lookup: Access (“look up”) data
in different locations and merge it For sales example, Tab “Markup” contains markup values for each sales channel We want to access markup values for sales spreadsheet and display them on “Sales” spreadsheet tab Microsoft Excel includes the Vlookup command to access, or look up, data in different locations and merge it into existing data sets. Vlookup is useful for situations when we want to be able to easily changes amounts related to certain categories. In our customer dataset example, each distribution channel features a different markup percentage. For example, products sold through physical brick and mortar stores have a markup of 30%. Products sold through Internet channel #1 have a markup of 15%, and so forth. The markups could change over time, so we want a solution that allows us to adopt changes easily. To do so, we invoke the Vlookup command. We start by building a Vlookup table in a Microsoft Excel tab that we label as “Markup.” Note that the lookup table occupies the range of cells between A2 and B7. © Stephan Sorger 2016; Data Science: Excel Essentials; 40
41
Vlookup =VLOOKUP (lookup_value, table_array, col_index_num,
[range_lookup]) =VLOOKUP (I3, Markup!$A$2:$B$7, 2, FALSE) lookup_value = I3 (cell we are looking up) table_array = Markup!A2:B7 (table on Markup tab) col_index_num = 2 (2nd column over on Markup) [range_lookup]=FALSE FALSE = We want exact match TRUE = Approximate match OK The Microsoft Excel Vlookup command contains several parameters as part of its syntax. The parameters include the lookup value, which is the cell we are looking up, the location of the table array that we are accessing, the column index number, which is the location of the column in the table we are accessing, and the range lookup parameter, which tells Excel if we demand an exact match, or if we are satisfied with an approximate match. In our case, We insert a new column that we will call Markup, into which we can place the Vlookup commands. In our Vlookup equation, the first parameter is the lookup value. Here, we set the lookup value for the first row of data to the location of the channel for the first row, or I3. Next, we need to tell Excel the location of the lookup table. Because the lookup table is in the Markup tab, we use an exclamation mark to indicate the tab. We then specify the range of the table. . As we saw in the previous slide, the range of the lookup table is A2 to B7. Therefore, the parameter for table array is Markup!A2:B7. The next parameter tells Excel the column index number with the data we need. Because the column is the second column in the Markup table, the column index number is 2. The final parameter is the range lookup parameter. Excel uses the range lookup parameter to indicate if we want an approximate match rather than an exact one. In our case, we do not want an approximate match, so we enter False. We then copy and paste the VLOOKUP command in all of the rows in the Markup table, one for each row of data. The result is shown in the Markup table at the right hand side. We can notice that each of the different types of distribution channels have different markups. We can easily change those markups by simply changing the data in the lookup table in the Markup tab. Had we not used the VLOOKUP command, changes to the markup amounts would require us to manually go through each entry and change them to the new amount, a tedious and potentially error-prone process. © Stephan Sorger 2016; Data Science: Excel Essentials; 41
42
Outline/ Learning Objectives
Topic Description Charts Displaying and interpreting data in graphs Copy/Paste Pasting data without the underlying equations Filter and Sort Arranging data in logical sequences Find/Replace Replacing multiple instances of data Formatting Highlighting aspects of data through formatting Vlookup Accessing additional data in spreadsheets In this module, we covered How to display and interpret data in graphs and charts, Different options in copying and pasting data, Intelligent ways to filter and sort data to get the information displayed how we want, Finding and replacing multiple instances of data, Applying conditional formatting to highlight certain aspects of the data, © Stephan Sorger 2016; Data Science: Excel Essentials; 42
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.