# Learning Excel for Data Analysis

## Presentation on theme: "Learning Excel for Data Analysis"— Presentation transcript:

Learning Excel for Data Analysis
Sessions 5 and 6 Dr. Chaitali Basu Mukherji

Data Analysis Data Analysis in Excel is performed in multiple ways
using the following sections of the Data tab– Get Data – To Connect to external data set Sort and Filter Data Tools – Data Validation, Duplicate Removal, Consolidation, Data Tables and What If Analysis Outline – Group and Ungroup, Subtotals Analysis – Data Analysis, Solver

Group and Ungroup Group and Ungroup in the Outline Group of the Data tab Group allows you to collapse a group of rows or columns Ungroup reverts the action For both functions, an outline with a + or – sign will appear

Subtotals Subtotals is used in a sorted list Sort the list on the field for which you want subtotals inserted Click the Subtotal button in the Outline group on the Data tab Subtotal dialog box appears to specify the options for the subtotals When you use the Subtotals command, Excel outlines the data at the same time that it adds the rows with the departmental salary totals and the grand total. This means that you can collapse the data list down to just its departmental subtotal rows or even just the grand total row simply by collapsing the outline down to the second or first level. In a large list, you may insert page breaks every time data changes in the field on which the list is being subtotaled. To do this, select the Page Break between Groups check box in the Subtotal dialog box before you click OK to subtotal the list. Excel does not allow you to subtotal a list formatted as a table. You must first convert your table into a normal range of cells. Click a cell in the table and then click the Table Tools Design tab. Click the Convert to Range button in the Tools group, and then click Yes. Excel removes the filter buttons from the columns at the top of the list while still retaining the original table formatting. Select the field for which the subtotals are to be calculated in the At Each Change In drop-down list Specify the type of totals you want to insert in the Use Function drop-down list Select the check boxes for the field(s) you want to total in the Add Subtotal To list box Click OK Excel adds the subtotals to the worksheet

Solver Solver can handle problems that involve many variable cells and

Lets Solve the Problem using Solver

Solver using iteration
Let us solve a quadratic equation set using Solver F(x,y) = x^2+y+3 = 0 G(x,y) = 2*x^2+y^3+5 = 0 Solver will use the best estimate method using 100 iterations to come up with a close result

Analysis Toolpak To develop complex statistical or engineering analyses, you can save steps and time by using the Analysis ToolPak. Provide data and parameters for each analysis, and the tool uses appropriate statistical or engineering macro functions to calculate and display the results in an output table. Some tools generate charts along with output tables.

Anova Analysis Two Factor with Replication: This analysis is useful when data is classified along two different dimensions Example: We measure plant height which are given 3 different brands of Fertilizer and kept at 2 different temperatures For each of the six possible pairs, we have an unique observation of plant height Two Factor without Replication: This analysis is useful when data is classified along two different dimensions but there is only a single observation for each pair Anova analysis tools provide different types of variance Analysis using tools like Single Factor, Two-Factor with Replication and Two-Factor without Replication The tool to be used depends on number of factors and number of samples that you have from populations that you want to test Single Factor Anova - This tool performs a simple analysis of variance on data for two or more samples. It provides a test whether each sample is drawn from same underlying probability distribution If there are only two samples, function TTEST can be used. With more than two samples, Single Factor Anova model has to be called

What is Pivot Table? Pivot Table is used to produce meaningful information from a table of information. For e.g. from a table of data that has names, addresses, ages, occupations, phone numbers and Pin codes, a Pivot Table we easily and quickly find out: How many Salesman work in each region? What is the net car sale of each region? Lets look at comparative sales across products. What is the total Sales for the Company? How many customers do we sell in each region? Lets see how

Pivot Tables can generate and extract meaningful information from a large table of information within a matter of minutes It uses a lot less memory from your PC than if the same results were got using Excels built in functions It provides new information by simply drag-and-drop (pivot) Information is updated each time we open the Workbook or by clicking refresh

Example for Pivot Step 1:  Select the data range from which to make the pivot table Step 2:  Go to Insert tab and click on pivot table icon to select Pivot table option Step 3:  Excel displays a pivot table wizard where you specify the pivot table target location Step 4:  Make your first pivot report by dragging and dropping fields in the pivot table grid area or by controlling the “Pivot table panel”. The pivot report is divided in to header and body sections. You can drag and drop the fields you want in each area. The body itself contains three parts. Rows, Columns and Cells. You can use any fields in these areas too.

Tips on Pivot Tables Formatting is easy for pivot tables
You can easily change the pivot table summary formulas by Right click on pivot table and selecting “summarize data by” You can apply conditional formatting on pivot tables although you must be careful as pivot tables scale in size depending on the data If original data from which pivot tables are constructed changes, right click on the pivot table and select “Refresh Data” option To drill down on a particular summary value, double click on it. Excel will create a new sheet with the data corresponding to that pivot report value. (This is extremely useful)

What are Pivot Charts? Charts created on the Pivot tables are called Pivot Charts They allow us to create professional interactive charts that are not possible without complex VBA coding How is our data set-up and do we also want a Pivot Chart The basic information needed to use the Pivot Wizard are – Where is our data stored eg, range in the same Workbook, a database, another Workbook Which column of data is going into which Field i.e. the optional Page field, Row field, Column field and the mandatory Data field Where do you want to put your Pivot Table eg, new Worksheet or existing one Making a pivot chart from a pivot table is very simple. Just click on the pivot chart icon from tool bar or Options ribbon area and follow the wizard

What is Charting in Excel?
Charts are used to display series of numeric data in a graphical format to make it easier to understand large quantities of data and the relationship between different series of data To create a chart , you start by entering numeric data which you can plot by selecting chart type that you want Excel supports many types of charts (such as a column chart or a pie chart) and their subtypes (such as a stacked column chart or a pie in 3-D chart) to help you display data in ways that are meaningful to your audience You can create a combination chart by using more than one chart type Some Chart Types (Column or Bar) can be created by arranging data in rows and columns while others (Pie and Bubble) require special arrangement of data Chart templates can be saved as .crtx files and used like any other template

Elements of a Chart Chart Area Plot Area Data Points The Axes Legends
Titles Label

Excel Chart Types Excel provides facility to do the following types of chart. A typical use of the Different chart types are mentioned below. Column charts are useful for showing data changes over a period of time illustrating comparisons among items Line charts are useful for displaying continuous data over time, set against a common scale showing trends in data at equal intervals Pie charts are useful for showing the size of individual items in proportional to the sum of the items Bar charts are useful for comparisons among individual items

Cont. XY (scatter) charts useful for Area charts are useful for
displaying and comparing numeric values, in scientific, statistical, and engineering data Area charts are useful for emphasizing the magnitude of change over time drawing attention to the total value across a trend Stock charts useful for illustrating the fluctuation of stock prices, daily or annual temperatures Surface charts useful for finding optimum combinations between two sets of data similar to a topographic map Doughnut charts useful for showing the relationship of parts to a whole, and can contain more than one data series Bubble charts useful for comparing the sizes of parts that make up the data set Radar charts Radar charts compare the aggregate values of several data series as opposed to Pie charts that have only one data series

Creating a Chart You select a chart type by choosing an option from the Insert tab's Chart group. After you choose a chart type, such as column, line, or bar, you choose a chart sub-type. For example, after you choose Column Chart, you can choose to have your chart represented as a two-dimensional chart, a three-dimensional chart, a cylinder chart, a cone chart, or a pyramid chart. There are further sub-types within each of these categories. As you roll your mouse pointer over each option, Excel supplies a brief description of each chart sub-type. In Microsoft Excel, you can represent numbers in a chart. On the Insert tab, you can choose from a variety of chart types, including column, line, pie, bar, area, and scatter. The basic procedure for creating a chart is the same no matter what type of chart you choose. As you change your data, your chart will automatically update.

Let’s Create a Column Chart

Sub Types of Column Chart
Clustered column in 3-D – These compare values across categories. It displays 2-D data values using a 3-D perspective. A third value axis (depth axis) is not used. Stacked column in 3-D: Stacked column charts show the relationship of individual items to the whole, comparing the contribution of each value to a total across categories. 3-D column: 3-D column charts use three axes that you can modify (a horizontal axis, a vertical axis, and a depth axis), and they compare data points along the horizontal and the depth axes. Cylinder, cone, and pyramid: Cylinder, cone, and pyramid charts are available all the above types Cylinder, cone, and pyramid: Cylinder, cone, and pyramid charts are available all the above types with only the shape being cylinder, cone or pyramid instead of rectangle

Applying a Chart Layout
Context tabs are tabs that only appear when you need them Called Chart Tools, there are three chart context tabs: Design, Layout, and Format. The tabs become available when you create a new chart or when you click on a chart. You can use these tabs to customize your chart. You can determine what your chart displays by choosing a layout. The layout you choose determines whether your chart displays a title, where the title displays, whether your chart has a legend, where the legend displays, whether the chart has axis labels and so on. Excel provides several layouts from which you can choose.

Switching Data

Changing Chart Style

Positioning the Chart

Changing the Chart Type

Communicating through Data
Communicating through data is most effective if we understand the basic rules There are 7 common relationships in Quantitative Business Data Typical questions that arise on number presentation are – Compared to what? At what instant? In which sequence? Relative to what other? How much is the deviation? What kind of distribution does it follow? Is there any special Correlation between them?

Time-Series Relationships
This is the most common relationship in quantitative business data When quantitative values are expressed as a series of measures taken across equal intervals of time, this relationship is called a time series Studies indicate that approximately 75% of all business graphs display time series Time can be divided into intervals of varying duration, including years, quarters, months, weeks, days, and hours Time series reveal trends and patterns that we must be aware of and understand to make informed decisions

Ranking Relationships
It is most meaningful in business to see things ranked, such as the performance of sales people or the expenses of departments When quantitative values are sequenced by size, from large to small or vice versa, this relationship is called a ranking This not only reveals their sequence, but makes it much easier to compare values by placing those that are most similar near one another.

Part-to-Whole Relationships
It is often useful to see how something is divided into parts, and the percentage relationship of each part to the whole When quantitative values are displayed to reveal the portion that each value represents to some whole, this is called a part-to-whole relationship Some typical examples are how a market is divided up between competitors, or expenses are divided between regions as shown below

Deviation Relationships
When quantitative values are displayed to feature how one or more sets of values differ from some reference set of values, this is called a deviation relationship The most common example in business is one that shows how some set of actual (such as expenses) deviate from a predefined target (such as a budget)

Distribution Relationships
When we show how a set of quantitative values are spread across their entire range, this relationship is called a distribution We can often learn a great deal by examining the distribution of a set of values, especially the shape of that distribution, which reveals what’s typical, if it is skewed in one direction or the other, and if there are gaps or concentrations This shows a distribution of values that is fairly symmetrical, approaching what is called a normal or bell-shaped curve

Correlation Relationships
When pairs of quantitative values, each measuring something different about an entity (for example a person, department, or product), are displayed to reveal if there is significant relationship between them (for instance, as one goes up the other goes up as well, or as one goes up the other goes down), this is called a correlation Understanding correlations between quantitative variables can help us predict, take advantage of, or avoid particular behaviors Correlation between employee’s heights in inches (y axis) and their salary in dollars (x axis) is shown below

Nominal Comparison Relationships
This chart is called a nominal comparison relationship where there is not particular relationship between the values Four geographical regions do not relate to each other in any particular order It does provide a means to compare the regional values, but nothing more It is always useful, whenever you prepare a graph that displays nothing but a nominal comparison, to ask yourself if another relationship could be featured that would make the graph more meaningful In this case, simply arranging the regions in order of their quantitative values could produce a ranking relationship Often discrete items in a categorical variable, like these geographical regions, need to be arranged in a particular order because people expect to see them arranged in that way

Tip for Selecting Right Chart Type
What to represent Chart Type to use Nominal Comparison Bar Point Time Series Line Ranking Part to Whole Pie Stacked Bar Bubble Deviation Frequency Distribution Histogram Frequency Plot Correlation Scatter Plot with trend line

Best Practices of Charting
Determine Your Message and Identify Your Data Format Graphs to focus on the message removing unnecessary Distractions Check out if a Table, a Graph, or Both Is Needed to Communicate Your Message most effectively Determine the best place in the Charting area to Display Each Variable Take special care on Legend Placement

Tips for Enhancing Chart Performance
Use tables to hold the data Use named ranges, named formulas Use Pivot Tables Sort your data Use Manual Calculation Mode Use Non-volatile formulas These formulas are re-calculated whenever there is a change in the workbook. Examples of volatile formulas are RAND, NOW, TODAY, OFFSET Keep formulas in a separate sheet Write better formulas

Tips for a Good Report Restrict The Work Area to relevant Columns and Rows only Lock Formula Cells And Protect The Worksheet Freeze Panes So that Your boss Knows what she is Reading Hide Un-necessary / Calculation Sheets Hide Rows / Columns not used in report Include Cell – Comments / Help Messages Use Consistent Colors And Schemes Name and Color Worksheet Tabs Appropriately Before Closing The Workbook, Select Cell A1 On The Correct Sheet

Thank You