Presentation on theme: "Learning Excel for Data Analysis Dr. Chaitali Basu Mukherji."— Presentation transcript:
Learning Excel for Data Analysis Dr. Chaitali Basu Mukherji
Data Analysis Data Analysis in Excel is performed in multiple ways using the following sections of the Data tab– Get Data – To Connect to external data set Sort and Filter Data Tools – Data Validation, Duplicate Removal, Consolidation, Data Tables and What If Analysis Outline – Group and Ungroup, Subtotals Analysis – Data Analysis, Solver
Group and Ungroup Group and Ungroup in the Outline Group of the Data tab Group allows you to collapse a group of rows or columns Ungroup reverts the action For both functions, an outline with a + or – sign will appear
Subtotals Subtotals is used in a sorted list Sort the list on the field for which you want subtotals inserted Click the Subtotal button in the Outline group on the Data tab Subtotal dialog box appears to specify the options for the subtotals Select the field for which the subtotals are to be calculated in the At Each Change In drop-down list Specify the type of totals you want to insert in the Use Function drop-down list Select the check boxes for the field(s) you want to total in the Add Subtotal To list box Click OK Excel adds the subtotals to the worksheet When you use the Subtotals command, Excel outlines the data at the same time that it adds the rows with the departmental salary totals and the grand total. This means that you can collapse the data list down to just its departmental subtotal rows or even just the grand total row simply by collapsing the outline down to the second or first level. In a large list, you may insert page breaks every time data changes in the field on which the list is being subtotaled. To do this, select the Page Break between Groups check box in the Subtotal dialog box before you click OK to subtotal the list. Excel does not allow you to subtotal a list formatted as a table. You must first convert your table into a normal range of cells. Click a cell in the table and then click the Table Tools Design tab. Click the Convert to Range button in the Tools group, and then click Yes. Excel removes the filter buttons from the columns at the top of the list while still retaining the original table formatting.
Solver Solver can handle problems that involve many variable cells and can help find combinations of variables that maximize or minimize a target Cell with multiple constraints conditions that must be met for the solution to be valid Problem Statement: You are planning an advertising campaign for a new product with a total print advertising budget of Rs 12,000,000 and want to expose your ads at least 800 million times to potential readers through six publications. Your job is to reach the readership target at the lowest possible cost with the following additional constraints: At least six advertisements should run in each publication No more than a third of advertising dollars be spent on any one publication Cost for placing ad in Pub3 and Pub4 must not exceed Rs 7,500,000.
Lets Solve the Problem using Solver
Solver using iteration Let us solve a quadratic equation set using Solver F(x,y) = x^2+y+3 = 0 G(x,y) = 2*x^2+y^3+5 = 0 Solver will use the best estimate method using 100 iterations to come up with a close result
Analysis Toolpak To develop complex statistical or engineering analyses, you can save steps and time by using the Analysis ToolPak. Provide data and parameters for each analysis, and the tool uses appropriate statistical or engineering macro functions to calculate and display the results in an output table. Some tools generate charts along with output tables.
Anova Analysis Anova analysis tools provide different types of variance Analysis using tools like Single Factor, Two-Factor with Replication and Two-Factor without Replication The tool to be used depends on number of factors and number of samples that you have from populations that you want to test Single Factor Anova - This tool performs a simple analysis of variance on data for two or more samples. It provides a test whether each sample is drawn from same underlying probability distribution If there are only two samples, function TTEST can be used. With more than two samples, Single Factor Anova model has to be called Two Factor with Replication: This analysis is useful when data is classified along two different dimensions Example: We measure plant height which are given 3 different brands of Fertilizer and kept at 2 different temperatures For each of the six possible pairs, we have an unique observation of plant height Two Factor without Replication: This analysis is useful when data is classified along two different dimensions but there is only a single observation for each pair http://www.statsoft.com/textbook/anova-anova/?button=1
What is Pivot Table? Pivot Table is used to produce meaningful information from a table of information. For e.g. from a table of data that has names, addresses, ages, occupations, phone numbers and Pin codes, a Pivot Table we easily and quickly find out: How many Salesman work in each region? What is the net car sale of each region? Lets look at comparative sales across products. What is the total Sales for the Company? How many customers do we sell in each region? Lets see how
Advantages of Pivot Tables Pivot Tables can generate and extract meaningful information from a large table of information within a matter of minutes It uses a lot less memory from your PC than if the same results were got using Excels built in functions It provides new information by simply drag-and- drop (pivot) Information is updated each time we open the Workbook or by clicking refresh
Example for Pivot Step 1: Select the data range from which to make the pivot table Step 2: Go to Insert tab and click on pivot table icon to select Pivot table option Step 3: Excel displays a pivot table wizard where you specify the pivot table target location Step 4: Make your first pivot report by dragging and dropping fields in the pivot table grid area or by controlling the Pivot table panel. The pivot report is divided in to header and body sections. You can drag and drop the fields you want in each area. The body itself contains three parts. Rows, Columns and Cells. You can use any fields in these areas too.
Tips on Pivot Tables Formatting is easy for pivot tables You can easily change the pivot table summary formulas by Right click on pivot table and selecting summarize data by You can apply conditional formatting on pivot tables although you must be careful as pivot tables scale in size depending on the data If original data from which pivot tables are constructed changes, right click on the pivot table and select Refresh Data option To drill down on a particular summary value, double click on it. Excel will create a new sheet with the data corresponding to that pivot report value. (This is extremely useful)
What are Pivot Charts? Charts created on the Pivot tables are called Pivot Charts They allow us to create professional interactive charts that are not possible without complex VBA coding How is our data set-up and do we also want a Pivot Chart The basic information needed to use the Pivot Wizard are – – Where is our data stored eg, range in the same Workbook, a database, another Workbook – Which column of data is going into which Field i.e. the optional Page field, Row field, Column field and the mandatory Data field – Where do you want to put your Pivot Table eg, new Worksheet or existing one – Making a pivot chart from a pivot table is very simple. Just click on the pivot chart icon from tool bar or Options ribbon area and follow the wizard
What is Charting in Excel? Charts are used to display series of numeric data in a graphical format to make it easier to understand large quantities of data and the relationship between different series of data To create a chart, you start by entering numeric data which you can plot by selecting chart type that you want Excel supports many types of charts (such as a column chart or a pie chart) and their subtypes (such as a stacked column chart or a pie in 3- D chart) to help you display data in ways that are meaningful to your audience You can create a combination chart by using more than one chart type Some Chart Types (Column or Bar) can be created by arranging data in rows and columns while others (Pie and Bubble) require special arrangement of data Chart templates can be saved as.crtx files and used like any other template
Elements of a Chart 1.Chart Area 2.Plot Area 3.Data Points 4.The Axes 5.Legends 6.Titles 7.Label
Excel Chart Types Excel provides facility to do the following types of chart. A typical use of the Different chart types are mentioned below. Column charts are useful for – showing data changes over a period of time – illustrating comparisons among items Line charts are useful for – displaying continuous data over time, set against a common scale – showing trends in data at equal intervals Pie charts are useful for – showing the size of individual items in proportional to the sum of the items Bar charts are useful for – comparisons among individual items
Cont. XY (scatter) charts useful for – displaying and comparing numeric values, in scientific, statistical, and engineering data Area charts are useful for – emphasizing the magnitude of change over time – drawing attention to the total value across a trend Stock charts useful for – illustrating the fluctuation of stock prices, daily or annual temperatures Surface charts useful for – finding optimum combinations between two sets of data similar to a topographic map Doughnut charts useful for – showing the relationship of parts to a whole, and can contain more than one data series Bubble charts useful for – comparing the sizes of parts that make up the data set Radar charts – Radar charts compare the aggregate values of several data series as opposed to Pie charts that have only one data series
Creating a Chart In Microsoft Excel, you can represent numbers in a chart. On the Insert tab, you can choose from a variety of chart types, including column, line, pie, bar, area, and scatter. The basic procedure for creating a chart is the same no matter what type of chart you choose. As you change your data, your chart will automatically update. You select a chart type by choosing an option from the Insert tab's Chart group. After you choose a chart type, such as column, line, or bar, you choose a chart sub-type. For example, after you choose Column Chart, you can choose to have your chart represented as a two- dimensional chart, a three-dimensional chart, a cylinder chart, a cone chart, or a pyramid chart. There are further sub-types within each of these categories. As you roll your mouse pointer over each option, Excel supplies a brief description of each chart sub-type.
Lets Create a Column Chart
Sub Types of Column Chart Clustered column in 3-D – These compare values across categories. It displays 2-D data values using a 3-D perspective. A third value axis (depth axis) is not used. Stacked column in 3-D: Stacked column charts show the relationship of individual items to the whole, comparing the contribution of each value to a total across categories. 3-D column: 3-D column charts use three axes that you can modify (a horizontal axis, a vertical axis, and a depth axis), and they compare data points along the horizontal and the depth axes. Cylinder, cone, and pyramid: Cylinder, cone, and pyramid charts are available all the above types Cylinder, cone, and pyramid: Cylinder, cone, and pyramid charts are available all the above types with only the shape being cylinder, cone or pyramid instead of rectangle
Applying a Chart Layout Context tabs are tabs that only appear when you need them Called Chart Tools, there are three chart context tabs: Design, Layout, and Format. The tabs become available when you create a new chart or when you click on a chart. You can use these tabs to customize your chart. You can determine what your chart displays by choosing a layout. The layout you choose determines whether your chart displays a title, where the title displays, whether your chart has a legend, where the legend displays, whether the chart has axis labels and so on. Excel provides several layouts from which you can choose.
Changing Chart Style
Positioning the Chart
Changing the Chart Type
Communicating through Data Communicating through data is most effective if we understand the basic rules There are 7 common relationships in Quantitative Business Data Typical questions that arise on number presentation are – – Compared to what? – At what instant? – In which sequence? – Relative to what other? – How much is the deviation? – What kind of distribution does it follow? – Is there any special Correlation between them?
Time-Series Relationships This is the most common relationship in quantitative business data When quantitative values are expressed as a series of measures taken across equal intervals of time, this relationship is called a time series Studies indicate that approximately 75% of all business graphs display time series Time can be divided into intervals of varying duration, including years, quarters, months, weeks, days, and hours Time series reveal trends and patterns that we must be aware of and understand to make informed decisions
Ranking Relationships It is most meaningful in business to see things ranked, such as the performance of sales people or the expenses of departments When quantitative values are sequenced by size, from large to small or vice versa, this relationship is called a ranking This not only reveals their sequence, but makes it much easier to compare values by placing those that are most similar near one another.
Part-to-Whole Relationships It is often useful to see how something is divided into parts, and the percentage relationship of each part to the whole When quantitative values are displayed to reveal the portion that each value represents to some whole, this is called a part-to-whole relationship Some typical examples are how a market is divided up between competitors, or expenses are divided between regions as shown below
Deviation Relationships When quantitative values are displayed to feature how one or more sets of values differ from some reference set of values, this is called a deviation relationship The most common example in business is one that shows how some set of actual (such as expenses) deviate from a predefined target (such as a budget)
Distribution Relationships When we show how a set of quantitative values are spread across their entire range, this relationship is called a distribution We can often learn a great deal by examining the distribution of a set of values, especially the shape of that distribution, which reveals whats typical, if it is skewed in one direction or the other, and if there are gaps or concentrations This shows a distribution of values that is fairly symmetrical, approaching what is called a normal or bell-shaped curve
Correlation Relationships When pairs of quantitative values, each measuring something different about an entity (for example a person, department, or product), are displayed to reveal if there is significant relationship between them (for instance, as one goes up the other goes up as well, or as one goes up the other goes down), this is called a correlation Understanding correlations between quantitative variables can help us predict, take advantage of, or avoid particular behaviors Correlation between employees heights in inches (y axis) and their salary in dollars (x axis) is shown below
Nominal Comparison Relationships This chart is called a nominal comparison relationship where there is not particular relationship between the values Four geographical regions do not relate to each other in any particular order It does provide a means to compare the regional values, but nothing more It is always useful, whenever you prepare a graph that displays nothing but a nominal comparison, to ask yourself if another relationship could be featured that would make the graph more meaningful In this case, simply arranging the regions in order of their quantitative values could produce a ranking relationship Often discrete items in a categorical variable, like these geographical regions, need to be arranged in a particular order because people expect to see them arranged in that way
Tip for Selecting Right Chart Type
Best Practices of Charting Determine Your Message and Identify Your Data Format Graphs to focus on the message removing unnecessary Distractions Check out if a Table, a Graph, or Both Is Needed to Communicate Your Message most effectively Determine the best place in the Charting area to Display Each Variable Take special care on Legend Placement
Tips for Enhancing Chart Performance Use tables to hold the data Use named ranges, named formulas Use Pivot Tables Sort your data Use Manual Calculation Mode Use Non-volatile formulas – These formulas are re-calculated whenever there is a change in the workbook. Examples of volatile formulas are RAND, NOW, TODAY, OFFSET Keep formulas in a separate sheet Write better formulas
Tips for a Good Report Restrict The Work Area to relevant Columns and Rows only Lock Formula Cells And Protect The Worksheet Freeze Panes So that Your boss Knows what she is Reading Hide Un-necessary / Calculation Sheets Hide Rows / Columns not used in report Include Cell – Comments / Help Messages Use Consistent Colors And Schemes Name and Color Worksheet Tabs Appropriately Before Closing The Workbook, Select Cell A1 On The Correct Sheet