Presentation on theme: "Learning Excel for Data Analysis Dr. Chaitali Basu Mukherji."— Presentation transcript:
Learning Excel for Data Analysis Dr. Chaitali Basu Mukherji
Data Analysis Data Analysis in Excel is performed in multiple ways using the following sections of the Data tab– Get Data – To Connect to external data set Sort and Filter Data Tools – Data Validation, Duplicate Removal, Consolidation, Data Tables and What If Analysis Outline – Group and Ungroup, Subtotals Analysis – Data Analysis, Solver
Importing Data in Excel Data can be imported into Excel from Text files, from HTML files and Access Databases The main benefit of connecting to external data is – you can periodically analyze this data without repeatedly copying the data, which is both time-consuming and error-prone you can automatically refresh (or update) your Excel workbooks from the original data source whenever the data source is updated with new information
From Access Database Method 1 – you can copy data from a datasheet view and then paste the data into an Excel worksheet Start Access, and open the table, query, or form that contains the records you want to copy On the Home tab, click View, and then click Datasheet View Select the records that you want to copy along the row or if you want to select specific columns, drag across adjacent columns On the Home tab, in the Clipboard group, click Copy Start Excel, and then open the worksheet where you want to paste data Go to cell A1 and then On the Home tab, in the Clipboard group, click Paste Note: To ensure that the copied records do not replace existing records, make sure that the worksheet has no data below or to the right of the cell that you click
From Access Database Method 2 – Bring Access data that can be refreshed into Excel Create a connection, and store it in an Office Data Connection file (.odc), to the Access database and then retrieve all of the data from a table or query The main benefit of connecting to Access data instead of importing it is that you can periodically analyze this data in Excel without repeatedly copying or exporting the data from Access After you connect to the data, you can also automatically refresh (or update) your Excel workbooks from the original Access database whenever the database is updated with new information Click the cell where you want to put the data from the Access database On the Data tab, in the Get External Data group, click From Access. Locate and double-click the Access database that you want to import In the Select Table dialog box, click the table or query that you want to import, and then click OK In the Import Data dialog box, – Under Select how you want to view this data in your workbook, do select Table or select PivotTable report or select PivotChart and PivotTable report – Optionally, click Properties to set refresh, formatting, and layout options for the imported data, and then click OK. – Under Where do you want to put the data? do one of the following: – To return the data to the location that you selected, click Existing worksheet. – To return the data to the upper-left corner of the new worksheet, click New worksheet. Click OK Excel puts the external data range in the location that you specify.
Trust Centers Connection files reside in Trust Centers. Typical Microsoft Trust centers are: drive\Program Files\Microsoft Office\Templates drive\Program Files\Microsoft Office\Office12\Startup Creating Trust Centers Click the Microsoft Office Button, and then click Excel Options. Click Trust Center, click Trust Center Settings, and then click Trusted Locations. To create a trusted location that is not local to your computer, select the Allow trusted locations on my network (not recommended) check box. Click Add new location. In the Path box, type the name of the folder that you want to use as a trusted location, or click Browse to locate the folder. To include subfolders as trusted locations, select the Subfolders of this location are also trusted check box. In the Description box, type what you want to describe the purpose of the trusted location. Click OK.
From Text Step 1 Original data type - If items in the text file are separated by tabs, colons, semicolons, spaces, or other characters, select Delimited. If all of the items in each column are the same length, select Fixed width. Start import at row - Type or select a row number to specify the first row of the data that you want to import. File origin - Select the character set that is used in the text file. In most cases, you can leave this setting at its default. If you know that the text file was created by using a different character set than the character set that you are using on your computer, you should change this setting to match that character set. Preview of file - This box displays the text as it will appear when it is separated into columns on the worksheet. Step 2 (Delimited) Delimiters - Select the character that separates values in your text file. If the character is not listed, select Other check box, and then type the character Treat consecutive delimiters as one - Select this check box if your data contains a delimiter of more than one character between data fields or if your data contains multiple custom delimiters Text qualifier - Select the character that encloses values in your text file. When Excel encounters the text qualifier character, all of the text that follows that character and precedes the next occurrence of that character is imported as one value, even if the text contains a delimiter character. – If delimiter is a comma (,) and text qualifier is a quotation mark ("), "Dallas, Texas" is imported into one cell as Dallas, Texas. If no character or the apostrophe (') is specified as the text qualifier, "Dallas, Texas" is imported into two adjacent cells as "Dallas and Texas” – If the delimiter character occurs between text qualifiers, Excel omits the qualifiers in the imported value – If no delimiter character occurs between text qualifiers, Excel includes the qualifier character in the imported value. Hence, "Dallas Texas" (using the quotation mark text qualifier) is imported into one cell as "Dallas Texas“. Step 2 (Fixed width data) Data preview - Set field widths in this section. Click the preview window to set a column break, which is represented by a vertical line. Double-click a column break to remove it, or drag a column break to move it. Step 3 - Click the Advanced button to do one or more of the following: Specify the type of decimal and thousands separators that are used in the text file. When the data is imported into Excel, the separators will match those that are specified for your location in Regional and Language Options or Regional Settings (Windows Control Panel). Specify that one or more numeric values may contain a trailing minus sign. Column data format - Click the data format of the column that is selected in the Data preview section. If you do not want to import the selected column, click do not import column (skip). After you select a data format option for the selected column, the column heading under Data preview displays the format. If you select Date, select a date format in the Date box.
Sort Data entered into your worksheet is often unorganized making it difficult to examine. When analyzing information, you may need to rearrange the data in different ways to answer different questions. Excel's sorting feature can help your rearrange your data so you can use it more efficiently If your spreadsheet contains formulas, be careful when using the sort Feature as Formulae rely on cell references to perform their calculations and moving the data with the sort feature may destroy these references. Data can be sorted by: Selecting a single cell in the column containing the data to sort Select an entire column
Filter Filtering is a way that you can use Excel to quickly extract certain data from your spreadsheet. Unlike sorting, filtering doesn't just reorder the list but actually hides the rows or columns containing data that do not meet the filter criteria defined. Excel has AutoFilter that makes it easy to extract data Click on any cell in your spreadsheet. Under the Data tab, select the Filter button Drop-down menus will appear next to each cell heading
Convert Text to Column If you have a cell that contains a lot of text, you may wish to separate it into several columns. This can only be done if there is a logical character that separates the text, such as a comma, semi-colon or full stop. For example, cells containing Last Name, First Name can be separated into two different columns
Remove Duplicates A duplicate value is one where all values in the row are an exact match of all the values in another row. They are determined by the value displayed in the cell and not necessarily the value stored in the cell. For example, if you have the same date value in different cells, one formatted as "3/8/2009" and the other as "Mar 8, 2009", the values are unique. It's a good idea to filter for or conditionally format unique values first to confirm that the results are what you want before removing duplicate values. Select the range of cells, or make sure that the active cell is in a table. On the Data tab, in the Data Tools group, click Remove Duplicates. Do one or more of the following: – Under Columns, select one or more columns. – To quickly select all columns, click Select All. – To quickly clear all columns, click Unselect All. If the range of cells or table contains many columns and you want to only select a few columns, you may find it easier to click Unselect All, and then under Columns, select those columns. Click OK. Excel displays a message indicating how many duplicate values were removed and how many unique values remain, or if no duplicate values were removed. Click OK. Note: You cannot remove duplicate values from data that is outlined or that has subtotals. To remove duplicates, you must remove both the outline and the subtotals.
Filter for Unique Values Filtering for unique values and removing duplicate values are two closely related tasks because the displayed results are the same — a list of unique values. The difference, however, is important: When you filter for unique values, you temporarily hide duplicate values, but when you remove duplicate values, you permanently delete duplicate values. Select the range of cells, or make sure the active cell is in a table. On the Data tab, in the Sort & Filter group, click Advanced. In the Advanced Filter dialog box, do one of the following: – To filter the range of cells or table in place, click Filter the list, in-place. – To copy the results of the filter to another location, do the following: Click Copy to another location. In the Copy to box, enter a cell reference. Select the Unique records only check box, and click OK. The unique values from the selected range are copied to the new location. The original data is not affected.
Conditional Formatting Quick formatting Select a column in the table On the Home tab, in the Styles group, click the arrow next to Conditional Formatting, and then click Highlight Cells Rules. Select Duplicate Values. Enter the values that you want to use, and then select a format. Advanced formatting Select one or more cells in a range, table, or PivotTable report. On the Home tab, in the Styles group, click the arrow next to Conditional Formatting, and then click Manage Rules. The Conditional Formatting Rules Manager dialog box appears. Do one of the following: – To add a conditional format, click New Rule. The New Formatting Rule dialog box appears. To change a conditional format, do the following: – Make sure that the appropriate worksheet or table is selected in the Show formatting rules for list box. – Optionally, change the range of cells by clicking Collapse Dialog in the Applies to box to temporarily hide the dialog box, by selecting the new range of cells on the worksheet, and then by selecting Expand Dialog. – Select the rule, and then click Edit rule. The Edit Formatting Rule dialog box is displayed. Under Select a Rule Type, click Format only unique or duplicate values. Under Edit the Rule Description, in the Format all list box, select unique or duplicate. Click Format to display the Format Cells dialog box. Select the number, font, border, or fill format that you want to apply when the cell value meets the condition, and then click OK. You can choose more than one format. The formats that you select are displayed in the Preview box.
Data Validation – Entry Restriction To avoid less junk in your data entry process, it is sometime essential to restrict the choice of values in specific columns or cells by using the drop-down list. Here are the steps to follow : Select the cell to validate On Data tab, in Data Tools group, click Data Validation The Data Validation dialog box opens In the Data Validation dialog box, click the Settings tab Click on the Allow box then select List from the drop-down list Click the Source box and then type the valid values separated usually a comma “,” or semicolon “;”. For example if the cell is for a color of a car then you can limit the values by entering : Silver, Green, Blue Instead of typing your list manually, you can create the list entries by referring to a range of cells in the same worksheet or another worksheet in the workbook To specify the location of the list of valid entries, do one of the following: – If the list is in the current worksheet, enter a reference to your list in the Source box, for example enter: =$A$1:$A$6 – If the list is on a different worksheet, define a name for your list then enter the name that you defined for your list in the Source box, for example, enter: =ValidColors
Other Data Restrictions You can restrict the entry in a cell using the Datatypes of Excel by choosing Whole Number, Decimal, Date or Time For all the above, you need to specify the range between which you want to restrict For Text, you need to specify the range of the length of the Text field You can also define your own restriction through use of Excel Formula
Data Validation - Tips In-cell dropdown check box must be selected, otherwise, you won’t be able to see the drop-down arrow next to the cell Select or clear the Ignore blank check box depending on how you want to handle blank (null) values If you use defined name and there is a blank cell anywhere in that range, selecting the Ignore blank check box allows any value to be entered in the validated cell If any referenced cell is blank, selecting the Ignore blank check box allows any value to be entered in the validated cell. If you change the validation settings for a cell, you can automatically apply your changes to all other cells that have the same settings. To do so – Open the Data Validation dialog box – Click the Settings tab – Select the Apply these changes to all other cells with the same settings check box
Data Validation – Other Options Display an input message when the cell is clicked – Click the Input Message tab – Select Show input message when cell is selected check box – Fill in the Title and text for the Input message Display an error message when wrong data is entered – Click the Error Alert tab – Select Show error alert after invalid data is entered check box – Fill in the Title and text for the Error message – Select one of the following options for the Style box: Information: Display an information message. Does not prevent entry of invalid data Warning: Display a warning message. Does not prevent entry of invalid data Stop : Prevent entry of invalid data
Validate Cell Values Select the cell you want to validate On the Data menu, click Validation, the Data Validation dialogue box will be shown Click the Settings tab. Specify the type of validation you want. Suppose you have a numeric value and you want to allow only values between 1 and 99, then: – In the Allow combo box select ‘Decimal’. – In the Data combo box select ‘Between’. – In the Minimum edit box enter ‘1’. – In the Maximum edit box enter ‘99’. To show an error message when an invalid data is entered: – Click the Error Alert tab – Fill the Title and Error Message edit boxes with appropriate text
Validate Cell Value based on another Cell Value Suppose we have a list of documents each with Issue and Expiry dates. We want to validate expiry date so that it is always greater than issue Date Select the cell On the Data menu select Validation and go to the Settings tab In the Allow list box select Date and in the Data list box select Greater Than In the Start Date: box type “=Address of the issue date ″
Consolidate This feature allows multiple lists to be combined and presented in one sheet The following rules enable the consolidation of Lists using the Consolidate command: – The structure of the Lists must be identical – The headings of all rows and the leftmost columns in the Lists must contain the same topic – The number of columns and the number of rows do not have to be identical; nor does the internal order of the text. – The Lists must have a single row for labels, and a single column for labels – The cells in the Lists data range must contain only numeric data – Excel consolidates data by identifying corresponding text crossed between the header row and the leftmost column Give Names to List1, List2 and List 3 Select a cell in a different sheet of the workbook, and select Data -> Consolidate (in Data Tools Group). 4. In the Reference box, press F3. 5. In the Paste Name dialog box, select List1, click OK, and then click Add to add List1 to All references box. 6. Repeat steps 4 and 5, and add List2 and List3 to All references box. 7. In Use Labels in, select the Top row and Left column checkboxes, and then click OK.
What IF Analysis There are 3 kinds of what-if analysis tools in Excel: Scenarios, Data Tables, and Goal Seek. Scenarios and Data Tables take sets of input values and determine possible results. Data table works only with one or two variables, but it can accept many different values for those variables. A scenario can have multiple variables, but it can accommodate only up to 32 values. Goal Seek works differently from scenarios and data tables. It takes a result and determines possible input values that produce that result.
Scenario Manager Scenario Manager can create multiple scenarios on the same worksheet, and then switch between them. For each scenario, specify the cells that change and the values to use for that scenario. When switching between scenarios, the result cell changes to reflect the different changing cell values Worst Case ScenarioBest Case Scenario 1. Changing cells 2. Result cell Scenario Reports are not automatically recalculated.
Scenario Example Problem Statement: Assume you own a book store and have 100 books in store. You sell a certain % for higher price of $50 and a certain % for lower price of $20 If you sell 60 % for the highest price, cell D10 calculates a total profit of 60 * $50 + 40 * $20 = $3800
Create Different Scenarios What if you sell 70% for the highest price? What if you sell 80% for the highest price? What if you sell 90% for the highest price? What if you sell 100% for the highest price? You can type in a different percentage into cell C4 to see corresponding result of a scenario in cell D10. However, What-If Analysis enables you to easily compare the results of different scenarios. On Data tab, click What-If Analysis and select Scenario Manager from list The Scenario Manager dialog box appears Add a scenario by clicking on Add Type a name (60 % highest), select cell C4 (% sold for the highest price) for the Changing cells and click on OK. Enter the corresponding value 0.6 and click on OK again Next, add 4 other scenarios (70%, 80%, 90% and 100%). Scenario Manager shows the picture below: To see result of a scenario, select the scenario and click on the Show button. Excel will change the value of cell C4 accordingly for you to see the corresponding result on the sheet
What If Analysis Result To easily compare the results of these scenarios, do the following – Click the Summary button in the Scenario Manager – Next, select cell D10 (total profit) for the result cell and click on OK Result: Conclusion: If you sell 70% for the highest price, you obtain a total profit of $4100, if you sell 80% for the highest price, you obtain a total profit of $4400, etc.
Goal Seek Goal Seek is a built in Excel tool that allows you to see how one data item in a formula impacts another. You might look at these as “cause and effect” scenarios. You need to borrow some money. You know how much money you want, how long a period you want in which to pay off the loan, and how much you can afford to pay each month. Use Goal Seek to determine what interest rate you must secure in order to meet your loan goal If you want to determine more than one input value, for example, the loan amount and the monthly payment amount for a loan, you should instead use the Solver add-in
Goal Seek Example Problem: Local election results is being studied where 2/3 of the voters need to vote YES for cabinet formation Votes% of Votes YES447863.90 * NO253036.10 Total7008100 Observation: YES votes are a majority, but short of the required 2/3 approval to win the election. YES group is close, but how close? What would've made a difference? Using Goal Seek we can change the value of various cells to see how the results change. This would allow you to answer these types of questions. How many “NO” voters needed to be converted to YES to win the election? How many more votes were needed by the YES team to win the election? If 500 more people voted could the YES team have won? In each of these questions, the goal is to change a data value to see if the YES percentage went over that two-thirds mark or 66.67%. Rather than haphazardly changing cell values to see the results, Goal Seek can find the answers. Create the following spreadsheet in Excel Click the cell you want to change. This is called the “Set cell”. Select Goal Seek…and in the Goal Seek dialog, enter the new “what if” amount in the To value text box Here, we're asking Excel to replace the contents of cell D4 which is 63.90% with 66.67% which is the percentage needed to win the election. We also need to tell Excel which cell to change. Since we wanted to know the number of YES votes, we'll click C4 Tips: The Set cell in Step 2 must contain a formula. The cell you change in Step 5 can't contain a formula. It must be a typed value.
Data Tables Data Tables are used when – when a formula uses one or two variables, or multiple formulas use one common variable all the outcomes in one place are to be seen in one place a range of possibilities are to be seen at a glance focus on only one or two variables and results are easy to read and share in tabular form If automatic recalculation is enabled for workbook, data tables recalculate with fresh data It can’t accommodate more than 2 variables but can handle as many values as you want
One-variable data tables This is used to see how different values of one variable in one or more formulas will change the results of those formulas. Example: you can use a one-variable data table to see how different interest rates affect a monthly mortgage payment by using the PMT function. You enter the variable values in one column or row, and the outcomes are displayed in an adjacent column or row. D2 contains the payment formula, =PMT(B3/12,B4,-B5), which refers to the input cell B3. On the Data tab, in the Data Tools group, click What-If Analysis, and then click Data Table Type the list of values that you want to substitute in the input cell either down one column or across one row. Leave a few empty rows and columns on either side of the values. Do one of the following: – If the data table is column-oriented (your variable values are in a column), type the formula in the cell one row above and one cell to the right of the column of values. – If the data table is row-oriented (your variable values are in a row), type the formula in the cell one column to the left of the first value and one cell below the row of values. Select the range of cells that contains the formulas and values that you want to substitute. Based on the first illustration in the preceding section, this range is C2:D5.
Two-variable data tables This is used to see how different values of two variables in one formula will change the results of that formula. Example: you can use a two-variable data table to see how different combinations of interest rates and loan terms will affect a monthly mortgage payment. C2 contains the payment formula, =PMT(B3/12,B4,-B5), which uses two input cells, B3 and B4. A two-variable data table uses a formula that contains two lists of input values. The formula must refer to two different input cells. In a cell on the worksheet, enter the formula that refers to the two input cells. In this example, in which the formula's starting values are entered in cells B3, B4, and B5, you type the formula =PMT(B3/12,B4,-B5) in cell C2. Type one list of input values in the same column, below the formula. In this case, type the different interest rates in cells C3, C4, and C5. Enter the second list in the same row as the formula, to its right. Type the loan terms (in months) in cells D2 and E2. Select the range of cells that contains the formula (C2), both the row and column of values (C3:C5 and D2:E2), and the cells in which you want the calculated values (D3:E5).In this case, select the range C2:E5. On the Data tab, in the Data Tools group, click What-If Analysis, and then click Data Table. In the Row input cell box, enter the reference to the input cell for the input values in the row.Type B4 in the Row input cell box. In the Column input cell box, enter the reference to the input cell for the input values in the column. Type B3 in the Column input cell box. Click OK.
Group and Ungroup Group and Ungroup in the Outline Group of the Data tab Group allows you to collapse a group of rows or columns Ungroup reverts the action For both functions, an outline with a + or – sign will appear
Subtotals Subtotals is used in a sorted list Sort the list on the field for which you want subtotals inserted Click the Subtotal button in the Outline group on the Data tab Subtotal dialog box appears to specify the options for the subtotals Select the field for which the subtotals are to be calculated in the At Each Change In drop-down list Specify the type of totals you want to insert in the Use Function drop-down list Select the check boxes for the field(s) you want to total in the Add Subtotal To list box Click OK Excel adds the subtotals to the worksheet When you use the Subtotals command, Excel outlines the data at the same time that it adds the rows with the departmental salary totals and the grand total. This means that you can collapse the data list down to just its departmental subtotal rows or even just the grand total row simply by collapsing the outline down to the second or first level. In a large list, you may insert page breaks every time data changes in the field on which the list is being subtotaled. To do this, select the Page Break between Groups check box in the Subtotal dialog box before you click OK to subtotal the list. Excel does not allow you to subtotal a list formatted as a table. You must first convert your table into a normal range of cells. Click a cell in the table and then click the Table Tools Design tab. Click the Convert to Range button in the Tools group, and then click Yes. Excel removes the filter buttons from the columns at the top of the list while still retaining the original table formatting.
Analysis ToolPak and Solver For Analysis ToolPak refer to the document on DataData Analysis Let us solve a quadratic equation set using Solver. F(x,y) = x^2+y+3 = 0 G(x,y) = 2*x^2+y^3+5 = 0 Solver will use the best estimate method using 100 iterations to come up with a close result