Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Virtualization Demoette… Flat-File Data Sources

Similar presentations


Presentation on theme: "Data Virtualization Demoette… Flat-File Data Sources"— Presentation transcript:

1 Data Virtualization Demoette… Flat-File Data Sources
Hello, and welcome to the Demoette series for Cisco Information Server, or CIS. In this Demoette, we show the use of Flat-File Data Sources.

2 Agenda What are they and why do they matter? A basic demo Summary
Here is our agenda. We begin by defining flat-file data sources and outlining their importance for our customers. Next we walk through a very basic demo of flat-file data sources. Finally, we summarize the contents of this demoette.

3 Agenda What are they and why do they matter? A basic demo Summary
Let’s begin by discussing what flat-file data sources are and why they are important for our customers.

4 What are they? Flat-File Data Sources Delimited files
Excel Spreadsheets (non-ODBC) XML files are out-of-scope for this demoette Typical flat-file data sources used by CIS customers include delimited files and Excel Spreadsheets that are not accessed via Open Database Connectivity, or ODBC. CIS provides a separate adapter for ODBC access to Excel. CIS also supports the use of XML files as a data source. However, since XML files are hierarchical structures that typically require transformation, we do not consider them in this demoette.

5 Why do they matter? Flat-File Data Sources Purchased data
Departmental data Ad hoc data Flat-file data sources are important to our customers for three reasons. First, delimited files and spreadsheets are common formats for data purchased from external sources. For example, external marketing and demographics data is often delivered to corporate customers in delimited file format. Second, while enterprise-wide data is typically held in databases, important departmental data frequently lives in spreadsheets, and users often need to integrate these spreadsheets with corporate databases. Third, data analysts often build ad hoc data in spreadsheets, and then need to integrate this data with corporate databases.

6 Agenda What are they and why do they matter? A basic demo Summary
Next, let’s walk through a very basic demo that shows the use of Flat-file data sources.

7 Demo: Here is the business problem…
Data Analysts CIS Customer Data: Delimited File Customer Data: Complex Spreadsheet Here is the business problem we illustrate in this demo. Our data analysts need to join various types of customer data to get a more well-rounded view of their customers. Some of this data resides on a delimited file. Other data resides in an Excel spreadsheet. This spreadsheet is somewhat complex; it includes multiple tabs, and the data on one of the tabs is in an irregular format.

8 Demo: Before you begin…
Save the files to your hard drive Import CAR file Configure the data sources to point to your hard drive location Before you begin this demo, you will need to get the files that are used as data sources and save them to your hard drive. You can then import the CAR file for this demoette, and configure the data sources to point to the locations where you have stored the files. The CAR file contains all of the views used in the demo. You may use these pre-built views, or build your own from scratch. All of these assets, as well as instructions for these steps, can be found in the Additional Resources folder that accompanies this demoette.

9 Demo: Create the Delimited Data Source
We are ready to begin our demo. We right-click the Studio namespace and select New Data Source. <CLICK> From there, we select the File-Delimited data source. <CLICK> We name the new data source… <CLICK> … and specify the directory or URL where it resides. <CLICK> Next we specify the formatting details of our flat file. Note that our file has a header row which will be used to define the column names, so we check the “Has Header Row” box. <CLICK> Now we’re ready to Create and Introspect.

10 Demo: Create the Delimited Data Source
We select the delimited file we want from the directory, and click Next. <CLICK> We click Finish… <CLICK> … and then OK.

11 Demo: Create the Delimited Data Source
Our delimited file now appears as a table in the CIS namespace. We can open it and view the data.

12 Demo: Create the Excel Data Source
Next, let’s create our Excel data source. We create a new Data Source and specify Microsoft Excel non-ODBC. <CLICK> Again, we navigate to the proper directory, and specify appropriate details, such as the fact that this spreadsheet uses a Header row for column names. <CLICK> When we introspect the spreadsheet, CIS tells us that it has two tabs: Customers Fancy Tab and Customers Plain Tab. We’ll choose the Plain Tab to start with.

13 Demo: Create the Excel Data Source
When we complete the introspection process, we see that our table is created in the namespace, and we can open the table and see its data.

14 Demo: Create the Excel Data Source
Our Plain Tab was easy to introspect, but as you can see here, the Fancy Tab is a bit more complex. It actually contains two different sets of data in different parts of the page. <CLICK> On the left, beginning in row 1, we have aggregate data showing the total number of customers in each US State. <CLICK> To the right, beginning in row 5, we have a subset of the base customer information, which focuses on customer names and phone numbers. We want to introspect both of these data collections. Let’s begin with the aggregate data.

15 Demo: Create the Excel Data Source
We right-click the Excel data source, and select Add/Remove Resources. <CLICK> We select the Fancy Tab, and specify the data range of the aggregated data. The precise range is A1 colon B12, but what if we expected the number of rows to change over time? To solve that issue, we can specify a much larger number for the end of the range, such as B 100.

16 Demo: Create the Excel Data Source
Now our aggregate data appears as a table in the CIS namespace. We can open it and view the results. Our data appears, but why are there 23 rows of null values at the end? These null rows are present because of interaction with the second sub-table on this tab, the one with the customer phone number information. Since that sub-table has data in rows 13 through 35, CIS returns these rows as part of our aggregate data table. It is very simple to filter these nulls out, and we’ll show how to do it in a moment.

17 Demo: Create the Excel Data Source
But first, let’s go after the second collection of data on the Fancy Tab. We’ll make a new data source for this, because we want to introspect a different range of a tab that we have already introspected in our original data source. The process of creating this second data source is identical to what we have already seen, so we don’t repeat it all here. Note that we use E5 as the value for the data range, because that’s where this set of data begins.

18 Demo: Create the Excel Data Source
Now our second data collection on the Fancy Tab appears as a table. We can open it and view the contents.

19 Demo: Create Physical Views
Following CIS development best practices, we create physical-level views for our Delimited File table, as well as for the two sub-tables on the Fancy Tab of our Excel spreadsheet. This process is straightforward for the Delimited and Phone tables, but let’s take a closer look at the Aggregate Data table. <CLICK> For this view, we add a Criterion on the Grid Panel that excludes those null rows we encountered earlier. <CLICK> Now when we execute the view, the null rows are not present.

20 Demo: Create a Federated View
Finally, let’s create a federated view that joins data from the delimited file, and both subsets of data on the Fancy Tab of the Excel spreadsheet. We start with the delimited file, and join it to the spreadsheet phone information based on customer ID. We also join the delimited data with the aggregate data on the spreadsheet, based on the stateorprovince column. <CLICK> We use the Grid panel to define a subset of columns for the projection, and we provide an alias for the column that shows the number of customers in the state.

21 Demo: Create a Federated View
We execute the View, and CIS joins the data from our flat-file sources. <CLICK> As our execution plan shows, CIS must perform three separate Fetch operations and join all the data itself. Because flat files have no Join capabilities, there is no work that CIS can push down. Even so, CIS is able to perform the Joins itself, and as developers we don’t have to worry about this complexity. If we did want to take advantage of push-down capabilities, we could consider caching these data sources. Then we could use the Join capabilities of the cache target database. Our demo is complete.

22 Agenda What are they and why do they matter? A basic demo Summary
Let’s summarize what we have seen in this presentation.

23 Summary Flat-File Data Sources Delimited files
Excel Spreadsheets (non-ODBC) Purchased data Departmental data Ad hoc data Typical flat-file data sources used by CIS customers include delimited files and Excel Spreadsheets that are not accessed via Open Database Connectivity, or ODBC. CIS provides a separate adapter for ODBC access to Excel. CIS also supports the use of XML files as a data source. However, since XML files are hierarchical structures that typically require transformation, we did not consider them in this demoette. Flat-file data sources are important to our customers for three reasons. First, delimited files and spreadsheets are common formats for data purchased from external sources. For example, external marketing and demographics data is often delivered to corporate customers in delimited file format. Second, while enterprise-wide data is typically held in databases, important departmental data frequently lives in spreadsheets, and users often need to integrate these spreadsheets with corporate databases. Third, data analysts often build ad hoc data in spreadsheets, and then need to integrate this data with corporate databases. Thank you.

24 TOMORROW starts here.


Download ppt "Data Virtualization Demoette… Flat-File Data Sources"

Similar presentations


Ads by Google