Presentation is loading. Please wait.

Presentation is loading. Please wait.

Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina.

Similar presentations

Presentation on theme: "Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina."— Presentation transcript:

1 Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina

2 © 2006 Michael Mina2 Agenda My two bios The goal of this presentation Why "misadventures"? Lessons learned Some tools I use The story of our secret data mart Parting advice

3 © 2006 Michael Mina3 My real world bio Worked as a business analyst since 1995 –Medical Mutual –Antares Management Solutions –Cleveland State University –National City Medical Mutual –Completed ad hoc data requests for many different departments. –Developed and programmed the disease management outcomes reporting process. Recognized by CEO for efforts resulting in NCQA accreditation. –Coordinated and prioritized development of reporting, system modifications and resolution of reporting problems for the Underwriting division. Recognized by VP for helping save $400K annually.

4 © 2006 Michael Mina4 My real world bio (cont'd) Antares Management Solutions –Working in IT, I developed reporting from multiple data warehouses for business units and partner corporations –Developed web-enabled GUI front-end metadata repository/data dictionary. –Developed and taught courses in business intelligence tools and the metadata repository for business units and partner corporations. –Contributed to data warehouse redesign project. Conceived and developed Quality Assurance methodology for maintenance of dimension tables. –Assisted in the development of data marts and reporting for several departments.

5 © 2006 Michael Mina5 My real world bio (cont'd) Cleveland State University –Managed staff and developed analyses, reporting processes and databases for various departments –Managed the Institutional Research component of the two- year PeopleSoft 8 upgrade project. Recognized by both President and CIO for outstanding contributions to the project. –Chaired Data Stewardship Committee (25 members) to manage University data and ensure that the University possesses the accurate data it requires. National City –Currently working on various data warehouse-related projects, including data warehouse metrics.

6 © 2006 Michael Mina6 My real world bio (cont'd) Founder of GCPCUG Data Warehousing SIG (met monthly 1/ /2002) Developed and currently teach "Business Intelligence & Data Warehousing" continuing education course at CSU

7 © 2006 Michael Mina7 My Bizarro world bio I'm a citizen of Afghanistan, with Hispanic and Japanese ancestry. I'm eligible for the Medicare Rx benefit. My wife is also Hispanic (but not from Afghanistan.) She is interested in dating others, and has a newborn that I do not know about. That's what some companies think--My family demographic data needs serious revision. (For more about Bizarro world, see "This am Bizarro page" at Encyclopaedia/bizarro.php)

8 © 2006 Michael Mina8 My Bizarro world bio (cont'd) My point? Enterprises vary greatly in the degree to which they pursue data quality. How important is data quality? And to whom does it matter? Some departments actually consider data quality a threat, regardless of what they say (more on that later.…)

9 © 2006 Michael Mina9 The goal of this presentation Share some of my experiences and the experiences of others (with their permission) for your edification Focus your attention on matters I believe are not addressed frequently enough Help you survive and thrive in this line of business

10 © 2006 Michael Mina10 Why "misadventures"? "Accentuate the positive" or learn from mistakes? In his classic "The Data Warehouse Lifecycle Toolkit", Kimball clearly expects almost everything to go well. He's a highly paid consultant, so it's no surprise he gets the cooperation he needs. I had NO experiences that ran as smoothly as his.

11 © 2006 Michael Mina11 Why "misadventures"? (cont'd) Since becoming a business analyst in 1995, I've reported to 8 different managers at 4 different companies. I've seen more missteps than successfully completed data warehouse-related projects. –Projects running past deadline –Chameleon-like project requirements –Broken promises, false accusations –Other unorthodox, unprofessional behaviors

12 © 2006 Michael Mina12 Why "misadventures"? (cont'd) As a part-time career consultant for eight years, people have sought my advice for dealing with a variety of unpleasant situations. Experience is the best teacher and bad experience can be the best of the best--especially if it's someone else's.

13 © 2006 Michael Mina13 Lessons I've learned as a Business Analyst I want to share lessons that are data analysis-related and career-related The latter are important: Do not ignore your own best interests. I've seen –People hired on Monday who had their department downsized on Friday –People told their department survived a downsizing, just before someone was terminated (not for cause) So with that in mind, here are some…

14 Lessons I've learned as a Business Analyst…

15 © 2006 Michael Mina15 People define "Data Warehouse" differently Some people refer to the location of physical servers as the DW Some people call questionably modeled data sets that are used for reporting purposes a DW Some even call a TNF copy of a TNF production database a DW Analyst impact: –Make sure you understand which definition is being used –This is especially important during your interviews with people unaware of these multiple definitions. –Don't bother "correcting" people, its counterproductive

16 © 2006 Michael Mina16 True dimensional models are a rarity With one possible exception, every DW data model I've encountered is TNF, or somewhere between TNF and dimensional, but never purely dimensional Examples: –TNF portions of DWs –Every date dimension table I've seen uses date as the key rather than a surrogate key Many decisionmakers cannot be sold on the value of dimensional modeling –One Senior VP claimed he did not want "his" data warehoused. He wanted reporting from the operational system - it was "closer to the source" –Fact: The data governance group was controlled by his rival. Was it ignorance, politics, or both?

17 © 2006 Michael Mina17 True dimensional models are a rarity (cont'd) Analyst impact: –Modeling is often easier without a dimension model requirement –The opportunities to fully develop dimensional modeling skills may be limited

18 © 2006 Michael Mina18 Companies love MSRE ("misery") Multi-Source Reporting Environments more common than DWs –One version of the truth often unworkable Too expensive and time consuming Deployment of revenue generating systems (and the need for information therefrom) will not wait for the data to be warehoused –More common than DW "one version of the truth": Source systems as system of record Several witnesses (data sources) testify, and their "testimony" is adjudicated. Hence, MSRE.

19 © 2006 Michael Mina19 Companies love MSRE (cont'd) Analyst impact: –More data sources to understand –Newer analysts need more mentoring, even when there is effective metadata management –Opportunity to become very valuable to employer by understanding these sources

20 © 2006 Michael Mina20 Data quality can be a threat Do NOT assume the desire for data quality is universal "Jim" found out that many dimension tables were missing values that were needed to allow analysts to join them to the fact tables. He took it upon himself to develop a process to systematically detect values in the fact tables that were missing from the dimension tables. It generated 60+ pages (single spaced) of missing codes and descriptions. Jim proudly showed his manager, who looked as if he had seen a ghost. He thanked Jim, but the system was not implemented.

21 © 2006 Michael Mina21 Data quality can be a threat (cont'd) The problem: –Jim later learned that his department created the problem, and that not too many users were aware of it. –There was no glory to be had in cleaning up one's own mess, especially when few people knew about the mistake in the first place. OOPS! Analyst impact: –Be more politically astute –Weigh the risks of acting independently. In this case, Jim had no regrets. Any guesses why?

22 © 2006 Michael Mina22 Cost-effectiveness can be career-ineffective "Dave" worked for a state government plagued by significant budget constraints. The directive from on high was to cut costs. Dave's boss wanted to have her annual department meeting at an expensive location off site. Dave reminded her of the cost cutting directive. It was not appreciated. When his department was tasked with developing an interactive PDF document containing standard reports from a data mart, Dave's staff put together a very low cost, scalable prototype using Access, Excel and pdf995 (freeware).

23 © 2006 Michael Mina23 Cost-effectiveness can be career-ineffective (cont'd) His boss wound up having the system developed using SQL server, running on a $30K server, and paying $10K to graphic designers for the interactive PDF. It was completed 18 months after the prototype. The prototype was about 80% as functional as the final version. The problem: –Dave's boss believed that one of her staff would gain more job satisfaction using SQL Server. –Also, she needed help from a peer who had purchased a $30K server that was underutilized. –And, her peer needed to justify his purchase. And so, a friendship was born…

24 © 2006 Michael Mina24 Cost-effectiveness can be career-ineffective (cont'd) Analyst impact: –Understand differences between the public sector and private sectors. Some people just like to spend taxpayer money, and you can't stop them. –The 80/20 rule works both ways: if you are strapped for cash, sometimes it is better to settle for 80% of the functionality at 20% of the cost. –What's cost-effective for your employer may not be career- effective for you. You must decide which is a higher priority for you. In most cases, those who try to work cost-effectively are appreciated.

25 © 2006 Michael Mina25 Requirements management = danger management Requirements management is the sine qua non of business analysis. It is challenging at best, dangerous at worst. Challenging: –Determining who has asthma when that information is not explicitly captured in the claims data. –Determining who is an employee Cleveland State University could not provide one single answer because they were required to report to multiple agencies that had multiple specifications. There was a Dept of Education definition of Employee There was an Ohio Civil Rights Commission definition as well.

26 © 2006 Michael Mina26 Requirements management = danger management (cont'd) Challenging (cont'd) –Suppose I wanted a list of all Cleveland customers. Do I want: Customers that live in Cleveland? Customers that live in the Cleveland sales region? Customers that live in the Cleveland administrative region? Customers that are assigned to the Cleveland sales region, regardless of where they actually live? You get the idea… –Large scale requirements management (e.g., using IBM Rational RequisitePro.)

27 © 2006 Michael Mina27 Requirements management = danger management (cont'd) Dangerous #1: Variable requirements - fixed deadline –Consider Rob's case: –Rob's manager kept reversing himself on his data requests "Paint this wagon red." "You know, this wagon would look better blue." "Why is this wagon blue? Everyone knows wagons should be red." –On one occasion, Rob's boss significantly changed project requirements the day before the deadline. At 5:00 pm on the day of the deadline (he usually left at 5:00), he wrote Rob an criticizing him for not having the project complete. –Rob went to talk to his boss, but he had gone for the day. –People on Rob's staff even provided proof that Rob's boss reversed himself on requirements.

28 © 2006 Michael Mina28 Requirements management = danger management (cont'd) The problem: –Rob reported to someone who had a reality-allergy. –Rob was eventually let go, officially because of downsizing. The solution: –Rob documented his case to his boss in writing, along with supporting documentation. His boss toned down the criticism, but the situation did not revert to normalcy. –Rob also saw an attorney, only to learn that as a result of special circumstances, and through no fault of his own, he didn't have a case. –Rob was eventually able to convince his boss to write him an outstanding letter of recommendation.

29 © 2006 Michael Mina29 Requirements management = danger management (cont'd) Analyst impact: –You must be able to clearly articulate your concerns about requirements, especially to those who don't understand the complexities of your work. –You must document your concerns in writing, preferably in nonrepudiable form (e.g., ), at minimum in a personal log. –You must keep excellent records of project requirements, changes in requirements, preferably in nonrepudiable form.

30 © 2006 Michael Mina30 Requirements management = danger management (cont'd) Analyst impact (cont'd): –Even better: publish all changes to the requirements, and all issues related to the requirements, and ensure that stakeholders receive this information. This prevents any pretense to ignorance. –These must be records in your personal possession. General access records are fine as a supplement to personal records. –You cannot be too attached to your job, or your employer. You too may suddenly find yourself reporting to reality- averse management. –Pay as much attention to managing your career as to doing your job. Rob did, and found a better job.

31 © 2006 Michael Mina31 Requirements management = danger management (cont'd) Dangerous #2: Determining that requirements have been met –Simple? NO! –Who has final say as to whether or not a requirement has been met? Does he/she play political games? –Often need to negotiate to agree that stated requirements have been met The problem: –Bonuses for some managers can depend upon completion of certain projects by certain dates. This may result in political pressure to declare that requirements were met. –Also, missing a deadline almost always impacts one negatively.

32 © 2006 Michael Mina32 Requirements management = danger management (cont'd) Analyst impact: –Be aware that compromises that often occur. These include Restating the requirements until the work that has been done meets the revised requirements (often accompanied by pretending that these were what the requirements should have been all along.) Loosely interpreting the requirements (like penumbras, emanations and such) Note: I am indicating neither approval nor disapproval for these. –Do not compromise on compliance reporting unless you are willing to go to prison to benefit other people's careers. And DONT think youre immune to such pressure, a friend of mine was not. He did the right thing by refusing to cooperate with senior management. (Note: he was eventually promoted into another department.)

33 © 2006 Michael Mina33 "One-time-only" means "several-times-at-least" I have yet to develop a one-time-only process one time only! Even if the results are due once, the requirements are often developed interatively, and that can mean rework. I have never regretted building a repeatable process. One example: –"Chris" was assigned a mentor that had started a particular analytical project manually rather than programmatically. –He was tasked by his manager to assume responsibility for that analysis and automate it. However, his mentor felt that continuing the analysis manually was the best course of action, even though Chris and his mentor had the same manager.

34 © 2006 Michael Mina34 "One-time-only" means "several-times-at-least" (cont'd) One example (cont'd) –When Chris met with his mentor to discuss his progress on the project, his mentor could not accept that he was to automate the process. She insisted that, had she continued working on the project, it would have been done by then. –Even though Chris followed his manager's instructions, his mentor's negative attitude toward his progress threatened to damage his career. The reality: –The requirements for the analysis changed significantly one day. Had the process been done manually, the rework costs would have been high. –Two days later, the requirements changed significantly, including twice in the same morning.

35 © 2006 Michael Mina35 "One-time-only" means "several-times-at-least" (cont'd) The problem: –Did Chris' mentor want the opportunity to complete the "one- time-only" analysis? –Did he/she want to micromanage the project? –Was he/she jealous that Chris was able to automate this process when the mentor could not? –Who knows?

36 © 2006 Michael Mina36 "One-time-only" means "several-times-at-least" (cont'd) Analyst impact: –Everyone likes to be right. Understand that some people like to be right, so they learn when they're wrong and adjust (I hope that's you.) Other people like to be right, and think they are always right. You need to document their incorrectness. –Carefully document: The requirements given to you Your decisions regarding implementation Any shortcomings of the alternatives to repeatibility.

37 © 2006 Michael Mina37 "One-time-only" means "several-times-at-least" (cont'd) Analyst impact (cont'd) –Business rules are often developed iteratively. Common sense suggests that a process to implement those rules also be developed and executed iteratively. –Build repeatability into your processes. If you expect some opposition to this, then do not advertise your plans. –Repeatable processes often make for better discussion in job interviews than "one-time-only" projects.

38 © 2006 Michael Mina38 Understanding ETL process(es) is important To become a more effective business analyst, it is not enough to be familiar with the data in the warehouse. You must also understand: –How the data is loaded –The systems from which data is extracted –Into which warehouse tables that data is loaded –Which data is not loaded and why This information is often not captured at a sufficient level of detail in metadata repositories ETL breaks or fixes can cause strange differences in reports from one month to the next. These may not be detected until after the warehouse is loaded.

39 © 2006 Michael Mina39 Understanding ETL process(es) is important (cont'd) Analyst impact: –More information to understand –Newer analysts need more mentoring, even when there is effective metadata management –Opportunity to become very valuable to employer by understanding these processes

40 © 2006 Michael Mina40 Beware of IT-focused decisionmaking We must not be IT-bashers, but: –People who understand technology better than business should not make business decisions, and –People who understand business better than technology should not make technology decisions. There is a subculture within IT, less dominant than before, that I call "business-agnostic" - they don't really understand their employer's business, and some of them don't want to. They tend to be technophilic. Some are actually businessphobic. When they have significant decisionmaking authority, they promote suboptimal outcomes.

41 © 2006 Michael Mina41 Beware of IT-focused decisionmaking (cont'd) Examples of IT-focus to the detriment of business- focus: Example 1 –A trainer who makes it known that she enjoyed working as a trainer, "except for the people" (i.e., the students.) Example 2 –IT tells its SQL trainer (me) not to teach business analysts about the LEFT JOIN and the RIGHT JOIN because their queries would adversely affect system performance.

42 © 2006 Michael Mina42 Beware of IT-focused decisionmaking (cont'd) The problem: –The dimension tables are incomplete, and IT is at fault. A friend of mine ran a report that was short $12 million as a result. –IT must either accept responsibility for the incomplete dimension tables, or accept that business analysts must work around it somehow.

43 © 2006 Michael Mina43 Beware of IT-focused decisionmaking (cont'd) Analyst impact: –It is NOT acceptable to hide from business analysts the fact that using the INNER JOIN could cause their reports to be millions of dollars out of balance. –Weigh the cost of additional CPU time vs. the cost of allowing senior management to receive incorrect information - this is a no-brainer. –I explained that I would not comply for the above reasons, and because compliance would damage my reputation. –Your reputation is a personal asset. You should not allow an employer to squander it any more than you would allow them to raid your savings account.

44 © 2006 Michael Mina44 Beware of IT-focused decisionmaking (cont'd) Example 3 –IT and business analysts are displeased with the level of effort required to perform complex analyses. Consequently, IT seeks a replacement for the company standard BI tool. The problem: –The warehouse's data model was antiquated and no longer supported the complexity of the evolving analytical environment. –Replacement of the BI tool was contraindicated. The solution must involve data model remediation. –The quick fix - changing BI tools - would have been a waste of time and money.

45 © 2006 Michael Mina45 Beware of IT-focused decisionmaking (cont'd) Example 4 –IT tells business analysts to stop using an uppercase conversion function in the WHERE clause in SQL because it makes queries perform poorly. The problem: –Many fields on which analysts must query are mixed case. –IT did not bother contacting analysts before issuing this "ruling". Analyst impact: –Learn to identify business-agnosticism and make plans to deal with it in a manner that allows IT to save face. –Gradually bring IT decisionmakers to the understanding that business-agnostic decisions undermine their reputation and authority.

46 © 2006 Michael Mina46 Network intelligently The problem: –The career advice often given regarding networking is not sophisticated enough. –"Network, network, network!" is not a strategy. Volume cannot always compensate for lack of a definite plan. Analyst impact: –Like any other undertaking, you should determine how much effort you are willing to give, what you want to take, and how to effectively accomplish both. –Why do you (audience) think I (presenter) am here?

47 © 2006 Michael Mina47 Network intelligently (cont'd) Analyst impact (cont'd): –Primary goals in networking should include Building skills you otherwise might not get to develop. Meeting people likely to help you with your career objectives. The NEODWSIG meets both of these criteria –Seek opportunities to work with people in other departments and/or companies to achieve primary goals. –Volunteer at places that will help you accomplish those same goals. –If your job does not afford you intelligent networking opportunities, consider finding another.

48 © 2006 Michael Mina48 Don't become "VP of the Weather" Avoid like the plague getting in situations where your responsibility greatly exceeds your authority to execute. I call this being "Vice President of the Weather" - you are held responsible for actions you cannot control. Often stems from decisionmakers not being held to account for their own unreasonable behavior.

49 © 2006 Michael Mina49 Don't become "VP of the Weather (contd) For example: "Tim" was accused by a "Sam," a Vice President of IT, of making a bad decision in product selection. In front of a large group, Sam demanded to know why Tim made the bad selection. Tim pulled out an from Sam proving that Sam made the product selection, and Tim answered "because you told me to." Note to users: Don't let this happen to you!

50 © 2006 Michael Mina50 Don't become "VP of the Weather" (cont'd) The problem: –Sam was my boss' boss' boss' boss, and he made me Vice President of the Weather on two occasions. Occasion 1 –My manager and I were made responsible for completely understanding the data of a client. The problem: –The client did not allow us to talk freely to their subject matter experts. –We were not permitted to study the client's data freely, nor were we allocated the time to do so. The solution: –You think there is one? Fortunately, matters never came to a head.

51 © 2006 Michael Mina51 Don't become "VP of the Weather" (cont'd) Occasion 2 –Sam tasks me with developing a large set of reports from a managed care system for a client. The problem: –I was forbidden to talk with ANYONE working for the client. –I could not ask questions about the deliverables, or the source data. The solution: –I used the Force. –Seriously, I had experience developing reports like this from a similarly structured database. With the judicious use of assumptions, I was able to deliver.

52 © 2006 Michael Mina52 Don't become "VP of the Weather" (cont'd) Analyst impact –Always gather information to document the unreasonableness of the request or assignment in the event that things go wrong. –The more times you pull a rabbit out of a hat, the more likely you will be looked upon as a wizard. This is usually bad because you will be relied upon to cover the incompetence of others. –Determine when failure IS an option. Properly managed, limited failure can deter unreasonable requests in the future. –Realistically, you should be responsible for Your own actions The actions of those who report to you The actions of those you've misled (e.g., by giving bad advice, false information, etc.)

53 Some tools I use as a Business Analyst…

54 © 2006 Michael Mina54 Tools Source-to-target data maps Data flow charts Ad hoc validation Informal metadata Creative cross joins –Date dimension table –Truth tables Using Excel for query construction Complex case catalog

55 © 2006 Michael Mina55 Source-to-target data maps A table (e.g., in Excel) showing columns needed for your analysis, and the tables from which you plan to source them. Best used when your analysis sources data from many tables.

56 © 2006 Michael Mina56 Source-to-target data maps (cont'd) This an example:

57 © 2006 Michael Mina57 Source-to-target data maps (cont'd) Advantages: –Complex data analyses could take much longer without this tool. –Helps analysts quickly identify columns and tables already being used by the process, as well as best sources for additional columns. –Excel allows filtering on the columns, allowing for more focused review of the data. –Can serve as an addendum to a requirements document Disadvantages: –Requires some additional effort to develop and maintain

58 © 2006 Michael Mina58 Data flow charts Not the same as data flow diagrams Similar to both flow charts and UML activity diagrams, but not exactly like either

59 © 2006 Michael Mina59 Data flow charts (cont'd) Data sources are listed by name. qnnn = query tnnn = intermediate table Queries are executed in numerical order. Data sources and intermediate tables point to queries that source data from them. Queries point to intermediate tables they create, update, etc.

60 © 2006 Michael Mina60 Data flow charts (cont'd) Advantages: –Facilitates making inevitable changes to your process –Gives a graphical overview of your process –Helps illustrate where changes in your process will have their impacts Disadvantages: –Symbology not well developed (e.g., is it a create table, update table, or append table?) –Some additional time is required to develop and update the data flow chart.

61 © 2006 Michael Mina61 Ad hoc validation Run an audit process parallel to your primary process. Goal: systematically perform reasonableness checks at each point in your process. At a minimum, create a table listing –The name of each table in your process, and –Its row count.

62 © 2006 Michael Mina62 Ad hoc validation (contd) Advantages: –Helps identify where improper logic was used. –Helps identify points in your process about which you are likely to be questioned by the stakeholders. –For example: Your row count drops after an inner join. Is that a desired or undesired result? Disadvantages: –Requires some additional effort –Requires additional CPU time

63 © 2006 Michael Mina63 Informal metadata Formal metadata –Often found in metadata repositories –Physical characteristics of tables, columns, etc. –Definitions of tables, columns, etc. Informal metadata –Often found in the memories of subject matter experts –May not be easily stored and retrieved using traditional metamodels

64 © 2006 Michael Mina64 Informal metadata (cont'd) Certain types of metadata are better attached to topics than to tables or columns. Consider this situation: –Suppose we want to determine if a customer account is open, and there is a closed_date field in table A, the table that is the main source of account information. –One would think that if closed_date were null, then the account would be open. But suppose that is the case only for accounts processed on system X. –Accounts processed on system Y, however, are always missing their closed date on table A for various reasons. –To determine if accounts processed on system Y are open, you must verify that, in another table B, the field open_status = 'Y' for the account in question.

65 © 2006 Michael Mina65 Informal metadata (cont'd) Question: To which columns or tables should this very valuable metadata (i.e., how to identify an open account) be attached? –To closed_date? –To open_status? –To table A? –To table B? Answer: To something else entirely –This is an example of what I call topical metadata, the topic being "open customer account." Note: the term "topical metadata" is used frequently in the context of the semantic web, and rarely, if ever, in the context of relational databases. Nevertheless, I believe it is appropriate here.

66 And Now, It's Time For…

67 The Informal Metadata Game Show!

68 © 2006 Michael Mina68 The Informal Metadata Game Show Which weighs more? A pound of feathers or a pound of gold?

69 © 2006 Michael Mina69 Answer: A pound of feathers weighs more Metadata is key to this riddle because "a pound" is not "a pound" Of course, a quantity of feathers that weighs the same as a quantity of gold will take up more volume than the gold, but this is not a matter of density.

70 © 2006 Michael Mina70 Answer: A pound of feathers weighs more (cont'd) 1 "pound" of feathers = 1 avoirdupois pound –This is equal to 7,000 grains –See 1 "pound" of gold = 1 Troy pound –One Troy pound is 5,760 of the same grains used in the definition of avoirdupois pound –See 1 pound of feathers weighs more than 1 pound of gold because "pound" has different meanings in each context.

71 © 2006 Michael Mina71 Informal metadata (cont'd) A column called "weight" may thus have a different meaning for a particular row if another column called item_desc has the value "gold in that row.

72 © 2006 Michael Mina72 Creative cross joins What is a cross join (aka Cartesian product)? –Suppose Table1 has r1 rows and c1 columns –Suppose Table2 has r2 rows and c2 columns –Table1 cross join Table2 is the combination of every row of Table1 with every row of Table2. –Table1 cross join Table2 has (r1 x r2) rows and (c1 + c2) columns The problem is not so much the c1 + c2 as the r1 x r2. I tell my students that cross joins are like snake venom: often deadly, but they can be used to make very powerful medicine in the right hands.

73 © 2006 Michael Mina73 Creative cross joins (cont'd) Cross join applications –Date dimension table –Truth tables –Any application where you need to exhaust all possible combinations of N quantities

74 © 2006 Michael Mina74 Date dimension table Cross joins can be used to help create a date dimension table for your data warehouse, or any other database. This is an example done using Microsoft Access Create (manually or otherwise) one table listing the digits 0 through 9.

75 © 2006 Michael Mina75 Date dimension table (cont'd) Write a cross join query to create a Cartesian product of the table with itself four times. Our SELECT statement will use the Cartesian product to create the integers 0 through 10,000. The DateAdd function will add these integers to a starting date (in this case, 1/1/1990). Different databases may use a different function to add integers to dates. Save these results in a table.

76 © 2006 Michael Mina76 Date dimension table (cont'd) SELECT 1000*A.Digit+100*B.Digit+10*C.Digit+D.Digit AS DateKey, DateAdd("d",1000*A.Digit+100*B.Digit+10*C.Digit+D.Digit,#01/01/1990#) AS DateValue INTO DateValues FROM Digits AS A, Digits AS B, Digits AS C, Digits AS D ORDER BY 1;

77 © 2006 Michael Mina77 Date dimension table (cont'd) Then, create a date dimension table using the previous table. Use the row functions provided by your database to add the information you need. Remember to create a primary key on the Time dimension table.

78 © 2006 Michael Mina78 Date dimension table (cont'd) SELECT DateKey, DateValue, Day(DateValue) AS Day, Month(DateValue) AS Month, Int((Month(DateValue)+2)/3) AS Quarter, Year(DateValue) AS Year INTO TimeDimension FROM DateValues

79 © 2006 Michael Mina79 Truth tables can be used to help determine and validate business rules. They are created by –Identifying a set of inputs –exhausting all combinations of True and False for those inputs –evaluating a business rule for each combination This is an example done using Microsoft Access. Truth tables

80 © 2006 Michael Mina80 Create a two-column, two-row table like so: Cross join the tables (SQL on next slide) Truth tables (cont'd)

81 © 2006 Michael Mina81 Truth tables (cont'd) SELECT tblTrueFalse.status_boolean, tblTrueFalse_1.status_boolean, tblTrueFalse_2.status_boolean, tblTrueFalse_3.status_boolean FROM tblTrueFalse, tblTrueFalse AS tblTrueFalse_1, tblTrueFalse AS tblTrueFalse_2, tblTrueFalse AS tblTrueFalse_3 ORDER BY tblTrueFalse.status_boolean, tblTrueFalse_1.status_boolean, tblTrueFalse_2.status_boolean, tblTrueFalse_3.status_boolean;

82 © 2006 Michael Mina82 Truth tables (cont'd) This is the output

83 © 2006 Michael Mina83 Truth tables (cont'd) Select the Analyze It with Microsoft Excel option

84 © 2006 Michael Mina84 Truth tables (cont'd) Add your business rules and evaluate them to make sure they correspond to reality.

85 © 2006 Michael Mina85 Truth tables (cont'd) Advantages: –Useful when you must persuade others of the validity of decision rules –Useful in identifying holes in decision logic Disadvantages: –Requires effort to develop

86 © 2006 Michael Mina86 Using Excel for query construction Text-related functions in Excel can be used to write SQL. Advantages: –Useful when you have complex and/or volatile decision rules –Useful when you have a set of queries you would like to have coded consistently Disadvantages: –Time consuming

87 © 2006 Michael Mina87 Using Excel for query construction (cont'd) Input some basic information in this section

88 © 2006 Michael Mina88 Using Excel for query construction (cont'd) This is the main workspace. Query parameters are outlined.

89 © 2006 Michael Mina89 Using Excel for query construction (cont'd) This is the first query, ready to be copied into Access.

90 © 2006 Michael Mina90 Using Excel for query construction (cont'd) This is the second query, ready to be copied into Access.

91 © 2006 Michael Mina91 Complex case catalog Create a catalog of complex cases you've worked on over the course of your career Business problem statements, specifications, personal notes, queries, code samples, etc. Advantages: –Provides a portable knowledgebase to which you can refer regardless of your employer. –Facilitates resume and interview preparation. –Can provide ideas for writing articles. Disadvantages: –Requires effort to maintain –Legal issues may be associated with keeping copies

92 The story of our secret data mart…

93 © 2006 Michael Mina93 The story of our secret data mart A coworker friend and I built a data mart without official authorization. Background: –I was developing the first disease management outcomes reporting process for my employer. The problem: –I was running up against deadlines and system resource issues. –My manager offered next to no support. In fact, she was getting angry with me for system problems I could not control (that dreaded Vice President of the Weather thing.)

94 © 2006 Michael Mina94 The story of our secret data mart (cont'd) The solution –I spoke with a good friend in IT about the issue, and we decided to build a data mart. –We knew that formal project authorization would not be forthcoming, so we worked on it behind the scenes. Another problem: –We had to account for our time. The solution –I was "working with IT to help improve system performance" for my project –My friend's time was spent on generic "performance improvement"

95 © 2006 Michael Mina95 The story of our secret data mart (cont'd) Another problem: –How do we suddenly reveal the existence of a data mart? The solution –Evolutionary terminology –From "IT thinks they need to create a summarized data set for us." –To "IT is summarizing data for us to try to improve performance." –To "I'm working with IT to decide which columns from which tables we need to summarize." –To "It's really like a data mart" –To "we used our data mart" –Our managers either didn't notice, or pretended not to notice.

96 © 2006 Michael Mina96 The story of our secret data mart (cont'd) We got the work done by deadline. After that, people seemed to pretend that building a data mart was the plan all along. It's amazing how they love a winner. We planned additional data marts for additional diseases, but it was not to be…. How did it end? –Shortly thereafter, my friend was promoted into another department. –Sometime after that, I was promoted into my friend's former department. –Within one year, my former manager was demoted for unclear reasons.

97 © 2006 Michael Mina97 The story of our secret data mart (cont'd) Analyst impact: –Consider very carefully how willing you are to bend the rules. –If your manager wants you to do a job with one hand tied behind your back, you need to ask why. We risked damaging our careers to help managers who lacked foresight accomplish their goals. –I would not advise this, nor do it again.

98 © 2006 Michael Mina98 Parting advice Deepen and expand your skill set. It increases –Your marketability –Your authority within your organization Seek opportunities to achieve, and to understand the impact of those achievements Develop a career portfolio Network effectively "As you do good work for your employer, whoever it may be, always, always ask 'what's in it for me?' " - You owe it to yourself, your spouse/ family/ causes you support Think twice before building data marts, or anything else, in secret. I lucked out once. I wouldnt do it again.

99 Thank you!

Download ppt "Our Secret Data Mart: This and other misadventures in my ten years as a business analyst Presentation to the NEODWSIG 3/23/2006 by Michael Mina."

Similar presentations

Ads by Google