Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Information Server

Similar presentations


Presentation on theme: "IBM Information Server"— Presentation transcript:

1 IBM Information Server
Insert Your Header Here. IBM Information Server Cleanse - QualityStage This is STRICTLY an INTERNAL product. Do not distribute without senior management approval.

2 IBM Information Server Delivering information you can trust
Support for Service-Oriented Architectures Understand Cleanse Transform Deliver Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Platform Services Key Point: The culmination of these efforts has led us to our latest platform offering – the IBM Information Server. IBM Information Server is a revolutionary new software platform from IBM that helps organizations derive more value from the complex, heterogeneous information spread across their systems. It enables organizations to integrate disparate data and deliver trusted information wherever and whenever needed, in line and in context, to specific people, applications, and processes. IBM Information Server helps business and IT personnel to collaborate to understand the meaning, structure, and content of any type of information across any sources. It provides breakthrough productivity and performance for cleansing, transforming, and moving this information consistently and securely throughout the enterprise, so it can be accessed and used in new ways to drive innovation, increase operational efficiency, and lower risk. IBM Information Server is designed to help companies leverage their information across all its sources. IBM Information Server delivers all of the functions required to integrate, enrich and deliver information you can trust for your key business initiatives. IBM Information Server allows you to: Understand all sources of information within the business, analyzing its usage, quality, and relationships Cleanse it to assure its quality and consistency Transform it to provide enriched and tailored information, and; Federate it to make it accessible to people, processes, and applications All of these functions are based on a parallel processing infrastructure that provides leverage and automation across the platform. The Information Server also provides connectivity to nearly any data or content source, and the ability to deliver information through a variety of mechanisms. Underlying these functions is a unified metadata management foundation that provides seamless sharing of knowledge throughout a project lifecycle, along with a detailed understanding of what information means, where it came from, and how it is related to information in other systems. Integration logic built within IBM Information Server can easily be deployed and managed as a shared service within a SOA. IBM Information Server provides: access to the broadest range of information sources the broadest range of integration functionality, including federation, ETL, in-line transformation, replication, and event publishing the most flexibility in how these functions are used, including support for service-oriented architectures, event-driven processing, scheduled batch processing, and even standard APIs like SQL and Java. The breadth and flexibility of the platform enable it to address many types of business problems and meet the requirements of many types of projects. This optimizes the opportunities for reuse, leading to faster project cycles, better information consistency, and stronger information governance. Regarding Service-Oriented Architectures, information integration enables information to be made available as a service, publishing consistent, reusable services for information that make it easier for processes to get the information they need from across a heterogeneous landscape. Parallel Processing Connectivity Metadata Administration Deployment 2

3 The IBM Solution: IBM Information Server Delivering information you can trust
Unified Deployment Understand Cleanse Transform Deliver Key Point: The culmination of these efforts has led us to our latest platform offering – the IBM Information Server. IBM Information Server is a revolutionary new software platform from IBM that helps organizations derive more value from the complex, heterogeneous information spread across their systems. It enables organizations to integrate disparate data and deliver trusted information wherever and whenever needed, in line and in context, to specific people, applications, and processes. IBM Information Server helps business and IT personnel to collaborate to understand the meaning, structure, and content of any type of information across any sources. It provides breakthrough productivity and performance for cleansing, transforming, and moving this information consistently and securely throughout the enterprise, so it can be accessed and used in new ways to drive innovation, increase operational efficiency, and lower risk. IBM Information Server is designed to help companies leverage their information across all its sources. IBM Information Server delivers all of the functions required to integrate, enrich and deliver information you can trust for your key business initiatives. IBM Information Server allows you to: Understand all sources of information within the business, analyzing its usage, quality, and relationships Cleanse it to assure its quality and consistency Transform it to provide enriched and tailored information, and; Federate it to make it accessible to people, processes, and applications All of these functions are based on a parallel processing infrastructure that provides leverage and automation across the platform. The Information Server also provides connectivity to nearly any data or content source, and the ability to deliver information through a variety of mechanisms. Underlying these functions is a unified metadata management foundation that provides seamless sharing of knowledge throughout a project lifecycle, along with a detailed understanding of what information means, where it came from, and how it is related to information in other systems. Integration logic built within IBM Information Server can easily be deployed and managed as a shared service within a SOA. IBM Information Server provides: access to the broadest range of information sources the broadest range of integration functionality, including federation, ETL, in-line transformation, replication, and event publishing the most flexibility in how these functions are used, including support for service-oriented architectures, event-driven processing, scheduled batch processing, and even standard APIs like SQL and Java. The breadth and flexibility of the platform enable it to address many types of business problems and meet the requirements of many types of projects. This optimizes the opportunities for reuse, leading to faster project cycles, better information consistency, and stronger information governance. Regarding Service-Oriented Architectures, information integration enables information to be made available as a service, publishing consistent, reusable services for information that make it easier for processes to get the information they need from across a heterogeneous landscape. WebSphere QualityStage Data cleansing, standardization, matching, and survivorship for enhancing data quality and creating coherent business views Unified Metadata Management Parallel Processing Rich Connectivity to Applications, Data, and Content

4 Need for Data Quality Critical Problems Why? Alternative Approaches
Need to create & maintain 360 degree views of customers, suppliers, products, locations, events Need to leverage data - make reliable decisions, comply with regulations, meet service agreements Why? No common standards across organization Unexpected values stored in fields Required information buried in free-form fields Fields evolve - used for multiple purposes No reliable keys for consolidated views Operational data degrades 2% per month Alternative Approaches Denial – problem misunderstood and ignored until too late; load and explode Hand-coding - clerical exception processing; very time consuming and resource intensive Simplistic cleansing apps - evolved from direct marketing & list hygiene, lack flexibility Data Sources Data Values Kentucky Fried Chicken KFC 227G CB&NAT STICK P QUE/MOZZ WRAPP. Molly Talber DBA KFC Kent Fried Chick Kentucky Fried Mrs. M. Talber The classic case for data quality is really as simple as people, places and things. People may be business organizations, individual customers, suppliers or distributors who conduct business in one or more locations. Things are the products and services you sell and the pieces, parts and bills of material you use to make them. Organizations today more than ever need to understand the complex relationships they have with their customers, suppliers and distribution channels and be able to make decisions on accurate counts of parts and products in order to compete effectively, provide exceptional service and meet increasing regulatory requirements. As you can see in the example, data values for the same business entity often vary widely in disparate data sources. There’s a lack of common standards for how to store data, there’s inconsistency in how the data is input and the business operation is often very creative with the data values they introduce into your application environments. All that noise at the value level across sources makes understanding relationships between critical business entities such as customers and products very difficult. In many cases there is no reliable and persistent key you can use across the enterprise to get all the information associated with a single customer or product. The ongoing degradation of the data quality by 2% or more per month ensures quality is a persistent challenge. Very often the problems is mis-understood or ignored until the new enterprise application is delivered to the business where it quickly becomes obvious that poor data quality jeopardizes the usefulness of the entire project. Hand coding exceptions and manually fixing data problems is very expensive and does not fundamentally solve the problem. Address hygiene applications lack the flexibility and performance to provide an effective enterprise solution. 227G CB&NATURAL STICK MOZZ WRAPPER John & Molly Talber Talber, KFC, ATIMA 4 4

5 Why Should I Care About Cleansing Information?
Lack of information standards Different formats & structures across different systems Data surprises in individual fields Data misplaced in the database Information buried in free-form fields Data myopia Lack of consistent identifiers inhibit a single view The redundancy nightmare Duplicate records with a lack of standards So it is clear to see that understanding is important in any integration project, but why is cleansing important? The unfortunate truth is that every enterprise has to deal with data quality issues. Gartner estimates that data degrades at 2% per month. This causes many problems, where data can no longer be trusted, and often gives inconsistent results. There are five types of problems that we generally see within enterprise data stores. The first is a lack of information standards. Names, addresses, part numbers, and other data are entered in inconsistent ways, particularly across different systems. These differences make records look different, even when they are actually the same – like in the example with Kate Roberts represented in three different ways, with different address standards. Another common issue involves data surprises in individual fields. Data in the database is often misplaced, or fields are used for multiple purposes – as in the example, where the name field contains company and address information, tax id contains telephone numbers, and the telephone field has a variety of mistakes. This often leads to program and application errors, or it can result in misidentification of key products and customers. A third common problem is information buried in free-form fields. In this case valuable information is hidden away in text fields, Since these fields are difficult to query using SQL, this information is often not leveraged, although it likely has value to the business. This type of problem is common in product information and help desk case records. The fourth problem is data myopia – our term for the lack of consistent identifiers across different systems. Without adequate foreign-key relationships, it is impossible to get a complete view of information across systems. This example shows three products that look very different, but are actually the same. The final problem is redundancy within individual tables. This is extremely common, where data is re-entered into systems because the data entry mechanism is not aware that the original record is already there. This is a common side effect of lack of standards, but it is one of the worst data quality problems, since it links directly to costs and customer dissatisfaction. 5

6 Importance of Data Quality
Low data quality impacts an organization in several ways Poor data quality leads to misguided marketing promotions Cross sell opportunities may be missed because same customer appears several times in slightly different ways Valued customers may not be recognized during support calls or other important touchpoints Data mining is difficult because related items are not detected as related What is good data quality? Two percent of “bad” data doesn’t sound that bad? Two percent of 10M rows means that you have 200K errors  200K errors add up to big problem for analytics/operations/anything! Data Quality is a major business issue!

7 Enterprise initiatives… …to satisfy critical business requirements.
Supply chain collaboration & item synchronization Inventory consolidation Single view of a customer or supplier ERP Implementations ERP instance consolidation IT System renovation Consolidation resulting from M&A activity Enterprise Data Warehouse Compliance & Regulatory projects (SOX, HIPAA, ACCORD, etc.) Compliance Business to Business Standards Risk Management Reduce Costs & Increase Productivity Increase Revenue / CRM Payoff Business Intelligence Payoff …need high quality data… 7

8 IBM WebSphere QualityStage
Shared design environment with DataStage increases functionality and reduces development time Visual match rule interface simplifies match tuning Service orientation provides ‘continuous’ quality & delivers confidence in your data Parallel architecture shortens execution time

9 How will you get an accurate, consolidated view of your business?
1. Free Form Investigation 2. Data Standardization 3. Data Matching 4. Data Survivorship Customers WebSphere QualityStage Process Products / Materials Target The problem of transforming legacy data into an enterprise orientation is that : legacy values have conflicting meanings and inconsistent representations; thus they are not easily mapped to new target data fields legacy records are organized around accounts, lines-of business, functions, and geographic territories; yet the enterprise view needs to span and link related records across these operational sources but no common keys exists to make the linkage of related records the problem is compounded in that the process must handle tens or hundreds of thousands of records initially (often millions), but then must continue to handle thousands of new transactions on a daily basis; thus an effective in-house automated solution is an absolute requirement Database with Consolidated Views Transactions Vendors / Suppliers

10 Insert Your Header Here.
Why Investigate Discover trends and potential anomalies in the data 100% visibility of single domain and free-form fields Identify invalid and default values Reveal undocumented business rules and common terminology Verify the reliability of the data in the fields to be used as matching criteria Gain complete understanding of data within context Why is Analysis and Assessment important. What does it provide?

11  Investigation - Free Form
123 St. Virginia St. 123 | St. | Virginia | St. Parsing: Separating multi-valued fields into individual pieces number street state street type type 123 | St. | Virginia | St. Lexical analysis: Determining business significance of individual pieces House Street Street Number Name Type 123 | St. Virginia | St. Context Sensitive: Identifying various data structures and content “The instructions for handling the data are inherent within the data itself.”

12 Insert Your Header Here.
Rule Sets Pre-defined rules for parsing and standardizing: Name Address Area (City, State and Zip) Multi-national address processing Validate structure: Tax ID US Phone Date Append ISO country codes Pre-process or filter name, address and area Rule sets are stored in the common repostiory

13  Standardization - Example
Input File: Address Line Address Line 2 639 N MILLS AVENUE ORLANDO, FLA 306 W MAIN STR, CUMMING, GA 30130 3142 WEST CENTRAL AV TOLEDO OH 43606 843 HEARD AVE AUGUSTA-GA-30904 1139 GREENE ST ACCT # AUGUSTA GEORGIA 4275 OWENS ROAD SUITE 536 EVANS GA 30809 Result File: House # Dir Str. Name Type Unit No. NYSIIS City SOUNDEX State Zip ACCT# 639 N MILLS AVE MAL ORLANDO O645 FL 306 W MAIN ST MAN CUMMING C552 GA 3142 W CENTRAL AVE CANTRAL TOLEDO T430 OH 843 HEARD AVE HAD AUGUSTA A223 GA 1139 GREENE ST GRAN AUGUSTA A223 GA 4275 OWENS RD STE 536 ON EVANS E152 GA RJD – Fixed line underneath Input File

14 Insert Your Header Here.
Why Match Identify duplicate entities within one or more files Perform householding Create consolidated view of customer Establish cross-reference linkage Enrich existing data with new attributes from external sources

15 Two Methods to Decide a Match
Are these two records a match? WILLIAM J KAZANGIAN 128 MAIN ST /8/62 WILLAIM JOHN KAZANGIAN 128 MAINE AVE /8/62 Deterministic Decisions Tables: Fields are compared Letter grade assigned Combined letter grades are compared to a vendor delivered file Result: Match; Fail; Suspect B B A A B D B A = BBAABDBA = +49 Probabilistic Record Linkage: Fields are evaluated for degree-of-match Weight assigned: represents the “information content” by value Weights are summed to derived a total score Result: Statistical probability of a match RJD – Changed timing Well, you can automate this in either of two ways. The first, that just about everybody is familiar with, is a decision-table approach. Each field is evaluated and given some score (or letter grade) that tells how well it matched. Then all the grades are lined up into a pattern that allows us to maintain visibility to the results of each individual field. That pattern is then used as a key into a static-table that tells the system whether that particular configuration of field-scores should be matched, failed or perhaps passed on for clerical review. The second approach, known as Probabilistic Linkage, is familiar to computer science professionals who must perform highly precise matching when there is great liability and consequence from errors. This method also evaluates each field, but the score produced is a numerical representation of the amount of information produced by the single pair of values. This allows each individual field score to be summed to produce a final score which precisely measures the information content of the matching fields. That final score, or match weight, is an accurate gauge of the probability of a match. So to summarize, there really are just two ways to automate the decision of whether records should be matched. One is based on performing a pattern or rule-based look-up in a table, and the other is based on rigorous mathematical measurement of the available information. Both give satisfactory results when the data is fairly simple and the matching requirements reasonably lax, but the probabilistic approach is required when the data is noisy or incomplete, or the business requirements are very demanding. Now lets see why that’s true.

16 Insert Your Header Here.
Why Survive Provide consolidated view of data Provide consolidated view containing the “best-of-breed” data Resolve conflicting values and fill missing values Cross-populate best available data Implement business and mapping rules Create cross-reference keys

17  Survivorship - Example
Survivorship Input (Match Output) Group Legacy First Middle Last No. Dir. Str. Name Type Unit No. 1 D150 Bob Dixon 1500 SE ROSS CLARK CIR 1 A1367 Robert Dickson ROSS CLARK CIR 23 D689 Ernest A Obrian SW 74TH ST STE 202 A436 Ernie Alex O’Brian SW 74TH ST 23 D352 Ernie Obrian ST # 202 Consolidated Output Group First Middle Last No. Dir. Str. Name Type Unit No. 1 Robert Dickson 1500 SE ROSS CLARK CIR 23 Ernie Alex O’Brian SW 74TH ST STE 202 Group Legacy 1 D150 1 A1367 23 D689 23 A436 23 D352

18 How Does WebSphere QualityStage Integrate
Data Extraction and Load Routines Database Target DB2 Oracle Sybase Onyx IDMS etc. Investigation Standardization Integration Survivorship QualityStage Seamlessly! DB2 Oracle Sybase Onyx IDMS etc.

19 WebSphere DataStage and WebSphere QualityStage: Fully Integrated!
Seamless! INTEGRITY applications can be directly accessed through DataStage designer allowing users to modify INTEGRITY applications directly from the Designer canvas.

20 QualityStage: Data Quality Extensions
IBM WebSphere QualityStage GeoLocator IBM WebSphere QualityStage Postal Verification Products WAVES (WorldWide) IBM WebSphere Worldwide Address Verification Solution IBM WebSphere QualityStage Postal Certification Products CASS (United States) SERP (Canada) DPID (Australia) IBM Information Server Data Quality Module for SAP IBM WebSphere QualityStage for Siebel 20 20

21 Key Strengths for IBM QualityStage
Intuitive, “Design as you think” User Interface Simple rule design & fine tuning Seamless Data Flow integration Intuitive rule design & fine tuning Defining the technology standard with SOA Industry leading probabilistic matching engine 21 21

22 Thank You


Download ppt "IBM Information Server"

Similar presentations


Ads by Google