Presentation on theme: "Having it All is not Having it All at All! Problem Formulation in the Face of Overwhelming Quantities of Data."— Presentation transcript:
Having it All is not Having it All at All! Problem Formulation in the Face of Overwhelming Quantities of Data
A journey of discovery… Wheres the fire? START FROM THE BEGINNING -- Before the beginning of great brilliance, there must be Chaos. -- (I Ching) At the beginning of the 21th century, the population of the Earth [was] 6.300.000.000., who annually experience a reported 7,000,000 - 8,000,000 fires with 70,000 –80,000 fire deaths and 500,000 –800,000 fire injuries. Dr. Ing. Peter Wagner 2006
Gone are the days when there was a single source of truth… Baker Library Entries in a book on Australia business owners About a storekeeper in Halifax County, N.C. – June 1873: purchaser or stolen goods, a great scamp. Entry about one J. B. Alford, who sold groceries and liquors: June 1870 This man is said to be in thriving circumstances. He has some Real & personal estate & I think it is safe to trust him. Entry on Hannah Griffith, a milliner in Springfield, Ill. In 1869 about to marry a fellow [of] no account. An entry two years later noted with some relief, that that plan had fallen through. Harvard R.G. Dun Credit Report Collection "is not much of a businessman, but had some capital, it is said, advanced by his father, who is reputed well off -- About J.D. Rockefeller – who turned out to be a good credit risk; 1863 was the year he set up a refinery that blossomed into Standard Oil.
Framing our case for change… We all know that the world is changing We are aware that the rate of change is increasing at an unprecedented rate We see new types of data, technologies, and behaviors every day More and more, we are tasked with discerning the discoverable need from the articulated want The Operating Environment What has made us successful so far is insufficient We now have the ability to succeed… or fail, much faster The connectedness of information and the ways in which it is changing is impacting the risk and opportunity space in ways we are only beginning to understand The Case for Change
Sometimes, a picture is worth a thousand words. Lately, a thousand pictures are taken in the time it takes to speak a single word! What about the digital footprint of all of the smartphones? What about the social networks the crowd? What about the metadata in the photos? What are the opportunity costs to other activities? The largest corpus of data preceded the event Most data created about the event had significant, and asymmetric latency The rate of data decay attributable to the participants in the event is significant
Asking the right question How deep would the ocean be if sponges didnt live there? What if the Hokey Pokey really is what its all about? What if there were no hypothetical questions? How many more of these silly questions till the next slide?
Questions about risk and opportunity are at the heart of our focus. 10 Should I extend credit? What about fraud? What is the right credit limit? What do my best customers look like? Which customers should I call on next Which prospects are most promising?
Macro Regional Local Micro Global Association Entity PITCOB Connected Supply/Value Chains Mal- feasance Disaster Remediation Material Changes CMA/LMA Enhanced Identity Resolution Stewardship Interventions Real-Time Adjudication It is extremely important to frame the question in the right context.
The right universe of data is often implied by the scope and context of the question. 12 Busines s Name Telephone Address SIC Employee Size Sales Revenue Year Started Primary Contact Linkage Foundational Firmographic Data in hand Discoverable data Computable data Extent, unavailable data (opportunity cost) Understanding of cause systems Relevant theory D&B Proprietary information
Veracity: How do I adjudicate the truth when the malfeasants are learning so much faster? Volume: How much data is too much to see the answer? Velocity: Can the rate of change of data itself be part of the answer? Variety: How can heterogeneous and unstructured data inform new ways of inquiry? Leveraging the Vs to get to the best answer
A typical M&A takes 6-9 months from announcement to deal completion Some take longer, or may never close Regulatory requirements sometimes drive pre- and post- close changes over years Family trees updated as the deal completes Average update within 10 days Linkage updates frequently precede official registry changes Updates include re-linking records, re- structuring tree levels, taking entities to out of business and creating new entities Announced restructuring and re-organizations often take 6 months to 2 years A good example can be seen in tracking mergers, acquisitions, and divestitures. 14
Traditional analysis of this data can reveal interesting risks 15 CITGO PETROLEUM CORPORATION Texas, USA CITGO PETROLEUM CORPORATION Texas, USA PDV AMERICA, INC Oklahoma, USA PDV AMERICA, INC Oklahoma, USA Propernyn B.V. Netherlands Propernyn B.V. Netherlands 3 additional subsidiary levels National Government: Republic of Venezuela National Government: Republic of Venezuela
Combining the articulated want (family tree) with the discoverable need (whats really going on)… 16 Ceramics Inc 50 Employees Glass Mfr Wichita, Kansas Ceramics Inc 50 Employees Glass Mfr Wichita, Kansas Medi-Cell 125 Employees Lab Equip Mfr. Abayance, FL Medi-Cell 125 Employees Lab Equip Mfr. Abayance, FL AdvDesigns AG 30 Employees R&D Stem Cell Rsrch Frankfurt, Germany AdvDesigns AG 30 Employees R&D Stem Cell Rsrch Frankfurt, Germany Mediquip 1000 Employees Mediquip 1000 Employees Monsanto 500 member family tree Largest Genetically modified food producer Monsanto 500 member family tree Largest Genetically modified food producer Pending Decision: Underwrite Directors and Officers Policy 49% 30% The story is true. The names have been changed to protect the innocent..
Language, identity, and intention can significantly impact the complexity of the situation. D&B Proprietary information Kabushikigaisha Kawasaki Mōtāsu Jyapan (aka Kawasaki Motors Japan) Hanguggawasaki (aka Kawasaki Korea) Chuanxi Zhonggong zuishin (aka Kawasaki Heavy Industries Consulting) KAWASAKI KK (Local electricians in a suburb of Kawasaki) Chuanxi chuliao Youxian Gonxi (aka Kawasaki Paint Co, Dongguan) Kawasaki Jūkōgyō Kabushiki-gaisha (aka Kawasaki Heavy Industries) Ka-wa-sa-ki Kawasaki (idiom)- river beside mountainous terrain
Privacy and other statutory constraint Multiple names Digital natives vs. digital immigrants Overlapping identities People are strange…
As the boundary between people and small business becomes increasingly blurred, we continue to focus on the concept of People In The Context of Business Cleanse, de-dupe, identity resolution and enrichment services for your contact data Understand when people move from organization to organization Sharpen the line between the individual and the business when engaging small businesses Malfeasance and fraud are perpetrated by people, not by businesses. This solution reveals relationships that will help all of us more effectively identify potential for bad behavior. 19 THE CHALLENGETHE GOALTHE VALUE #1 – the John Smith problem – multiple people with the same name #2 – the Ann Taylor problem – data about businesses named after people Caroline M Smith 302 N Liberty St. Albion, IA Addr Type: Residential Carrie Smith Meredith Corporation 1716 Locust St. Des Moines, IA Addr. Type: Commercial Caroline Smith University of Iowa 21 E Market St. Iowa City, IA Addr. Type: Commercial #3 – the Sybil problem – one person with multiple persona or names Carrie Smith Tenderheart Daycare 2635 Cleveland Dr. Adel, IA Addr. Type: Commercial Many people connected to one business Many businesses connected to one person Businesses connected through people People connected through associations with other people A single view of customers and prospects, both in the context of entities and people will drive key actionable outcomes for your business. D&B Proprietary information
Creating the foundation for People in the Context of Business. 20 D&B Proprietary information
Ill bet you knew this was coming Learning from the way things move, even if you dont understand them fully… seriously? How do you predict something that has no precedent? Predictions, predictions…
Commercial signal and proxy are now added to existing predictive attributes to provide deeper insights and even more predictive analytics. Traditional Business Data Robust Predictive Data Available No Data Available Non- Traditional Insight Low High Predictive Content Limited Data Available Signal & proxy sources add significant decisioning content on small businesses with limited or no traditional predictive data footprint
Signals aggregated and analyzed over time, correlated with other data sources expose hard-to-find patterns. 23 BIG DISPARATE SOURCES OF DATA SIGNAL EXTRACTION ADVANCED ANALYTICS PREDICTIVE MODEL GAINS Were harnessing the massive flow of data through our systems and distilling the signals that describe a companys behavior. This is helping to increase levels of precision in predictive models. Customer Cross- border Inquiries Customer Match Inquiries Global Trade Experiences Transaction al WorldBase Updates Third Party Exchange Customer Portfolio Monitoring Intelligence Engine Traffic Phone and Email Connectivity Testing Call Center Activity Other Proprietary Sources D&B Proprietary information
Extending the deployed capability to better understand malfeasance… 24 Identity verification of the business and authentication of the individual Rules-based alerts at point of data entry Prevention At data point of entry Manual analyst reviews Automated, rules- driven detection procedures to reveal suspicious patterns Detection Within data maintenance Investigation of high risk cases by certified fraud examiners Situational analysis Investigation Collaborating with industry groups forums, customers, and law enforcement to understand evolving needs and trends Recovery and Learning Apply learning and integrate new targeted severe risk prevention and detection rules in data supply processes and platforms Continuous Improvement Data Collection & Input D&B Proprietary information
Combining people, linkage, and daily signals to quickly recognize and analyze patterns and take action… 25 In the above use-case, with millions of payment experiences a week, we were able to quickly identify and analyze a suspicious pattern and take action Not only on all related cases but also the three ring leaders Ring Leaders D&B Proprietary information
Data sensing: Advanced analytics also play a significant role in acquiring new data sources. 26 Utility Data Merchant Data Govt Data Sentiment Data Shipment Data Labor Mkt Data Scale Depth Value Other Data Multi-national footprint? Comprehensive coverage across all verticals and sizes of business? Positive correlation with trade or other predictors to serve as a proxy?
Some current efforts under way to utilized this hybrid capability… D&B Proprietary information
28 New Techniques to address Big Data New approaches to Discovery, Curation, and Synthesis Data sensing at the Event Horizon We are increasingly faced with information that is rich, varied, and replete with opportunity – our focus is shifting from hunting and gathering to new challenges.
And now we welcome the new year, full of things that have never been – Rainer Maria Rilke