Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Slides:



Advertisements
Similar presentations
Data Science for Natural Medicines: Dead Doctors Don't Lie Radio
Advertisements

Web Intelligence Text Mining, and web-related Applications
Data Science for Tackling the Challenges of Big Data
Director and Senior Data Scientist/Data Journalist
Information and Business Work
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Interfaces for Selecting and Understanding Collections.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Overview of Web Data Mining and Applications Part I
PubMed/How to Search, Display, Download & (module 4.1)
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Lesson 12 — The Internet and Research
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Easy-to-Understand Tables RIT Standards Key Ideas and Details #1 KindergartenGrade 1Grade 2 With prompting and support, ask and answer questions about.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4,
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Never-ending Search: (What you REALLY need to know about online searching) Ms. Emili school year.
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
NC WiseOwl eBooks K-8 NC WiseOwl eBooks K-8 Search for eBooks on your topic from this large collection of current titles. eBooks K-8 Search for eBooks.
HTML Basic. What is HTML HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Searching for NZ Information in the Virtual Library Alastair G Smith School of Information Management Victoria University of Wellington.
Internet Research – Illustrated, Fourth Edition Unit A.
Unit 1—Computer Basics Lesson 3 The Internet and Research.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Learning how to search on the web “If all you ever do is all you’ve ever done, then all you’ll ever get is all you’ve ever got.” (author unknown)
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
ITE 130 Web Searching.
Spotfire 5 Users Guide Dashboard
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
Tutorial 7 – Integrating Access With the Web and With Other Programs
Presentation transcript:

Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community February 14,

Data Science for Business Book Review Summary: – If you are a data scientist, take this as our challenge: think deeply about exactly why your work is relevant to helping the business and be able to present it as such. – Remember: – If you can’t explain it simply, you don’t understand it well enough.—Albert Einstein Semantic Verses Magnet: – “Magnet is the only engine that treats topics as semantic objects, which gives it a competitive edge since the identification of “key topics” is generally considered to be the main feature of any semantic engine.” – “Semantic is used here to refer to understanding what a piece of text is about. We do not claim we are doing NLP/NLU for question/answering purposes.” Source: Walid S. Saba, PhD, AI/NLP Scientist, February

Magnet Text Analysis Engine: Understands What the Text is About 3

Data Science for Business Knowledge Base 4 My Note: A Knowledge Base* with: Data Story Slides Data Sets Spotfire Dashboard Book Web Pages *Structured Mashup with everything treated as an object with a well-defined URL for the Glossary (taxonomy) and Table of Contents (thesaurus) Integrated together in an Information Model!

MindTouch MindTouch: – Treats topics as semantic objects (they can be searched for links to content). – MindTouch headings identify “key topics” (see Table of Content for book in this page). – Allows one to construct a natural language front-end for enterprise data (and big data) integration across multiple sources (Google Chrome and Spotfire can Find words and data in their mashup Knowledge Bases). – Can be combine with Be Informed, YARCData, and big data analytics (Spotfire) and could pilot including Semantic Verses. – An example of expert subject matter that serves to provide a metamodel of topics as an interface to the integration of content (text and data) that can be both personalized by the user and integrated with similar metamodels. Semantic Community: – Doing Natural Language Processing (NLP)/Natural Language Understanding (NLU) by hand in MIndTouch and I see why it is so difficult to automate for massive information on the Internet without Subject Matter Expertise and Structure. 5

Specific Example: TFIDF - Term Frequency (TF) and Inverse Document Frequency (IDF) Using Google Find for TFIDF (12 hits) where the first is: Combining Them: TFIDF which says: See “Example: Attribute Selection with Information Gain” on page 56. Which says: For a dataset with instances described by attributes and a target variable, we can determine which attribute is the most informative with respect to estimating the value of the target variable. We also can rank a set of attributes by their informativeness, in particular by their information gain. This can be used simply to understand the data better. It can be used to help predict the target. Or it can be used to reduce the size of the data to be analyzed, by selecting a subset of attributes in cases where we can not or do not want to process the entire dataset. See this UC Irvine Machine Learning Repository page for the data set used to illustrate information gain. 6

Using Google Find for TFIDF 1 7

Using Google Find for TFIDF 10 8

The Data Mining Process 1 Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment 9

The Data Mining Process 2 Business Understanding: – Use real Subject Matter Expertise content instead of general Web content. Data Understanding: – Make all content data so unstructured, semi-structured, and structure information are integrated data. Data Preparation: – Create an index of content topics and objects that is both a relational and graph database. Modeling: – A searchable Information Model with Analytics (Ontology) linked to the Thesaurus (Taxonomy) linked to the Glossary (Vocabulary). Evaluation: – Finding more needles in the needle haystack and discovering things of interest that you did not know how to look for. Deployment: – Publically available on the Web using the Google Chrome Browser. 10

Data Preparation 11 Topics Knowledge Base URL Function Within Topic URLs Figure and Tables URLs Within Footnote URL Relational and Graph (Subject, Object, & Predicate) Databases

Modeling 12 A searchable Information Model with Analytics (Ontology) linked to the Thesaurus (Taxonomy) linked to the Glossary (Vocabulary)

Evaluation Find: – The find tool is a fast way to find contents in your data, navigate in the analysis, and to perform actions found in the menus of Spotfire. It consists of a text field where you enter a search string and a list of results for the search. – To reach the Find dialog: Press Ctrl+F. OR Select Tools > Find.... Searching in TIBCO Spotfire: – There are many places in TIBCO Spotfire where you can search for different items. For example, you can search for filters, analyses in the library or elements used to build information links in the Information Designer. All of the available search fields use the same basic search syntax, which is presented below. For more information regarding search of a specific item, see the links at the bottom of this page. – Tip: If you cannot find what you are looking for, try adding more wildcards. For example, to locate a filter called "Sales ($)", enter the search expression "Sales ($*", to avoid interpreting the text within the parenthesis as a Boolean expression. 13

Deployment 14 Publically available on the Web using the Google Chrome Browser. Web Player