Presentation is loading. Please wait.

Presentation is loading. Please wait.

Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix.

Similar presentations


Presentation on theme: "Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix."— Presentation transcript:

1 Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu October 24, 2002 © Amit Sheth Based on Keynote CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002

2 I am not selling any product here. It is interesting to note SW = Software has move to SW = Semantic Web

3 Fundamental Issue Ontology Creation and maintenance –Human consensus + automatic KB (assertion) extraction Automatic Semantic Annotation Extremely fast computations exploiting semantic metadata –Especially named relationships

4 Semantic Enterprise Content Management Challenges 1.More variety and complexity  More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc)  More types (Docs, Images -> Audio, Video, Variety of text- structured, unstructured)  More sources (internal, extranet, internet, feeds) 2.Saclability, Information Overload  Too much data, precious little information (Relevance) 3.Creating Value from Content  How to Distribute the right content to the right people as needed? (Personalization -- book of business)  Customized delivery for different consumption options (mobile/desktop, devices)  Insight, Decision Making (Actionable)

5 New Enterprise Content Management Technical Challenges 1.Aggregation  Feed handlers/Agents that understand content representation and media semantics  Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different types 2.Homogenization and Enhancement  Enterprise-wide common view  Domain model, taxonomy/classification, metadata standards  Semantic Metadata– created automatically if possible 3.Semantic Applications  Search, personalization, directory, alerts, etc. using metadata and semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization

6 The Semantic Web -- a vision with several views: ·“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what data means to process it.” [B99] ·“The semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [BHL01] ·“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. [W3C01] Semantics: The Next Step in the Web’s Evolution

7 Central Role of Metadata Where is the content? Whose is it? Produce Aggregate What is this content about? Catalog/ Index What other content is it related to? Integrate Syndicate What is the right content for this user? Personalize What is the best way to monetize this interaction? Interactive Marketing Broadcast, Wireline, Wireless, Interactive TV Semantic Metadata ApplicationsBack End "A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV “Metadata increases content value in each step of content value chain.” Amit Sheth

8 A Metadata Classification (Heterogeneous Types/Media) Data (Heterogeneous Types/Media) (creation-date, location, type-of-sensor...) Content Independent Metadata (creation-date, location, type-of-sensor...) (size, max colors, rows, columns...) Content Dependent Metadata (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologies OntologiesClassifications Domain Models User MoreSemanticsforRelevance to tackle Information Overload!!

9 Semantic Metadata Extraction, Semantic Annotation WWW, Enterprise Repositories METADATA EXTRACTORS Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images... Key challenge: Create/extract as much (semantics) metadata automatically as possible

10

11 Semantic Content Organization and Retrieval Engine (SCORE) technology Automatically aggregates and extracts information from disparate sources and multiple formats Automatically tags/annotates and categorizes content Automatically creates relevant associations - Maps content topics and their relationships Semantic query engine relates information and knowledge both internal and external to the organization into a single view

12 Semagix Freedom Product Components

13 Market Guide (MG)ZDNet (ZD) Hoover’s (H)Data supplied from NASA (DPL) Federation of American Scientists (FAS)Central Intelligence Agency (CIA) The Interdisciplinary Center (ICT)Federal Bureau of Investigation (FBI) Capital Advantage (CA)Office of Foreign Assets Control (OFAC) PERSON (OFAC, FBI, DPL) -politician (OFAC, FBI, CIA, CA) politician associated with politicalOrganziation politician held politicalOffice politician associated with politicalOffice -terrorist (OFAC, FBI, DPL) terrorist memberOf organization terrorist appears on watchList -companyExecutive (MG) companyExecutive holdsOffice companyPosition person has permanent address address (OFAC, FBI) person has dob(date of birth) (OFAC, FBI) person has pob(place of birth) (OFAC, FBI) Knowledge Sources Used THING -event (ICT) terroristOrganization participated in terroristSponsoredEvent (ICT) -politicalOffice (CIA, CA) politicalOffice office(s) within govtOrganization politicalOffice associated with organization -watchList (OFAC, FBI, DPL) terroristOrganization appears on watchList (OFAC, FBI, DPL) -organization (OFAC, FBI, FAS, ICT, CA, CIA) organization appears on watchList organization memberOf suborganization -company company manufactures product (ZD) company identifiedBy tickeySymbol (H) companyposition position in company (MG) company memberOf industry (H) -tickerSymbol (H) tickerSymbol memberOf exchange (H) PLACE -organization located in place (H, OFAC) -religiousAffiliation practiced in place (CIA) -company headquarters in city (H) Entity Classes and Relationships populated by these knowledge sources: JIVA

14 SCORE Capabilities Semantics (understanding of content and user needs) Extreme relevance Semantic associations Near real-time Multiple applications/usage patterns (not just search) Automation Scalability in all aspects

15 Technologies Involved Ontology driven architecture (definitional, assertional components Automatic Classification with classifier committee (multiple technologies, rather than one size fits all) Automatic Semantic Metadata Extraction/Annotation Semantic associations/ knowledge inferences Scalability throughout with distributed architecture and implementation (number of content and knowledge sources, indexing, etc.) Main memory implementation, incremental check pointing

16 Video with Editorialized Text on the Web Auto Categorization Auto Categorization Semantic Metadata Automatic Categorization & Metadata Tagging (unstructured text)

17 Extraction Agent Enhanced Metadata Asset Semantic Metadata Extraction/Annotation: Semi-structured source Web Page

18 Semantic Metadata Syntax Metadata Semantic Content Enhancement Workflow

19 Content Asset Index Evolution

20 Content which does contain the words the user asked for Extractor Agents Content which does not contain the words the user asked for, but is about what he asked for. Value-added Metadata Content the user did not think to ask for, but which he needs to know. Semantic Associations ++ Intelligent Content End-User Intelligent Content Empowers the User

21 Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3 rd party content integration Semantic Application Example – Analyst Workbench

22 Related Stock News Semantic Web – Intelligent Content Industry News Industry News Technology Products Technology Products COMPANY SEC EPA Regulations Competition COMPANIES in Same or Related INDUSTRY COMPANIES in INDUSTRY with Competing PRODUCTS Impacting INDUSTRY or Filed By COMPANY Important to INDUSTRY or COMPANY Intelligent Content =What You Asked for +What you need to know!

23 Syntax Metadata Semantic Metadata led by Same entity Human- assisted inference Knowledge-based & Manual Associations

24 Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)

25 Innovations that affect User Experience BSBQ: Blended Semantic Browsing and Querying –Ability to query and browse relevant desired content in a highly contextual manner Seamless access/processing of Content, Metadata and Knowledge –Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the above, allowing user to follow his train of thought dACE: dynamic Automatic Content Enhancement –Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant pieces of content during content consumption Semantic Engine APIs with XML output –Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to cater to any user application

26 Visionics AcSys Security Portal Check-in Interrogation Boarding Gate Airport Airspace Semagix Ontology Metabase Threat Scoring Gov’t Watchlists News Media Web Info LexisNexis RiskWise Passenger Records Reservation Data Airline Data Airport Data Airline and Airport DataFuture and Current Risks Airport LEO ARC AvSec Manager Data Management Data Mining IPG

27 Sources Used Knowledge Sources: FBI - Most Wanted Terrorists Denied Persons Lists Terrorism Files ICT Office of Foreign Asset Control (OFAC) Hamas terrorists CNN Locations FAA_Airport_Codes About.com Comtex_International Hindustan Times JerusalemPost CNN Newstrove_Hamas Content Sources : Africa News Service AFX News – Asia/UK/Europe AP Worldstream Asia Pulse BusinessWire ComputerWire (CTW) EFE News Services FWN Select Itar-TASS Knight Ridder News (Open) Knight-Ridder Open M2 - International M2 Airline Industry Information New World Publishing PR Newswire PRLine (PRL) Resource News International RosBusiness United Press International UPI Spotlights

28 Semagix’s Semantic Technology enables flight authorities to : - take a quick look at the passenger’s history - check quickly if the passenger is on any official watchlist - interpret and understand passenger’s links to other organizations (possibly terrorist) - verify if the passenger has boarded the flight from a “high risk” region - verify if the passenger originally belongs to a “high risk” region - check if the passenger’s name has been mentioned in any news article along with the name of a known bad guy Interrogation Kiosk – Unique Advantages of Semagix SmithJohn

29 SmithJohn Threat Score Components LEXIS NEXIS ANNOTATION Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Semagix’s rich ontology Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the ontology to present a clear picture about the passenger to the flight official Flight Coutry Check 45 0.15 Person Country Check 25 0.15 Nested Organizations Check 75 0.8 Aggregate Link Analysis Score: 17.7 LINK ANALYSIS Action: Semantic analysis of the various components (watchlist, Lexis Nexis, ontology search, metabase search, etc.) to come up with an aggregate threat score for the passenger Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the ontology, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action appearsOn watchList : FBI ONTOLOGY SEARCH Action: Semagix’s rich ontology is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the ontology to present a visual association picture to the flight official METABASE SEARCH Action: Semagix’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations WATCHLIST ANALYSIS Action: Semagix’s rich ontology is automatically searched for the possible appearance of this name on any of the watchlists Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front

30 Query Comparison: Semagix vs. RDBMS

31 Performance > 10,000 entities/relationships per hr.Population/update rate in a Ontology with 1 million entities/relationships 1 minute (near real-time)Incremental Index Update Frequency 65msQuery Response Time (64 concurrent users) 1 - 10 msQuery Response Time (light load) > 1,980,000Queries per server per hour

32 More at www.semagix.comwww.semagix.com and http://lsdis.cs.uga.edu/lib/presentations.html


Download ppt "Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix."

Similar presentations


Ads by Google