How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Probase: Understanding Data on the Web Haixun Wang Microsoft Research Asia.
WikiNet: a very large-scale multi-lingual concept network Vivi Nastase, Michael Strube HITS gGmbH Benjamin B ö rschinger, C ä cilia Zirn University of.
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
Statistical NLP: Lecture 3
Creating a Similarity Graph from WordNet
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 7 Topic Spotting & Query Expansion Martin Russell.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,
Artificial Intelligence and Commonsense Reasoning Ernest Davis New York Amateur Computer Club May 14, 2015.
Meaning and Language Part 1.
The Study of Meaning in Language
Style, Grammar and Punctuation
Adding Common Sense into Artificial Intelligence Common Sense Computing Initiative Software Agents Group MIT Media Lab.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Common Sense Computing MIT Media Lab Interaction Challenges for Agents with Common Sense Henry Lieberman MIT Media Lab Cambridge, Mass. USA
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
ConceptNet 5 Jason Gaines 1. Overview What Is ConceptNet 5? History Structure Demonstration Questions Further Information 2.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Reading. How do you think we read? -memorizing words on the page -extracting just the meanings of the words -playing a mental movie in our heads of what.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Evaluating Progress in Commonsense Reasoning Ernest Davis, AI Summit, February 2014 One area of AI that has made little progress in 55 years (McCarthy,
1 Query Operations Relevance Feedback & Query Expansion.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Taken from Schulze-Kremer Steffen Ontologies - What, why and how? Cartic Ramakrishnan LSDIS lab University of Georgia.
Wordnet - A lexical database for the English Language.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Tutorial: Knowledge Bases for Web Content Analytics
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Lecture 2: Categories and Subcategorisation
Lexicons, Concept Networks, and Ontologies
NELL Knowledge Base of Verbs
Statistical NLP: Lecture 3
CSC 594 Topics in AI – Applied Natural Language Processing
Introduction to Linguistics
Acquiring Comparative Commonsense Knowledge from the Web
WordNet WordNet, WSD.
ProBase: common Sense Concept KB and Short Text Understanding
Entity Linking Survey
Presentation transcript:

How AI Programs Collect Concepts Ernie Davis Concept and Categories Seminar Sept 25, 2015

Outline Part I: Case studies. Systems: Probase, NELL, ConceptNet, WordNet, WikiNet, CYC. Issues: What is a concept? How are concepts collected? How are taxonomic and other relations found? Polysemy & synonymy System features: Size, evaluation, uses. Part II: Relation to cog. psych. concepts

What do AI people want? Cognitive plausibility? Not much. Happy to note similarities if they come up. Philosophical coherence? No! You can quote Plato or Quine if you want to pretend to be erudite. Otherwise, reading those guys will just leave you hopelessly confused. Human or superhuman level AI in years? Less than you would suppose. Mostly entertainment for journalists.

What do AI people want? To build, today, a system that does something.

Probase Wu et al (Google) 2.6 million concepts (categories) million isA relations. 92.8% accuracy. Concept: A noun (English) or a two- or three- word noun phrase. E.g. “country”, “city”, “renewable energy technology”, “common sleep disorder”, “meteorological phenomenon”.

Probase Taxonomy: From Hearst patterns (Hearst 1992). “Countries such as England, France and Germany”, “Bears, lions, and other animals”. Pitfalls: “animals other than dogs such as cats” “companies such as IBM, Nokia, Proctor and Gamble”, “Europe, Brazil, China, and other countries”, “dead animals such as dogs and cats”.

Probase Use statistical information to rule out false concepts and taxonomic relations. E.g. there are many instances of “dog” and “cat” and only a few texts that suggest that “cat” is a subcategory of “dog”.

Probase: Polysemy Example: “plants such as refineries and nuclear reactors” vs. “plants such as trees, grass, and cacti”. Rule 1: In each text the hypernym has only one category; you don’t have “plants such as refineries and trees”. Rule 2: If two lists have substantial overlap, then the hypernyms are the same. E.g. “plants such as trees, grass, and cacti” and “plants such as grass, bushes, and trees.” Use these, similar rules, to group together word uses into meanings.

NELL (Never-Ending Language Learner) Mitchell et al (Carnegie-Mellon). Using web mining, constructs a taxonomy of concepts, and collects facts corresponding to fixed set of relations. 80 million beliefs (facts).

NELL: Concepts A concept is a noun (I think). Instances are proper nouns. No attempt to address polysemy. Features used for categorization, learned in a snowballing way. Name form: E.g. “…burgh” is a city. Context: E.g. “mayor of X” → X is a city. Lists and table: If you see a list “London, Paris, Prague” and you know that London and Paris are cities, infer that Prague is a city.

NELL: Taxonomy Seems to be largely accurate though lopsided (e.g instances of “amphibian” vs. 0 instances of “poem”).

NELL: Facts Hit and miss. Mostly unexciting. I did an experiment of collecting 110 facts from NELL. 64 were true. 16 were nearly true, e.g. “Dublin Dublin is the capital city of the country Ireland”. 14 were false e.g. “dipodidae is an arthropod” (actually a rodent). 5 were hopelessly vague e.g. “David is a person who died at age 10.” 11 were meaningless. E.g. “states is a state or province located in the geopolitical location field”.

ConceptNet Starts with the collection of facts amassed by Open Mind Common Sense, a crowd-sourcing platform. “Concepts … could be noun phrases, verb phrases, adjective phrases, or clauses. ConceptNet defines concepts as the equivalence class of phrases after normalization removing function words, pronouns, and inflections.”

Relations and Patterns IsA UsedFor CapableOf Desires CreatedBy PartOf HasProperty Causes NP is a kind of NP NP is used for VP NP can VP NP wants to VP You make NP by VP NP is part of NP NP is AP The effect of NP|VP is NP|VP

ConceptNet

WordNet Hand-constructed lexicon of English (Miller 1995) Word senses disambiguated. Words sense related by synonym, antonym, hypernym, hyponym, meronymy (part/whole). A concept is a synset (a collection of synonymous word senses). 117,000 synsets WordNets exist (to some extent) for almost 50 languages.

WikiNet (Nastase et al. 2010) A concept “roughly corresponds to a Wikipedia article”. Defer to the wisdom of crowds ( as mediated by the abstruse, bureaucratic, contentious process that is Wikipedia editing). Relations are extracted from text and info-boxes. Multi-lingual Wikipedia. Entities and categories are aligned across languages. 90+% accuracy

Sample categories Members of Queen (band) Movies directed by Woody Allen Villages in Brandenburg Mixed Martial Arts Television Programs 490,215 categories. 3.3M concepts. 36M relations: instance 10M, subcategory 400K, spatial 4M, nationality 570K, topic 340K, genre 330K

CYC Long-term project (1985-present) to encode commonsense knowledge. Knowledge hand-coded by knowledge engineers. Concepts chosen to optimize the knowledge encoding; only secondarily related to NL words (at least in principle). Very precise conceptual distinctions.

CYC Concepts “Just about anything can be reified: a particular proposition, a type of predicate, a problem solving context, an inference mechanism, etc.” Some concepts (from the 1990 book) PaperboyDeliveringNewspapersAsABusinessEvent PaperboyDeliveringNewspapersAsATravelEvent EgyptIn1986, PhysicalEgyptIn1986, PoliticalEgyptIn1986 YoungerThanParentsConstraint

CYC Partly open, partly proprietary. Poorly described. Open CYC (public): 200K concepts, 2M facts Research CYC (can be licensed): 500K concepts, 5M facts.

Relation to Cog. Psych. Where are: Concept learning? Definitional vs. prototype vs. exemplar theories? Different kind of AI: Classification learning (Supervised vs. Unsupervised) Synthetic categories? Only of theoretical interest. Simple algorithms have superhuman abilities. Base level vs. sub/superordinate. ??

Hard Questions How many concepts? AI: 100,000s to millions Cog psych: ?? How to evaluate recall (coverage). What is a concept? Relation of concepts to Mentalese primitives. English vs. other languages: Does it make any difference?