Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense.

Push Singh & Tim Chklovski

AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense reasoning: Facts, descriptions, scripts, rules, exceptions Computer vision and speech recognition: Segmented images, transcribed speech Robotics: Motion capture data, body configurations

Traditional Sources 1. Knowledge engineering, programming pro: can be high quality cons: - often brittle because lack of coverage - expensive! 2. Learning from raw data pro: there is sometimes a lot of raw data cons: - you have little control over the data - for many tasks the data is not available - hard to learn structured representations

Solution: turn to the general public! There are 500,000,000 people on-line (Nielsen/NetRatings) People can participate by: -Providing labeled training examples (i.e. for OCR) -Tagging corpora (with part-of-speech, word senses) -Verifying and cleaning data (validating assertions) -Supplying rules and examples (assertions, stories) -Evaluating performance of systems (i.e. of face recognizer) -Online supervised learning (i.e. Stork’s Animals) -Organizing and structuring information (i.e. the web)

Successful Distributed Human Projects The Open Directory Project (www.dmoz.org) -indexes 3,248,314 sites -46,846 editors FreeDB (www.freedb.org) -543,786 CDs catalogued Others: The Internet Movie Database (www.imdb.com) American Psychological Society (psych.hanover.edu) Distributed Proofreaders (charlz.dns2go.com/gutenberg) NASA Crater marking project (clickworkers.arc.nasa.gov/top)

Open Mind Common Sense Second-largest commonsense database after Cyc: - 410,000 assertions, stories, descriptions, rules, etc. - Built by 8600 users over 1 ½ years - Can extract relations and rules via shallow parsing Basis for several applications and experiments: - ARIA photo annotation and retrieval agent - GOOSE commonsense search engine - MAKEBELIEVE story generator - intelligent camera, analogical reasoner - word sense disambiguator (Henry Lieberman, Hugo Liu, Barbara Barry, Thomas Lin)

The Snowball Effect Two systems that leverage what they already know and give feedback to the contributors Word Sense Disambiguation: - Lets users select what sense a word is used in in a given sentence - Uses collected information to decide where more learning is needed. - Provides feedback on how much an automatic tagger has improved because of your contribution (learner.media.mit.edu/cgi/wsd-collect-tagging.cgi) Learner: - Gathers commonsense knowledge by asking questions that the system thinks may be true - Questions are formed by making analogies, based on existing knowledge (forthcoming, see www.media.mit.edu/~timc/learner)

The Pyramid of Tasks Core write plug-ins contribute inference rules contribute and verify simple assertions This is because the prior experience required is an inverse pyramid: some lisp / scheme experience, knowledge representation / AI background knowledge rep / ai background or interest, some programming analytical skills, familiarity with reasoning possess common sense

Future Open Mind Projects - Using webcams, thousands of people help teach their computers to recognize the appearance and behavior of various kinds of objects. - A system that reads text on the web, but has people help it comprehend passages. - A dialogue system that people teach how to have a conversation. - Using future cell phones and wearable computers, we could all start to teach computers the patterns of our everyday lives by letting them see and hear us as we actually do things in the world.

Open Questions - How do we get users hooked? - How do we acquire more sophisticated knowledge? - Can we acquire hard-to-articulate knowledge? - How do we use knowledge that is easy to acquire?

Fear, Rejoice, … Fear You have to learn how to build a community of contributors. You have to make your research accessible. Rejoice Unlimited number of urops! Mom and Dad Come to our web site! You can help too. Discoveries If you build it they will come. Malevolent users not a big problem. Recent Built 2 nd largest commonsense database. Useful in prototype applications.

Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense.

Similar presentations

Presentation on theme: "Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense.

Similar presentations

Presentation on theme: "Push Singh & Tim Chklovski. AI systems need data – lots of it! Natural language processing: Parsed & sense-tagged corpora, paraphrases, translations Commonsense."— Presentation transcript:

Similar presentations

About project

Feedback