Logistics Location – March 12: The Interchurch Center (TIC), Room C&D – March 13: The Interschool Laboratory (IL), CEPSR 750 Lunch & Breaks – Same room on both days Dinner Monday March 12 (Today) – If you have not signed up, please do by 10:30am Monday March 12 Restrooms – Monday TIC: Lower level, take escalators down one floor and then to the left of the cafeteria – Tuesday IL: Same floor (signs are posted)
Logistics Wifi: General Wifi access – SSID: guest@interchurch – User Name: guest – Password: guest12345 Presentations – Please send them to email@example.com or give them to him on a flash drive during a break ahead of the firstname.lastname@example.org
Today’s Agenda Highlights 9:00 - 9:30am Introductions and Overarching goals of workshop 9:30 -10:30am Discussion of What is STS? [Item A] 10:30 - 11:00 Coffee Break 11:00 - 11:30am SemEval 2012 STS Task 11:30 - 12:00pm Sample Manual Annotation by participants 12:00 - 1:00pm Discussion of participant annotations 1:00 - 2:00pm Lunch 2:00 - 2:30pm Evaluation of STS [Item B] 2:30 - 4:00pm NLP applications that would benefit from STS [Item C] 4:00 - 4:30pm coffee break 4:30 - 5:30 How to create an STS blackbox? [Item D]
Game plan for both days This is a working workshop, participants are encouraged (urged) to participate and contribute, both physically present people and remotely participating people Each session is led by either Mona or Eneko, but discussion is expected throughout End of each session we will go over a summary/action points from the session where relevant
Acknowledgments: Credit where due Ido Dagan Martha Palmer Dan Cer Alessandro Moschitti SIGLEX Board members: Diana McCarthy, Katrin Erk, Sebastian Pado, Rada Mihalcea Nancy Ide, James Pustejovsky, Sanda Harabagiu NSF Program Directors (Tanya Korelsky, Terry Langendeon) DARPA for funding this! $$ is important CCLS for their logistical support Special thanks to Weiwei Guo (just got his STS paper accepted to ACL, YAY)! Thanks All for accepting our invitation!
Discussions Resulted in …. *SEM – http://ixa2.si.ehu.es/starsem/ http://ixa2.si.ehu.es/starsem/ – Be sure to submit papers there (please ) SEMEVAL 2012 STS Task 6 – http://www.cs.york.ac.uk/semeval-2012/task6/ http://www.cs.york.ac.uk/semeval-2012/task6/ This STS Workshop – http://www.cs.columbia.edu/~weiwei/workshop/inde x.html
Introductions Please introduce yourself – Name and Affiliation – Briefly: Relevance of STS to you/your work, name Semantic component (enabling technology) Resource for STS End NLP application Infrastructure/large systems Theoretical considerations All of the above
Goals of STS Workshop Pool community with respect to relevance of STSto NLP (thanks for overwhelming positive response to our invitation) Foster collaboration with a concrete by-in from different participants towards building a real STS framework Pursue/seek funding to realize STS
STS Workshop Considerations What is STS? – How to characterize STS quantitatively and qualitatively? – What semantic components contribute to STS – How to create a principled empirical STS framework with utility and intrepretability? – Could this lead to a better understanding of semantics of NL How to create an STS blackbox? – How can different semantic components/features interact – What kind of resources and tools are necessary for such an effort – Infrastructure desiderata
STS Workshop Considerations Evaluation of STS – Intrinsic Graded vs. Binary Similarity Metric considerations – Extrinsic How to illustrate the utility of STS to end NLP applications such as MT, Distillation, etc. Future directions – Monolingual vs. Multilingual – Shared *SEM task? – Potential proposal submissions/funding avenues – Collaboration across the pond!
STS Framework Research Goals To create an interoperable STS pipeline that integrates different semantic components ranging from simple word similarity to more nuanced semantic components that can handle more complex semantic and pragmatic phenomena such as modality and lambda logic. To perform intrinsic evaluation of STS To show the utility of STS to large NLP applications using extrinsic evaluations To advance our understanding of the underpinning semantics of natural languages and how we can empirically exploit this knowledge To foster stronger collaborations within the Semantic community and across to other sub-communities within CL
STS Vision STS Box UIMA or some other platform? Text A Text B NLP Applications Linguistic Resources: Corpora (raw and annotated), Treebanks, Ontologies, Propbanks, Dictionaries, etc, Linguistic Resources: Corpora (raw and annotated), Treebanks, Ontologies, Propbanks, Dictionaries, etc, Fundamental NLP Tools: Tokenizers, POS Taggers, Lemmatizers, Chunkers, etc.
STS Box A single system which takes features from different semantic layers of representation integrated (focus of current SemEval 2012 STS Task 6) Multiple semantic components – Performance of components (confidence in results) – Type of component – Relevance to task – How to order the components in a sequential pipeline – If multiple components performing same task, how to control for redundancy and complementarity – Layering annotations of different semantic knowledge on the same data Interaction/dependency between different semantic annotations Representation assumptions Formalism assumptions – How to operationalize the interaction among components
What is Semantic Textual Similarity? Semantic Similarity جدالكجد يدجياجد يجدي يج جي وغو يحيح يحسيفحس يحيحفي سف ي جي جيييدج كجساكجاس حفجحسوجح ج. كححسح حيحي حوحوس دح حدي يجدي يو جي جيحجفححكسحجسكحك حفحسوحوشيحيدويويد وي يوسحفوفوفوطبس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Hnh whdun duuhj js ijd dj iow oijd oidj dk uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii, gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhd hhduhd jjhuiq…Welcome to my world, trust me you will never be disappointed djijdp idiowdiw I iwfiow ifiwoufowi ioiowruo iyfi I wioiwf oid oi iwoiwy iowuouwr ujjd hihi iohoihiof uouo ou o oufois f uhdiy oioi oo ouiosufoisuf iouiouf paidp paudoi uiu fh uhhioiof Shjkahsiunu iuhndhau dhdkhn hdhaud8 kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo! Come over here, you will be pleasantly surprised idoasd io idjioio jidjduio iodio oi iiouio oiudoi ifuiosu fiuoi oiuiou oi io hiyuify 8iy ih iouoiu ou o ooihyiush iuh fhdfosiip upouosu oiu oi o oisyoisy oi sih oiiou ios oisuois uois oudiosu doi soiddu os oso iio oioisosuo. Добро пожаловать в мой мир, поверьте мне вы никогда не будете разочарованы 안녕하세요 제가 당신에게 전화했지만 아무 소용이있을려고... 당신이 시간을 즐기고 있었다 희망 Quantitative Graded Similarity Score Confidence Score Principled Interpretability, which semantic components/features led to results (hopefully will lead to us gaining a better understanding of semantics)
Monolingual Semantic Similarity Semantic Similarity بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Welcome to my world, trust me you will never be disappointed Yo! Come over here, you will be pleasantly surprised
Monolingual Semantic Similarity Semantic Similarity بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Welcome to my world, trust me you will never be disappointed Yo! Come over here, you will be pleasantly surprised Semantic Similarity score: 4.5, Grade: 4 Interpretation: Lexical X Y, Syntactic AB, CD, Scoping xyz, etc Confidence: 0.8
Multilingual Semantic Similarity Semantic Similarity بس تعالى ومالكش دعوه، هتبنبسط اخر انبساط Welcome to my world, trust me you will never be disappointed Yo! Come over here, you will be pleasantly surprised Semantic Similarity score: 3, Grade: 5 Interpretation: lexical B C D, syntactic, pragmatic Confidence: 0.9
Why STS? Most NLP applications need some notion of semantic similarity to overcome brittleness and sparseness – IR, IE, QA, MT, Dialogue, Pedagogical Systems, … – Also enabling tasks like parsing, SRL, Textual Entailment,... Provides evaluation beyond surface text processing – “Understanding” or interpretability of results – Nuanced semantics with utility A hub for semantic processing as a black box in applications beyond NLP (open source release) Lends itself to an extrinsic evaluation of scattered semantic components
Why STS? Monolingual Space – MT evaluation – Summarization – Paraphrase Generation Multi Lingual Space – Direct MT evaluation – X-lingual Summarization – X-lingual Generation But overall better understanding of semantic spaces – How do different languages carve up the space – What impact does it have on our thinking Relates to code switching and speaker state as well?
What is STS? The graded process by which two snippets of text (t1 and t2) are deemed equivalent semantically, i.e. bear the same meaning An STS system will quantifiably inform us on how similar t1 and t2 are, resulting in a similarity score An STS system will tell us why t1 and t2 are similar giving a nuanced interpretation of similarity based on semantic components’ contributions
What is STS? Word similarity has been relatively well studied – For example according to WN cord smile 0.02 rooster voyage 0.04 noon string 0.04 fruit furnace 0.05... hill woodland 1.48 car journey 1.55 cemetery mound 1.69... cemetery graveyard 3.88 automobile car 3.92 More similar
What is STS? Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. The coast is an area of land that is next to the sea. [0.25]
What is STS? Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. Woodland is land with a lot of trees. [2.51]
What is STS? Fewer datasets for similarity between sentences Once there was a Czar who had three lovely daughters. VS. There were three beautiful girls, whose father was a Czar. [4.3]
Multilingual STS No one to our knowledge has directly quantified the cross linguistic similarity between two texts
How is STS different from … Rich Textual Entailment (RTE) to date – RTE binary vs. STS graded – directionality (text to hypothesis) – typically text is (much) longer than hypothesis Paraphrase (Pph) to date – Pph binary vs. STS graded – Notion of (principled) interpretability
Pipelined STS An interoperable pipeline of semantic components – Input Two text snippets – Output Numerical score of similarity with graded similarity on a scale of 0- 5 What semantic components/features led to score (principled interpretability) Confidence level in response Evaluation – Intrinsic evaluation in the context of sentence similarity – Extrinsic evaluation in the context of MT evaluation – Intrinsic component evaluations
Main Objectives Plug & play environment for semantic components – WSD/WSI, lexical substitution, SRL, MWE, paraphrase, anaphora and coreference resolution, time and date resolution, named-entity handling, Under specification, hedging, semantic scoping, discourse analysis, etc. Pipeline Creation – Components produce scores, then combine – Combine Features directly in MuSeS environment Interpretability of contributing factors – Explicitly characterize why they are considered similar, i.e. which semantic component(s) contributed to the similarity score Quantifying STS, formalizing it as a probabilistic story Associating confidence levels with scores
Call on people for contribution Katrin Erk Christian Chiarcos Enrique Alfonesca
Intrinsic Evaluation Issues (Item B) Binary similarity – What is the cut off threshold Graded similarity – How to bin the results (2-4) How to assess and integrate confidence values from components? Should we weight different components differently? Depend on their stand alone performance Weight their contribution by their salience and relevance to STS? Theoretical considerations? Degree/Level of transparency/interpretability?
Extrinsic Evaluation Issues (Item B) How to integrate the STS blackbox in an NLP application – Is it simply ablation or is there something more interesting Where to integrate STS in different applications Do different applications require different types of STS (biased/weighted STS)? What implications would that have on design of STS? Can we come up with different STS formalisms (i.e. maybe with a known set of components?) similar to different syntactic formalisms/perspectives Role of intrinsic STS confidence level in integration and evaluation Again, Degree/Level of transparency/interpretabilityof underlying semantic components?
STS in NLP Applications (Item C) Distillation and MT (Marjorie Freedman) MT and MT evaluation (Alon Lavie, Dekai Wu, Lucia Specia, Kevin Knight, Scott Miller) Machine Reading (Ralph Weishdel) Watson Jeopardy (Alfio Gliozzo) Generation (Christian Chiarcos) Summarization (Enrique Alfonseca) Opinion Mining and Social Media Mining (Sanda Harabagiu) Inference (Johan Bos, Ido Dagan) (Tentative) Semantic Web and Ontologies (Michael Uschold)