Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology

Slot Filler Validation (SFV) Track Goals ▫Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval ▫Evaluate the contribution of RTE systems on KBP slot-filling ▫Allow teams to experiment with system voting and global SFV input: ▫Candidate slot filler ▫Possibly additional information about candidate slot fillers SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Can only improve precision, not recall of full slot-filling systems Evaluation metrics depends on SFV use case and availability of additional information about candidate fillers TAC RTE KBP Validation task (2011) TAC KBP Slot Filler Validation task (2012)

TAC RTE KBP Validation task (2011) 1 RTE evaluation pair, where: T is the entire document supporting the slot filler H is a set of synonymous sentences, representing different realizations of the slot filler Each slot filler returned by SF systems

Use Case 1: SFV as Textual Entailment (2011) SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance) Local Approach: ▫Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets) ▫Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system. Evaluation: ▫F score on entire pool of candidate slot fillers (unique slot filler, provenance) ▫Baseline: All T’s classified as entailing the corresponding H: P=R=percentage of entailing pairs in the pooled SF responses ▫Weak baseline, easily beat by all SFV systems; not a direct measure of utility of SFV to SF

Use Case 2: SFV impact on single SF systems SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence)  Broken out into individual slot filling runs Global Approach: ▫System Voting, leveraging features across multiple SF runs Evaluation: ▫Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2012 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence)  Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2012 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence)  Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.

Slot Filler Validation (SFV) 2013 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence)  Broken out into individual slot filling runs SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2013 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence)  Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run Score only on the 90% of KBP 2013 slot filling queries that didn’t have preliminary assessments released as part of SFV input

SF System Profile SF Team ranks in KBP 2009-2012 Did the system extract fillers from the KBP 2013 source corpus? Do the Confidence Values have meaning? Is the Confidence Value a probability? Tools or methods for: ▫Query expansion ▫Document retrieval ▫Sentence retrieval ▫NER nominal tagging ▫Coreference resolution ▫Third-party relation/event extraction ▫Dependency/Constituent parsing ▫POS tagging ▫Chunking ▫Main slot filling algorithm ▫Learning algorithm ▫Ensemble model ▫External resources

Slot Filler Validation Teams and Approaches BIT: Beijing Institute of Technology [local] ▫Generic RTE approach based on word overlap, cosine similarity, and token edit distance Stanford: Stanford University [local] ▫Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers UI_CCG: University of Illinois at Urbana-Champaign [local] ▫Tailored RTE approach; check candidate for slot-specific constraints jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global] ▫Consider only the confidence value associated with each candidate filler and aggregate confidence values across systems. RPI_BLENDER: Rensselaer Polytechnic Institute [strong global] ▫Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.

Impact of RPI_BLENDER2 SFV on SF Runs SF RunF1 of original SF run  F1 after applying SFV filter lsv10.3712120.012212 lsv50.3684620.025411 lsv30.3674380.029463 ARPANI10.364683-0.01695 lsv40.3634410.041238 RPI_BLENDER30.3366940.025749 RPI_BLENDER10.3339090.027718 lsv20.3333330.008259 RPI_BLENDER50.3328660.017108 PRIS201330.3273840.021544 NYU10.253842-0.00105 UWashington10.184026-0.011544 UWashington20.156271-0.004999 UWashington30.140677-0.013133 SAFT_KRes30.134615-0.004458 CMUML30.098274-0.002241 TALP_UPC30.036237-0.007019 Top 10 SF runs Negatively impacted SF runs

Conclusion Leveraging global features boosts scores of individual SF runs…. If done discriminately ▫Don’t treat all slot filling systems the same Even weak global features (e.g. raw confidence values) may help in some cases Caveat: other evaluation metrics also valid depending on use case. ▫RTE KBP validation (2011) metric may be appropriate if goal is to make assessment more efficient

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

Similar presentations

Presentation on theme: "Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

Similar presentations

Presentation on theme: "Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology."— Presentation transcript:

Similar presentations

About project

Feedback