Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science.

Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The University of Iowa

Why?  We focused on link detection this year to vet a new similarity scheme  In building our extraction framework for question answering and bioinformatics we were able to derive:  A reasonably clean scheme for mapping relationships between entities; and  Decorating those entities with extracted attributes/properties (e.g., person age, relative geographical position, etc.)  We focused on link detection this year to vet a new similarity scheme  In building our extraction framework for question answering and bioinformatics we were able to derive:  A reasonably clean scheme for mapping relationships between entities; and  Decorating those entities with extracted attributes/properties (e.g., person age, relative geographical position, etc.)

Our Working Hypothesis  Assessing inter-document linkage using a concept graph derived from the extraction framework could prove to be more robust than term vector methods

Technique (in the ideal)  Sentence boundary detect the corpus  Part-of-speech tag sentence terms  Extract named entities and residual noun phrases  Generate a parse for the sentence  Using the resulting dependencies to generate graph fragments  Merge the graph fragments into a single graph for a story  Use a graph similarity scheme to assess story linkage  Sentence boundary detect the corpus  Part-of-speech tag sentence terms  Extract named entities and residual noun phrases  Generate a parse for the sentence  Using the resulting dependencies to generate graph fragments  Merge the graph fragments into a single graph for a story  Use a graph similarity scheme to assess story linkage

The graph similarity measure  Generate the Cook-Holder edit distance between two graphs  Graph_sim(g1, g2) = 1 - norm(CHed(g1,g2) / max(|g1|,|g2|))  Generate the Cook-Holder edit distance between two graphs  Graph_sim(g1, g2) = 1 - norm(CHed(g1,g2) / max(|g1|,|g2|))

Reality sets in  MT text doesn’t parse worth a …  ASR text rarely has clean sentence boundaries  Off-the-shelf parsers aren’t trained for speech grammars  Hence ASR text doesn’t parse worth a …  MT text doesn’t parse worth a …  ASR text rarely has clean sentence boundaries  Off-the-shelf parsers aren’t trained for speech grammars  Hence ASR text doesn’t parse worth a …

Regrouping  Sentence boundary detect newswire sources  Approximate sentence boundaries with speech pauses longer than a certain threshold  Skip the parse  Generate graph fragments using a window of neighboring NPs  Submitted run uses the current NP and the two downstream NPs  This clearly misses syntactically close but lexically distant NP connections…  Sentence boundary detect newswire sources  Approximate sentence boundaries with speech pauses longer than a certain threshold  Skip the parse  Generate graph fragments using a window of neighboring NPs  Submitted run uses the current NP and the two downstream NPs  This clearly misses syntactically close but lexically distant NP connections…

Contrastive Runs  Cosine vector similarity of document term vectors  Cosine vector similarity of document phrase vectors  A strawman edit distance  Construct a single string for a document comprised of the concatenation of alphabetized NPs for the document  If the graph scheme doesn’t outperform this, it’s probably not worth pursuing…  Cosine vector similarity of document term vectors  Cosine vector similarity of document phrase vectors  A strawman edit distance  Construct a single string for a document comprised of the concatenation of alphabetized NPs for the document  If the graph scheme doesn’t outperform this, it’s probably not worth pursuing…

Official Results RunSchemeP(Miss)P(FA)Norm Clink UIowa1Graph0.72340.00180.7320 UIowa2Edit0.73080.06681.0582 UIowa3Phrase0.69710.00140.6984 UIowa4Word0.68510.00040.6871

Word Performance

Phrase Performance

Edit Distance Performance

Graph Similarity Performance

Word/Phrase Costs

Word/Edit Costs

Word/Graph Costs

Graph/Edit Costs

Conclusions  Definitely signal present in the graph similarity scheme  More tuning needed  Official Run Clink: 0.0146  Actual Minimum Clink: 0.0118  Official Run P(Miss): 0.7234  Actual Minimum Clink P(Miss): 0.4951  Definitely signal present in the graph similarity scheme  More tuning needed  Official Run Clink: 0.0146  Actual Minimum Clink: 0.0118  Official Run P(Miss): 0.7234  Actual Minimum Clink P(Miss): 0.4951

Conclusions, con’t.  Revisit the graph formation hack  Hybrid scheme  Using ideal scheme for newswires  Using hack for broadcasts  Alternatively  Aggressively segment ASR, resulting in smaller fragments  Parse everything  Note here that we don’t need full sentence structure, only good clausal structure  Revisit the graph formation hack  Hybrid scheme  Using ideal scheme for newswires  Using hack for broadcasts  Alternatively  Aggressively segment ASR, resulting in smaller fragments  Parse everything  Note here that we don’t need full sentence structure, only good clausal structure

Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science.

Similar presentations

Presentation on theme: "Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science.

Similar presentations

Presentation on theme: "Link Detection David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science."— Presentation transcript:

Similar presentations

About project

Feedback