Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size()

static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size() - 1)) #: candidates[i].overlapsWith(candidates[j]) => types[i] :: "null" || types[j] :: "null"; static model noDuplicates :: -> discrete[] types #: forall (v : types[0].values) atmost 1 of (t : types) t :: v; static model referenceConsistency :: -> discrete[] types #: forall (value : types[0].values) (exists (var : types) var :: "R-" + value) => (exists (var : types) var :: value); Learning Based Java for Rapid Development of NLP Systems Nick Rizzolo and Dan Roth This work was supported by NSF grant NSF SoD-HCER-0613885. Introduction Today's natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to simultaneously assign values to multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems. In this work, we first demonstrate that there exists a theoretical model that describes most NLP approaches adeptly. Second, we show how our LBJ language enables the programmer to describe the theoretical model succinctly. Finally, we introduce the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms. Constrained Conditional Models (CCMs) In a nut shell, to perform inference, a CCM finds the values of the output variables that maximize the following scoring function: Constraints Output space Features Constant Weights Learned Weights Input variablesOutput variables Result of Inference CCMs subsume discriminative and probabilistic modeling formalisms. Thus, a wide variety of learning and inference algorithms can be applied to them. For Example, Semantic Role Labeling: The pearls A1 which I I left to my daughter were shiny. R-A1A0VA2 ID Type The binary argument identification classifier determines whether each sequence of words is an argument. The multi-class argument type classifier labels each argument with a type. Every application of either classifier contributes to Goal: identify and classify the arguments of a given verb. Classifiers R-A1 => A1 E A2 => No other A2 Constraints Thus, we have a CCM. LBJ: Specifying Models, Features, and Constraints model ArgumentIdentifier :: discrete[] input -> boolean isArgument input[*] /\ ^isArgument; model ArgumentType :: discrete[] input -> discrete type input[*] /\ type; input[*] /\ input[*] /\ type; static model pertinentData :: ArgumentCandidate candidate -> discrete[] data data.phraseType = candidate.phraseType(); data.headWord = candidate.hadWord(); data.headTag = candidate.headTag(); data.path = candidate.path(); Models can be specified modularly. Features are written in First-Order Logic, and will have learned weights associated with them. Models can also be hard-coded, interfacing directly with data. Constraints look just like features, but their weights are explicit and constant. # ≡ ∞ Extentions to first order logic, like the “atmost” quantifier, make common constraints simple. model SRLProblem :: ArgumentIdentifier ai, ArgumentType at, ArgumentCandidate[] candidates -> boolean[] isArgument, discrete[] types for (i : (0.. candidates.size() - 1)) 1e9: isArgument[i] <- ai (commonFeatures candidates[i]); 1: types[i] <- at (commonFeatures candidates[i]); #: ~isArgument[i] => types[i] :: "null"; types <- noOverlaps candidates; types <- noDuplicates (); types <- referenceConsistency (); LBJ: Model Combination and Inference The models from the upper-right panel are now applied over a single group of candidate arguments corresponding to a single verb. Model application looks like assignment of another model to this model’s output variables. solver SRLInference :: SRLProblem problem Greedy.solve problem.isArgument[*]; ILP.solve problem.types[*]; Inference can be as easy as selecting algorithms and applying them over specific variables. State-of-the-art LBJ Implementations LBJ has already been used to develop several state-of-the-art resources. The LBJ POS tagger reports a competitive 96.6% accuracy on the standard Wall Street Journal corpus. In the named entity recognizer of (Ratinov and Roth, 2009), non-local features, gazetteers, and wikipedia are all incorporated into a system that achieves 90.8 F 1 on the CoNLL-2003 dataset, the highest score we are aware of. The co-reference resolution system of (Bengston and Roth, 2008) achieves state-of-the-art performance on the ACE 2004 dataset while employing only a single learned classifier and a single constraint. Data-Driven Compilation Developing a machine learning framework as a stand-alone language as opposed to a library opens the door to opportunities for automatically improving the efficiency of the code based on high-level analyses. In particular, much of the information necessary to generate the final program code is only available in the training data. Thus, we say that an LBJ compiler performs data-driven compilation. Example: Feature Extraction Consider the following lexicon, mapping from features to integers, that appears in many NLP systems. FeatureValues => Typical IndexLBJ Index word “the”00 “table”11 … POS DT500000 NN500011 … word /\ POS “the” /\ DT50050computed “table” /\ NN50051computed While a typical lexicon will store an integer index associated with every feature value, LBJ will only store indexes for the atomic features. Indexes of composite features can then be computed on the fly based on the indexes of the atomic features’ values. The result is a much smaller lexicon, and fewer hash table look-ups.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size()

Similar presentations

Presentation on theme: "Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size()"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size()

Similar presentations

Presentation on theme: "Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i + 1.. candidates.size()"— Presentation transcript:

Similar presentations

About project

Feedback