Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract.

Similar presentations


Presentation on theme: "Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract."— Presentation transcript:

1 Data Frames Version 3 Proposal

2 Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract "\d{2}"; context "([^\$\d]|^)\d{2},[^\d]"; } 0.6, { extract "\d{2}"; context "\b'\d{2}\b"; } 0.8; end; Mileage matches [8] constant { extract "\b[1-9]\d{1,2}k"; } 0.6, { extract "[1-9]\d?,\d{3}"; } 0.3; keyword "\bmiles\b", "\bmi\.", "\bmi\b"; end; Also: except, substitute, filter phrases; lexicons

3 Kimball’s Ontology Editor Strong separation of value and keyword phrases Each phrase may be labeled Still allow negation Introduce idea of “required context” Allow keyword to be specific to a subset of the value phrases for this data frame Expressions are richer than regular expressions. Supports Boolean and proximity operators; also lexicons and macros.

4 Internal Representation Replace SQL field length with arbitrary type field  This is the “internal representation”  Type is either lexical or nonlexical  Type could be the name of an object set in the ontology  Or it could be the name of a type in whatever language will be used to implement methods (more on this later), together with a units name (e.g. “miles”, “meters”, “grams”, “pounds”)

5 Methods Add a method phrase to data frames  Conceptually they are restricted derived object sets and relationship sets  We only declare method signatures in data frames Another language (e.g. Java) is used to define the method body Our tool will generate a template in which the programmer can write method bodies The template will have OO structures that allow read-only access to the seamless model/data instance  Keyword phrases may also apply to methods

6 Canonicalization Methods Each value phrase may have an associated canonicalization method  The purpose is to convert the extracted value string into a common form The data frame may have a default canonicalization method that applies if there is no individual method for a value phrase

7 Inheritance Inheritance is defined more cleanly  Generalization/specialization will indicate inheritance hierarchy  The internal representation cannot be overridden in specializations  Multiple parents must have the same internal representation  Individual inherited phrases can be deleted or overridden  New phrases can be added  In the case of name conflict, we require fully qualified names to be used (no automatic disambiguation)

8 General Constraints We may decide to implement a limited form of general constraint in the ontology  E.g. “Birth Date <= Death Date”  Or “Event Distance.toMiles() <= 26 If so, we may want to implement operator overloading (something like C++) The general constraint issue is not core to the current data frame discussion, but it has interesting ramifications

9 Other Issues How to integrate methods and confidence values into record-assembly heuristics Ontos system will have to be rewritten Extract into model instance, not SQL tables  We can always generate database tables later if we’d like Ontologies created graphically and stored as XML


Download ppt "Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract."

Similar presentations


Ads by Google