Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.

Similar presentations


Presentation on theme: "Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U."— Presentation transcript:

1 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U Bremen Terry Langendoen, U Arizona

2 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 2 Multiple language resources Symposium focus so far has been on digital preservation of the work of individual projects. Imagine there are 100,000 or more Web accessible digital language archives covering most of the world’s languages.  annotated texts, lexicons, grammatical descriptions, research papers, typological comparisons,...

3 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 3 Limits on access to content Metadata gets you only a little way in. String searching gets results, but it’s often not reliable (low “precision” and “recall”). Database searches typically can only be carried out one site at a time.

4 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 4 Smart searches need smart data Use informational, not presentational, markup (cf. presentations by Simons and Lewis). XML can be used to represent linguistic analyses to any desired degree of refinement. Analyses in other formats (e.g. relational databases) can be migrated to XML for both archiving, and smart web searching.

5 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 5 Smart markup isn’t enough Meaning and use of structural markup varies from site to site.  Same term used with different meanings.  Different terms used with the same meaning.  Markup element and attribute names and values, and structural content may be in different natural languages. Sites are encoded at different levels of granularity.

6 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 6 How to say what you mean Markup is syntax; it’s meaning can only be inferred for individual sites, or groups of sites that use a common markup scheme (e.g. TEI). So if markup term T means “x” in archive A and “y” in archive B, then we need:  A resource (called an ontology) that provides the definitions “x” and “y” in a systematic and machine-interpretable format.  A mechanism to link T to “x” in A and T to “y” in B.

7 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 7 What is an ontology? A computational artifact; A conceptualization of a domain; A theory of what is; The types in a knowledge base. There can be many ontologies for a given domain.

8 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 8 Why an ontology for linguistics? Language documentation  need to decipher markup  semantics and markup  Semantic Web implementation Natural language processing  conceptual basis for semantics (grounding)  as a common framework for linguistic and non-linguistic knowledge

9 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 9 GOLD General Ontology for Linguistic Description—http://emeld.org/gold  Incorporated in EMELD’s FIELD tool. Built using an upper ontology (SUMO) http://ontology.teknowledge.com Currently in a very early stage of development.

10 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 10 Partial SUMO taxonomy ObjectPerdurant Physical Entity Abstract Region SelfConnected- Object Agent Collection Proposition Relation Attribute SetOrClass Quantity

11 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 11 What currently is in GOLD? Categories for:  linguistic form  morphosyntactic categories  features  values  semantics for morphosyntactic categories  using SUMO  documentation

12 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 12 Format of GOLD Semantic Web initiative  http://w3.org/2001/sw/ http://w3.org/2001/sw/ Web Ontology Language (OWL)  An emerging Web standard and growing user base  Extensible  Lots of visualization tools and APIs are available for OWL.

13 Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 13 What’s still needed Buildout of GOLD (and/or development of companion ontologies) to cover the entire field. Mechanisms to link sites to ontologies.  Can be done in part using metadata. Development of additional ontology-aware tools for data creation and migration. A way of ensuring that ontologies endure just like the data they help interpret.


Download ppt "Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U."

Similar presentations


Ads by Google