Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wido van Peursen, VU University Amsterdam, Faculty of Theology.

Similar presentations


Presentation on theme: "Wido van Peursen, VU University Amsterdam, Faculty of Theology."— Presentation transcript:

1 Wido van Peursen, VU University Amsterdam, Faculty of Theology

2 1. The corpus: Hebrew Bible 2. The WIVU Database 3. CLARIN-project: SHEBANQ 4. NWO-project: Syntactic Diversity in BH 5. Case study: Judges 4 and 5

3  Ca. 400.000 words  Probably composed over a period of ca. 1000 years (1200-200 BC)  Complex transmission history  Oldest complete MS: Codex Leningradensis, 1008/9 AD  Various linguistic layers (e.g. vowel signs)  No native speakers

4  WIVU database of the Hebrew Bible  [WIVU = Werkgroep Informatica Vrije Universiteit] Createted since 1970s Linguistic levels:  Morphology (encoding rather than tagging!)  Words  Phrases  Clauses  Sentences  Text hierarchy

5

6

7 1. The corpus: Hebrew Bible 2. The WIVU Database 3. CLARIN-project: SHEBANQ 4. NWO-project: Syntactic Diversity in BH 5. Case study: Judges 4 and 5

8  System for HEBrew text: ANnotations for Queries and markup

9 Challenges: 1. No dedicated space on the web where an authorized version of this resource is guaranteed to exist. 2. No possibility to annotate it, link to it or build (open source) tools around it. 3. Results of existing queries cannot be shown on the web. 4. EMDROS is maintained by one-person private company. 5. Mainly used by specialists in Bible & Computer.

10  Mission: To build a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars.  Three steps: (1) make text & annotations, available to scholars; (2) demonstrate how queries can function to address research questions: repository of saved queries; (3) give textual scholarship more empirical basis, by creating the opportunity of unique identifiers referring to saved queries.

11  Mission: To build a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars.  Three steps: (1) make text & annotations, available to scholars; (2) demonstrate how queries can function to address research questions: repository of saved queries; (3) give textual scholarship more empirical basis, by creating the opportunity of unique identifiers referring to saved queries.

12  Mission: To build a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars.  Three steps: (1) make text & annotations, available to scholars; (2) demonstrate how queries can function to address research questions: repository of saved queries; (3) give textual scholarship more empirical basis, by creating the opportunity of unique identifiers referring to saved queries.

13  Mission: To build a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars.  Three steps: (1) make text & annotations, available to scholars; (2) demonstrate how queries can function to address research questions: repository of saved queries; (3) give textual scholarship more empirical basis, by creating the opportunity of unique identifiers referring to saved queries. Example: “in-his –feet”: a.“on foot” or b.“in his footsteps”. Disambiguation: 1.intuitive/contextual or 2.on basis of pattern recognition (participants/agreement)

14  Mission: To build a bridge between the linguistically annotated Hebrew Text corpus and biblical scholars.  Three steps: (1) make text & annotations, available to scholars; (2) demonstrate how queries can function to address research questions: repository of saved queries; (3) give textual scholarship more empirical basis, by creating the opportunity of unique identifiers referring to saved queries. [she-sang ] [Deborah and Barak ]

15 1. The corpus: Hebrew Bible 2. The WIVU Database 3. CLARIN-project: SHEBANQ 4. NWO-project: Syntactic Diversity in BH 5. Case study: Judges 4 and 5

16  Does Syntactic Variation reflect Language Change? Tracing Syntactic Diversity in Biblical Hebrew Texts

17  Explanations for linguistic diversity: Genre Chronology Language contact (Aramaic) Dialects Textual transmission Oral versus written layers

18  Limitations in current research: Focus on separate Bible books Methodological presuppositions Focus on lexical items or set phrases Failure to make use of methods for researching linguistic variation and change. Failure to incorporate insights into syntactic differences between independent / dependent clauses and between narration / direct speech.

19  Our approach Focus on syntax in three project components:  Phrase level  Clause level  Text level Synthesis: Integration of congruous and contradicting tendencies. Extra-biblical texts used as points of comparison.

20 1. The corpus: Hebrew Bible 2. The WIVU Database 3. CLARIN-project: SHEBANQ 4. NWO-project: Syntactic Diversity in BH 5. Case study: Judges 4 and 5

21  These chapters deal with battle of Deborah, Barak and Israelite tribes against the Canaanite king Jabin and his army- captain Sisera.  Differences, e.g.: 4 is prose, 5 is poetry. Main figures (Jabin absent in 5). Tribes involved (only two in 4).

22  4 depends on 5  Wellhausen 1878; Halpern 1983; Houston 1997; Neef 2002 and many others.  5 depends on 4  Bechmann 1989; Waltisberg 1999.  Common source/tradition  Richter 1963; Younger 1991.  Synchronous/sequential  Guest 1998; Reis 2005.

23 1. Identification of ‘similar’ text segments on the basis of ‘distance’ (synopsis impossible). 2. Identification of text features that cause high similarity scores. 3. Analysis of the distribution of these features in the larger context of Judges and the Old Testament.

24  Is intuition that 4 and 5 belong together supported by textual features?  If so, where in the text can they be found?  Similarity matrices: ‘distance’ measuring between each verse from ch. 4 and each verse from ch. 5.

25

26  Shared Lexemes:  the more shared lexemes, the smaller the distance.  ‘Noise’: e.g. ‘and’ >  Stoplist: exclude frequent particles etc.  Selection of content words on basis of part of speech: only words with inflection (nouns, verbs, adjectives).

27  Basic unit for text comparison: verse, but ‘verse’ based on traditional unit delimitation.  Differences in verse size may affect results.  Jaccard Index: the intersection of the number of shared lexemes divided by the union.

28 I went home I went home yesterday Intersection: Shared lexemes (types): 3 (I, went, home) Union: Total number of lexemes: 4 (I, went, home, yesterday) Jaccard Index = 3/4 = 0.75 I went home After the meeting I went home yesterday Intersection: 3 (I, went, home) Union: 7 (I, went, home, after, the, meeting, yesterday) Jaccard Index = 3/7 = 0.43

29  Shared lexemes: ‘feature-based’.  Also ‘blind’ methods, based on mathematical characteristics of the digital representation of the text, e.g. Normalized Compression Distance (NCD).

30  Example: verse pairs with the highest number of shared lexemes (4 or more)

31 5:15:55:24 4:6 Abinoam Barak say son God Israel the LORD mountain 4:14 Barak day Debora say 4:17 Heber Jael Kenite tent wife 4:21 Heber Jael tent wife

32  Proper nouns:  ‘Barak’, ‘Israel’.  Common nouns that are part of proper noun phrases:  ‘wife’ in ‘Jael the wife of Heber’;  ‘son’ in ‘Barak the son of Abinoam’.  Other verbs and common nouns:  ‘say’, ‘tent’, ‘day’.

33

34

35

36  High similarity scores in places that show high concentration of proper nouns.  Even within category of proper nouns considerable differences.  Shared common nouns and verbs: frequent words such as ‘day’, ‘say’. No significant concentration.

37  In case of literary dependency we would expect at least some concentration of shared lexemes.  Significant number of shared lexemes only in case of proper nouns.  But proper nouns suggest shared traditions, rather than literary dependency.


Download ppt "Wido van Peursen, VU University Amsterdam, Faculty of Theology."

Similar presentations


Ads by Google