November 28th 2000

1
Contrasting Polish and English Derivational Groups Karolina Tymowicz based on Jadacka, H. Rzeczeownik polski jako baza derywacyjna,WN-PWN 1995 independent contrastive study of 540 Polish-English pairs of derivations November 28 th 2000

2
Outline Defining terms: –Derivational group –Derivational base –Affixes –Similarity of and within derivational groups Procedure of comparison Conclusions

3
Derivational group A well-ordered system constructed around an underived entry word concentrating all the derivatives connected with it by means of direct or indirect process of derivation a hierarchical structure in which each element functions as a link between other derivatives and the BASE

4
Derivational base The item to which an affix is added to derive a new word-form the word-forms consisting of the derivational base and an affix are called DERIVATIVES –e.g. STYLE - STYLIZE - STYLIZER –e.g. CENTRE - CENTRIC - CENTRICALLY

5
Affix a morpheme that is added to a word, and which changes the meaning or function of the word affixes are bound-forms that can be added: –to the beginning of a word = a prefix, e.g.: unkind –to the end of a word = suffix, e.g.: kindness

6
Similarity within derivational groups Four kinds of similarities within derivational groups are considered. Three types of translational similarity –translational similarity between morphemes –translational similarity between derivatives –translational similarity between derivational groups and one type of grapho-etymological similarity –graphemic and etymological similarity between bases

7
degrees of translational similarity between morphemes (incl. bases) def. translational similarity between L1 and L2 morphemes is a degree to which L1 morpheme can correctly be rendered as a corresponding L2 morpheme (i.e. morphemes occupying the same position with respect to the base). no similarity, e.g. ponad- vs. -less in P. ponad-czasowy, E. time-less) 1 st degree of similarity, e.g. bez- vs. -less in P. bez-głośny, E. voice-less 2 nd degree of similarity, e.g. -ik vs. -er in P. głośn-ik, E. loudspeak-er -czas- vs. time- in P. ponad-czas-owy, E. time-less)

8
degrees of translational similarity between derivatives def.: a joint translational similarity between all the corresponding morphemes of the Polish and English derivatives e.g. Pol. Eng. za-=a- les’-=forest ać whereby two morphemes are corresponding iff they occupy the same position with respect to the base.

9
similarity between derivational groups is a function of –the grapho-etymological similarity of their bases, –and the translational similarity of all their derivatives. degrees of translational similarity between derivative groups

10
Degrees of graphemic- etymological similarity between derivational bases def. Similarity established between two bases with respect to their etymological and graphemic features with the assumption of their translational equivalence –no similarity, e.g. dom vs. house –remote similarity, e.g. brat vs. brother –close similarity, e.g. styl vs. style irrespective of the translational equivalence of their derivatives

11
Scale of translational similarity between derivatives This scale used here consists of 12 levels of similarity counted from 11 to 0, where 0 stands for the lowest level of similarity and 11 denotes the highest level of similarity. 0 1 2 3 4 5 6 7 8 9 10 11

12
Treatment of compound derivatives If a single compound derivative of the form “A-B” or “AB” (but not “A B”) has an equivalent in the other language in the form of 2 separate words “C D” then it is included into our classification as long as C is a direct translation of A and D is a direct translation of B or C is a direct translation of B and D is a direct translation of A. This convention has been adopted because Jadacka’s derivational groups contain only derivatives of the form ‘AB’ or ‘A-B’, but no ‘A B’ derivatives Jadacka’s work constituted the main and most reliable source of derivatives and derivational groups considered in the study.

13
11. P. BASE1 + BASE2 + SUFFIX = E. BASE1 + BASE2 + SUFFIX e.g.: słowo - word słowo-twór-stwo word form-ation 10. E. BASE1 + (BASE2 + SUFFIX) = P. (BASE2 + SUFFIX) + BASE1 e.g.: krew - blood blood-stain-ed poplamio-ny krwią 9. E. BASE1 + BASE2 = P. BASE2 + (BASE1 + SUFFIX) e.g.: głos - voice voice-mail poczta głos-owa 0 1 2 3 4 5 6 7 8 9 10 11 Compound derivatives 1 Scale of similarity

14
Compound derivatives 2 8. P. BASE1 + BASE2 = E. BASE1 + BASE2 e.g.: słowo - word pół-słowo half-word 7. E. BASE1 + BASE2 = P. BASE2 + BASE1 e.g.: styl - style free-style styl wolny 0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity

15
6. P. BASE + SUFFIX = E. BASE + SUFFIX e.g.: las - forest les’-nik forest-er P. BASE + SUFFIX + SUFFIX = E. BASE + SUFFIX + SUFFIX e.g.: styl - style styl-ist-yczny styl-ist-ic P. PREFIX + BASE + SUFFIX = E. PREFIX + BASE + SUFFIX e.g.: las - forest wy-les’-anie de-forest-ation P. PREFIX + BASE + SUFFIX + SUFFIX = E. PREFIX + BASE + SUFFIX + SUFFIX e.g.: centrum - centre de-centr-al-izować de-centr-al-ize Single derivatives 1 0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity

16
5. P. PREFIX + BASE + SUFFIX = E. BASE + SUFFIX + SUFFIX e.g.: dziecko - child bez-dziet-ność child-less-ness 4. P. PREFIX + BASE + SUFFIX = E. BASE + SUFFIX e.g.: pan - lord wielko-pań-ski lord-ly 3. P. PREFIX + BASE + SUFFIX = E. PREFIX + BASE e.g.: las - forest za-leś-ać a-forest Single derivatives 2 Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

17
2. P. BASE + SUFFIX = E. BASE + ____ e.g.: słowo - word słow-nik word-book P. BASE + SUFFIX = E. BASE e.g.: dziecko - child diec-inka child 1. P. BASE + SUFFIX + SUFFIX = E. _____ + _______ + SUFFIX e.g.: słowo - word słow-nik-arz lexico-graph-er P. BASE + SUFFIX = E. _____ + SUFFIX e.g.: znak - sign znacz-nik mark-er Single derivatives 3 Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

18
0. E. BASE + BASE = P. _____ e.g.: time - czas time-piece zegarek P. BASE + SUFFIX = E. _____ e.g.: kość - bone kos-tka ankle E. PREFIX + BASE = P. _______ e.g.: child - dziecko grand-child wnuk Single derivatives 4 Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

19
Experiment 540 Polish-English pairs of derivatives were judged as to their similarity according to the 12-point scale presented above the translational similarity points for each pair of derivatives obtained for each of the Polish and English bases together with the grapho-etymological similarity between these bases were analysed statistically

20
Statistical tests applied in the study in spite of nonnormality of the data the following parametric tests were applied MANOVA for –for translational similarity between derivatives by –grapho-etymological similarity between the basis these derivatives were obtained from, and –direction of translation »(Polish-English: based on Jadacka ‘95 and Collins Polish-English Electronic Dictionary, »English-Polish: based on Harper-Collins Electronic Dictionary and Collins English- Polish Electronic Dictionary) Multiple Range Tests for –translational similarity of the derivatives, irrespective of whether they were obtained through Polish-English or English-Polish translation –by grapho-etymological similarity between the Polish and English bases they were derived from Multiple Range Tests for –translational similarity of the derivatives obtained through Polish-English translation –by grapho-etymological similarity between the Polish and English bases they were derived from additionally some non-parametric tests were applied Mann-Whitney W test to compare –medians of the similarity points obtained for the derivatives in Polish-English translation –with the medians of the similarity points obtained for the derivatives in English-Polish translation

21
Some results: MANOVA Type III Sums of Squares was used All F-ratios were based on the residual mean square error. Source Sum of Squares Df Mean Square F-Ratio P-Value A:graph_ethym_sim_betw_bases 590,704 2 295,352 53,53 0,0000 B:direction_of_translation 195,227 1 195,227 35,38 0,0000 RESIDUAL 2957,27 536 5,5173 TOTAL (CORRECTED) 3903,44 539 The P-values test the statistical significance of each of the sources. Since P-values are less than 0,05, these grapho-etymological similarity between bases and the direction of translation have a statistically significant effect on the translational similarity between the derivatives obtained from these bases at the 95,0% confidence level.

22
Contrast Difference +/- Limits 0 - 1 0,197742 1,25397 0 - 2 *-2,60124 0,488299 1 - 2 *-2,79898 1,30672 * denotes a statistically significant difference. which means that the derivational groups * of the Polish-English bases that were judged to bear no similarity with respect to their grapho- etymological features, and the derivational groups * of the bases that were judged to be remotely similar with respect to their grapho-etymological features (i.e. 0-1) do not differ significantly with respect to the similarity of the derivatives that constitute derivational groups of each of these basis. on the other hand, groups derived from bases that differed in their etymology and graphemic representation (contrasts 0-2 and 1-2) have significantly different derivatives as far as the translational similarity of these derivatives is concerned. Some results: Multiple Range Tests

23
1 2 5 7 8 Frequency Cumulative % 540 observations = 100%

24
Applications of the study The results of the study provide insights into the possibility of automatic translation of UNKNOWN L1 derivatives on the basis of –the L2 equivalents of the component morphemes of L1 derivative –the degree of grapho-etymological similarity between the bases of these derivatives

25
For example: assume we do not know the equivalent of a derivative leśnik we can interpret bases even if they are modified by other morphemes (las leś-) we know the equivalents of the component morphemes: les’- (= las) forest -nik -er we know the grapho-etymological similarity between the bases (= 0) Hence, we guess with a relatively small certainty that English equivalent of leśnik is forester

26
Pessimistic scenario for automatic translation of derivatives Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

27
Optimistic scenario for automatic translation of derivatives Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

28
Very optimistic scenario for automatic translation of derivatives Scale of similarity 0 1 2 3 4 5 6 7 8 9 10 11

29
Conclusions COMPOSITIONALITY: The meaning of the derivative is a direct function of the meaning of its morphemes in app. 38-56% of cases Assuming we know the equivalents of all the morphemes of an L1 derivative we have app. 38- 56% chance of producing a comprehensible L2 derivative The grapho-etymological similarity of L1 and L2 bases influences the translational similarity of their derivational groups

