Presentation is loading. Please wait.

Presentation is loading. Please wait.

Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex.

Similar presentations


Presentation on theme: "Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex."— Presentation transcript:

1 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex knowledge in the social sciences and humanities. Sheila M. Embleton Dorin Uritescu Eric S. Wheeler

2 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler2 Romanian Online Dialect Atlas n Sheila M. Embleton Department of Languages, Literatures and Linguistics, York University n Dorin Uritescu co-editor of source atlas: Noul Atlas lingvistic român. Crisana. Department of French, Glendon College, York University n Eric S. Wheeler ITEC program, York University, Managing partner, Wheeler and Young Inc.

3 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler3 Romanian Online Dialect Atlas Supported ( ) by a grant from: Social Sciences and Humanities Research Council (Canada)

4 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler4 Agenda n The problem of high-volume, complex data in social sciences and humanities. n Predecessor projects: English, Finnish dialect data n Use of Multidimensional Scaling (MDS) to consolidate data n Interactive, media-rich presentation

5 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler5 Problem In social sciences/humanities, data is often characterized by: n high volume n multiple variables or dimensions n no a priori model Dialectology provides a good exemplar

6 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler6 Dialectology n Explain the variations in linguistic usage across geography n Simple example: church vs. kirk (< OE cirice) n More realistic problem: 169 features in 313 locations (SED) 213 features in 400+ locations (Finnish)

7 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler7 Dialect atlases n Record the details in maps n Many maps needed to make an atlas n Recovery of individual facts is possible but... n Global understanding of the situation is lost in the volume of details

8 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler8 English n Survey of English Dialects (SED) u 169 features at 313 locations n Computer Developed Linguistic Atlas of English n Applied MDS to already computerized data

9 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler9 English: results n 2-D map of dialect locations n No geographic information used n Close correspondence to geography (as expected) n Highlighted further problems of handling and understanding high- volumes of data

10 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler10 English Dialect Map n Northern counties at top n Mid and southern counties below n Somerset, Devon (South-west) is out of place (in East) n Star-bursts, colours, dotted lines all help interpret map data

11 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler11 Finnish

12 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler12 Kettunen (1940) The Dialect Atlas of Finland n 213 maps x 530 locations n Up to 16 features per map n Typically 1-3 features per location n ~120,000 data items Project: data computerization (largely done) Stage II: application of MDS (not yet done)

13 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler13 Map 1 (parts)

14 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler14 Special software to facilitate accurate data entry

15 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler15 Ambiguity ?

16 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler16 Resolution n Make Editorial decision: X, not Y n Mark as AMBIGUOUS X or Y n Get more input X (says expert)

17 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler17 Lesson In transforming data from one medium to another, even well-structured data will have unexpected pitfalls: n Design data-transformation carefully n Prototype your system; Find the problems early n Plan to work iteratively

18 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler18 Romanian Online Dialect Atlas: Crisana n Apply innovative contemporary methods in dialect geography to an online set of Romanian dialect data.

19 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler19 Romanian language n Key to understanding the evolution of all Romance languages u Early branch, distinct from French- Spanish-Italian line n Exemplar of non-hierarchical, dialect variation, and linguistic continua u Transition areas contain mixtures of dialect features and specific features

20 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler20 RODA: Part 1 Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996) n Available on internet and CD n Default interpretations n Interactive interface to data u custom select data for a map n Add audio clips to illustrate data

21 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler21 RODA Prototype 1

22 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler22 RODA: Part 2 Allow plug-in applications and other analyses of data, e.g. Apply Multidimensional Scaling to dialect data n Statistical technique n Consolidate large amounts of data n Complement to traditional analyses of small amounts of data

23 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler23 Multidimensional Scaling

24 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler24 Multidimensional Scaling n Statistical technique (Torgerson 1952) n Used in sociology, psychology, marketing n Reveals the scales along which data varies; gives a data-space n Uses distances [(dis)similarities] among responses of subjects

25 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler25 MDS Axioms of metric n d(X,X) = 0 n d(X,Y) = d(Y,X) n d(X,Y) > 0 if X Y n d(X,Y) d(X,C) + d(C,Y) for all points C Matrix reflects these rules

26 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler26 MDS n n+1 points generate an n- dimensional space n MDS can reduce that high- dimensional space to 2 (or 3) dimensions n Result: complex data can be viewed as a map

27 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler27 MDS n Can use MDS to consolidate data u English 312 dimensions reduced to 2 u All 169 features included (and taken in relevant subsets) u Finnish, Romanian provide large data sets that can do the same

28 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler28 Interactive, media-rich presentation Objectives n Make data accessible, useful to a wide research audience Methods n Interactive selection of data n Constructive presentation of data n Addition of audio and other media Online is much more than a book!

29 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler29 Framework and Appns n Online atlas provides a framework for accessing and presenting data n Other applications can work within the framework to transform or process the data, such as: F MDS data consolidation F Tools to analyze dialect variants of phonemes (proposed) F Others

30 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler30 Summary n Humanities and Social Sciences deal with large, complex data sets n Explore methods to access, process, present this kind of data n Solutions include: u MDS type processing u Online, interactive, rich presentation n Example: Romanian Online Dialect Atlas

31 Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler31 References n Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with Ambiguity. J. of Quantitative Linguistics pp n Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag. n Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics pp n Schiffman, Susan S., M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp. n Torgerson, W. S Multidimensional scaling: 1. theory and method. Psychometrika n Stan, Ionel & Uritescu, Dorin Noul Atlas lingvistic român. Crisana. Vol. I. Bucharest: Romanian Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press) n Uritescu, Dorin Asupra repartiţiei dialectale a graiurilor dacoromâne. Graiul din Oaş" / "On the Dialect Structure of Daco-Romanian. The Dialect of Oaş/, in Materiale si cercetari dialectale II, Cluj- Napoca: The University of Cluj- Napoca, pp n Uritescu, Dorin. 1984a. Subdialectul crisean. In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, , n Uritescu, Dorin. 1984b. Graiul din Tara Oasului. In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, , n Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, n Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6 forthcoming. n Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming Handbook in Quantitative Linguistics.


Download ppt "Romanian Online Dialect Atlas© 2003 Embleton, Uritescu, Wheeler1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex."

Similar presentations


Ads by Google