LEarning and TEaching Corpora: data-sharing and repository for research on multimodal interactions Ciara R. Wigham & Thierry Chanier Clermont Université.

LEarning and TEaching Corpora: data-sharing and repository for research on multimodal interactions Ciara R. Wigham & Thierry Chanier Clermont Université LRL: Publications: 4th WorldCALL Conference, July 2013, Glasgow

2 2 Simuligne (2001) UK-FR fre Copéas (2005) eng UK-FR Tridem ( ) UK-FR-USA eng, fre Ecofralin (2008) CO-FR fre,spa VMT- teamC (2006) math UK-USA-SG INFRAL (2009) deu,fra DE-FR FR FAVI ( ) fra ARCHI21 (2011) eng,fra FR SLIC (2013) USA-FR fra

3 Data validity & reliability in CALL research? Problem in Social Sciences and CALL: ▫visibility, accessibility and validity of research data ▫data representative / anecdotal? ▫no access to data when reading a publication ▫links between data and publications 3

4 CALL data from online learning situations CALL data is often: ▫not contextualised – pedagogical & technological situations (Kern et al., 2004) ▫tangled in specific software using proprietary formats Replication for interaction analysis in online learning near impossible: ▫variables that are difficult to control ▫replication does not imply that phenomenon previously observed will reoccur (Reffay et al., 2012) 4

5 Mulce project & LETEC Multimodal Corpora Exchange 5

6 Research data quality: Mulce project Interoperability: ▫Structured and coherent data sets = > analyses can be completed by researchers who did not participate in the course Sustainability: ▫Independent from online platforms ▫Stored in independent formalisms Open access to research data & appropriate licences Accessibility: ▫Finding the research data through standard metadata – OLAC (Open Language Archives Community) 6

7 Learner Corpora / LETEC Learner Corpora (see Granger, 2002; Meunier et al., 2011) ▫SLA research ▫learners' productions ▫test situations (Reffay et al., 2008) ▫learner- native speaker comparative studies (Boulton et al., 2012) LEarning and TEaching Corpora ▫all participants considered (learners, tutors, etc.) ▫interaction data ▫context 7

8 LETEC Components Instanciation Pedagogical scenario Research protocol Public licence Private licence Analyses 8 "A LETEC corpus collects in a systematic and structured way all the data from interactions which occur during a course which is partially or entirely online. These data are enriched by technical, pedagogical and scientific information as well as information about the participants and are organized to allow contextualized analyses to be performed.“ (Mulce-documentation, 2013) ethics & rights

10 Staged process stages= Data analyses 10

11 Illustration of methodology- European project KA2 Languages CLIL approach (Content and Language Integrated Learning) ▫Architecture + French / English L2 Hybrid course "Building Fragile Spaces" : 5-day studio Feb students, 2 architecture tutors, 1 EFL tutor, 1 FFL tutor  Working with external partners: exchanges 11

13 Elaboration of research areas Interplay between verbal and non verbal modes Role of nonverbal in identity construction Interplay between textchat & voicechat modalities Support for L2 verbal participation and production Wigham (2012) – PhD Thesis Stage 1: Design 13

14 Pedagogical Design Macro-task– collaboratively elaborate a model in a synthetic world (Second Life) as a response to an architectural problem brief Architectural studio, hybrid CLIL approach 4 workgroups Stage 1: Design Learning design Online environments Participants’ roles Learning & support activities 14

15 Learning & support activities ActivityArchitecture objectivesL2 objectives Introduction to Second Life Introduce students to multimodal nature of SL Establish a communication protocol Collaborative building activity Introduce students to building techniques to aid them develop their model Develop L2 communication techniques concerning the referencing of objects Group reflective session Develop critical thinking by negotiation Distinguish pertinent information for overall problem identification in their design brief Help students to skill-up their L2 Acquire domain-specific vocabulary Develop a professional discourse Stage 1: Design Detailed in: Rodrigues et al., in press; Wigham & Chanier,

16 Research protocol Research protocol design ▫Protocol for data collection ▫Researchers' roles ▫Timetable of research activities Stage 1: Design researcher 16 Wigham & Chanier, 2013 ReCALL

18 Data collection & coverage Data collected Pre- questionnaires Session dataPost questionnaires Semi- directive interviews Environ ment KwiksurveysSecond LifeVoiceForumKwiksurveysSkype Data typeSpreadsheet file Video screen captures Audio recordings Spreadsheet fileAudio recordings Quantity & coverage of data 17 student questionnaires 20 group sessions & 2 presentation sessions 19h40m 64 forum messages 16 student questionnaires 5 student interviews 2h30 pre-course post-course during course Stage 2: Data collection 18

20 Primary data (anonymised)Each resources has an ID and a description given LETEC global corpus: content packaging Manifest : structured data Structured Interaction Data Model (Mce_sid, 2011) XML Information about each component of the corpus General metadata(OLAC standards)Environnements usedInformation on participants: language biographies and group organisation Description of the environment, course length, participants, tools Activities described in the pedagogical scenario Stage 3: Data organisation 20

21 Corpus deposit Mulce corpus repository (Mulce-repository, 2013) Stage 3: Data organisation 21

22 Corpus diffusion Description of corpus; interface to browse structure; zip file to download Stage 3: Data organisation 22

24 verbal mode non verbal mode audiotextchat proxemic transmission radio transmission public private not detailed here, see Wigham & Chanier, (2013) ReCALL 25(1) Multimodal data transcription Stage 4: Data transcription & diffusion 24

25 Elaboration of transcription methodology Characterized by communication modes & modalities ▫Systematic approach to studying online environments New environments = new modalities ▫Added to transcription methodology Communication mode Communication modality Act type and transcription code Explanation verbal audio audio act (tpa)verbal turn in the public audio channel silence (sil) interval between two audio acts greater than three seconds textchattextchat act (tpc)message entered in the textchat window nonverbal proxemics movement (mvt) avatar movement in the environment, e.g. avatar sits down, flies, walks backwards entrance into /exit from environment (es) avatar enters or exits the synthetic world kinesicskinesic (kin) avatar gestures and movements made by an avatar's body part e.g. nod, point, clap productionproduction (prod) production or display of an object in the SL environment Stage 4: Data transcription & diffusion 25

26 Multimodal transcription using ELAN video screen capture multimodal transcription aligned using timeline participants & modality view of annotations for one participant in one modality Max Planck Institute for Psycholinguistics (2001). ELAN [software]. The Netherlands: Max Planck Institute for Psycholinguistics. [] Stage 4: Data Analyses 26

27 Production & deposit of LETEC distinguished corpus Particular analysis of a selected part of the global LETEC corpus Chanier, T. Saddour, I. & Wigham, C.R. (2012). (dir.) Distinguished Corpus: Transcription of Verbal and Nonverbal Interactions of the Second Life Reflection archi21-slrefl-av-j2. : Clermont Université. [oai : slrefl-av-j2 ; Only contains transformed data (=the transcriptions) Refers to a selection of the original data in global corpus (=videos) Software used for transcription cited (=ELAN) Stage 4: Data transcription & diffusion 27

28 Why does structuring a corpus help analysis? Common technical structures to hold interaction data ▫Data linked ▫Analyses at different levels, in context whilst maintaining a global view of the course XML structure allows standard forms of annotation / coding & different analysis software to be used ▫Tatiana (2008) ▫Calico (2009) Stage 4: Research Analyses 28

29 An analysis example Interplay between textchat & voicechat Textchat modality acts in adjunct to the audio modality ▫e.g. technical problems exist, opening & closing sequences of sessions (Liddicoat, 2011; Palomeque, 2011) Monomodal textchat environments – auto-correction, negotiation of meaning and corrective feedback Learner overload (Deutschmann & Panichi, 2009)  Multimodal environments ? (Hampel & Stickler, 2012)  Can the textchat serve for L2 feedback provision? Stage 4: Research Analyses Wigham & Chanier (in print) CALL Journal 29

30 An example of modality interplay 30

31 Characterisation of textchat functions Wigham & Chanier (in print) CALL Journal Stage 4: Research Analyses 31

32 Characterisation of textchat functions Data coding facilitated by XML schemas Stage 4: Research Analyses 32

33 Feedback in textchat 17% of acts contain feedback (49 acts) Primarily concerns lexical and grammatical non target- like forms (cf. Tudini, 2003) Predominant use of recasts (32/49 instances) EFL Session TechnicalSocialisation Conversation management TaskForm Es-j Sc-j Sc-j Stage 4: Research Analyses 33

34 Results of textchat feedback study EFL tutor's strategic choice to use textchat - reduces cognitive load ▫Non expertise in content matter Language form Vs communicative meaning ▫Recasts as remain in textchat window ▫Recasts so as not to interrupt content communication Students’ management of multiple modalities Stage 4: Research Analyses 34

35 Publication of analyses & deposit of associated distinguished corpus Production of distinguished corpus: ▫Wigham, C.R. (2013). (dir.) Distinguished Corpus: Interplay between textchat and audio modalities during the Second Life Reflective Sessions. : Clermont Université. [oai : ; Analysed data presented in parallel with results ▫Wigham, C.R. & Chanier, T. (in print). Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic world Second Life. CALL Journal Distinguished corpora can be cited in articles Explicit connections between data and publications enhance the quality of CALL research 35 Stage 4: Publication

36 Conclusion: Sustaining CALL research Reuse of data for cumulative or contrastive analyses ▫Rodrigues & Wigham (in print) – text chat & problematic vocabulary points ▫Natural language processing techniques Facilitated by: ▫structured XML formalisms render online interaction data autonomous from any platform, in tool agonistic form ▫interactions described by modes & modalities -> not specific to an online environment Reuse of LETEC in corpus linguistics (TEI-CMC) 36 Conclusion

37 Perspectives Documented and selected materials in their original context –basis for reflection in pedagogical corpora Integration of pedagogical corpora into teacher- training classrooms 37 Conclusion

38 Contact: Website: Mulce-documentation: Mulce-repository: Thank you! 38

39 Corpus metadata Inform researchers about: ▫conditions under which the corpus was built ▫how to use the corpus ▫the corpus' content ▫licences for re-using the corpus Used for web harvesting ▫corpus become visible to whole community (OLAC, Clarin) ▫corpus can be cited Stage 3: Data organisation 39

Characterisation of textchat functions Data coding facilitated by XML schemas Wigham & Chanier (in print) CALL Journal

41 Data coverage 6 sessions (3 FFL, 3 EFL) 4h30m of screen recordings Analyses 41 Groups analysedAudio actsTextchat acts EFL FLE38664

42 Perspectives Documented and selected materials in their original context –basis for reflection Inter-disciplinary project 42

