Presentation is loading. Please wait.

Presentation is loading. Please wait.

Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012.

Similar presentations


Presentation on theme: "Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012."— Presentation transcript:

1 Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012

2 Outline Research goal(s) and questions Disclaimers and challenges The story 2006-2012: – evolving solutions (incl. demos) PICLE search and Perl Concordancer bigram and trigram tools Error concordancer IFA Student concordancer IFAConc Some conclusions and plans

3 Overall research goal(s) Pedagogic “action-like” research: – probing the potential of corpus-based e-learning for (my) EAP writing instruction – Explore options for “seamless” integration with coursework Some questions: – Can students be (successful ?) (corpus) explorers of language (for their own sake) ? – (meta-)linguistic background knowledge (or remnants of it): facilitator or inhibitor for data-driven learning (DDL)? – can (controlled ?) corpus-exploration facilitate constructivist (practical) knowledge making (= knowing how to write (this) better / best ?) – bottom-up exploration or / and top-down instruction?

4 Some challenges and disclaimers small corpora (100 – 300 k words) non-indexed text search self-made online tools questions of time: – speed of search – time of satisfactory data analysis (any tool!)

5 Why like this? “experimental” assumption: – if tools with these limitations can work with learners, then... fun of creation flexibility and freedom of development availability of man-power: – student programmers – seminar students as corpora collectors – student writing groups as testers / testees EAP / EGAP / ESAP context – special(ist) corpora

6 The start: Briefly about PICLE Polish sub-corpus of the International Corpus of Learner English (ICLE) 330,000 words of running text (over 500 essays) Major part (c. 230,000 words) published on ICLE CD ROM in 2002 (2006, 2 nd ed), together with comparable English learner corpora collected in other EFL countries. 50-thousand word sampler has been error-tagged Can be (re-)searched online, unlike most other learner corpora

7 Some (lexical) research insights from PICLE (1) Misuse 'HAVE/GIVE (sb) possibility to ' = *'MIEĆ/DAĆ (komuś) możliwość / sposobność ' –... the adoptive parents have influenced their child, without giving him any choice or *possibility to "try out" other options. – For this reason we should reread a story because it gives us *possibility to look at the literary work from a perspective. BNC (chance, likelihood of): – [...]... led him to the perception that man has the possibility of changing his state of consciousness. – The sample was so arranged as to be fully representative over the country as a whole, and everyone had the same possibility of being included.

8 Some (lexical) research insights from PICLE (2) Overuse High frequency vocabulary adverbs of stance (boosters): definitely, certainly, undoubtedly, for sure “favourite” phraseology: BE full of : – Our television is full of programmes unsuitable for young viewers,... that/this BE why: – Since imagination belongs to one of the most important of our features we cannot deprive ourselves of it. That is why, many of us are (...). TAKE care (of : –... duties on the side of a woman, who is now expected not only to take care of her house and family but also to find time for professional work.

9 Some (lexical) research insights from PICLE (3) Underuse / avoidance E.g.: collocation breadth: attributive adjectives before attitude To be sure: Exclusive NS use: The motivations for both sexes, to be sure, are different. There is plenty of violence, to be sure, but it is a nice violence and no one gets killed.

10 Beginnings How (better ?) to share (and discover ?) such learner-corpus insights with learners? – items for (passive) study (usage alerts etc.) – items for study AND the corpus method..? Let’s try ! potential usefulness of DDL assumed

11 PICLE search / Perl Concordancer From one corpus to a range of (comparable) corpora Tool hub: – http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph p http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph p “Perl Concordancer(s)”: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/s earch_adv_new.html http://ifa.amu.edu.pl/~kprzemek/concord2advr/s earch_adv_new.html

12 Bigram and trigram tools towards more “search-worthy” items bigrams: – http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht ml http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht ml trigrams counter: – http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p hp http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p hp problems: – “geek” tools

13 Error concordancing List-driven: – http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors /errors.htm http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors /errors.htm "direct error concordancer“: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/error -builder.php http://ifa.amu.edu.pl/~kprzemek/concord2advr/error -builder.php problems: – direct interpretability? – away from the “error” corpus evidence towards exposure to, and noticing, NS usage...

14 IFA Student Concordancer IFAConc’s predecessor: – http://ifa.amu.edu.pl/~kprzemek/concord2- login/index.html http://ifa.amu.edu.pl/~kprzemek/concord2- login/index.html Problems: – interface issues – search syntax – getting students to do it, e.g.: need for integration of prompted and spontaneous work integration with other (online) (writing) course tasks

15 IFAConc inspirations Tim John’s ‘Kibbitzer’ Tom Cobb’s URL-driven concordance feedback Aston’s corpora for ESP + Hyland’s emphasis on ESAP Linguistic theory: – Hoey’s lexical priming – also: Sinclair’s ‘extended unit of meaning’, Stubbs’ ‘phrasal schemas’, Goldberg’s CCxG, Halliday’s metafunctions SLA and CALL theory: ‘default path’, DDL, constructionism and Web 2.0 current DDL: Gavioli’s 'samples' vs 'examples‘, Widdowson’s authentication Coniam's "concordancing oneself" online Cobuild Sampler (friendly search syntax) web search engines

16 IFAConc conceptions friendly (enough), but demanding (some) deep-level processing (noticing, interpretation, adoption/rejection): – patterns of use / meaning – variation bottom-up and top-down access recommended theoretical platform (cf. ‘default path’ in CALL) relevance – authentication – personalisation collaborative as well as individual

17 Pedagogical uses of corpora > Aston (2002): – data-driven learning (hard and soft) – corpus-based learning of functions and concepts realized in texts (‘data-driven cultural learning’) – bottom-up building of reading skills – reference use accompanying work with other texts (e.g. writing, translating) – serendipitous browsing DDL and EGAP/ESAP/ERP writing – pre-learning (learner as researcher/spy, with possible/likely prior clues) – re-drafting/revising (reference use / consultation)

18 IFAConc success target: A good human concordancer initiates searches adjusts searches interprets searches – specific linguistic insights – awareness applies interpretation (authenticates) – on a task (eg. revision, vocab learning activities) – personal record (annotation) – discusses / shares findings – co-annotation, discussion personalizes the tool – annotations – personal corpora

19 IFAConc initial technologies (1) each search is a web link – concordances are interactive, not static each search is recorded (user-logging) – user interface – teacher-admin interface each search can be annotated – possible interaction with admin/teacher contrasting corpora along a cline of specialisation – EGP -> EGAP -> ESAP – EFL varieties – personal(ised) corpora

20 IFAConc initial technologies (2) corpora easily switchable on and off random sampling (e.g. 20 lines – Sinclair, after Hunston 2002) wider context view (cf. ‘shunting’, Halliday) Not only corpus search interface: – History – past work, possibly annotated – Resources - repository with recommended useful content and / or tasks enhancing web integration and publicity: RSS, blog, Moodle, traditional CALL

21 IFAConc default corpora > 1_PopLorePOPeditorials and popular introductions 1_UK-PressUKPnews stories 2_IntrTBksINTtextbook introductions 2_ExposParPARexpository paragraphs 3_PICLEPCFPolish students’ argumentative and expository essays 4_PragmatPRGResearch articles in linguistic pragmatics (82 samples) 4_Lex-LexLEXResearch article s in lexicology and lexicography (75 samples) 4_SLASLAResearch articles in Second Language Acquisition and TEFL (101 samples) 4_L1-AcqL1AResearch articles in First Language Acquisition (67 samples) 5_PragmEFLPRLExcerpts from 30 IFA MA theses in linguistic pragmatics

22 IFAConc – brief phase 1 to phase 2 change log Tests (tasks. monitoring, questionnaires) performed up to 2010 showed: – Anybody can conduct a reasonably successful analysis (cf. > 200 extramurals) – Returning users (gradually) search and interpret better – procedure too teacher-intensive – need to increase breadth of searches / analyses 2010 Improvements in UX (user experience), e.g.: – system more interactive – clearer highlighting of error-prone learner data – optimization of training and teacher-student interactions => towards an e-learning environment – Encouraging boost in annotation quality after the changes Latest enhancements (after 2011): – context reading mode – more sharing options (History entries / corpora) CAVEAT: General dev problem: – new features vs. (pedagogical / research) focus

23 IFAConc – some HIGHLIGHTS...

24 IFAConc highlights: Corpora Search

25

26 IFAConc highlights: Feedback link in student text (by comparison)

27 IFAConc highlights: Shared entry task prompt

28 IFAConc highlights: Resources training page

29 IFAConc highlights: User monitoring (1)

30 IFAConc highlights: User monitoring (2) – S-T collaborative annotation

31 IFAConc highlights: User monitoring (3) – email notification H-98307 'devoted' - annotation update by 'jagodawasik'

32 IFAConc highlights: Output: potential lexical primings: literary criticism vs. linguistics

33 IFAConc highlights: Output: Likely overuse / underuse cases

34 IFAConc – demo of today’s look http://ifa.amu.edu.pl/~ifaconc student interface teacher / admin interface Pardon the imperfections (server changes):

35 Some IFAConc lessons learned The system CAN work and could be self-sustainable, but: Enforced mode => free-use – research goal: fine-tuning of automatic student-tool interaction – problematic peer-feedback-based constructivims (“collectivist” culture? But: Is student’s web experience not changing that?) Prolonging the “novelty effect” – at least within assumed one user (learner) cycle Steady user experience enhancements: – increasing the ease of use improvement of ways-in without sacrificing learner effort (interpretation, authentication) facilitation of annotations (integration of Corpora Search and History) – enhancing interpretation options new corpora, new / improved training tasks etc.

36 What now? > Moving on: – integrating the various types of searches and corpora – search faster! – further interface improvements System being transformed / moved onto different servers Towards a blueprint, a proposal, a prototype solution for successful (= smooth / seamless / pleasant / meaningful / rewarding etc etc.) integration, yet in the making... though one which has produced a few interesting insights...

37 Acknowledgements > My (student) programmers – Paweł Nowak, Dominique Stranz Some software (makers): – GATE, Stanford Tagger – Exchanger XML Lite – WordPress and plugins 2004-6 MA and BA students – ESAP corpora – compilation, pre-processing – early testing 2007-11 seminar groups and EGAP and ESAP writing students – piloting, testing, using

38 Selected bibliography > ???

39 My relevant earlier presentations PALC (Lodz) 2007 TALC (Lisbon) 2008 PALC (Lodz) 2009 CL (Liverpool) 2009 TALC (Brno) 2010 ALL (Tuebingen) 2011


Download ppt "Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012."

Similar presentations


Ads by Google