Download presentation
Presentation is loading. Please wait.
Published byWesley Jordan Dixon Modified over 9 years ago
1
Przemysław Kaszubski Faculty of English From PICLE search to IFAConc Corpora in Poland PLM Session Poznań 8 Sept. 2012
2
Outline Research goal(s) and questions Disclaimers and challenges The story 2006-2012: – evolving solutions (incl. demos) PICLE search and Perl Concordancer bigram and trigram tools Error concordancer IFA Student concordancer IFAConc Some conclusions and plans
3
Overall research goal(s) Pedagogic “action-like” research: – probing the potential of corpus-based e-learning for (my) EAP writing instruction – Explore options for “seamless” integration with coursework Some questions: – Can students be (successful ?) (corpus) explorers of language (for their own sake) ? – (meta-)linguistic background knowledge (or remnants of it): facilitator or inhibitor for data-driven learning (DDL)? – can (controlled ?) corpus-exploration facilitate constructivist (practical) knowledge making (= knowing how to write (this) better / best ?) – bottom-up exploration or / and top-down instruction?
4
Some challenges and disclaimers small corpora (100 – 300 k words) non-indexed text search self-made online tools questions of time: – speed of search – time of satisfactory data analysis (any tool!)
5
Why like this? “experimental” assumption: – if tools with these limitations can work with learners, then... fun of creation flexibility and freedom of development availability of man-power: – student programmers – seminar students as corpora collectors – student writing groups as testers / testees EAP / EGAP / ESAP context – special(ist) corpora
6
The start: Briefly about PICLE Polish sub-corpus of the International Corpus of Learner English (ICLE) 330,000 words of running text (over 500 essays) Major part (c. 230,000 words) published on ICLE CD ROM in 2002 (2006, 2 nd ed), together with comparable English learner corpora collected in other EFL countries. 50-thousand word sampler has been error-tagged Can be (re-)searched online, unlike most other learner corpora
7
Some (lexical) research insights from PICLE (1) Misuse 'HAVE/GIVE (sb) possibility to ' = *'MIEĆ/DAĆ (komuś) możliwość / sposobność ' –... the adoptive parents have influenced their child, without giving him any choice or *possibility to "try out" other options. – For this reason we should reread a story because it gives us *possibility to look at the literary work from a perspective. BNC (chance, likelihood of): – [...]... led him to the perception that man has the possibility of changing his state of consciousness. – The sample was so arranged as to be fully representative over the country as a whole, and everyone had the same possibility of being included.
8
Some (lexical) research insights from PICLE (2) Overuse High frequency vocabulary adverbs of stance (boosters): definitely, certainly, undoubtedly, for sure “favourite” phraseology: BE full of : – Our television is full of programmes unsuitable for young viewers,... that/this BE why: – Since imagination belongs to one of the most important of our features we cannot deprive ourselves of it. That is why, many of us are (...). TAKE care (of : –... duties on the side of a woman, who is now expected not only to take care of her house and family but also to find time for professional work.
9
Some (lexical) research insights from PICLE (3) Underuse / avoidance E.g.: collocation breadth: attributive adjectives before attitude To be sure: Exclusive NS use: The motivations for both sexes, to be sure, are different. There is plenty of violence, to be sure, but it is a nice violence and no one gets killed.
10
Beginnings How (better ?) to share (and discover ?) such learner-corpus insights with learners? – items for (passive) study (usage alerts etc.) – items for study AND the corpus method..? Let’s try ! potential usefulness of DDL assumed
11
PICLE search / Perl Concordancer From one corpus to a range of (comparable) corpora Tool hub: – http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph p http://ifa.amu.edu.pl/~kprzemek/PICLE_search.ph p “Perl Concordancer(s)”: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/s earch_adv_new.html http://ifa.amu.edu.pl/~kprzemek/concord2advr/s earch_adv_new.html
12
Bigram and trigram tools towards more “search-worthy” items bigrams: – http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht ml http://ifa.amu.edu.pl/~kprzemek/concord3/bigram.ht ml trigrams counter: – http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p hp http://ifa.amu.edu.pl/~kprzemek/concord3/trigram.p hp problems: – “geek” tools
13
Error concordancing List-driven: – http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors /errors.htm http://ifa.amu.edu.pl/~kprzemek/concord2adv/errors /errors.htm "direct error concordancer“: – http://ifa.amu.edu.pl/~kprzemek/concord2advr/error -builder.php http://ifa.amu.edu.pl/~kprzemek/concord2advr/error -builder.php problems: – direct interpretability? – away from the “error” corpus evidence towards exposure to, and noticing, NS usage...
14
IFA Student Concordancer IFAConc’s predecessor: – http://ifa.amu.edu.pl/~kprzemek/concord2- login/index.html http://ifa.amu.edu.pl/~kprzemek/concord2- login/index.html Problems: – interface issues – search syntax – getting students to do it, e.g.: need for integration of prompted and spontaneous work integration with other (online) (writing) course tasks
15
IFAConc inspirations Tim John’s ‘Kibbitzer’ Tom Cobb’s URL-driven concordance feedback Aston’s corpora for ESP + Hyland’s emphasis on ESAP Linguistic theory: – Hoey’s lexical priming – also: Sinclair’s ‘extended unit of meaning’, Stubbs’ ‘phrasal schemas’, Goldberg’s CCxG, Halliday’s metafunctions SLA and CALL theory: ‘default path’, DDL, constructionism and Web 2.0 current DDL: Gavioli’s 'samples' vs 'examples‘, Widdowson’s authentication Coniam's "concordancing oneself" online Cobuild Sampler (friendly search syntax) web search engines
16
IFAConc conceptions friendly (enough), but demanding (some) deep-level processing (noticing, interpretation, adoption/rejection): – patterns of use / meaning – variation bottom-up and top-down access recommended theoretical platform (cf. ‘default path’ in CALL) relevance – authentication – personalisation collaborative as well as individual
17
Pedagogical uses of corpora > Aston (2002): – data-driven learning (hard and soft) – corpus-based learning of functions and concepts realized in texts (‘data-driven cultural learning’) – bottom-up building of reading skills – reference use accompanying work with other texts (e.g. writing, translating) – serendipitous browsing DDL and EGAP/ESAP/ERP writing – pre-learning (learner as researcher/spy, with possible/likely prior clues) – re-drafting/revising (reference use / consultation)
18
IFAConc success target: A good human concordancer initiates searches adjusts searches interprets searches – specific linguistic insights – awareness applies interpretation (authenticates) – on a task (eg. revision, vocab learning activities) – personal record (annotation) – discusses / shares findings – co-annotation, discussion personalizes the tool – annotations – personal corpora
19
IFAConc initial technologies (1) each search is a web link – concordances are interactive, not static each search is recorded (user-logging) – user interface – teacher-admin interface each search can be annotated – possible interaction with admin/teacher contrasting corpora along a cline of specialisation – EGP -> EGAP -> ESAP – EFL varieties – personal(ised) corpora
20
IFAConc initial technologies (2) corpora easily switchable on and off random sampling (e.g. 20 lines – Sinclair, after Hunston 2002) wider context view (cf. ‘shunting’, Halliday) Not only corpus search interface: – History – past work, possibly annotated – Resources - repository with recommended useful content and / or tasks enhancing web integration and publicity: RSS, blog, Moodle, traditional CALL
21
IFAConc default corpora > 1_PopLorePOPeditorials and popular introductions 1_UK-PressUKPnews stories 2_IntrTBksINTtextbook introductions 2_ExposParPARexpository paragraphs 3_PICLEPCFPolish students’ argumentative and expository essays 4_PragmatPRGResearch articles in linguistic pragmatics (82 samples) 4_Lex-LexLEXResearch article s in lexicology and lexicography (75 samples) 4_SLASLAResearch articles in Second Language Acquisition and TEFL (101 samples) 4_L1-AcqL1AResearch articles in First Language Acquisition (67 samples) 5_PragmEFLPRLExcerpts from 30 IFA MA theses in linguistic pragmatics
22
IFAConc – brief phase 1 to phase 2 change log Tests (tasks. monitoring, questionnaires) performed up to 2010 showed: – Anybody can conduct a reasonably successful analysis (cf. > 200 extramurals) – Returning users (gradually) search and interpret better – procedure too teacher-intensive – need to increase breadth of searches / analyses 2010 Improvements in UX (user experience), e.g.: – system more interactive – clearer highlighting of error-prone learner data – optimization of training and teacher-student interactions => towards an e-learning environment – Encouraging boost in annotation quality after the changes Latest enhancements (after 2011): – context reading mode – more sharing options (History entries / corpora) CAVEAT: General dev problem: – new features vs. (pedagogical / research) focus
23
IFAConc – some HIGHLIGHTS...
24
IFAConc highlights: Corpora Search
26
IFAConc highlights: Feedback link in student text (by comparison)
27
IFAConc highlights: Shared entry task prompt
28
IFAConc highlights: Resources training page
29
IFAConc highlights: User monitoring (1)
30
IFAConc highlights: User monitoring (2) – S-T collaborative annotation
31
IFAConc highlights: User monitoring (3) – email notification H-98307 'devoted' - annotation update by 'jagodawasik'
32
IFAConc highlights: Output: potential lexical primings: literary criticism vs. linguistics
33
IFAConc highlights: Output: Likely overuse / underuse cases
34
IFAConc – demo of today’s look http://ifa.amu.edu.pl/~ifaconc student interface teacher / admin interface Pardon the imperfections (server changes):
35
Some IFAConc lessons learned The system CAN work and could be self-sustainable, but: Enforced mode => free-use – research goal: fine-tuning of automatic student-tool interaction – problematic peer-feedback-based constructivims (“collectivist” culture? But: Is student’s web experience not changing that?) Prolonging the “novelty effect” – at least within assumed one user (learner) cycle Steady user experience enhancements: – increasing the ease of use improvement of ways-in without sacrificing learner effort (interpretation, authentication) facilitation of annotations (integration of Corpora Search and History) – enhancing interpretation options new corpora, new / improved training tasks etc.
36
What now? > Moving on: – integrating the various types of searches and corpora – search faster! – further interface improvements System being transformed / moved onto different servers Towards a blueprint, a proposal, a prototype solution for successful (= smooth / seamless / pleasant / meaningful / rewarding etc etc.) integration, yet in the making... though one which has produced a few interesting insights...
37
Acknowledgements > My (student) programmers – Paweł Nowak, Dominique Stranz Some software (makers): – GATE, Stanford Tagger – Exchanger XML Lite – WordPress and plugins 2004-6 MA and BA students – ESAP corpora – compilation, pre-processing – early testing 2007-11 seminar groups and EGAP and ESAP writing students – piloting, testing, using
38
Selected bibliography > ???
39
My relevant earlier presentations PALC (Lodz) 2007 TALC (Lisbon) 2008 PALC (Lodz) 2009 CL (Liverpool) 2009 TALC (Brno) 2010 ALL (Tuebingen) 2011
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.