Practical Text Mining With Perl 데이터베이스연구실 김 민 흠. 3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks.

Slides:



Advertisements
Similar presentations
CORRECT. 32a – Use a comma before a coordinating conjunction joining two independent clauses.
Advertisements

Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
1 Strings and Text I/O. 2 Motivations Often you encounter the problems that involve string processing and file input and output. Suppose you need to write.
Fourth Grade Grammar Jeopardy Start.
1 The Excel Spreadsheet Application as Basis for a Computer Assisted Language Learning Activity Authoring System Karl Sklar St. Michaels College School.
Computations have to be distributed !
Chapter 5 Mechanics of Writing Business Communication Copyright 2010 South-Western Cengage Learning.
Chapter 5 Mechanics of Writing
Punctuation & Grammar., ?; :’!., ?; “” :’!., ?; “” :’!
PUNCTUATION MARKS ETC. for Writing References & Citations.
Learning Web development. 3(+1) Tier architecture PHP script Remote services Web Server (Apache, IIS) Browser (IE, FireFox, Opera) Desktop (PC or MAC)
25-Jun-15 JavaScript Language Fundamentals II. 2 Exception handling, I Exception handling in JavaScript is almost the same as in Java throw expression.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
COS 381 Day 22. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia.
Objected Oriented Perl An introduction – because I don’t have the time or patience for an in- depth OOP lecture series…
Apostrophes & Quotation Marks The Brenham Writing Room Created by D. Herring.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Rules Apostrophes Examples 1. Use an apostrophe to replace
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
Colons and semi-colons A colon (:) is used to introduce a list, a definition or a quotation. He needs: to focus during lessons; to improve high frequency.
Last section of punctuations!!! TEST next THURSDAY!!!
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 9 Strings and Text.
LIS 7450, Searching Electronic Databases Basic: Database Structure & Database Construction Dialog: Database Construction for Dialog (FYI) Deborah A. Torres.
Name _____________________________________________________________________ Spelling Contract: “The Stories Julian Tells” Week of ________________ Complete.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
LA 8: Unit I: Writing Mechanics Capitalization and Punctuation.
The course. Description Computer systems programming using the C language – And possibly a little C++ Translation of C into assembly language Introduction.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
Grammar Review Parts of Speech Sentences Punctuation.
© 2006 SOUTH-WESTERN EDUCATIONAL PUBLISHING 11th Edition Hulbert & Miller Effective English for Colleges Chapter 10 PUNCTUATION.
Chapter 6(6.5~) Concordance lines and corpus linguistics Parallel embedded system design lab 이청용.
REVIEW (NOUN). What is noun? are names of person, places, things, animals or event. What is noun? are names of person, places, things, animals or event.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
1 CS 430: Information Discovery Lecture 8 Automatic Term Extraction and Weighting.
LING 408/508: Programming for Linguists Lecture 27 December 9 th.
What punctuation marks did you use to create possessive forms, contractions, and some plurals ? PERIOD APOSTROPHE QUESTION MARKS.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Perl References arrays and hashes can only contain scalars (numbers and strings)‏ if we want something more complicated (like an array of arrays) we use.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
Regular Expressions MAKING POWERFUL REGULAR CHANGES DESCRIBED BY EXPRESSIONS.
1 Writing for Computer Science 4. Punctuation Ko, Myung warn.
Year 1  Word:  Add –s to make words plural.  Add –ing, -ed and –er.  Add -un  Sentence  I can use and to create compound sentences.  I can join.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
SPAG Parent Workshop April Agenda English and the new SPaG curriculum How to help your children at home How we teach SPaG Sample questions from.
Regular Expressions Copyright Doug Maxwell (
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Regular Expressions 'RegEx'.
Microsoft Office Illustrated
Apostrophes.
Microsoft Office Illustrated
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Apostrophes Use an apostrophe to show possession
Text Features Matching Game
For more detailed instructions, see the Getting Started presentation.
Application Prefetch Files Prefetch Files
Complete Apostrophe Use Worksheet #4 for homework.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
Apostrophes Rule #4.
Chapter 5 Mechanics of Writing
More Variables and Data Types: references/matricies
POSSESSIVE  ´S SINGULAR NOUNS PLURAL NOUNS
4. Indices and Algebra Use powers to describe repeated multiplication 6c Use powers in algebra to describe repeated multiplication 6b Simplify.
Apostrophes Brook Meath.
What is an apostrophe? Apostrophes are punctuation marks. In English we use them in two ways, to show possession and to show contraction (or omission).
ACT English:.
Punctuation.
Presentation transcript:

Practical Text Mining With Perl 데이터베이스연구실 김 민 흠

3.7 Two Text Application This section discusses two applications, which are easy to program in Perl thanks to hashes. The first illustrates an important property of most texts, one that has consequences later in this book. The second develops some tools that are useful for certain types of word games.

3.7.1 Zipf’s Law for A Chiristmas carol Program 3.2 A concordance program that finds matches for a regular expression. The file name, regex, and text extract radius are given as command line arguments.

3.7.1 Zipf’s Law for A Chiristmas carol As discussed in section and 2.4.3, hyphens and apostrophes cause problems. Using program 3.2, we can find all instances of potentially problematic punctuation. These cases enable us to decide how to handle the punctuation so that the words in the novel change as little as possible.

3.7.1 Zipf’s Law for A Chiristmas carol First dashes Command line argument 사용 C:\>perl 78ex.pl A_Christmas_Carol.txt -- 30

3.7.1 Zipf’s Law for A Chiristmas carol  Second single hypens.  C:\>perl 78ex.pl A_Christmas_Carol.txt “\w-\w” 30

3.7.1 Zipf’s Law for A Chiristmas carol Third apostrophes. Apostrophes are used for quotes within quotations as well as for possessive nouns. The latter produces one ambiguity due to possessives of plural nouns ending in s for example, seven years'. Another possible ambiguity is a contraction with an apostrophe at either the beginning or the end of a word.

3.7.1 Zipf’s Law for A Chiristmas carol Perl 78ex.pl A_Christmas_Carol.txt “\w’\W” 30Perl 78ex.pl A_Christmas_Carol.txt “\W’\w” 30

Program 3.3 This program counts the frequency of each word in A Christmas Carol. The output is sorted by decreasing frequencies. CSV 파일 ( 쉼표구분 파일 ) 프린트 됩니다.

An Aid to Crossword Puzzles 가로세로 퍼즐에 맞는 단어를 찾음. CROSSWD.TXT 가 라인당 하나의 단어를 가지고 있기때문에 REGEX 가 작동한 다. C:\>Perl 85ex.pl “^\w{2}j\w{2}n\w$” REGEX 에 ^ 과 $ 를 사용해서 7 문자를 표시.

word Anagram 아나그램 dictionary 를 만든다. 알파벳순서로 정렬된 각각의 단어들로 기재되어 있다. 예 ) bdac 는 abcd 의 index 를 문자열을 가지고 있다.

Finding Words in a Set of Letters 한 그룹뿐아니라 서브그룹도 고려. 예 ) 8 개의 글자로 255 개의 subset 을 만들수 있음 Program 3.6 This program finds all words formed from subsets of a group of letters.

3.8.1 References and Pointers 예)예) $wordref 가 the 의 메모리 위치를 저장  디레퍼런스 : 저장된 위치의 값을 검색하는 방법 레퍼런스앞에 $ 를 붙이거나 -> 를 사용 레퍼런스 : 변수등이 지정되어 있는 위치

3.8.1 References and Pointers 레퍼런스를 사용하는 법 백슬러시, 대괄호 [ ] (anonymous array) 배열이나 연상배열을 디레퍼런스 : 레퍼런스앞에 와 % 를 붙임

3.8.1 References and Pointers 해시배열 ( 연상배열 ) => 은, 대신에 사용 Anonymous 해시는 중괄호 사용

3.8.2 Arrays of Arrays and Beyond Arrays of Arrays Anonymous array 의 리스트 세가지 모두 동일한 표현 By putting $data[0] } this is dereferenced. [ ] [ ] 사이에 arrow 를 포함

3.8.2 Arrays of Arrays and Beyond Code 3.31 $#data 의 마지막 index 부여

3.8.2 Arrays of Arrays and Beyond Code 3.32

3.8.2 Arrays of Arrays and Beyond Code 3.33

감사합니다