Presentation is loading. Please wait.

Presentation is loading. Please wait.

Korean script searching in Korean Library OPACs Junglim Chae Yonsei University.

Similar presentations


Presentation on theme: "Korean script searching in Korean Library OPACs Junglim Chae Yonsei University."— Presentation transcript:

1 Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

2 Indexing Method N-Gram Morphological Analysis

3 N-Gram Indexing N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다  12 Index by Bigram Segmentation  아버, 버지, 지가, 가 0, 0 방, 방에, 에 0, 0 들, 들어, 어가, 가신, 신다 Many index terms-many results but lots of noise High recall ratio but low precision ratio

4 Morphological Analysis Requires a morphological analysis dictionary E.g.) 아버지가 방에 들어가신다  Three Index by morphological analysis  아버지, 방, 들어가다 Ability to match linguistically similar terms Faster performance with a smaller index Accurate matches that meet user expectations High precision ratio but low recall ratio

5 N-Gram Vs. Morphological Analysis N-GramMorphological Analysis Recall RatioHighLow Precision RatioLowHigh Size of IndexBigSmall Indexing SpeedFastSlow Search SpeedSlowFast ApplicationLibrariesWeb Search Engines

6 A Case Study Yonsei University Library Library System: Maestro-Y Search Engine: K2 by Verity Indexing Method  N-Gram (bigram) + Morphological Analysis Indexing Rules Rule1: Divide Strings by space Rule2: Extract index using bigram indexing method Rule3: Add the whole string excluding spaces between strings Rule4: Add words from Korean morphological analysis dictionary

7 A Case Study Yonsei University Library E.g.) ‘ 국어문법의 이해 ’  국어문법의 / 이해 (rule1)  국어, 어문, 문법, 법의, 이해 (rule2)  국어문법의이해 (rule3)  국어문법 (rule4)  Index: 국어, 어문, 문법, 법의, 이해, 국어문법, 국어문법의이해

8

9

10 Search Tips

11 Search Tips(1) Keyword Search – 키워드검색, 임의검색 –Default Search Option –Use at most 3 keywords Use Boolean operators Omit Stop-words

12 Search Tips(2) Keyword Search –Follow the Korean Word Division Rules E.g.) 동해물과 백두산이 (O) 동해물과백두산이 (X)

13 Search Tips(3) Keyword Search –Compound Nouns do not use spaces between nouns E.g.) 서울대학교 (O), 서울 대학교 (X )

14

15 Browse Search –Begin with or Truncation – 전방일치검색, 우측절단검색 –When you already know the first word of the title, author, or publisher E.g.) 한글과 Search Tips(4)

16

17 Browse Search –Korean Classics E.g.) 열여춘향슈절가라 Search Tips(5)

18

19

20 Exact Match –Precise Search – 완전일치검색 –Known items E.g.) 난중일기 Search Tips(6)

21

22 Exact Match –Single character words E.g.) ‘ 산 ’, ‘ 흙 ’, ‘C’ Search Tips(7)

23

24

25 Support Hangul/Hancha Searching E.g.) 中國歷史文選 / 중국역사문선 Search Tips(8)

26

27 Japanese Kana Archaic Korean Russian Special characters : Choose scripts from Multi-language Input Table Search Tips(9)

28 E.g.) Multi-Script Input Table

29 Japanese Kana – 日本の歷史 / 일본の역사 / 일본노역사 – 日本デザイン論 일본デザイン론 일본데자인론 Search Tips(10)

30

31 Personal names – 윤동주 – 이광수 ; 춘원 –Shakespeare ; 셰익스피어 –Murakami, Haruki ; 村上春樹 ; 촌상춘수, 무라카미 하루키 Search Tips(11)

32

33 Space –Considered as AND E.g.) 한국 역사 = 한국 AND 역사 –In some OPACs, spaces in the character fields do make a difference in retrieval Search Tips(12)

34

35

36

37

38 Comparative search with and without space Input Keywords Libraries 국어 문법 National Assembly Library102 Spaces do not matter National Digital Library2,047 KERIS (monographs)3,246 Seoul National University Library171 Korea University Library224 The National Library of Korea614400 Spaces matter Yonsei University Library332179 Sungkyunkwan University Library14170 Ewha Womans University Library221133 Hanyang University Library344181

39 謝謝 Thank You 감사합니다 ありがとうございます junglim.chae@yale.edu


Download ppt "Korean script searching in Korean Library OPACs Junglim Chae Yonsei University."

Similar presentations


Ads by Google