Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana.

Similar presentations


Presentation on theme: "The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana."— Presentation transcript:

1 The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono ( ) Supervisors Resmana Lim, M.Eng. Adi Wibowo, M.T.

2 The need to obtain the necessary scientific journal. Limited access to obtaining scientific journal. The need to get article information, not only by harvesting, but also manual. The need to obtain better search result. Background

3 Problem : How to get article information by harvesting from external journal site? How to input article which formated BibTex, XML or PDF into database? How to harvest article automatically at a certain period? How to do indexes of article exist in database? How to search by using OKAPI BM25 of existing article in database? Goal : To develop information-sharing site for more complete article information and make user get the desired information Problem & Goal

4 Context Diagram

5

6 Harvesting Process metadataformat verb example : verb=ListMetadataFormats listidentifiers verb example : verb=ListIdentifiers&from= &until= &metadataPrefix=oai_dc getrecord verb example : verb=GetRecord&identifier=oai:CiteSeerXPSU: &metadataPrefix=oai_dc listrecord verb example : &until= &metadataPrefix=oai_dc

7 Article Management Process

8 Indexing Process

9 Title Process Description Process

10 Content Process Creator Process

11 Explode Process Stop Word Process

12 Stemming Process Hitung f(qi,D) Process

13 Total Artikel Process Hitung IDF Process

14 Avgdl Process Search Process

15 OKAPI Process User Management Process

16 Message Management

17 Entity Relationship Diagram (ERD)

18 OKAPI BM25 Okapi BM25 is a function of ratings used search engines to give ratings on the desired documents based on relevance to a given query. OKAPI BM25 Formula Inverse Document Frequency

19 Article example : Article Example TitleDescriptionContent Oai1complex stockhastNumer analysi Model complex real detail analysi build Oai2 Managed abstrach build Manner detail Join creation numer make possibl Oai3 Structur detail possibl Real abstrach world Make detail usual manner Oai4Build world explorAnalysi detailManaged stockhast replicating complex explor

20 Manual : Manual & Program IDF Calculation Program :

21 Keyword example : complex Manual : Program : Manual & Program OKAPI Calculation

22 Article : 500 Keyword : Network System Search result= 198 article Result maybe relevan= 29 article Relevan article result = 12 Recall = 12/12 *100% = 100% Precision = 12/198 *100% = 6% Recall Precision Oai identifierRelevanSearch rank oai:CiteSeerXPSU: tidak15 oai:CiteSeerXPSU: tidak12 oai:CiteSeerXPSU: ya8 oai:CiteSeerXPSU: tidak6 oai:CiteSeerXPSU: ya3

23 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: tidak16 oai:CiteSeerXPSU: tidak25 oai:CiteSeerXPSU: tidak13 oai:CiteSeerXPSU: ya5 oai:CiteSeerXPSU: ya24 oai:CiteSeerXPSU: tidak28 oai:CiteSeerXPSU: ya9 oai:CiteSeerXPSU: ya10 oai:CiteSeerXPSU: tidak18 oai:CiteSeerXPSU: tidak29 oai:CiteSeerXPSU: ya4 oai:CiteSeerXPSU: tidak21 oai:CiteSeerXPSU: tidak23 oai:CiteSeerXPSU: ya17

24 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: tidak19 oai:CiteSeerXPSU: ya20 oai:CiteSeerXPSU: tidak26 oai:CiteSeerXPSU: tidak27 oai:CiteSeerXPSU: ya1 oai:CiteSeerXPSU: ya2 oai:CiteSeerXPSU: tidak22 oai:CiteSeerXPSU: tidak14 oai:CiteSeerXPSU: tidak11 oai:CiteSeerXPSU: ya7

25 Keyword : music model Search result = 150 article Result maybe relevan = 30 article Relevan article result = 14 Recall = 14/14 *100% = 100% Precision = 14/150 *100% = 9.3% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: ya19 oai:CiteSeerXPSU: tidak29 oai:CiteSeerXPSU: ya18 oai:CiteSeerXPSU: ya21 oai:CiteSeerXPSU: ya6 oai:CiteSeerXPSU: tidak27 oai:CiteSeerXPSU: tidak10

26 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: ya25 oai:CiteSeerXPSU: tidak12 oai:CiteSeerXPSU: ya30 oai:CiteSeerXPSU: ya11 oai:CiteSeerXPSU: tidak16 oai:CiteSeerXPSU: ya20 oai:CiteSeerXPSU: tidak33 oai:CiteSeerXPSU: tidak32 oai:CiteSeerXPSU: ya1 oai:CiteSeerXPSU: tidak13 oai:CiteSeerXPSU: tidak31 oai:CiteSeerXPSU: tidak8 oai:CiteSeerXPSU: ya15 oai:CiteSeerXPSU: ya7

27 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: tidak24 oai:CiteSeerXPSU: ya4 oai:CiteSeerXPSU: ya5 oai:CiteSeerXPSU: ya3 oai:CiteSeerXPSU: tidak23 oai:CiteSeerXPSU: tidak17 oai:CiteSeerXPSU: tidak28 oai:CiteSeerXPSU: tidak14 oai:CiteSeerXPSU: tidak9

28 Keyword : music analysis Search result = 116 article Result maybe relevan = 23 article Relevan article result= 10 Recall = 10/10 *100% = 100% Precision = 10/116 *100% = 8.6% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: ya22 oai:CiteSeerXPSU: ya2 oai:CiteSeerXPSU: tidak3 oai:CiteSeerXPSU: tidak9 oai:CiteSeerXPSU: ya5 oai:CiteSeerXPSU: tidak23 oai:CiteSeerXPSU: ya19

29 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: ya17 oai:CiteSeerXPSU: ya10 oai:CiteSeerXPSU: ya20 oai:CiteSeerXPSU: tidak13 oai:CiteSeerXPSU: tidak21 oai:CiteSeerXPSU: ya1 oai:CiteSeerXPSU: tidak18 oai:CiteSeerXPSU: tidak11 oai:CiteSeerXPSU: ya7 oai:CiteSeerXPSU: tidak4 oai:CiteSeerXPSU: tidak16 oai:CiteSeerXPSU: ya15 oai:CiteSeerXPSU: ya17 oai:CiteSeerXPSU: tidak12

30 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU: tidak6 oai:CiteSeerXPSU: tidak14 oai:CiteSeerXPSU: tidak8

31 Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel detik 200 artikel detik 300 artikel detik 400 artikel detik 500 artikel detik

32 Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel detik 200 artikel detik 300 artikel detik 400 artikel detik 500 artikel detik

33 Article : 500 Keyword : computer analysis search result: 140 artikel, Time : second Search Time

34 Keyword : user applications search result : 92 artikel, Time : second Search Time Continue

35 Keyword : work scheme search result : 92 artikel, Time : second Search Time Continue

36 Keyword : high image transform search result : 101 artikel, Time : second Search Time Continue

37 Keyword : network search result : 76 artikel, Time : second Search Time Continue

38 Conclusion 1.System only can perform metadata harvesting process with oai_dc metadataformat. 2.System only can updating automatically on the approved url. 3.Time needed by system to generated keyword-related article is varied, according the number of articles produced. 4.Recall on search result is very good, because it has an average of 100% while the precision is bad enough because it had an average of less than 10%. The result was good enough because of all articles that may be relevant if they are rated less than 30. Conclusion

39 Suggestion 1.The system can be developed in order to become data providers. 2.The system can be dynamically able to harvest other metadata formats. Suggestion

40 Thank You For Your Attention


Download ppt "The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana."

Similar presentations


Ads by Google