Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana.

Similar presentations


Presentation on theme: "The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana."— Presentation transcript:

1 The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana Lim, M.Eng. Adi Wibowo, M.T.

2 The need to obtain the necessary scientific journal. Limited access to obtaining scientific journal. The need to get article information, not only by harvesting, but also manual. The need to obtain better search result. Background

3 Problem : How to get article information by harvesting from external journal site? How to input article which formated BibTex, XML or PDF into database? How to harvest article automatically at a certain period? How to do indexes of article exist in database? How to search by using OKAPI BM25 of existing article in database? Goal : To develop information-sharing site for more complete article information and make user get the desired information Problem & Goal

4 Context Diagram

5

6 Harvesting Process metadataformat verb example : http://citeseerx.ist.psu.edu/oai2? verb=ListMetadataFormats listidentifiers verb example : http://citeseerx.ist.psu.edu/oai2? verb=ListIdentifiers&from=2010-03-17&until=2010-03- 18&metadataPrefix=oai_dc getrecord verb example : http://citeseerx.ist.psu.edu/oai2? verb=GetRecord&identifier=oai:CiteSeerXPSU:10.1.1.1.2918 &metadataPrefix=oai_dc listrecord verb example : http://citeseerx.ist.psu.edu/oai2?verb=ListRecords&from=201 0-03-17&until=2010-03-18&metadataPrefix=oai_dc

7 Article Management Process

8 Indexing Process

9 Title Process Description Process

10 Content Process Creator Process

11 Explode Process Stop Word Process

12 Stemming Process Hitung f(qi,D) Process

13 Total Artikel Process Hitung IDF Process

14 Avgdl Process Search Process

15 OKAPI Process User Management Process

16 Message Management

17 Entity Relationship Diagram (ERD)

18 OKAPI BM25 Okapi BM25 is a function of ratings used search engines to give ratings on the desired documents based on relevance to a given query. OKAPI BM25 Formula Inverse Document Frequency

19 Article example : Article Example TitleDescriptionContent Oai1complex stockhastNumer analysi Model complex real detail analysi build Oai2 Managed abstrach build Manner detail Join creation numer make possibl Oai3 Structur detail possibl Real abstrach world Make detail usual manner Oai4Build world explorAnalysi detailManaged stockhast replicating complex explor

20 Manual : Manual & Program IDF Calculation Program :

21 Keyword example : complex Manual : Program : Manual & Program OKAPI Calculation

22 Article : 500 Keyword : Network System Search result= 198 article Result maybe relevan= 29 article Relevan article result = 12 Recall = 12/12 *100% = 100% Precision = 12/198 *100% = 6% Recall Precision Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.1.3301tidak15 oai:CiteSeerXPSU:10.1.1.1.8714tidak12 oai:CiteSeerXPSU:10.1.1.11.3246ya8 oai:CiteSeerXPSU:10.1.1.131.2961tidak6 oai:CiteSeerXPSU:10.1.1.133.114ya3

23 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.133.5166tidak16 oai:CiteSeerXPSU:10.1.1.134.7415tidak25 oai:CiteSeerXPSU:10.1.1.135.7151tidak13 oai:CiteSeerXPSU:10.1.1.138.8592ya5 oai:CiteSeerXPSU:10.1.1.143.7835ya24 oai:CiteSeerXPSU:10.1.1.143.9199tidak28 oai:CiteSeerXPSU:10.1.1.147.3140ya9 oai:CiteSeerXPSU:10.1.1.148.6013ya10 oai:CiteSeerXPSU:10.1.1.149.7229tidak18 oai:CiteSeerXPSU:10.1.1.2.8672tidak29 oai:CiteSeerXPSU:10.1.1.2.876ya4 oai:CiteSeerXPSU:10.1.1.28.2069tidak21 oai:CiteSeerXPSU:10.1.1.28.3751tidak23 oai:CiteSeerXPSU:10.1.1.31.5233ya17

24 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.32.3394tidak19 oai:CiteSeerXPSU:10.1.1.34.422ya20 oai:CiteSeerXPSU:10.1.1.37.133tidak26 oai:CiteSeerXPSU:10.1.1.37.886tidak27 oai:CiteSeerXPSU:10.1.1.46.7941ya1 oai:CiteSeerXPSU:10.1.1.5.5436ya2 oai:CiteSeerXPSU:10.1.1.61.8860tidak22 oai:CiteSeerXPSU:10.1.1.62.5142tidak14 oai:CiteSeerXPSU:10.1.1.8.4971tidak11 oai:CiteSeerXPSU:10.1.1.94.3465ya7

25 Keyword : music model Search result = 150 article Result maybe relevan = 30 article Relevan article result = 14 Recall = 14/14 *100% = 100% Precision = 14/150 *100% = 9.3% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.10.1860ya19 oai:CiteSeerXPSU:10.1.1.10.2860tidak29 oai:CiteSeerXPSU:10.1.1.111.3072ya18 oai:CiteSeerXPSU:10.1.1.127.8691ya21 oai:CiteSeerXPSU:10.1.1.130.1856ya6 oai:CiteSeerXPSU:10.1.1.133.7089tidak27 oai:CiteSeerXPSU:10.1.1.140.3374tidak10

26 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.140.8940ya25 oai:CiteSeerXPSU:10.1.1.142.7598tidak12 oai:CiteSeerXPSU:10.1.1.149.6567ya30 oai:CiteSeerXPSU:10.1.1.152.2688ya11 oai:CiteSeerXPSU:10.1.1.154.24tidak16 oai:CiteSeerXPSU:10.1.1.154.2529ya20 oai:CiteSeerXPSU:10.1.1.155.1750tidak33 oai:CiteSeerXPSU:10.1.1.16.7401tidak32 oai:CiteSeerXPSU:10.1.1.17.1013ya1 oai:CiteSeerXPSU:10.1.1.18.6229tidak13 oai:CiteSeerXPSU:10.1.1.2.6849tidak31 oai:CiteSeerXPSU:10.1.1.2.8672tidak8 oai:CiteSeerXPSU:10.1.1.20.3633ya15 oai:CiteSeerXPSU:10.1.1.31.5233ya7

27 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.32.5049tidak24 oai:CiteSeerXPSU:10.1.1.34.7828ya4 oai:CiteSeerXPSU:10.1.1.4.677ya5 oai:CiteSeerXPSU:10.1.1.4.7323ya3 oai:CiteSeerXPSU:10.1.1.5.1181tidak23 oai:CiteSeerXPSU:10.1.1.5.4681tidak17 oai:CiteSeerXPSU:10.1.1.52.4788tidak28 oai:CiteSeerXPSU:10.1.1.57.3576tidak14 oai:CiteSeerXPSU:10.1.1.59.9118tidak9

28 Keyword : music analysis Search result = 116 article Result maybe relevan = 23 article Relevan article result= 10 Recall = 10/10 *100% = 100% Precision = 10/116 *100% = 8.6% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.10.2860ya22 oai:CiteSeerXPSU:10.1.1.10.3132ya2 oai:CiteSeerXPSU:10.1.1.140.3374tidak3 oai:CiteSeerXPSU:10.1.1.140.8940tidak9 oai:CiteSeerXPSU:10.1.1.145.8953ya5 oai:CiteSeerXPSU:10.1.1.149.6567tidak23 oai:CiteSeerXPSU:10.1.1.154.2529ya19

29 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.155.1750ya17 oai:CiteSeerXPSU:10.1.1.155.4454ya10 oai:CiteSeerXPSU:10.1.1.156.2520ya20 oai:CiteSeerXPSU:10.1.1.18.6229tidak13 oai:CiteSeerXPSU:10.1.1.2.6849tidak21 oai:CiteSeerXPSU:10.1.1.2.8672ya1 oai:CiteSeerXPSU:10.1.1.25.747tidak18 oai:CiteSeerXPSU:10.1.1.29.4192tidak11 oai:CiteSeerXPSU:10.1.1.34.7828ya7 oai:CiteSeerXPSU:10.1.1.4.7323tidak4 oai:CiteSeerXPSU:10.1.1.5.1181tidak16 oai:CiteSeerXPSU:10.1.1.5.4681ya15 oai:CiteSeerXPSU:10.1.1.155.1750ya17 oai:CiteSeerXPSU:10.1.1.52.4788tidak12

30 Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.59.9118tidak6 oai:CiteSeerXPSU:10.1.1.6.3984tidak14 oai:CiteSeerXPSU:10.1.1.6.757tidak8

31 Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel 805.1392138 detik 200 artikel 1646.911684 detik 300 artikel 2509.824728 detik 400 artikel 3514.183314 detik 500 artikel 4744.517922 detik

32 Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel 805.1392138 detik 200 artikel 1646.911684 detik 300 artikel 2509.824728 detik 400 artikel 3514.183314 detik 500 artikel 4744.517922 detik

33 Article : 500 Keyword : computer analysis search result: 140 artikel, Time :0.549877882004 second Search Time

34 Keyword : user applications search result : 92 artikel, Time : 0.547022104263 second Search Time Continue

35 Keyword : work scheme search result : 92 artikel, Time : 0.491093873978 second Search Time Continue

36 Keyword : high image transform search result : 101 artikel, Time : 0.498678922653 second Search Time Continue

37 Keyword : network search result : 76 artikel, Time : 0.270733833313 second Search Time Continue

38 Conclusion 1.System only can perform metadata harvesting process with oai_dc metadataformat. 2.System only can updating automatically on the approved url. 3.Time needed by system to generated keyword-related article is varied, according the number of articles produced. 4.Recall on search result is very good, because it has an average of 100% while the precision is bad enough because it had an average of less than 10%. The result was good enough because of all articles that may be relevant if they are rated less than 30. Conclusion

39 Suggestion 1.The system can be developed in order to become data providers. 2.The system can be dynamically able to harvest other metadata formats. Suggestion

40 Thank You For Your Attention


Download ppt "The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana."

Similar presentations


Ads by Google