Presentation is loading. Please wait.

Presentation is loading. Please wait.

The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of.

Similar presentations


Presentation on theme: "The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of."— Presentation transcript:

1

2 The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of Electrical Engineering, Belgrade, Serbia 6th International Conference on Education and New Learning Technologies Barcelona, 7th - 9th of July 2014

3 Introduction A large number of forums with different topics Forums are often used by students during their studies Large number of relevant information scattered around different forums inside one university domain Forums are based on different technologies 2

4 Issues The same topic can appear across different forums inside one university domain School official forums VS. departments independent forums Same documents can be uploaded as post attachments to a couple of different web forums Similar courses at different schools 3

5 Solution – Specialized crawler Specialized forum crawler Aggregation of crawled data from multiple forums of a single university domain Storing data into database Forum modules that use this database for helping students 4

6 Forum structure Always defined by presented implicit paths 5 Example of a) forum b) thread c) attachments inside post.

7 Crawler algorithm FCbRE – Forum Crawler based on Regular Expressions Automated system Identifying DOM structure and basic forum elements with regular expressions. Identifying forum implicit paths using regex Example: >>index\.php\?showforum\==\digit+!>+>\P=!<+ Extraction of post content and storing into the database 6

8 Crawler database Essential in FCbRE model Forum threads and posts are separately stored Similarity tables that contain unique pairs of identifiers of forums, threads and attachments 7 Forums + site id - forum id - forum name - forum link Threads + forum id - thread id - thread name - thread link Posts + thread id - post id - post info Attach + post id - attach id - attach name - attach link Web Forum - site id - site name - site link F – Simil. + forum id (1) + forum id (2) T – Simil. + thread id (1) + thread id (2) F/T – Simil. + forum id + thread id A – Simil. + attach id (1) + attach id (2)

9 Finding similarities Determining similarities of forums, threads or document names It is not enough to just compare the words grammatical errors Singular/plural form different form but the same semantic meaning Using existing search engines to distinguish semantics FCbRE uses low-level semantic difference 8

10 Module plugins Two module plugins FCbRE-S (FCbRE Search plugin ) FCbRE-DP (FCbRE Duplicate Prevention plugin) Both used for experimental purposes Written for vBulletin technology Can be adopted for any other forum technology 9

11 FCbRE-S (FCbRE Search plugin ) Designed for standard forums searches Forwards the requested query to FCbRE database for similarity comparison All similarities are shown as addition to standard search results 10

12 FCbRE-DP (Duplicate Prevention plugin) Implemented in the section where the users can create a topic or forum Monitors the field for the name of new thread or forum Notifies the user that the similarity exist 11

13 Results 9 web forums from the University of Belgrade, manually gathered This group is a mixture from different sources Percentage of similar forums is smallest, while for the document is highest True percentage of "useful" duplicates should be taken with caution 12

14 Conclusion The proposed solution performs information aggregation of related forums It has potential in reducing duplication of forums, topics and posts The use of plugins would result in higher forum content quality 13

15 Thank you! 14 Feel free to contact us and ask any question that you may find interesting milos_pavkovic@yahoo.com jeca@etf.bg.ac.rs


Download ppt "The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš Pavković and Jelica Protić, University of Belgrade School of."

Similar presentations


Ads by Google