Presentation is loading. Please wait.

Presentation is loading. Please wait.

Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel

Similar presentations


Presentation on theme: "Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel"— Presentation transcript:

1 Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel

2 Data Papers from RePEc dataset –31139 Working Papers –15145 Journal Articles all of them available online, not all are free More than 90% of them are in PDF or PostScript formats

3 Harvesting Perl script that: – Reads the RePEc data – Downloads the documents full text – Converts them to ASCII (using pstotext) – Tries to find a Reference section

4 Test on 1000 documents 13% are not found in the URL specified 3% are not it PDF or PS 15% give errors in the pstotext conversion 9% are converted but a reference section can not be found 60% were successfully converted

5 Parsing problems of CiteSeer Publication date. When a reference contains more than one year it is discarded Source of publication, i.e. working papers series or journals titles is not parsed be CiteSeer. We will need to add code with a list of all journals and working paper series.

6 To do Study of citation patterns Use of data in user services Use of data in logging and registration services

7 Thank you for your attention. Contact José Manuel Barrueco Cruz for more information


Download ppt "Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel"

Similar presentations


Ads by Google