Presentation on theme: "Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel"— Presentation transcript:
Current work on CitEc José Manuel Barrueco Cruz Thomas Krichel
Data Papers from RePEc dataset –31139 Working Papers –15145 Journal Articles all of them available online, not all are free More than 90% of them are in PDF or PostScript formats
Harvesting Perl script that: – Reads the RePEc data – Downloads the documents full text – Converts them to ASCII (using pstotext) – Tries to find a Reference section
Test on 1000 documents 13% are not found in the URL specified 3% are not it PDF or PS 15% give errors in the pstotext conversion 9% are converted but a reference section can not be found 60% were successfully converted
Parsing problems of CiteSeer Publication date. When a reference contains more than one year it is discarded Source of publication, i.e. working papers series or journals titles is not parsed be CiteSeer. We will need to add code with a list of all journals and working paper series.
To do Study of citation patterns Use of data in user services Use of data in logging and registration services
Thank you for your attention. Contact José Manuel Barrueco Cruz for more information