Presentation is loading. Please wait.

Presentation is loading. Please wait.

DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic.

Similar presentations


Presentation on theme: "DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic."— Presentation transcript:

1 DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic

2 A Brief History

3 Statistics in DSpace 1.0

4 Statistics in DSpace 1.1 This slide is left intentionally blank

5 Statistics in DSpace 1.2 If I’m honest, this is just padding

6 Statistics in DSpace 1.3

7 Classic Statistics  Shows items archived, views, searches  Parses dspace.log  Renders flat HTML files  Uses two scripts which must be scheduled  Reports can be public, or admin only

8 Classic Statistics – Config  All configuration in [dspace]/config/dstat.cfg (overview and search exclusions)  Displays:  Overview  Archive breakdown (item types)  Items viewed  Actions (Deletion, Update, Create, etc.)  Logins  Searches (keywords)  Action names in [dspace]/config/dstat.map

9 Classic Statistics – Issues  dspace.log is primarily for debugging  May not log all information required  May log lots of unnecessary information  Size of log files  1 log line does not equal a single access  No filtering of spiders, robots, etc.  Log parsing may take some time  Slow to update stats

10 Fast Forward: DSpace 1.6

11 Solr Statistics  Available for JSP and XML Uis  Event logger writes to Apache Solr  Filters Spiders by IP address  Reports are searches of usage data  Reports can be public, or admin only

12 Solr Stats - What is Indexed  Time  Type (item, bitstream, etc), Id  Owning Community, Owning Collection, Owning Item  IP, Continent, Country, City, Longtitude / Latitude  Eperson Id, User Agent  Flag to indicate Robot / Spider

13 Solr Stats – Home  Top 10 items

14 Solr Stats – Community  Total visits  Visits last 7 months  Top 10 Countries  Top 10 Cities

15 Solr Stats – Collection  Total visits  Visits last 7 months  Top 10 Countries  Top 10 Cities

16 Solr Stats – Item  Total visits  Total file views  Visits last 7 months  Top 10 Countries  Top 10 Cities

17 Solr Stats – Config (v1.6)  [dspace]/config/dspace.cfg  solr.log.server  Location of Solr server / application  solr.dbfile  Location of Geo database  solr.spiderips.url  URLs to download IP addresses of search spiders  useProxies  Client identification when hosted behind proxy  solr.query.filter.spiderIp  Filter out spider IP addresses in query  solr.query.filter.isBot  Filter out ‘isBot’ field in query  statistics.item.authorization.admin  Set to ‘true’ to restrict to admins, false for public access

18 Solr Stats – Config (v1.7)  [dspace]/config/dspace.cfg  solr.log.server  solr.dbfile  solr.spiderips.url  useProxies  solr.query.filter.spiderIp  solr.query.filter.isBot  statistics.item.authorization.admin  solr.resolver.timeout  Timeout for the DNS resolver (lower for fewer connections)  solr.satatistics.logBots  Disable logging of events by spider IP addresses

19 Solr Stats – Config (v1.8)  [dspace]/config/modules/solr-statistics.cfg  server  spiderips.urls  dbfile  resolver.timeout  useProxies  logBots  query.filter.spiderIp  query.filter.isBot  authorization.admin  query.filter.bundles  Bundles for which to display file stats (requires 1.8 index)

20 Solr Stats – Improvements  Dspace v1.8  Displayed file bundle  Configurable - defaults to ORIGINAL bundle  [dspace]/bin/dspace stats-util –b –r  Dspace v1.7  Solr Optimization  [dspace]/bin/stats-util –o  Autocommit  Defaults to 15 minute intervals  Configurable in [dspace]/solr/statistics/colrconfig.xml  maxTime property

21 Solr Stats – Upgrade from Classic  Scripts parse dspace.log files to Solr entries  [dspace]/bin/dspace stats-log-converter  [dspace]/bin/dspace stats-log-importer  -I  Input file  -m  Adds a wildcard to the input (i.e. dspace.log*)  -s  Skip reverse DNS lookup (can be slow)  -v  Verbose output

22 Solr Stats –Custom Queries  You can expand the reports by querying the Solr index directly Example: Top downloads for a user – query on epersonid facet: 1167 251 42 36 20 18 9 0

23 Solr Stats - Maintenance  [dspace]/bin/dspace stats-util –h usage: StatisticsClient -b,--reindex-bitstreams Reindex the bitstreams to ensure we have the bundle name -r,--remove-deleted-bitstreams While indexing the bundle names remove the statistics about deleted bitstreams -u,--update-spider-files Update Spider IP Files from internet into /dspace/config/spiders -f,--delete-spiders-by-flag Delete Spiders in Solr By isBot Flag -i,--delete-spiders-by-ip Delete Spiders in Solr By IP Address -m,--mark-spiders Update isBot Flag in Solr -h,--help help -o,--optimize Run maintenance on the SOLR index

24 Solr Stats - Issues  Privacy laws – IP addresses not anonymized  Performance issues / resource usage  Maintenance of Solr  Usage when Solr is unavailable  Usage tracking during periods of high usage

25 Summary  Classic Statistics  Possibly slow to analyse, fast to display  Delay in updating  Very imperfect  Solr Statistics  Updates ‘real time’  Can be slow to render as dataset grows  Improved in each release  Less imperfect


Download ppt "DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic."

Similar presentations


Ads by Google