METRIDOC: A Framework for Managing and Exposing Library Event Data With the support of University of Pennsylvania Libraries
METRIDOC University of Pennsylvania Libraries Metrics start with a basic abstraction: The Event
METRIDOC University of Pennsylvania Libraries xxx.xx.xxx.xxx|-|zucca|[26/Jul/2007:15:41: ]| GET tp:// t=psycinfo&adv=1 HTTP/1.1| 302|0| Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/ (KHTML, like Gecko) Safari/419.3| NGpmb6dT6JXswQH|__utmc= ;ezproxy=NGpmb6dT6JXswQH; hp=/; proxySessionID= ; __utmc= ; __utmz= utmccn=(direct)|utmcsr=(direct)|utmc md=(none);UPennLibrary=AAAAAUaWP5oAACa4AwOOAg==; sfx_session_id=s6A37A3E0-3B8E-11DC-80E985076F88F67F Viewing an Ejournal article. The Event as raw data
METRIDOC University of Pennsylvania Libraries User & Program Parameters College | Dept Rank Course Host College Host Dept Instructor Grant Spnsr Library Parameters Srvice Genre Cognzt Staff Orgn’l Unit Budget cntr Environmental Parameters Bibliographic Parameters Title URI Format Cost| Supplr Date | Time Location IP Domain URL EVENT An Event Abstracted
METRIDOC University of Pennsylvania Libraries Link resolver Proxy server COUNTER ILS (Voyager, I3, Kuali-OLE) Resource sharing system Web server Social networking Srvs. Spreadsheets, databases Other targets… The “Event” is represented in machine-readable data, stored in a plethora of business systems. E-Resource Use by service, demographic, package Expenditures & Inventory planning / reader interest data Supply chain data Discovery systems & content use Research & instructional data learning management Other events… Event TypesSource Target
Is a framework for : Extracting event data from systems Transforming those data into readable, normalized formats Loading transformed/normalized payload into a repository Supporting analysis through local and collaborative dissemination channels. MetriDoc METRIDOC University of Pennsylvania Libraries
Increased scope of sources Synthesis of vectors, e.g. Expenditure per use Resources use by communities Contextualized data with greater statistical dimension and descriptive power. Collaborative assessment. Improved Data Resolution Through Integration METRIDOC University of Pennsylvania Libraries
Our legacy system: Datafarm Perl cron Perl cron Perl cron Voyager Farmer Quaker App Logs
METRIDOC University of Pennsylvania Libraries Datafarm Shortcomings Maintainability issues Scripts that depend on each other located in different places Perl is very productive as long as you are maintaining your own code Doing the same thing over again, no code reuse Lack of notification for success and failure Not shareable No safe way to expose data for collaboration Generating data for a report can be a job in itself Schemas are not stored in a sharable format Not reusable Doing the same thing over and over again without building libraries for common tasks No central code repository to share libraries within and outside of UPenn
METRIDOC University of Pennsylvania Libraries What we need?Who takes care of it A central schedulerJenkins Notifications of job success or failure Jenkins Batch job / etl scripting framework Metridoc Exposing dataMetridoc – Google data format Reporting / GraphsGoogle Charts / R / Tableau / Other Stat Packages Central Code RepositoryMaven Central via Sonatype Hosting
METRIDOC University of Pennsylvania Libraries Current System: Metridoc Perl cron Perl cron Perl cron Voyager Farmer Quaker App Logs
METRIDOC University of Pennsylvania Libraries Metridoc Philosophy
METRIDOC University of Pennsylvania Libraries Scripting Framework
METRIDOC University of Pennsylvania Libraries Scripting Example
METRIDOC University of Pennsylvania Libraries Scripting Example
METRIDOC University of Pennsylvania Libraries Exposing data
METRIDOC University of Pennsylvania Libraries Metrics on the cheap (google charts)
METRIDOC University of Pennsylvania Libraries Thoughts on complex statistics
METRIDOC University of Pennsylvania Libraries The future
METRIDOC University of Pennsylvania Libraries Abstracts 4 key functions, exposes interfaces for interoperability Target Source, e.g. Relais, Illiad, ILS Ingest Log Parse Format Refined output 1. Extract Resolution Sources e.g. IdM, WorldCat Refined output Resolve Codes & IDs Normalize 2. Transform Query Srvc Data Repo 3. Load User Interface Local Data Stores Results Document Query Document 4. Query
METRIDOC University of Pennsylvania Libraries Partners are welcome Sponsor More at