BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Decision Tree Approach in Data Mining
A Case Study Presenters: Karen A. Plummer Valerie Jenkins Joy Ramos
Get out your Homework! You will be able to predict an outcome based on the least-squares method. You will be able to verbalize the meaning of slope and.
Searching Algorithms Finding what you are looking for.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Uncovering the Potential of Library Sales Panelists: Jamie Carter: Operations Manager, Publisher Alley Ceneta Lee-Williams: Sales Manager, Random House.
Acquisitions and Serials in 2005 and beyond Georgia Fujikawa Manager, Training Programs.
Induction of Decision Trees
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
RESEARCH METHODS IN EDUCATIONAL PSYCHOLOGY
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Siobhan Goldberg SOC 680 Fall  US Census (2011)  Hispanic Population  Individuals identified as Hispanic or Latino in origin  California: 38.1%
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Making a Collection Count: Why a Physical Inventory is Essential to a Dynamic Library.
Technical Services & Cataloging and Classification Jennifer Anielski and Christina Tracy IS 554 Public Library Management.
A Beautiful Catalog … Arising from the Abyss Presented by Lori Thorrat Catalog Department Manager.
Basic Data Mining Techniques
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
Next Generation Techniques: Trees, Network and Rules
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Data Management Grade 7. What’s the Story? Secondary data is information that was collected by someone else. Referring to information that was published.
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
Music Score Processing GREATER EFFICIENCY THROUGH ANALYSIS CHUCK PETERS WILLIAM & GAYLE COOK MUSIC LIBRARY INDIANA UNIVERSITY 1.
RESEARCH TECHNIQUES RESEARCH TECHNIQUES FOR STUDENTS USING PRIMARY AND SECONDARY SOURCES.
A Walk in the Park? Descriptive Cataloging from scratch LIB 630 Classification and Cataloging Spring 2010.
V |© OverDrive, Inc | Page 1 Track circulation and make informed purchases using the Reports feature in Content Reserve. Contact:
Running a Report.  List Bibliography Report  Found under: All Titles Purpose : Creates customized bibliographies by catalog, call number, or item characteristics.
Chapter 9 – Classification and Regression Trees
DATA MINING FINAL REPORT Vipin Saini M 許博淞 M 陳昀志 M
Year-End Report MHHSE Library Goals Promote informational materials supporting Common Core integration. Create Narrative Nonfiction.
Borrow or Buy? The Convergence of Interlibrary Loan and Collection Development Kristen N. Hindes Interlibrary Loan and Instruction Librarian Library and.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Stephen Tierney & Ross The 5 Minute Achievement Plan ….print and scribble your way to Greater Student Achievement.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Determining the Size of a Sample 1 Copyright © 2014 Pearson Education, Inc.
Locating Books in Your EPISD Library. ONLINE CATALOG.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Year-End Report MHHSE Library. Improve communication with teachers regarding materials and services Goals Objectives: Monthly library.
Statistics for the Social Sciences Psychology 340 Spring 2010 Introductions & Review of some basic research methods.
Public Library Survey FY 2015 SDC General Session December 08, 2015.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Chapter Five Vocabulary. Page 1 (1) A Census of the Population This would be ideal – we would actually KNOW the values of the parameters! Really hard.
Casalini libri "Aggregating European scholarly publishing for diffusion worldwide: the Casalini libri experience, from print to digital"
Chapter 10 – Data Analysis and Probability 10.1 – Populations and Surveys.
Boosting ---one of combining models Xin Li Machine Learning Course.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
PDA Profile Optimization at Liberty University Erin Crane, Ebooks Librarian Lori Snyder, E-Resource Cataloging Librarian
Demand-Driven Acquisition A Data-Driven Perspective.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Design and Data Analysis in Psychology I Salvador Chacón Moscoso Susana Sanduvete Chaves School of Psychology Dpt. Experimental Psychology 1.
Graphic Novel Cataloging
Collection Matters: Establishing harmony in our relationships
Understanding Results
THE BEGINNING.
Employee Turnover: Data Analysis and Exploration
MIS2502: Data Analytics Classification using Decision Trees
Sᶟ: Strategies for Serving Seniors
Design and Data Analysis in Psychology I
Chapter 7 Sampling Distributions
Decision trees MARIO REGIN.
Becker Parkhurst-Strout and Erin Sladen Denver Public Library
Presentation transcript:

BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?

Background Libraries receive limited public funding State funding depends on turnover Turnover = number of items circulated/total number of items Circulation rate = number of times a book is checked out per year The higher a library’s turnover, the more public funding it receives

Question: We have a new book that the library is considering adding to its collection. Will it circulate?

Obtaining the Data Examined adult books from the Cooper-Siegel Community Library Moderately sized public library Serves population of ~28,000 (County average is ~27,000) Relatively high turnover 3.26 (County average is 2.17, State average is 2.1)

Preparing the Data We eliminated the following: Children’s books Video Audio Lost, missing, withdrawn, and billed books Books in processing Reference books Bestsellers Books circulating less than a year and a half

Preparing the Data Removed 3,149 total books from a dataset of 28,110 books ( material in our original data set!) Final dataset had 24,961 books Turnover of ~1.45 in 2012 Kept duplicate titles because different copies of the same book can be radically different

Building the Model - Data Considered We split up the final data set into fiction and nonfiction 10,751 fiction 14,210 nonfiction Fiction and nonfiction have different circulation rates and different rationales for adding to the library Seems most logical to use different models for each Focused on fiction

Building the Model – Fiction vs. Nonfiction Fiction Nonfiction

Fiction vs. Nonfiction Circulation Rates Fiction Nonfiction

Building the Models – The Target Variable Prediction target was “Average circulators” Fiction ≥ 2.8 books/year lifetime circulation rate Top ~35% (34.8%) Nonfiction ≥ 1.3 books/year lifetime circulation rate Top ~35% (36%)

Building the Models - Variables Variables pulled from bibliographic records in library software Some variables required processing to calculate or extract Particularly information from Library of Congress Subject Headings Allowed model to choose variables to include

Building the Model - Variables

Choosing the Models Decided to opt for “usefulness” over “accuracy” Was able to achieve accuracy over 85% for most But they were accurate by just rejecting almost everything, so no “decision” was really made at each node Self-eliminated some variables that dominated the model but led to less useful results Number of other libraries that own the book Years since published Price

Models Considered - Fiction C5 Training: 74.87% Test: 71.68% CHAID Training: 74.79% Test: 71.45% C&R Training: 72.45% Test: 71.22% QUEST Training: 72.45% Test: 71.21% Logistic regression Training: 70.3% Test: 68.9% k-nn Training: 69.27% Test: 61.97%

Chosen Model - CHAID C5 had best predictive rate, but model is proprietary and licenses are expensive for a library We settled on a CHAID decision tree model Within the Top 2 prediction rates 71.45% on Test Set Conservative Algorithm publicly available

Model Output – Variable Predictors Height of book Suspense genre Large print or regular Mystery genre Hardcover or paperback Women subjects Psychological genre Family relationship subjects Number of pages North America subjects Romance genre Humor subjects Western European subjects Political subjects Music subjects British Isles subjects Friendship subjects Middle Eastern subjects Children subjects Horror subjects Central Asia subjects Whether illustrated

Model Output - Rules Seems unwieldy to read, but is not actually difficult to use by hand Follow one branch at a time

Trying the Model on Sample Books Tried using the model on some of our favorite books Test 1: The White Deer Rule 1. >21 cm, < 22 Rule 2. Women subjects: Yes Rule 3. Friendship subjects: No Rule 4: North American subjects: No Result: NO

Trying the Model on Sample Books Test 2: Dreaming of Babylon Rule 1. >19 cm, <21 cm Rule 2. Family relationship subjects: No Rule 3. Central Asian subjects: No Rule 4. Mystery genre: Yes Rule 5. Psychological genre: No Result: YES

Trying the Model on Sample Books Test 3: 12 th of Never (Women’s Murder Club) by James Patterson Rule 1. >23 cm Rule 2. Suspense genre: Yes Rule 3. Hardcover: Yes Rule 4. Music subjects: No Rule 5. Middle Eastern subjects: No Result: YES

References 2011 Pennsylvania public library statistics:

THANK YOU! Questions?