Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?

Similar presentations


Presentation on theme: "BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?"— Presentation transcript:

1 BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?

2 Background Libraries receive limited public funding State funding depends on turnover Turnover = number of items circulated/total number of items Circulation rate = number of times a book is checked out per year The higher a library’s turnover, the more public funding it receives

3 Question: We have a new book that the library is considering adding to its collection. Will it circulate?

4 Obtaining the Data Examined adult books from the Cooper-Siegel Community Library Moderately sized public library Serves population of ~28,000 (County average is ~27,000) Relatively high turnover 3.26 (County average is 2.17, State average is 2.1)

5 Preparing the Data We eliminated the following: Children’s books Video Audio Lost, missing, withdrawn, and billed books Books in processing Reference books Bestsellers Books circulating less than a year and a half

6 Preparing the Data Removed 3,149 total books from a dataset of 28,110 books (75000+ material in our original data set!) Final dataset had 24,961 books Turnover of ~1.45 in 2012 Kept duplicate titles because different copies of the same book can be radically different

7 Building the Model - Data Considered We split up the final data set into fiction and nonfiction 10,751 fiction 14,210 nonfiction Fiction and nonfiction have different circulation rates and different rationales for adding to the library Seems most logical to use different models for each Focused on fiction

8 Building the Model – Fiction vs. Nonfiction Fiction Nonfiction

9 Fiction vs. Nonfiction Circulation Rates Fiction Nonfiction

10 Building the Models – The Target Variable Prediction target was “Average circulators” Fiction ≥ 2.8 books/year lifetime circulation rate Top ~35% (34.8%) Nonfiction ≥ 1.3 books/year lifetime circulation rate Top ~35% (36%)

11 Building the Models - Variables Variables pulled from bibliographic records in library software Some variables required processing to calculate or extract Particularly information from Library of Congress Subject Headings Allowed model to choose variables to include

12 Building the Model - Variables

13 Choosing the Models Decided to opt for “usefulness” over “accuracy” Was able to achieve accuracy over 85% for most But they were accurate by just rejecting almost everything, so no “decision” was really made at each node Self-eliminated some variables that dominated the model but led to less useful results Number of other libraries that own the book Years since published Price

14 Models Considered - Fiction C5 Training: 74.87% Test: 71.68% CHAID Training: 74.79% Test: 71.45% C&R Training: 72.45% Test: 71.22% QUEST Training: 72.45% Test: 71.21% Logistic regression Training: 70.3% Test: 68.9% k-nn Training: 69.27% Test: 61.97%

15 Chosen Model - CHAID C5 had best predictive rate, but model is proprietary and licenses are expensive for a library We settled on a CHAID decision tree model Within the Top 2 prediction rates 71.45% on Test Set Conservative Algorithm publicly available

16 Model Output – Variable Predictors Height of book Suspense genre Large print or regular Mystery genre Hardcover or paperback Women subjects Psychological genre Family relationship subjects Number of pages North America subjects Romance genre Humor subjects Western European subjects Political subjects Music subjects British Isles subjects Friendship subjects Middle Eastern subjects Children subjects Horror subjects Central Asia subjects Whether illustrated

17 Model Output - Rules Seems unwieldy to read, but is not actually difficult to use by hand Follow one branch at a time

18 Trying the Model on Sample Books Tried using the model on some of our favorite books Test 1: The White Deer Rule 1. >21 cm, < 22 Rule 2. Women subjects: Yes Rule 3. Friendship subjects: No Rule 4: North American subjects: No Result: NO

19 Trying the Model on Sample Books Test 2: Dreaming of Babylon Rule 1. >19 cm, <21 cm Rule 2. Family relationship subjects: No Rule 3. Central Asian subjects: No Rule 4. Mystery genre: Yes Rule 5. Psychological genre: No Result: YES

20 Trying the Model on Sample Books Test 3: 12 th of Never (Women’s Murder Club) by James Patterson Rule 1. >23 cm Rule 2. Suspense genre: Yes Rule 3. Hardcover: Yes Rule 4. Music subjects: No Rule 5. Middle Eastern subjects: No Result: YES

21 References 2011 Pennsylvania public library statistics: http://www.portal.state.pa.us/portal/portal/server.pt/community/library_statistics/8696

22 THANK YOU! Questions?


Download ppt "BRANDON SHERMAN BETH SPINDLER Predicting Book Acquisitions for a Public Library: Will It Circulate?"

Similar presentations


Ads by Google