Identifying and addressing usage disparities with the New York Public Library

by Zhi Liu

The Background

New Yorkers love our public libraries. The three main public library systems in New York City – the NYPL serving Manhattan, the Bronx, and Staten Island, the Queens Public Library, and the Brooklyn Public Library – together welcome nearly 40 million people (patrons, in library jargon) each year according to official statistics. Urban library systems are not monoliths, but rather a complex distributed system consisting of numerous branches, with each one of them serving a specific neighborhood.

Books in public libraries are an essential public good, and fairness and equity in their allocation is an inherent concern. Are there disparities in the level of usage and the level of access by patrons in different neighborhoods? If so, what is causing these disparities, and how can we address them? Working closely with practitioners at NYPL as a Siegel Family Endowment PiTech PhD Impact Fellow, my goal in the summer of 2023 was to explore these questions.

Image: NYC Public Library at Bryant Park

The Challenge

As we set out to uncover usage disparities, the main challenge we face is that usage itself is a complex function of multiple factors, so how do we audit it objectively?

As an example of such complexity, let’s say book A is checked out fewer times (in terms of count) than book B. Does this mean book A is less desirable than book B? Certainly not, it could also be that the library has fewer copies of book A, and thus the number of checkouts we see of A is discounted compared to what would happen under infinite inventory.

As another example, NYPL offers the option of using ‘holds’ to request books online, and picking them up at a desired location, in addition to the more traditional experience of browsing books at a physical branch. This option greatly benefits patrons who use it, as now they have access to the collection of all the branches, instead of only their nearest branch because NYPL transports the book requested from a branch where it’s available to the pick-up branch, in case it’s not available there. Let’s say more books are checked out via holds at branch X compared to branch Y. Does this mean that patrons living near branch X are more aware of the holds system and use it more frequently? Not necessarily, it could also be that branch X has a much more limited collection of books, and as a result, patrons here have to use the holds system if they want some book outside the collection.

To overcome the challenges that the complexity of defining usage brings, we need both finer technical apparatus and expertise from practitioners.

Zhi Liu

The analyses and findings

Together with my supervisor at NYPL, we decided to use a Bayesian latent variable model, that directly estimates the desirability of each book title, the latent demand size at each branch, and the hold usage fraction at each branch, while controlling for book availability. This estimation was only made possible through our access to fine-grained, anonymous data on book checkouts and check-ins across branches and checkout modes, which are meticulously logged by the NYPL. This analysis allowed us to audit usage patterns that are largely free from the complexity described above, and pin-point equity concerns

One main finding of our analyses is that hold usage varies among branches by a lot. On average, a patron in midtown Manhattan is 3 times more likely to use the holds system for a book than a patron in the Southwest Bronx. But why does this matter? Let us take a look at what happens when a hold request is placed.

When a patron places a hold request at branch X, the library takes the book from a branch where it’s available, say branch Y, and transfers it to branch X for checkout. This incurs a book inflow at branch X, and an outflow at branch Y. The above mentioned disparity in hold usage between Manhattan and the Bronx sparks concerns over equity: if patrons in more privileged neighborhoods are systematically using the holds system more to check out more desirable books, does this deprive other patrons the access to a good collection?

Based on our parameter estimates and the checkout transactions, different branches had substantial differences in the inflow and outflow of books due to hold usage, with the highest inflow branch pulling an estimated 7.5% equivalent of its collection from other branches, just to fulfill hold requests there. This comes at a cost to patrons in outflow branches, as they are potentially facing a more empty shelf when they walk into the branch.

Make it stand out

Whatever it is, the way you tell your story online can make all the difference.

Left: Historical net book inflow and outflow in branches. Branches where patrons use the holds system more (which are also more likely to be in more affluent neighborhoods) are pulling books from other branches to fulfill hold requests, while some branches are depleting their collections as a result. Right: Net inflow and outflow in branches under our simulated intervention. The inflows and outflows are more balanced across branches.

Impact and Path Forward

How do we address these equity concerns? I am continuing this collaboration with NYPL with my Cornell Tech academic advisor in the Fall of 2023, to explore potential operational strategies that can alleviate these disparities, with careful planning that does not sacrifice the efficiency benefit brought by the holds system.

For example, to demonstrate one hypothetical strategy, let’s say we keep the observed hold requests unchanged, but alter where the books are being pulled from to fulfill them. Instead of pulling books from a location that the book is available but otherwise largely random, we instead pull from the branch that has the most inflow due to the holds system. In the above right figure, we show the counterfactual book flows under such intervention. This time, almost all branches have balanced inflow and outflow, and the correlation with income is no longer statistically significant.

There is still a long way to go from this naive strategy to truly implementable interventions. However, continuing to work closely with practitioners at NYPL, my academic advisor and I aim to work with the NYPL on potential system design changes, so that we could better reflect on and address these concerns.


Previous
Previous

Workforce Modelling for Ithaca’s 2030 Building Electrification Goal

Next
Next

Unveiling NYC Policing Patterns through Data Visualization and Analysis