The long tail: a usage analysis of pre-1993 print biomedical journal literature

eScholarship provides open access, scholarly publishing services to the University of California and delivers a dynamic research platform to scholars worldwide. Abstract: Objective: The research analyzes usage of a major biomedical library’s pre-1993 print journal collection. Methodology: In July 2003, in preparation for a renovation and expansion project, the Biomedical Library at the University of California, San Diego, moved all of its pre-1993 journal volumes off-site, with the exception of twenty-two heavily used titles. Patrons wishing to consult one of these stored volumes could request that it be delivered to the library for their use. In the spring of 2006, an analysis was made of these requests. Results: By July of 2006, 79,827 journal volumes published in 1992 or earlier had been requested from storage. The number of requests received declined with age of publication. The usage distribution exhibited a ‘‘long tail’’: 50% of the 79,827 requests were for journal volumes published before 1986. The availability of electronic access dramatically reduced the chance that corresponding print journal volumes would be requested. Conclusions: The older biomedical print journal literature appears to be of continued value to the biomedical research community. When electronic access was provided to the older literature, demand for older print volumes declined dramatically.

• Analysis of requests for stored biomedical journal volumes published prior to 1993 indicates that older biomedical journal literature receives substantial use: during this 3-year study, there were nearly 80,000 requests for journal volumes published in 1992 or earlier, with half the requests for volumes published from 1986-1992 and 40% of the request for volumes published from 1970-1985. These results indicate that retaining older print volumes, or providing easy access to the older literature through electronic journals or other means, will likely be required to meet user information needs. • Use of older journal volumes varies by title and by user population, and a small number of journal titles were responsible for most of the use. • Requests for older print biomedical journal titles dropped significantly when electronic access became available.

Implications
• Biomedical libraries should carefully consider implications of eliminating on-site access to older journal literature for users and budgets. • Removing access to older journal literature may result in higher demand for interlibrary loan and document delivery services. • Biomedical libraries can safely substitute reliable electronic access to older literature to meet ongoing needs for this information, thereby creating space for other purposes.

INTRODUCTION
Bibliometricians have long studied the decline in the use of scientific literature with age, also known as obsolescence, in hopes that their studies would shed light on the reasons for any decay in the utility of information and, thus, on the nature of scholarly communication itself. These studies suggest that use of scientific literature, in general and medical literature, in particular, declines rapidly with time [1,2]. Information scientists have hypothesized that this decline reflects two distinct stages in the use of the journal literature. In the first stage, which occurs in the first few years after publication, readers scan the recent literature to stay up to date on general trends and to maintain proficiency in their fields of interest. The need to frequently peruse current literature, sometimes referred to as the ''immediacy effect'' [2], is likely responsible for the extremely heavy use of newer material reported in the above-referenced studies. In later years, journals are consulted largely by readers looking up citations found in other articles or identified when searching indexing and abstracting services such as PubMed or search engines like Google Scholar. The high levels of usage generated in early years by current awareness needs declines to a much lower, but possibly steady, pattern of consultation. Because of the immediacy effect, only a relatively few years of any journal may account for most of its use. One important consideration for librarians is that most bibliometric studies that find a rapid decline in the use of the literature a few years after publication do not study actual journal usage. Instead, they document declines in citation frequency over time and assume that this decline reflects an actual decline in use of the literature. This distinction is particularly important for libraries considering whether to retain older literature in their collections. Line notes, ''It is not known how useful references and citations may be as indicators of use probability, nor how usage patterns differ between libraries whether of the same or different types'' [3].
For medical librarians, an important exception to Line's statement can be found in the work of Tsay, who reported that the ''mean use half-life,'' or the average time needed for a publication in her collection to experience half of its lifetime of use, was 3.43 years after publication [4]. In a subsequent study, Tsay [5] reported a statistically significant correlation between actual usage and impact factor. Sullivan et al. not only found a similar mean use half-life when tabulating usage over a ten-year period for Stanford's journal collection, but also verified that the rapid decline in usage of the literature with time is not just a function of the size of the journal literature [6]. Even accounting for the fact that less material was published in earlier years, Sullivan found that use of the medical literature declines rapidly with age. Recently, Kaplan et al. analyzed interlibrary loan and citation data and found support in both types of data for Tsay's and Sulivan's conclusions from decades earlier, concluding that ''retention of retrospective journals of greater than fifteen years of age may not be necessary in most health sciences libraries'' [7].
Unfortunately, for those seeking clear direction on this issue, somewhat contradictory information comes from a very recent study of actual usage [8]. In 2002, the University of North Carolina (UNC) at Chapel Hill stored material published prior to 1992 off-site in preparation for a major construction project. While the researchers did not calculate a mean use half-life for stored material, they found unexpectedly high use of the older literature: 50% of all requests received were for materials published between 1980 and 1989 [8]. The UNC experience suggests that the older biomedical literature may have more interest than commonly thought. Given the irreversible nature of the weeding process (once older volumes are removed, it is usually impractical to replace them), more information on actual use of the older journal literature would help libraries making decisions related to weeding and storing their journal collections.

SETTING
The University of California, San Diego (UCSD), Biomedical Library supports a user population of nearly 6,000 faculty and students involved in the teaching, research, and patient care programs of the UCSD School of Medicine, the Skaggs School of Pharmacy and Pharmaceutical Sciences, the Division of Biological Sciences, and UCSD's two teaching hospitals. At the time of this study, UCSD was ranked fifth in the nation and first in the University of California system in federal research and development funding, and, among medical schools, the UCSD School of Medicine ranked first in federal research funding per faculty member [9]. The library is also open to the public, and, as the only biomedical research library in San Diego and Imperial County, it is heavily used by area doctors and researchers. In addition, because biology is the single largest subject major on campus, undergraduates are also frequent users of the library and its collections.
The UCSD Biomedical Library's collections are heavily used. In 1999/2000, the library re-shelved close to 500,000 items each year and recorded a gate count of over 300,000. By 2003, when this study commenced, yearly on-site usage of the collection had declined to 230,000 re-shelvings and the gate count to 240,000. This decline was thought to be in response to the acquisition of large numbers of electronic journals. Electronic journals from large publishers-such as Elsevier, Wiley, Blackwell, and Springer-were now generally available online, beginning with articles published in 1996. Electronic journals from other publishers provided online access, in most cases, beginning with volumes published in 2000 or later. During the 3year period of this study, however, the library licensed numerous additional backfiles. Extensive backfiles from Elsevier were added in the fall of 2004. In 2005, backfiles from Wiley and Springer also became available, and older content from multiple other publishers was added during 2006.

STORAGE PROJECT
In July 2003, in preparation for a major renovation and addition project that severely limited the amount of on-site available shelving space, the library moved 60,000 journal volumes from the on-site collection to the library's off-site storage facility, the Annex. These 60,000 volumes included all journal volumes published from 1965 to 1992, except for 22 heavily used titles (Appendix online), as well as recent (1993-) volumes from selected publishers whom the library believed would continue to provide stable electronic access. These 60,000 volumes joined 44,000 pre-1965 journal volumes already stored at the Annex. Altogether, these 104,000 journal volumes represented 2,282 journal titles.
Faculty, staff, students, and the general public could request any of these journal volumes using an online request form. To assist in identifying the requested material, the title, publication year, call number of the item, and affiliation of the user were automatically captured from the form and placed in a database. From July 2003 until December 2004, the volumes requested were delivered twice daily to the existing UCSD Biomedical Library. In mid-December 2004, the Biomedical Library moved its remaining on-site collection and all staff and services to temporary quarters in the main UCSD Library. From December 2004 until the library reopened in August 2006, items located at the Annex were delivered to that temporary facility.
In the spring of 2006, the information in the database was analyzed to assist the library in determining which of the stored volumes should be returned to the on-site collection as well as to provide insight into use of the older biomedical literature.

METHODOLOGY
The main source of information used for this analysis was the database of requests for stored material received between July 2003 and July 2006. Prior to analyzing the information in the database, significant normalization and clean up of the raw data was performed. Because all the data were entered by users, the ways titles and call numbers were recorded varied widely. Some of the normalization was done automatically by running scripts over the data, but the vast majority of the clean up was done by hand. Five variables were selected for analysis from the normalized data: the number of requests received, journal volume publication dates of those requests, requested journal titles, status of the requestor (affiliate or non-affiliate), and availability of electronic access.
To support the database analysis, information was also taken from: 1. Re-shelving counts: The library counts the number of items re-shelved on-site each day. These data are cumulated annually and quarterly for internal and external reports. 2. Catalog records and electronic licensing agreements: The earliest date of online access available to library users for any particular title is recorded in the library's online catalog record. Spreadsheets accompanying publishers licensing agreements also list journals with online access. 3. Shelf measurements: To assist in planning the move of the collection from storage back into the new facility, the stored collection was measured in five-year increments.
Descriptive statistics (counts, etc.) were tabulated for all data.

Number of requests
From July 2003 through July 2006, 90,839 requests were made to have volumes brought from the Annex to campus. Of the total number of requests over this 3-year period, 88,338 (97%) were for journal volumes. The 2,501 (3%) remaining requests were for books. In contrast, during this same time period, over 400,000 volumes, largely journal volumes published after 1992, were re-shelved in the library's on-site collection. Additional recent articles were consulted in the library's electronic journal collection, but unfortunately, the volume of these consultations could not be quantified.
Use of the stored collection was not constant throughout the life of the project. Figure 1 shows the change in number of requests over the 3-year period. The number of requests for items stored at the Annex per quarter dropped by more than half over these 3 years, from over 10,000 in the fall quarter of 2003 to less than 5,000 in the spring quarter of 2006. The decline in requests for material located at the Annex mirrored a reduction in use of physical journal volumes located on-site in the library as measured by re-shelving data. There were 46,417 volumes re-shelved on-site in the summer of 2003, but only 21,485 volumes reshelved in the spring of 2006. Figure 2 shows the distribution, by publication date, of requests made from the stored collection. The graph Shelf measurement data were used to determine if the observed decline in usage with time reflected a real decline in interest in the literature or was simply an artifact of changes in the volume of publication overtime. Other things being equal, the larger the collection in a given year, the more material from that year will be requested [5]. Indeed, the number of inches of stored material declined with age as would be expected given the large growth in articles published in recent years. However, the growth in articles over time was not the only reason for the decline in requests with age. The number of requests per inch also declined with age, from a high of 1.19 requests per inch for the early 1990s to 0.27 requests per inch among the earliest years stored.

Journal usage by title
In the fall of 2003, volumes from 1,827 unique journal titles were requested from storage. As noted above, the number of requests made in each quarter declined by half over the course of the project. By contrast, the number of unique journal titles requested declined by only about a third to 1,254 in the spring of 2006. The mean number of requests per title dropped from 5.7 to 3.3 over this period, and the median number of requests dropped from 3 to 2 requests per title. The titles were ranked according to the number of requests made, with 1 being the rank of the most requested title. The total number of requests submitted for the top 100 requested journals was calculated, the total made for the next 100 most requested journals was calculated, the next 100, and so on ( Figure 3). The 200 most requested titles, in both the fall of 2003 and the spring of 2006, accounted for 50% of the Annex requests. The 600 most requested titles accounted for over 80% of the requests in both periods. In other words, only a small number of titles accounted for half the requests, and less than half the titles were responsible for the overwhelming majority of requests. Notably, both these distributions exhibit an initial peak of use followed by a long, flat decline. In the fall of 2003, the graph becomes flat at a rank of 1,000, and the remaining 827 titles had 2 or fewer requests each. In the spring of 2006, the drop was even earlier and more severe: the last 1,100 titles requested were each only requested a single time. Table 1 lists the top 50 most requested titles over the course of the study for all users. The relative popularity of titles varied somewhat according to whether the requestor was or was not a UCSD affiliate. Thirty-five (70%) of the top titles requested by UCSD affiliates were also among the top 50 most requested by nonaffiliates. Yet these 2 groups of users had distinct dif-  ferences. For example, Circulation Research, 35th in terms of overall requests, was very popular among affiliates, ranking 18th in total number of requests, but only 86th for non-affiliates. Conversely, Anesthesiology, ranked 36th overall, ranked 16th among non-affiliates but only 57th among affiliated users.

Effect of electronic access
To investigate the effect of the availability of electronic access on the rate of requests, a more detailed analysis of the usage patterns of the 20 most requested titles in the fall 2003 was made ( Table 2)

Use of the older journal literature
The data reported here clearly documented a continued interest in the older biomedical literature. Usage of journal literature more than 10 years old was far less than that of more recently published print volumes and would undoubtedly have been even further dwarfed had recent electronic journal usage numbers been available, but it was still substantial. Close to 80,000 requests for this material were received in a 3year period. Moreover, while usage of this older lit-erature continued to decline with age, the drop was not abrupt. The data showed that materials were used fairly frequently well into the second decade after publication, followed by a long and steady decline in use. It is possible to think of the study presented here as an analysis of literature usage after the period defined by the ''immediacy effect.'' The journal volumes stored for this project were, in the vast majority of cases, at least 10 years old at the time they were placed into storage, and at least 13 years old when they were removed. Any use of this literature for current awareness or for maintaining proficiency in a field of research was probably minimal. It seems reasonable to assume that most of the requests occurred when individuals were following up on citations they had found in other articles or through search services like PubMed or Google Scholar.
When usage of these stored materials is plotted over time, it results in a graph with a ''long-tail.'' The phrase, ''The Long Tail'' (as a proper noun with capitalized letters), was first coined by Anderson in a 2004 article in Wired magazine to describe the business and economic models of companies such as Amazon.com or Netflix [10]. Anderson argues that products that are in low demand or have low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers and blockbusters, if the store or distribution channel is large enough. The term is not new, having long been used in statistics to refer to a feature of ''power-law'' distributions, such as the frequency with which different words are used in English: a few common words are used a great deal, and a long tail of increasingly obscure words are used less often [11]. In these types of distributions, high in-Usage analysis of pre-1993 journals Archives of Ophthalmology 46 The American Journal of Pathology 47 Archives of Internal Medicine 48 Developmental Biology 49 The Journal of Pediatrics 50 International Journal of Cancer cidence at the beginning of a graph is followed by a long tail of low frequency. In the case of the request data analyzed here, the tail is indeed long. The first 7 years, from 1993-1986, account for 50% of the usage. It takes 14 more years, back to 1970, to account for the next 40% of usage. In other words, the total usage under the tail of this distribution, from 1986-1970, is almost as great as the area under the initial higher-use portion, from 1993-1986. This is both good and bad news for librarians seeking to make space in their facilities. On the one hand, medical libraries can probably retain a relatively small number of years of the older volumes of titles they now have in their stacks and meet half the requests they could expect for material from those titles. On the other hand, to satisfy the remaining 50% of anticipated requests for older articles from their on-site collection, libraries would have to retain twice as many additional years.
The data reported here are in some respects very similar to those reported by Kaplan et al. when they looked at interlibrary loan requests in DOCLINE [7]. In both UCSD's storage project and Kaplan's study of interlibrary loans, somewhat more than two-thirds of the requests for material published in the thirty-year period 1960 to 1990 were for articles published between 1980 and 1989. Kaplan concluded that ''retention of retrospective journals of greater than fifteen years of age may not be necessary in most health sciences libraries.'' However, the conclusion here is somewhat different: though usage may drop significantly with age, interest in the biomedical literature is still substantial after the ''immediacy effect'' has been satisfied. Looking at the long tail of the usage distribution for the older literature, it is apparent that meeting all requests for this material could require retention of many, many years.
Individual libraries must decide if this use, far lower than the use of recent material but still substantial, justifies retaining older material in the collection. Determining which course to pursue will likely vary with the mission of the library and the users it serves. For some libraries, it may make sense to rely on interlibrary loan and document delivery services to provide access to the older journal literature. Others, with a large research collection such as UCSD's Biomedical Library, may find that the cost of staff and fees associated with interlibrary loan services for a large number of requests outweigh the benefits of recovering shelf space by discarding older materials. Other factors to consider when determining whether or not to retain the older literature include the importance to users of having immediate access to older materials and the importance to the institution of assuring that users do not go without this information because of the inconvenience involved in obtaining it elsewhere.

Variation in use by title
Usage of individual titles in this study showed a similar distribution to that found for use by publication year. A relatively few titles accounted for the majority of use. After that, the distribution exhibited a long tail, with each remaining title at the end of the tail receiving only a single request over the study period. The distribution of use by title, however, differed in one important respect from the distribution of use by publication year. While it required close to 30 years' (back to the mid-1970s) worth of journals to accommodate 80% of the requests for material from storage, relatively few titles, 600 in the current study, accounted for close to 80% of the requests from storage. These data are encouraging to librarians who must, for whatever reason, reduce the size of their on-site print journal collection. Identifying the most heavily used titles among the older literature for retention will likely substantially reduce the effect of removing materials from the journal collection.
The data seem to indicate that the best way to iden-tify these titles and years is probably an actual usage study. Earlier research reported that impact factor correlates with usage [5]. Although statistically significant, the correlation found was only in the neighborhood of 0.35. This modest correlation is far from a oneto-one match, accounting as it does for only 12% of the variance. Certainly in this study, there was evidence that other variables, in addition to impact factor, must be considered to identify titles to remove from the collection. Not only did the most requested titles vary over time at UCSD, but they varied depending on whether the requests came from affiliated or non-affiliated users. There was perhaps a core of materials of widespread interest, but the research and clinical interests of these two groups were not by any means identical. If space limitations had required a decision that favored one group over the other, the disadvantaged group would certainly have noticed. A decision based solely on impact factor would not have pleased either constituency. Further evidence of the importance of local variables on the use of older literature comes from comparing UCSD's results with those from the study at UNC. Of the top twelve titles in the UNC 2002 storage project [8], only JAMA appeared in the list of the top fifty most requested titles over the course of the project at UCSD. When UNC's twelve most requested titles were compared to UCSD's most requested during only the fall of 2003, when relatively few electronic backfiles were available, there is slightly more overlap. At that point, four of UNC's top twelve were in UCSD's top fifty. However, none of the eight remaining UNC top twelve titles appeared in the list of the top fifty requested at UCSD. The degree of variation observed between UCSD affiliates and non-affiliates, and between UCSD and UNC, suggests that each library will need to study its own usage patterns before identifying titles to remove from the older portion of its collections.

Electronic access
The study shows substantial support for the proposition that reliable access to electronic journal backfiles provides an acceptable substitute for print access to older journal literature. Although this study does not definitively prove that the increase in electronic access was responsible for the decline in use of print noted during the three-year course of this project, the data strongly indicate that this was the case. The more electronic content that was added, the less print was used, whether that print was in the library's on-site collection or stored in the Annex. As electronic backfiles of particular titles became available, requests for print volumes of those same titles dramatically declined. Some requests were made for print volumes in storage that were also available online. However, it is likely that most of those requests were made by commercial document delivery suppliers who were not permitted, according to the library's electronic journal licensing agreements, to resell the online version of an article to their customers. Otherwise, electronic journals seem to be a promising alternative to retaining print volumes on-site, and the data from this study may provide a strong argument for funding the purchase of journal backfiles to recover space in a health sciences library.

CONCLUSION
Although usage of older biomedical journal literature declines with age as expected, significant usage of this older journal literature remains. Because usage tapers off gradually, removing all volumes more than fifteen or twenty years old to save space will likely deprive a significant number of library users of immediate access to information they would otherwise have used. Libraries should carefully consider how they will continue to provide access to this older journal literature if print access is eliminated.
Relatively few titles appear to account for the large majority of use of the older literature. Therefore, in situations in which libraries face pressure to reduce the size of their physical collections, one approach is to retain only those titles for which older volumes are heavily used. Another approach is to invest in backfiles of electronic journals for heavily used older titles. Unfortunately, the data from this study suggest that different institutions and different user groups may need different segments of the older literature. Selection of material for retention, or of electronic content to replace print volumes, will likely require usage studies and the knowledge of an experienced collection development librarian.