Comparing test searches in PubMed and

Google Scholar has been met with both enthusiasm and criticism since its introduction in 2004. This search engine provides a simple way to access “peer-reviewed papers, theses, books, abstracts, and articles from academic publishers' sites, professional societies, preprint repositories, universities and other scholarly organizations” [1]. An obvious strength of Google Scholar is its intuitive interface, as the main search engine interface consists of a simple query box. In contrast, databases, such as PubMed, utilize search interfaces that offer a greater variety of advanced features. These additional features, while powerful, often lead to a complexity that may require a substantial investment of time to master. It has been observed that Google Scholar may allow searchers to “find some resources they can use rather than be frustrated by a database's search screen” [2]. Some even feel that “Google Scholar's simplicity may eventually consume PubMed” [3]. 
 
Along with ease of use, Google Scholar carries the familiar “Google” brand name. As Kennedy and Price so aptly stated, “College students AND professors might not know that library databases exist, but they sure know Google” [4]. The familiarity of Google may allow librarians and educators to ease students into the scholarly searching process by starting with Google Scholar and eventually moving to more complex systems. Felter noted that “as researchers work with Google Scholar and reach limitations of searching capabilities and options, they may become more receptive to other products” [5]. 
 
Google Scholar is also thought to provide increased access to gray literature [2], as it retrieves more than journal articles and includes preprint archives, conference proceedings, and institutional repositories [6]. Google Scholar also includes links to the online collections of some academic libraries. Including these access points in Google Scholar retrieval sets may ultimately help more users reach more of their own institution's subscriptions [7]. 
 
While its advantages are substantial, Google Scholar is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions, lack of controlled vocabulary, and issues regarding scope of coverage and currency. Table 1 summarizes some of the reported criticisms of Google Scholar. 
 
 
 
Table 1 Criticisms of Google Scholar 
 
 
 
Vine found that while Google Scholar pulls in data from PubMed, many PubMed records are missing [20], and that Google Scholar also lacks features available in MEDLINE [12]. Others have noted that Google Scholar should not be the first or sole choice when searching for patient care information, clinical trials, or literature reviews [23,24]. Thorough review and testing of Google Scholar, being an approach similar to that used to evaluate licensed resources, is necessary to better understand its strengths and limitations. As Jacso states, “professional searchers must do sample test searches and correctly interpret the results to corroborate claims and get factual information about databases” [18]. This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.


INTRODUCTION
Google Scholar has been met with both enthusiasm and criticism since its introduction in 2004. This search engine provides a simple way to access ''peer-reviewed papers, theses, books, abstracts, and articles from academic publishers' sites, professional societies, preprint repositories, universities and other scholarly organizations'' [1]. An obvious strength of Google Scholar is its intuitive interface, as the main search engine interface consists of a simple query box. In contrast, databases, such as PubMed, utilize search interfaces that offer a greater variety of advanced features. These additional features, while powerful, often lead to a complexity that may require a substantial investment of time to master. It has been observed that Google Scholar may allow searchers to ''find some resources they can use rather than be frustrated by a database's search screen'' [2]. Some even feel that ''Google Scholar's simplicity may eventually consume PubMed'' [3].
Along with ease of use, Google Scholar carries the familiar ''Google'' brand name. As Kennedy and Price so aptly stated, ''College students AND professors might not know that library databases exist, but they sure know Google'' [4]. The familiarity of Google may allow librarians and educators to ease students into the scholarly searching process by starting with Google Scholar and eventually moving to more complex systems. Felter noted that ''as researchers work with Google Scholar and reach limitations of searching capabilities and options, they may become more receptive to other products'' [5].
Google Scholar is also thought to provide increased access to gray literature [2], as it retrieves more than journal articles and includes preprint archives, conference proceedings, and institutional repositories [6]. Google Scholar also includes links to the online collections of some academic libraries. Including these access points in Google Scholar retrieval sets may ultimately help more users reach more of their own institution's subscriptions [7].
While its advantages are substantial, Google Scholar is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions, lack of controlled vocabulary, and issues regarding scope of coverage and currency. Table  1 summarizes some of the reported criticisms of Google Scholar.
Vine found that while Google Scholar pulls in data from PubMed, many PubMed records are missing Supplemental Table 5 and an appendix are available with the online version of this journal. [20], and that Google Scholar also lacks features available in MEDLINE [12]. Others have noted that Google Scholar should not be the first or sole choice when searching for patient care information, clinical trials, or literature reviews [23,24]. Thorough review and testing of Google Scholar, being an approach similar to that used to evaluate licensed resources, is necessary to better understand its strengths and limitations. As Jacso states, ''professional searchers must do sample test searches and correctly interpret the results to corroborate claims and get factual information about databases'' [18]. This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.

METHODOLOGY
Ten searches were performed in PubMed using a variety of available search features. The searches were repeated in Google Scholar to approximate a user's approach to those same topics in that search engine. The searches, performed between August and September 2006, were by topic, author, title, journal name, and/ or combinations of those fields (Appendix online). Topics included iron-deficiency anemia, bupropion for smoking cessation, and articles by specific authors in specific journals. The topics selected were loosely based on questions received during reference transactions or were previously developed for use during instruction.
For each search, the citations received via Google Scholar and PubMed were examined to determine a variety of characteristics including format, date, Medical Subject Headings (MeSH) where appropriate, uniqueness, duplications, and full-text availability from the author's institution.
Most searches were narrowed by date to produce sets of a reasonable size to allow comparison of unique items retrieved by each system. The search results were analyzed to determine possible reasons for the retrieval of unique items in each resource and to gather information on the general features of the Google Scholar results.

RESULTS
In eight of the ten searches, Google Scholar returned larger retrieval sets than PubMed (Table 2). Table 3 illustrates the characteristics of the items retrieved by Google Scholar, and Table 4 provides information on PubMed retrieval sets. Most items retrieved by Google Scholar were journal articles (Table 3). Items in other formats included: 9 books, 11 book reviews, 2 Web pages, 1 subject index listing, 1 thesis, 1 newsletter item, 1 bibliography, 4 author replies, 1 annual meeting abstract, and 1 draft document. These results yielded few gray literature items.
The main title link in Google Scholar citations was used to determine if full text was found. Full text was available in 46.96% (116/247) of the total citations retrieved. In most cases, it was assumed that full-text access was based on the institutional subscriptions Table 1 Criticisms of Google Scholar

Criticisms References
Advanced search functions may be unreliable [8][9][10] No ability to search controlled vocabulary or no authority control for journal names or author names [10][11][12][13] Some materials retrieved may not be scholarly [14] Secretive about how it defines ''scholarly'' [15] Secretive about scope or coverage [5,9,10,12,13,[16][17][18] May not be current [9-11, 14, 19] Missing PubMed records [11,20] Lack of sorting options* [10,11,14] Inclusion of duplicate citations in results [6,14] Only the first 1,000 results can be viewed [4,19,21] Not as comprehensive or precise as searching native interfaces [9,11,22] Lack of limiting features [11,12] * In the spring of 2006, Google Scholar introduced an option to re-sort with more current citations appearing first.  2  10  8  11  8  3  4  2  2 7  2 4  4  10  3  52  38  5  6  0  2 0  1 0  6  11  0  20  8  7  2  1  1 0  8  8  13  0  18  4  9  51  7  49  4  10  15  4  14  1 available to the author of this study. Some items retrieved might have been freely available. In 22.67% (56/247) of the results, the Google Scholar citation was simply a link out to a PubMed record. As shown in Table 4, nearly half (48.98%; 72/147) of PubMed citations provided full-text access through the author's institution. The unique items retrieved by each interface were examined to determine why they were missed by the other system. Across all searches, Google Scholar retrieved a total of 247 citations, 125 (50.61%) of which were unique to Google Scholar. Analysis revealed the following characteristics: Ⅲ Thirty-two items (12.96%) retrieved by Google Scholar were formats other than journal articles. Ⅲ Some unique Google Scholar items (10 items, 4.05%) appeared in journals not indexed by PubMed. Ⅲ Google Scholar covered a wider date range and returned 4 items (1.62%) older than 1950 that were not in PubMed. Ⅲ Google Scholar retrieved items based on its ability to search the full text of many articles rather than solely on citation data.
PubMed retrieved a total of 147 citations across all searches, and, of these, 46 (31.29%) were unique.

DISCUSSION
Assumptions of search engine performance based purely on retrieval quantities can be misleading without closer investigation of the results. For example, Ta-ble 2 shows that many of the searches returned quantities that were close in numbers. In search #1 (dietary supplements as a treatment for iron deficiency anemia), PubMed returned twenty-five citations, while Google Scholar returned twenty-six citations. However, only four citations were common to both systems. In search #2 (Mobius syndrome), Google Scholar returned eleven citations, while PubMed found ten citations but with an overlap of only two citations retrieved by both systems.
Terminology was observed to be a major factor affecting retrieval and the ability of both systems to return unique items. Some unique items retrieved by Google Scholar were off topic. These ''false hits'' appear to be related to Google Scholar's full-text searching along with a lack of controlled vocabulary. For example, the purpose of search #7 was to find articles on the topic of ''wine'' that appeared in the New England Journal of Medicine. Google Scholar retrieved eight items where the word ''wine'' appeared in the full text but was not the main topic of the article, in one case, retrieving an article where the authors acknowledge a colleague with the surname Wine. Google Scholar also returned items that contained the search terminology but did not match the intention of the search. In the search for information about dietary supplements in the treatment of iron deficiency (search #1), Google Scholar returned some citations about high iron stores rather than deficiency (Table 5 online). Google Scholar searches for a word or sequence of letters and not the concept or meaning.
The complete citations for all unique items retrieved by PubMed were examined. One possible explanation why Google Scholar failed to retrieve the same items was that many were indexed under the appropriate MeSH term, although the search phrase might not have appeared in the title or abstract. For example, search #9 was designed to retrieve articles by Visek about the topic of ammonia. While ammonia was not searched specifically as a MeSH term, PubMed automatically mapped it to MeSH. Of the unique citations retrieved by PubMed, some were indexed under ammonia although this term did not appear in the citation (Table 5 online). While Google Scholar offers the ability to use a tilde (ϳ) to retrieve alternative termi-  nology, this ability does not provide the control that subject headings do.

CONCLUSION
Performing a direct and exact comparison between searches in Google Scholar and PubMed is not possible as the systems function in very different manners. For example, PubMed searches a well-defined set of journals, while Google Scholar includes resources beyond journals and the exact scope of coverage is not extensively described. Because the systems are not searching identical data, the results are often different. Although these two systems are difficult to compare, it is still important to explore the differences between them. Librarians should understand the strengths and weaknesses of Google Scholar and be prepared to explain them to their users [14]. It may also be wise to consider including Google Scholar in bibliographic instructional sessions and to convey how it compares to other search interfaces [11]. For example, Google Scholar does not offer the number and extent of special searching and limiting features available in PubMed. However, Google Scholar provides some advantages in that it is an easy place to begin a search to find an initial retrieval of possibly worthwhile articles. It also offers searchers the ability to find citations to older items that they would miss if they use only PubMed. Additionally, Google Scholar has the potential to provide access to the gray literature. This increased access to a part of the biomedical literature, which can be difficult to search, may have implications for the public health field [25].
One of the most advantageous features of searching PubMed is the ability to utilize the MeSH vocabulary, as Google Scholar does not currently implement controlled vocabulary searching mechanisms. MeSH provides a powerful method of narrowing results and homing in on what the searcher needs. PubMed also offers substantially more features that allow searchers to narrow their retrieval to citations from clearly identified sources, as detailed in NLM's List of Journals Indexed for MEDLINE and List of Serials Indexed for Online Users [26]. The problem faced today by searchers is not a lack of information but rather an overload of information. For a researcher conducting human studies, writing a dissertation, finding information pertinent to patient care, or conducting an in-depth literature review, Google Scholar does not appear to be a replacement for PubMed, though it may serve effectively as an adjunct resource to complement databases with more fully developed searching features. It is important to note that both PubMed and Google Scholar are often upgraded with new features or with intended improvement of existing functions. It may be worthwhile to repeat this study in one or two years to determine if further refinements have improved their performance.

ACKNOWLEDGMENTS
The author thanks the following individuals who offered invaluable advice and support: Pauline Cochrane, Robin Beck, Sandra De Groote, AHIP, Victoria Pifalo, and Ann Carol Weller.

INTRODUCTION
Blogs are a relatively new medium in computer-mediated health communication and are regarded as highly opinionated journals maintained by millions of users who read and write personal remarks on issues ranging from news stories to health care [1][2][3]. Of the 120 million US adults with Internet access, 7%, or 8 million people, have created blogs [4], and the increasing use of blogs has been reported in several studies [1,4,5]. Rainie found that the typical blogger is a young, male, Internet veteran; has a broadband connection; and is financially secure [5]. The gender of the blogger has also been a topic for research. Herring et al. found that even though women participate in blogging activities (focusing on emotional support), men are more likely to create filter blogs and k-logs (knowledge blogs) that are considered focused on information [1].
Blogs have been described as a new medium, one that shifts mainstream control of information into the hands of the audience. The potential use of blogs for cancer patients, basic scientists, clinical researchers, and practicing oncologists to discuss findings and suggestions has been envisioned in several cancer journals [6]. In addition, the use of online communication tools to share emotional support in all aspects of cancerrelated issues has been frequently described [2,6]. While blogs are becoming more frequently researched, empirical studies regarding blogs and their users, especially cancer patients and their companions (defined for this study as patient family and friends), are noticeably lacking. Most research has been in the area of news media [7]. Some research has been reported in lexical (textual) analyses from studies designed to provide technological frameworks to classify blog messages for improved accessibility [8,9]. However, questions regarding motivation to post or comment on blogs and the perceived outcomes of using blogs still remain challenging research tasks [10,11]. Understanding how blogs are used can allow information providers to better understand the impact blogs can have on cancer patients, their friends, and families. This study used cluster analysis techniques to classify cancer blog users' demographics, as well as their use and perceptions of blogs.

METHODS
The study, approved by the University of Kentucky's Institutional Review Board, used an online survey to target users of blogs with cancer-related content. Invitations to participate in the survey were posted on 153 individual personal blogs* that were identified through bi-weekly searches by the authors between March 30 and June 3, 2006, using Google Blog Search [12]. Searches were limited to include only blogs with the word ''cancer'' in their titles, written in the English language, and with posts created within the past month.* In addition, the search was limited to frequently used blogging services, such as Blogspot, LiveJournal, and Typepad. Before participating in the survey, individuals read a study information sheet and provided consent to participate in the study.
Survey questions sought demographic information, usage, motivation, behavioral changes, and limitations of using blogs among cancer blog users (Appendix online). The survey questions were designed and modified based on previous cancer research and research on motivations for using the Internet [13][14][15]. In particular, questions about motivations for blog use were modified to better encompass cancer patients, family, and friends as prior research typologies focused on entertainment, diversion, and habitual motivations and were not appropriate or relevant for this study. In completing the survey, participants were asked to identify themselves as a cancer patient, companion, or health care provider.
Cluster analysis can be useful in identifying natural groupings of homogeneous groups of people in a manner that both minimizes within-group variations and maximizes between-group variations [16][17][18][19][20][21][22]. Because little is known about cancer blog users, this study used cluster analysis to classify users and ascertain patterns of characteristics represented by those groups. The optimal number of clusters was determined based on the visual analysis of clustering results, such as a dendro-* The invitation was only posted in personal blogs that do not restrict postings and comments by the public. gram and scree diagram. The significance of the clustering results was based on the Best-Cut suggestion by Mojena rules 1 and 2.

Overall sample demographics
The survey was completed by 113 respondents; 59.29% (n ϭ 67) were cancer patients; 31.86% (n ϭ 36) were friends or family of cancer patients; and 6.19% (n ϭ 7) were health care providers. Three participants did not answer this question. About 77% of respondents were female (n ϭ 87), and 22.12% were male (n ϭ 25). About 94% percent (n ϭ 99) of the sample were Caucasian. The most frequently reported salary earned by the population was between $60,000 and $75,000. The average age of the respondents was 57, and 91 (71.68%) respondents held bachelor's degrees or higher.

Characteristics of cancer bloggers
Cluster analysis revealed three clusters among the data. Visual analyses of the dendrogram and scree diagram (Figures 1 and 2, online) confirmed the threecluster solution as the optimal number for the given data set. Table 1 displays each cluster group's general demographics, and Table 2 displays the means of the variables blog usage, motivation, behavioral changes, and limitations of blog use.

Demographic characteristics
Cluster 1 included 38 (33.63%) bloggers whose average age was slightly higher than that of the sample (40.82, SD 11.03, vs. 40.26, SD 12.31). Among the 38 members in cluster 1, 25 were patients and 9 were friends or family members. Over 71% (n ϭ 27) of cluster 1 members answered that they hosted their own blogs, again representing the highest among the 3 clusters. Cluster 2 had the most members in its group (n ϭ 48, 42.48%). Similar to the other 2 clusters, Caucasian women dominated this group (n ϭ 33, 68.75%). Eighteen of 30 bloggers in this cluster were single. The number of friends and family members found in this group was 20, more than half of the total friends and family members in this study. Cluster 3 had the fewest male bloggers (n ϭ 4, 16%), and the average age of the members was 40.63 (SD 11.96). Cluster 3 also included the least number of individuals who hosted their own blogs (n ϭ 15, 55.55%).

Blog use
Cluster 1 used blogs for an average of 16.76 months, which was slightly less than the other 2 clusters. Cluster 2 reported higher mean scores (4.11) for seeking health care providers as their information source than the other 2 clusters. Information sought in medical libraries and patient education centers was more frequently sought in cluster 3 (mean score 2.92) compared to the other 2 clusters.

Motivations for blog use
Cluster 1 had slightly higher mean scores for using cancer blogs to seek cancer knowledge. Cluster 2 more frequently used blogs to express their own opinions. Across the 3 clusters in this study, encouraging others and sharing personal cancer stories were the primary motivators for blog use (mean score ϭ 4.33 and 4.24, respectively). In cluster 1, seeking a second opinion, looking for timely updated information, and looking for compiled cancer information were the least motivating factors (mean scores ϭ 2.84, 3.11, and 3.13, respectively). In addition, cluster 1 had the lowest mean score for expanding cancer knowledge (3.92) and cluster 2 had the lowest mean score for validating information (3.04).

Behavioral changes
Cluster 1 members encountered fewer limitations for using cancer blogs than the other two clusters. Members in clusters 2 and 3 indicated that poor searching functions restricted their participation in cancer blogs, while members in cluster 1 indicated less agreement with that statement.

Summary of cluster characteristics
Based on analysis of the data as determined by interpretation of survey results, Table 3 summarizes major characteristics of the 3 clusters found in this study. Cluster 1 (n ϭ 38, 33.63%) was more likely to include new bloggers who were motivated to seek compiled information and were frequent online information seekers. Cluster 2 (n ϭ 48, 42.48%) was more likely to include long-time cancer blog users who also use traditional sources for information seeking. Individuals in cluster 3 (n ϭ 27, 23.89%) were highly motivated and sought medically related information. In addition, bloggers in cluster 3 made the most frequent behavioral changes while using cancer blogs. Further details are described in Table 3.

DISCUSSION/CONCLUSION
This study used an agglomerative, hierarchical clustering method to classify characteristics of unique groups among cancer bloggers. Employing this technique was most appropriate given that there was limited prior knowledge of the underlying structure and nonhierarchical methods could not clearly and objec- Table 3 Summary of cluster characteristics*

Cluster Type of bloggers Characteristics
Cluster 1 (n ‫؍‬ 38, 33.63%) New bloggers, motivated for compiled information, least influenced, frequent online information seekers Ⅲ Had the fewest months of blog use Ⅲ Frequently commented on blogs by others Ⅲ Had the highest self-efficacy reported Ⅲ Sought health care providers for cancer information Ⅲ Read blogs for compiled cancer information and to seek help for others Ⅲ Read blogs to communicate with others Ⅲ Were the least affected for changing care options Ⅲ Had the least influence on preventing cancer Ⅲ Had the smallest decrease in cost and length of hospital stay Ⅲ Were highly satisfied with blog information Ⅲ Rarely had limitations of blog use Cluster 2 (n ‫؍‬ 48, 42.48%) Long-timers, traditional source seekers Ⅲ Had the most months of blog use Ⅲ Sought medical libraries for cancer information Ⅲ Read blogs to express own opinions and to seek second opinion Ⅲ Were least interested in reading blogs to seek encouragement Ⅲ Received strongest empowerment through blogs Ⅲ Solidified existing relationships Ⅲ Had mediocre level of limitations of blog use Cluster 3 (n ‫؍‬ 27, 23.89%) Highly motivated group, medically related information seekers, most frequent behavioral change group Ⅲ Sought mass media and online discussion groups for cancer info Ⅲ Had the most frequently read blogs by others and the least frequently posted comments on blogs by others Ⅲ Read blogs to get timely updated information, to expand cancer knowledge, and to prepare background information Ⅲ Read blogs to encourage others, to receive emotional support, and to share personal cancer stories Ⅲ Obtained greatest benefits to change current care options, to further discussion with physicians, to prevent cancer, and to find alternative care Ⅲ Found blogs helpful for expressing frustration and forming relationships Ⅲ Had the highest score on the limitations of blog use * Characteristics grouped and assessed by author interpretation of mean scores in Table 2.
tively determine the number of clusters in a data set. Moreover, this method is useful when a study is still in its exploratory phase.
Some demographic data found in this study were different from previously reported data. Results from the analysis illustrated a dominant demographic group across all clusters: highly educated Caucasian females. This demographic group is inconsistent with other research findings about online cancer information seekers [5]. In this study, the bloggers were also older (average age fifty-seven) than in other studies. As older patients and their friends and family seek health-related issues (especially cancer), there are potential roles for using this new technological medium to deliver cancer information.
The study findings suggest that blogs are used more frequently to share emotional support and personal stories than medical knowledge, thus agreeing with reports and research that indicate blogs have gained their popularity over the past few years by supporting personal narratives, political commentaries, or accounts of personal experiences. This study confirmed the findings of previous research that suggested the use of blogs can lead cancer patients and their companions to engage in meaningful conversation and that sharing personal experiences via blogs may help patients better cope with their cancer-related health conditions [6].
These results can inform the design of cancer blogs that provide customized (or personalized) assistance depending on the category of cancer blog users and their distinct characteristics. For instance, more attention by cancer information specialists (including medical librarians) might be given to people in cluster 1 because their motivation is to expand their knowledge about cancer as compared to people in cluster 2 who use blogs to seek emotional support.
Additionally, medical librarianship should not overlook bloggers and their uses, because blogs can be used as a health communication medium to disseminate cancer health information. In this sense, further in-depth analysis of cancer blog messages, including both posts and comments, may be beneficial in providing subject categorization to unorganized blog contexts. Medical librarians can, thus, play a key role in making information on blogs more easily assessable.
While this study provides useful findings, it has some limitations. The study sample is small and includes primarily Caucasian patients with bachelor's degrees. In addition, the use of convenience sampling and self-reported data could present bias in the reported results. Future studies should target a larger pool of participants in a longitudinal setting for more valid, reliable, and generalizable findings.

INTRODUCTION
Responding to recent changes in the scholarly publishing process, Coy C. Carpenter Library is expanding its scholarly communications program to better support the research publication efforts of the faculty at Wake Forest University Health Sciences (WFUHS).
Recent advances in open access publishing and archiving initiatives, adoption of the US National Institutes of Health (NIH) ''Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research'' (Public Access Policy) in 2005, the rapidly increasing pool of published biomedical research, rising costs of subscription rates, and continued barriers to access have necessitated an internal redesign of the library's Faculty Publications (FP) database. Changes in the scholarly publishing environment have also spurred the creation of online resource lists specifically addressing common issues in scholarly communications, including copyright and intellectual property ownership, open access, and the importance of scholarly publishing [1][2][3].
These efforts, coupled with plans for educational sessions on open access and copyright retention for faculty, are intended to address common questions raised during the publishing process. In particular, the FP database will bridge faculty publication citations to individuals' personnel profiles in the university's human resources department's management software, PeopleSoft, and to the full text of faculty-authored journal articles, thus providing the institution with a more complete picture of WFUHS faculty research initiatives and outcomes. This paper illustrates key objectives in Carpenter Library's strategy for supporting scholarly communications through enhancing the knowledge management applications of the FP data-

BACKGROUND
Since 1977, Carpenter Library has been responsible for collecting and organizing publication citations of WFUHS faculty. The goal of this effort has been to support the research and publishing activities of the faculty, as well as provide information to the Office of the Dean of the Wake Forest University School of Medicine, a division of WFUHS, for faculty promotions and advancement. Since the beginning, reports were created to track publication activities of faculty. The paper-based system was automated in 1988 as the FP database in Cuadra Star, and in 1991 the information was moved to MS Access. Only published, scholarly materials-articles, abstracts, book reviews, chapters, books, editorials, and letters-are included in the database. Until recently, citations from the database were formatted and printed as an appendix of the Dean's Annual Report (DAR). Now the library maintains a searchable Web interface for departments as well as visitors. Statistics are compiled for the DAR by document type totals (Table 1) and by department.

PROCESS
As mandated by the office of the dean, faculty must submit all publication information, including the bibliographic citation and a copy of the first page of the material, to the library. The library maintains a Faculty Publications Web page with submittal forms, guidelines for submission, a link to the previous year's DAR, and library contact information [4]. Approximately 60% of the publications are received directly from faculty, with the remaining 40% gathered by library staff from bibliographic databases such as PubMed, BIOSIS, and Web of Science. The staff verifies all submitted materials and has maintained very high standards of accuracy over the years. Bibliographic information is entered into the database, and a Web-based interface is available for departments to track submittals and as a research tool for users.
In 2005, a committee, including library representation, was formed to create a Faculty Information (FI) database for the school of medicine. The database would include biographical and professional information for faculty. The goal of the project is to connect faculty profiles with research interests, grants, internal protocols, and publications, which would be easily ac-cessible to authorized users. WFUHS has used PeopleSoft software for vital information, salary data, and personnel activity for many years, and the committee decided to integrate professional information such as research interests, grants, and publications into a unified source. To this end, a team composed of library and information technology staff created a new interface for faculty publication data entry utilizing PeopleSoft software, which links information in the FI database through employee identification numbers. Figure 1 shows the FP database input screen, still under development, which includes fields linking users to full-text articles via the digital object identifier (DOI), uniform resource locator (URL), or the unique identifier used in PubMed Central (PMCID).

NEW DEVELOPMENTS
In light of rapid increases in both the volume of scholarly research articles published annually and the subscription costs of scientific, technical, and medical journals [2,3,5], momentum is building in the scholarly communications community for broader, less restrictive access to vital scientific and biomedical research information, particularly to taxpayer-funded research [5][6][7][8]. NIH, at the behest of the US Congress, adopted the Public Access Policy on May 2, 2005 [9]. This policy requests that copies of all peer-reviewed scholarly articles resulting from NIH-funded research projects be archived in PubMed Central (PMC) [10]. Created in February 2000, PMC is a freely accessible, full-text digital repository of peer-reviewed scholarly articles from biomedical and life sciences journals [9]. In September 2006, approximately 700,000 articles were in PMC [11]. This figure includes both materials submitted as a result of the NIH policy and articles submitted by publishers.
Although some WFUHS faculty members have complied with the NIH Public Access Policy's archiving request, the library advocates increasing the percentage of archived publications. The institutional contribution rate (ICR), modeled on a search query developed at the Health Sciences Library and Informatics Center, University of New Mexico [12], for WFUHS is 4.31%.* This figure takes into account archived author manuscripts in PMC published on or after the May 2, 2005, NIH Public Access Policy adoption date through May 7, 2007 (n ϭ 20 for WFUHS) and the number of eligible articles cited in PubMed published during the same time period (n ϭ 464 WFUHS-authored articles resulting from NIH-funded research). This compliance rate is disproportionate to the number of publications likely to result from the 1,279 NIH awards granted to WFHUS faculty during the past 5 fiscal years (July 1, 2001-June 30, 2006), which account for 68.29% of all external funding received by WFUHS researchers during that time [13,14]. †

FUTURE OBJECTIVES
The inability to accurately track the number of publications stemming from individual grants should im- † Data exclude subcontract awards. Due to the length of time typically needed to produce publishable research results and allowing for varying time periods needed to complete the prepublication peer-review process, data for the past five fiscal years is given with the reasonable assumption that most National Institutes of Health (NIH)-funded research projects supported during that time period would be eligible for author manuscript archiving in compliance with the NIH Public Access Policy.
prove with the changes to and increased functionality of the FP database. Currently, articles in the FP database include a DOI link to articles regardless of whether the full text is available via a WFUHS library subscription. The FP staff plans to include the PMCID in each relevant record to allow users to link to the freely accessible full text of faculty research articles and, in turn, encourage submission. Additionally, users will be able to link from the citation to other information in the FI database such as grant information, research protocols, and funding data. This information is useful, if not required, when applying for and renewing federal grants, as well as filing grant progress reports. The database should also fill a valuable role in promoting the exchange of internal knowledge about research interests and findings and potentially enabling connections among researchers.

PROMOTING OPEN ACCESS
The library recognizes that current low rates of compliance with PMC archiving might be due to confusion about and general lack of awareness of the NIH Public Access Policy and is tailoring its scholarly communications program to better support faculty research endeavors. Previously created ''toolkits,'' lists of online and print resources appearing on the library's Website, have been updated and are now centrally located in the site's Scholarly Publishing Assistance section. Resource pages relevant to scholarly communications include the NIH Public Access Policy, scholarly publishing and open access, copyright and intellectual property, and scientific writing [15]. The need for straightforward resource pages was identified in an interdepartmental meeting hosted by the library with representatives from the office of the dean and the office of research.
Beyond the redesigns of the FP database and the Scholarly Publishing Assistance section of the library's Website, direct educational outreach programs are under development. Plans include ''lunch and learn'' seminars to be delivered to faculty in individual departments, during which time a librarian will give a ten-to-fifteen-minute presentation on topics including open access archiving, copyright permission and retention, the NIH Public Access Policy and PMC, current compliance rates, and ways these issues impact researchers. Time for questions will be allowed following the presentation, with the total estimated time of the program to be thirty minutes. By taking these seminars to the various departments directly and limiting the time of the program to half an hour, library staff hope that many faculty will be willing to attend.

CONCLUSION
Continued growth of the scholarly communications program at Carpenter Library will be dictated by the success of current and future initiatives undertaken by the library, as well as by changes in the broader scholarly communications community. Potential adoption of open access archiving requirements stipulated by either funding sources or government bodies will also guide library staff in prioritizing their efforts to support faculty researchers. Enhancement of the FP database and the anticipated interconnectivity with the FI database should facilitate increased functionality and accountability in faculty's and administrators' research tracking efforts. Libraries interested in highlighting the research achievements of faculty but lacking the resources necessary to build and maintain an institutional repository might consider creating a bibliographic database of faculty publication citations with links to full text available elsewhere. Librarians will be able to use these databases to engage faculty and administrators in dialogue about the advantages of open access and compliance with the NIH Public Access Policy and the necessity of understanding copyright, as well as provide quantifiable evidence of the knowledge congregated in their parent institutions. These efforts further underscore libraries' valuable roles in facilitating the expansion of scientific information.