Expanding roles in a library-based bioinformatics service program: a case study

Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting


INTRODUCTION
Rapid advances in molecular technologies during the past 2 decades have generated an unprecedented quantity of biomedical data. An arsenal of bioinformatics databases and software tools has been, and continues to be, created to assist researchers in analyzing, manipulating, and interpreting these data. As reported in Nucleic Acids Research, the number of online molecular biology databases has increased from 58 in 1996 to 1,512 in 2012, a 26-fold increase [1]. Data analysis tools have increased during the same time at an even higher rate [2]. Understandably, investigators are focused on their own research, and they and their teams find it exceptionally difficult to keep up with the latest research analysis tools in even their own specialized fields. Furthermore, the interdisciplinary nature of bioinformatics compounds this problem by increasing the complexity and expanding the knowledgebase needed to take advantage of the capabilities of these resources. To complete their work, university researchers require access to bioinformatics tools, and support in identifying and using these resources.
The health sciences library is a logical provider of bioinformatics resources and services in that it exists to meet the multidisciplinary information needs of all members of the university's research community. Instruction, consulting services, and licensing support are provided by libraries to faculty, students, and staff at no charge. No other academic unit has this same institutional mission.
The first library-based bioinformatics service program was launched in 1995 by the University of Washington Health Sciences Library. This program pioneered the use of a bioinformatics specialist with a doctoral degree in molecular and cellular biology to staff the program. Provided services included consultation, education, and training on bioinformatics tools; access to networked biological information systems; and development of a web portal to bioinformatics resources [3]. Over the next decade, similar programs were instituted at approximately twenty additional academic health sciences libraries [4][5][6]. In many cases, responsibility for the centralized licensing of commercial database suites and/or software analysis tools for researchers was also assumed by the library.
This paper offers a case study of the implementation and growth of the Bioinformatics Service Program at the University of Southern California's (USC's) Norris Medical Library (NML), and the library's actions to reassess its users' bioinformatics needs, secure funding, and expand the program to meet the growing and evolving needs of the university's research community.

ESTABLISHING THE PROGRAM
Although USC's NML had been providing individual training and workshops in the areas of molecular biology and genetics since 1990, the growing complexity of researcher needs and resources made it clear that the services provided by the NML librarian needed to be carried out by an individual with a strong background in the biosciences. In 2005, the library hired an individual with a doctoral degree (PhD) in life science to fill a vacant reference librarian position, and the Bioinformatics Service Program was established. Similar to other existing health sciences library bioinformatics programs, the primary focus of the new program at USC was on providing a consulting service, training workshops, and a web portal for bioinformatics resources [7].
Supplemental Table 5, Figure 2, and a supplemental appendix are available with the online version of this journal.
To promote its services and develop a client base for the library-based program, the bioinformatics specialist compiled an email list consisting of over 1,000 faculty members, postdocs, and staff researchers who are active in biomedical and life sciences research. The list, including updates, has been routinely used to disseminate announcements on new services and training workshops. Presentations were given at several departments or schools by invitation from researchers or faculty who had utilized the bioinformatics program. These presentations consisted of either overviews of the service and software collection or training workshops on specific research topics.
In 2008, in response to the growing demand for access to high-quality data analysis tools, the library purchased single seats for 3 commercial resources and provided free access to them for USC users. Commercial bioinformatics software programs are relatively easy to use and have reliable user support, and many contain high-quality, human-curated, and proprietary-knowledge content. The programs selected to address the major user needs were: Partek Genomics Suite for microarray data analysis, Ingenuity Pathway Analysis (IPA) for protein interaction network and pathway analysis, and Vector NTI Advance for molecular sequence analysis. In addition to their functions and scientific merits, these specific software tools were selected on the basis of feedback from USC users who attended vendor demonstrations set up by the bioinformatics specialist. During the first 2 years following the software purchase, more than 400 users registered for access to the software. Also in 2008, the bioinformatics specialist began to perform in-depth data analysis for researchers and, as a result, coauthored papers with more than a dozen faculty members. These collaborations served to further increase awareness of and the reputation of the program.
By 2009, the demand for bioinformatics services could no longer be met by one bioinformatics specialist. The volume of received questions grew as the requests became more divergent and complex. Requests for consultations, training, and data analysis assistance could not be met by a single specialist, and many had to be processed much later or turned away due to time constraints. At the same time, access to the small number of USC-licensed commercial software programs became increasingly competitive due to the limited number of licensed ''seats.''

REASSESSING NEEDS FOR BIOINFORMATICS SUPPORT
The existing service program was no longer sufficient to meet the needs of the USC research community, and it was imperative that additional funding be found to sustain and grow the program. To better understand what users needed and assess the usefulness of existing service offerings, a survey was sent to over 1,000 biomedical researchers through the program's mailing list in May 2010 (Appendix A, online only). A total of 254 researchers completed the survey, with faculty members as the largest percentage of respondents (38%), followed by graduate students (34%) and postdoctoral scholars (19%). Survey results showed that workshop training and in-person consultations were considered the most useful services provided by the program, although all services were highly rated ( Table 1). All 3 commercial tools were rated by the majority (.60%) as ''extremely useful'' or ''potentially useful,'' with IPA as the toprated tool (Table 2). At least 20%-40% of users had difficulty accessing the software tools due to limited availability (Table 3).
Strong interest was expressed in obtaining support for various types of data analysis. Most services were seen as desirable by users. All but one of the analysis needs listed was ranked as very useful or extremely useful by a majority of respondents ( Figure 1).
The survey results confirmed an overwhelming need for bioinformatics support at USC and clearly  showed that the greatest unmet need was for highthroughput data analysis support (related categories marked in Figure 1). Results also revealed that 35% of responding researchers had data they could not analyze because they lacked access to the appropriate software tools, and that 58% had data they could not analyze because they lacked sufficient training in using the appropriate tools. The shortage of NML's manpower in its bioinformatics service was remarked upon as was the need for increasing the size and scope of the licensed software collection.

RESOLVING FUNDING ISSUES
For the first four years, the bioinformatics program was funded wholly by NML. By reallocating personnel costs through staff reorganization, the personnel cost for one bioinformatics specialist was covered. The library budget also was used to make a modest investment in three bioinformatics software licenses for the university. This proved to be a critically important factor for later funding requests: statistics collected during this time were later used to demonstrate high usage and demand for the tools and the service as a whole. In addition, prominent faculty members and researchers who depended on the program's resources during this period provided strong letters of support and advocated for the program when broader funding from the university was sought. Two primary sources for additional support were targeted: (1) the USC Libraries, which has a separate budget from the health sciences libraries, and (2) USC's Office of Research. Because the bioinformatics program has an interdisciplinary focus and its user community encompasses disciplines throughout the university, it made sense for the USC Libraries to contribute funding. The Office of Research has at its core mission the charge to grow research and provide support to grant-seeking faculty. The essential role of bioinformatics in research is underscored by the fact that using the proper software tool can save months of unnecessary, or flawed, analytical work. Consulting with a bioinformatics specialist can easily make the difference between whether or not researchers receive funding for their projects, publish their findings in top-tier journals, or make discoveries that will have a significant impact on patients' lives. The Office of Research viewed the NML Bioinformatics Service Program as a useful partner in furthering its own mission. After four years of experience demonstrating how the program could significantly benefit the university community and persistent advocacy efforts by the library director and bioinformatics specialist, both of these institutional entities eventually decided to provide a significant level of financial support. A summary of program expenses and sources of funding is provided in Table 4.

ENHANCING PROGRAM SERVICES
In 2010, using funds provided by the university library, an additional bioinformatics specialist with a dual master's degree in bioinformatics and biochemistry was hired. Additional seats were purchased for the three software programs already licensed for the program, and licenses for five additional commercial resources were purchased using the funds made available from the university's Office of Research. The expanded software collection, currently consisting

Figure 1
Categorized data analysis needs at the University of Southern California * High-throughput data analysis-related bioinformatics support needs.
Roles in a bioinformatics service program of eight resources ( Table 5, online only), and the increased number of seats allowed the NML Bioinformatics Service Program to better address the constellation of data analysis needs of its users, as the total number of registered software users tripled in the following two years (Figure 2, available only online). Prior to 2009 the NML Bioinformatics Service Program held in-house workshops featuring multiple resources suitable for a specific research topic in order to help users identify the most appropriate tools to use for their research. Although useful, these workshops often lacked detailed instructions for utilizing the presented resources. Realizing that more thorough instruction was vital for getting researchers started in using a new tool, the bioinformatics specialists sought a way to incorporate in-depth, resource-oriented training into service offerings. Starting in 2009, outside trainers with extensive expertise, such as software field application specialists or program developers, were invited to conduct on-site or webinar training sessions. The training costs were included in the annual fee negotiated for licensing the software. These sessions elaborated on the key functionalities and latest features of a software tool and provided detailed instructions on using the tool to perform specific analyses. To enhance the user experience, live demonstrations and hands-on practice sessions were included whenever possible. The invited trainer workshops have appealed to a large audience over the past two years (Figure 3). By leveraging outside expertise for user training, a considerable amount of the bioinformatics specialists' time and effort has been saved.
In addition to live training, the bioinformatics specialists developed online information portals for key software programs and in-house workshops to  promote self-training. These portals included general information, training information (schedules, presentation slides, handouts, and recordings), links to online tutorials and frequently asked questions (FAQ) pages, and contact information for technical support. A total of 15 subject guides have been developed since 2010, including 12 guides on software tools and 2 on training workshops. One guide serves as a ''multimedia classroom'' to host all workshop sessions that have been previously recorded for later viewing [8]. These guides receive approximately 800-1,000 visits per month.

RESPONDING TO NEXT GENERATION SEQUENCING
Evolving biotechnologies remarkably shifted the areas of needed bioinformatics support. One noticeable change was the adoption of the next generation sequencing (NGS) technique in biomedical research, which measures genome-wide biomolecular changes at single-nucleotide resolution [9][10][11][12]. Unlike previous generation techniques, NGS generates enormous and extremely complicated data, requiring greater bioinformatics expertise as well as extreme computation power [13]. While workshops and consultations help elucidate the principles of NGS data analysis, users are still incapable of performing data analysis without access to appropriate software or satisfactory high-performance computers. A short survey sent to the program's mailing list in 2011 showed that 53% (n538) of respondents reported a concern about ''lack of expertise and software tools.'' Notably, 46% of respondents reported they expected to have NGS data in the next 6-12 months.
To address this emerging need, the NML Bioinformatics Service Program sought a combined software and hardware solution for NGS data analysis. After reviewing several commercially available NGS data analysis programs on the basis of their functionality, performance, and usability, two software suites were selected: the proprietary Partek Flow product and the open-access Galaxy suite. Both software suites integrate multiple tools for analyzing various NGS data and have implemented a user-friendly interface to facilitate the use of these command-line tools by biologists.
Computing resources suitable for performing NGS analysis were then investigated. Institutional bioinformatics cores have implemented powerful computer workstations [14,15], computer clusters [16-18], and cloud-based NGS solutions [19]. Computer cluster and cloud-based NGS solutions provide ample computational power compared to computer workstations, which have only moderate computational power; however, the latter may be more flexible and accessible. When preinstalled with bioinformatics tools, they can serve as walk-in or remote-access workstations for various bioinformatics applications in addition to performing NGS analysis [14,20]. After balancing the advantages and disadvantages and extensive discussion with USC's High-Performance Computing and Communication (HPCC) [21], the library purchased two computer workstations for its Bioinformatics Computation and Consulting Center and five computer nodes in HPCC (configured into a custom cluster named HPCC-NML) [22,23].
Once the software and hardware were in place, a student intern with shell scripting and high-performance computing experience was recruited to assist with software installation and configuration. The bioinformatics specialists then tested the implementation extensively using datasets provided by USC researchers. From their establishment in October 2012 through March 2013, the workstations and HPCC-NML have an average weekly usage of more than 400 hours on a 24/7 basis; most usage is remote. The usage is quickly growing as users produce more data.

FACILITATING RESEARCH COLLABORATION
Bioinformatics has become an indispensable component of increasingly interdisciplinary biomedical research. Typically, bench-top researchers rely on personal communication to identify bioinformatics researchers for possible collaborations. As biomedical researchers generate more genomic data with increasing complexity, their bioinformatics needs become more diverse and specific, and the traditional method of identifying and establishing bioinformatics collaborations is no longer effective. This situation provides an opportunity for bioinformatics service programs to serve as a conduit for identifying campus collaborators.
Through years of daily consultations and collaborative data analysis projects, the library-based bioinformatics program has gathered considerable awareness of research projects and expertise at USC. Bioinformatics clients have been referred to other service providers on campus when their expertise was relevant. With firsthand information on both bioinformatics needs and offerings at USC, the two library-based bioinformatics specialists were in a natural position to promote on-campus collaborations. To exploit this role, the NML Bioinformatics Service Program sponsored half-day, campus-wide collaboration symposia, which were held to facilitate exploration of potential collaborations.
The first symposium, ''Navigating an Ocean of 'Omics Data with Bioinformatics/Biostatistics Collaborations,'' was held in 2011 to promote collaboration between university labs. Fourteen high-profile speakers-including biomedical researchers, computational biologists, and biostatisticians-were invited to present 30-minute talks. The focus of the talks differed between the bioinformatics clients and providers: biomedical researchers introduced successful collaborative experiences, whereas computational biologists and biostatisticians showcased cutting-edge methods developed to analyze real-life data. The collaboration symposium was attended by 183 registrants, 73 (40%) of whom were faculty members.
''Resources for Next-Generation Sequencing'' was held in 2012, focusing on raising awareness of various Roles in a bioinformatics service program bioinformatics services and promoting lab-central service collaborations. Representatives of 5 major NGS service providers at USC-including sequencing core facilities, bioinformatics and biostatistics services, and the USC Clinical and Translational Science Institute-were invited to provide an overview of their services. Each presenter introduced the missions, types of services, and charging models (if any) for their services and discussed the concerns that affect NGS experimental design and data analysis. The 2012 NGS symposium was attended by more than 120 registrants, 43 (36%) of whom were faculty members.
Statistics plays an important role in experimental design and data analysis. In 2012, the NML Bioinformatics Service Program supplemented its range of services by providing access to free statistical consulting for users with general statistical questions. Although USC's Information and Technology Service had been offering this service on the university park campus for some time, no site on the health sciences campus had been designated for holding these consultations. Once the PhD consultant began to offer biweekly sessions in the NML Bioinformatics Computation and Consulting Center, the service became heavily used with an average of nine users and more than seven hours of consultation each week.

LESSONS LEARNED
Based on our experience, the authors believe that it is critically important to staff library-based bioinformatics service programs with individuals who possess a strong science background at the graduate degree level and preferably who have practical research experience. The library-based bioinformatics specialist must have the ability to communicate effectively in the language of the researcher.
Conducting a comprehensive needs assessment has proved to be an effective method for the bioinformatics program to gauge the needs of potential users. Results of the needs assessment revealed the problems faced by researchers and helped to define the services and resources that the program would use to address these problems, such as increasing the focus on high-throughput data (including NGS) analysis.
Commercially available bioinformatics software and analytic tools are expensive to license. In some library-based programs, fees are charged for access to licensed bioinformatics resources. To date, NML has elected not to pursue a fee-based model. In the view of the library, fees create barriers for many researchers and small labs that do not have the same ability as well-funded labs and grant-supported faculty to pay for access to resources. The library takes the position that these costs should be part of the institutional infrastructure [24]. As with other resources (journals, books, databases) provided by the library to support the educational, research, and clinical needs of its users, the bioinformatics tools should be freely available to all including graduate students, postdocs, and others who might be unable to access them otherwise.
As with traditional library resources, bioinformatics tools require personnel to ensure the resources are acquired through site licenses, promoted, and used effectively. For researchers to select the most appropriate tool and apply it to their data analysis, it is essential that they also receive appropriate educational support. The library is the only unit in the university setting with the mission to provide each of these critical service roles. However, to be a strong partner in furthering these research goals of the university, the library must also have the institution's commitment to provide additional financial support.
Establishing a bioinformatics service program does not ensure its usage. To be successful, library-based programs must put substantial effort into outreach activities [3,4,25]. Conducting regular consultations, attending in-house research presentations, and organizing campus-wide events around library-based bioinformatics resources has created excellent opportunities for the bioinformatics specialists to interact with users.
Although a library-based bioinformatics service is only one of many bioinformatics support providers in the institution, its unique service-oriented mission places it in an important position of having extensive knowledge of both the bioinformatics needs and the bioinformatics resources available at the university. This extensive knowledge allows the library-based bioinformatics staff to effectively promote intrainstitute collaboration by matching individuals with similar research interests and connecting researchers with services and tools dispersed throughout campus. The popularity of the two symposia organized by the NML Bioinformatics Service Program to encourage collaboration serves as confirmation of the value that the research community places on identifying oncampus collaborators.
Library-based bioinformatics service programs require substantial commitments on the part of the library and the institution. Despite the required efforts and the major commitment of needed resources, it is our belief that the benefits far outweigh the costs of such a program. Continual feedback from USC researchers suggests that the NML bioinformatics program is one of the most significant contributions the library has made to the work of the research community at USC. Researchers with access to appropriate bioinformatics resources and training on how to use them effectively are in a position to significantly shorten the data analysis cycle, work with a much wider range of data, and increase their competitiveness for grant applications.
Our ultimate goal is to shape the NML-based bioinformatics program into an indispensable component of the USC research community. To meet this challenge, the library's bioinformatics specialists will continue to support the university's researchers in infusing bioinformatics tools and solutions into their daily routine to promote research efficiency, sharpen research focus, and polish research hypotheses. Li et al.