Starting the data conversation: informing data services at an academic health sciences library

Objective: The research obtained information to plan data-related products and services. Methods: Biomedical researchers in an academic medicalcenter were selected using purposive sampling and interviewed using open-ended questions based on a literature review. Interviews were conducted until saturation was achieved. Results: Interview responses informed library planners about researchers’ key data issues. Conclusions: This approach proved valuable for planning data management products and services and raising library visibility among clients in the research data realm.


INTRODUCTION
Over the past five to ten years, libraries have begun to provide data-related services to researchers. Examples include assisting researchers in complying with the data management and sharing requirements of federally funded grants (e.g., National Institutes of Health, National Science Foundation) [1][2][3][4]; providing guidance for developing workflows and standard data collection procedures [1][2][3][4]; training researchers on how to better organize, store, and preserve their data [1,3,4]; and building searchable interfaces to provide a level of discovery and access for research datasets [1,[3][4][5]. Health sciences libraries, however, have been slow to develop these services [6,7]. New data sharing and data management initiatives from the National Institutes of Health's Big Data to Knowledge initiative and the publishers Public Library of Science (PLOS) [8,9] have created new opportunities for librarians, in particular for health sciences librarians, to expand their roles in data services.
This paper describes the methodology that the authors used to identify researchers to interview, reach out to those researchers, and conduct the interviews. It describes key findings from the interviews about the challenges that researchers face when collecting and managing data.

METHODS
The authors, located in an academic health sciences library, completed a series of interviews as a means to assess their research community and the challenges that the researchers face when collecting, managing, storing, and preserving their research data. These interviews were also designed to build connections with the researcher community. They were intended to provide valuable information to plan the development of library products and services, including an institutional data catalog to describe researchers' datasets created at the medical center, and led to the development of a tool to help basic science labs better manage their research data.

Developing interview questions
We performed a literature review to identify studies that evaluated the data-related challenges and needs of an institution's researchers. The library then selected a number of interview questions from previous studies that were deemed most appropriate for understanding researchers' data management challenges [12,16,18,20,26]. Questions taken from previous studies were adapted to make the interviews more conversational and open-ended. Additional interview questions were developed by the library to create a conversational interaction (Appendix A online only). The rationale was that if the interviews had a conversational tone, researchers would be more likely to elaborate on their answers, providing more in-depth information and bringing to light issues about which the librarians would not have thought to ask, due to the differences between their perspective and the researchers'.

Selecting study participants
Researchers with active grant funding were selected. Data from the institution's grants management tool were used to identify eligible participants. The grants management tool retrieved data from institutional researchers including their administrative department, grant funding agency, grant title, and contact information. Using the data gathered from the grants management tool, the authors identified and purposively selected researchers based on their expected data service needs, types of research (e.g., basic science, clinical research), levels of research experience, and involvement in big data research. Selected participants were sent an email outlining the librarians' intention to learn more about their data-related needs. Two attempts were made to reach out to researchers, after which a lack of response resulted in the researcher's removal from the list of potential interviewees. The authors interviewed individual researchers until theoretical saturation was achieved, such that no new insights into key requirements for library data services were identified.

Conducting the interviews
Prior to each interview, the librarians reviewed the stated research interests and publications of the researchers being interviewed to gain a better understanding of their research methods, including the types of data collected, the data collection methods used, and whether the researchers used newly created data or existing data from previous studies. This information provided librarians with the necessary background to feel confident discussing researchers' data during the interviews and provided context for the interviewer as the researchers responded to questions about their research data.
Two librarians were present for each interview: one who led the discussion and another who took notes on a laptop using word processing software. Using two librarians allowed the interview to remain conversational, so that one librarian would not be tasked with asking questions, listening intently, and taking notes at the same time.

Analyzing the results
Notes collected during the interviews were saved to a secure institutional server, and no personal identifying information was collected; only the distinction between basic science and clinical researchers was recorded, as well as the researchers' departments. Interview responses were coded in a word processing document using the grounded theory method and then transferred to a spreadsheet with an indication of being collected from either a basic science or clinical researcher (Appendix B online only). This spreadsheet served as a large, de-identified dataset, comprising frequencies of the major themes related to the data management of the interviewed researchers. The institutional review board gave this study an exemption, as the de-identified dataset categorized this study as non-human subject research.

RESULTS
Researchers were invited to participate in the study until theoretical saturation was achieved, at which time the authors had conducted thirty interviews, comprising eleven interviews with basic scientists and nineteen with clinical researchers. A number of responses to the questions were unique to individual researchers and therefore did not provide the librarians with information they could use to implement widespread products and services. These results can be viewed in the online Appendix C. Themes that did emerge from the interviews are described in Table 1. The specific themes that provided the library with an opportunity to implement new products and services are discussed in more detail below.

Data organization challenges and needs
Basic science researchers. The basic science researchers interviewed identified several challenges in managing their data. The biggest obstacle for researchers was the perceived lack of standards and procedures available for them to uniformly collect their data. Without specific collection standards, researchers were left to develop custom data collection methods, constantly reinventing the wheel, sometimes with every new research project.
Another issue that researchers identified was a disconnect between the different types of data collected. For example, imaging data and raw numerical data that were collected as part of the same research project were often located in different places and, therefore, difficult to find. Postdoctoral researchers and graduate students, who work in a lab for a limited amount of time, exacerbate this problem: these researchers work on a specific project but then leave with either the physical data or the methodology they used to collect that data. This leaves the basic science researcher without the ability to understand who used their data, how they used their data, or where their data have gone once that researcher leaves.
Clinical researchers. The major challenges identified by clinical researchers related to the quality of their data. Many researchers mentioned data quality as a major concern. This issue often stems from the involvement of multiple personnel in collecting data for a clinical study, coupled with inconsistent data collection methods. These inconsistencies can result in team members entering data elements using different interpretations of a given variable (e.g., weight measured in pounds versus kilograms), potentially rendering a data element or an entire dataset useless. Clinical researchers also identified difficulties in transferring data from one format to another. Clinical researchers use a number of different types of statistical software (e.g., SAS, SPSS, STATA, R) as part of their research process, and moving data between different types of software often results in poor data quality and even data loss.

Researcher interest in data sharing
Identifying researchers most interested in sharing their data was essential to inform the implementation of a data catalog for internally generated research datasets. The interviews identified clinical researchers-particularly those in the Department of Population Health (11 researchers)-as willing to share their data with the public as long as they were aware of who was using their data. Those same researchers expressed interest in finding shared datasets for their own research, either through direct access or collaboration. Responses to the interviews suggested that basic science researchers currently show little interest in sharing their research data, as the majority preferred to share with their direct collaborators or with no one at all. Basic science researchers cited a number of reasons for a reluctance to share data including negative experiences with past sharing, concerns about privacy restrictions, the belief that their data are too specialized to be of value to others, insufficient storage options for sharing data publically, and the hurdle of having to organize their data prior to sharing.

DISCUSSION
The biggest challenge that libraries face in building data management services is the researchers' perception that librarians do not understand research data and have no role to play in data management. While several other studies interviewed researchers about their data management challenges, many took an approach that seemed to call upon the researchers to be conversant in the language of the library, rather than speaking to the researchers in their own language. For example, the use of terminology such as ''e-science,'' ''metadata,'' and ''Dublin Core'' throughout the data interview process-terms that have little to no meaning for most researchers-may serve to widen, rather than narrow, the gap between librarians and researchers. Through the careful construction of ''researcher-centric'' questions and thorough preparation by the interviewers in educating themselves about the researchers' work, the interviewers were able to avoid this potential pitfall. Another strategy the librarians found to be very effective was making the interviews conversational and open-ended. Providing a relaxed environment for the researchers allowed the interview questions to flow more coherently, gave the librarians the opportunity to ask the researchers to elaborate on their answers in a more natural way, and allowed room for the researchers to expand their answers into areas that the librarians, with their different perspective, might have overlooked.
Through the data interviews, the authors gained valuable knowledge about the medical center research community's data issues including, but not limited to, the challenges they face when collecting, organizing, and sharing their research. Insights gained from the interviews provided new information that led to the improvement or development of library data products and services. The understanding that the Department of Population Health is most keen to share their data and find other research datasets that they can use for their research provided useful information that allowed the library to build out its data catalog to first address the needs of its most likely users. The data interview results regarding the extent of the difficulties that basic science researchers face in organizing the data in their labs led to the development of a low-barrier lab organization tool that is currently being piloted in two basic science labs.
Data interviews are an effective means of elucidating the challenges that researchers at an institution face when collecting, organizing, and sharing their data. The interviews also raise the visibility and, when conducted well, can enhance the credibility of the library in the realm of research data. Because of both benefits of raised visibility and credibility and the high variability of responses across researchers and so presumably across institutions, the value of what is reported in this report may lie more in the methodology than the specific results, as these interviews can serve as an important first step for a health sciences library to insert itself into the data conversation and change the perceptions of the research communities that they support.