The Field-Dependent Nature of PageRank Values in Citation Networks

The value of scientific research can be easier to assess at the collective level than at the level of individual contributions. Several journal-level and article-level metrics aim to measure the importance of journals or individual manuscripts. However, many are citation-based and citation practices vary between fields. To account for these differences, scientists have devised normalization schemes to make metrics more comparable across fields. We use PageRank as an example metric and examine the extent to which field-specific citation norms drive estimated importance differences. In doing so, we recapitulate differences in journal and article PageRanks between fields. We also find that manuscripts shared between fields have different PageRanks depending on which field’s citation network the metric is calculated in. We implement a degree-preserving graph shuffling algorithm to generate a null distribution of similar networks and find differences more likely attributed to field-specific preferences than citation norms. Our results suggest that while differences exist between fields’ metric distributions, applying metrics in a field-aware manner rather than using normalized global metrics avoids losing important information about article preferences. They also imply that assigning a single importance value to a manuscript may not be a useful construct, as the importance of each manuscript varies by the reader’s field.


Introduction
There are more academic papers than any human can read in a lifetime. Attention has been given to ranking papers, journals, or researchers by their "importance," assessed via various metrics. Citation count assumes the number of citations determines a paper's importance. The h-index and Journal Impact Factor focus on secondary factors like author or journal track records. Graph-based methods like PageRank or disruption index use the context of the citing papers to evaluate an article's relevance [1,2,3,4]. Each of these methods has limitations, and permutations exist that attempt to shore up speci c weaknesses [5,6,7,8].
One objection to such practices is that "importance" is subjective. The San Francisco Declaration on Research Assessment (DORA) argues against using Journal Impact Factor, or any journal-based metric, to assess individual manuscripts or scientists [9]. DORA further argues in favor of evaluating the scienti c content of articles and notes that any metrics used should be article-level (https://sfdora.org/read/). However, even article-level metrics often ignore that the importance of a speci c scienti c output will fundamentally di er across elds. Even Nobel prize-winning work may be unimportant to a cancer biologist if the prize-winning article is about astrophysics.
Because there are di erences between elds' citation practices [10], scientists have developed strategies including normalizing the number of citations based on nearby papers in a citation network, rescaling elds' citation data to give more consistent PageRank results, and so on [5,11,12,13]. Such approaches normalize away eld-speci c e ects, which might help to compare one researcher with another in a very di erent eld. However, they do not address the di erence in the relevance of a topic between elds. This phenomenon of eld-speci c importance has been observed at the level of journal metrics. Mason and Singh recently noted that depending on the eld, the journal Christian Higher Education is either ranked as a Q1 (top quartile) journal or a Q4 (bottom quartile) journal [14].
It is possible that, while global journal-level metrics fail to capture eld-speci c importance, articlelevel metrics are su ciently granular that the importance of a manuscript remains constant across elds. We investigate the extent to which article-level metrics generalize between elds. We examine this using MeSH terms to de ne elds and use eld-speci c citation graphs to assess their importance within the eld. While it is trivially apparent that journals or articles that do not have cross-eld citations will have variable importance, we ignore these cases. We include only those including citations in both elds, where we expect possible consistency. We rst replicate previous ndings that journal-level metrics can di er substantially among elds. We also nd eld-speci c variability in importance at the article level. We make our results explorable through a web app that shows metrics for overlapping papers between pairs of elds.
Our results show that even article-level metrics can di er substantially among elds. We recommend that metrics used for assessing research outputs include eld-speci c, in addition to global, ones. While qualitative assessment of the content of manuscripts remains time-consuming, our results suggest that within-eld and across-eld assessment remains key to assessing the importance of research outputs.

Journal rankings di er between elds
In an attempt to quantify the relative importance of journals, scientists have created rankings using metrics the Journal Impact Factor, which essentially uses citations per article, and those that rely on more complex representations like Eigenfactor [15]. Previous reports note that journal rankings di er substantially between elds using metrics based on citation numbers [14]. We calculated a eldspeci c PageRank-based score for each journal as the median PageRank of manuscripts published in that journal for that eld ( Fig. 1 A). We rst sought to understand the extent to which PageRank replicated journal ranking di erences across elds.
To begin, we compared the di erences in ranking between the top fty journals in nanotechnology and their corresponding ranks in microscopy. While the ranks were correlated (r=.75) there was a great deal of variance, especially for journals outside the top 20 in nanotechnology ( Fig. 1 B). We then examined the top-ranked journal in each of our 45 elds to determine whether the top-ranking journal was consistent across elds ( Fig. 1 C). We found that the most commonly top-ranked journal was Science. This was unsurprising, given that it tends to rank highly among global journal-level metrics such as eigenfactor. However, while Science was the top-ranked journal in a plurality of elds, approximately 80% of elds had a di erent journal in that spot.
We also investigated the presence of single-topic journals in our dataset, as MeSH headings re ect a di erent type of aggregation than journals do [16]. Of the 5,178 journals with at least 50 articles in our dataset, the median number of elds publishing in a given journal is 15 (Fig. 1 D). In the context of MeSH, specialty journals are rare. Most journals publish manuscripts with in one-third or more of the MeSH headings in our dataset. C) The frequency with which journals in the dataset are the top journal for a eld. D) The distribution of elds published per journal. The X-axis corresponds to the number of elds for which a journal has at least one paper within the eld. All plots restrict the set of journals to those with at least 50 papers in the dataset.

Manuscript PageRanks di er between elds
We split the citation network into its component elds and calculated the PageRank for each article (Fig. 2 A). We examined the distribution of PageRanks across elds and found that they di ered greatly (Fig. 2 B). We rst examined whether the citation practices of elds contributed to importance di erences. Investigating manuscripts that appeared in pairs of elds, we found that the distribution of importances matched the network more than that of the alternative topic area of the manuscript (Fig. 2 B, C, D).

Fields' di erences are not solely driven by di erences in citation practices
We devised a strategy to generate an empirical null for a eld pair under the assumption that the eld pair represented a single, homogenous eld ( Fig. 3 A). For each eld-pair intersection, we performed a degree-distribution preserving permutation. We created 100 permuted networks for each eld pair.
We then split the networks into their constituent elds and calculated a percentile using the number of permuted networks with a lower PageRank for a manuscript than the true PageRank. A manuscript with a PageRank higher than all networks has a percentile of 100, and one lower than all permuted networks has a percentile of zero. We used the di erence in the percentile in each eld as the eldspeci c a nity for a given paper. This percentile score allowed us to control for the di ering degree distributions between elds by comparing papers based on their expected PageRank in a random network with the same node degrees.
We selected eld pairs with varying degrees of correlation between their PageRanks (Fig. 3 B). By examining the elds' PageRank percentiles, we found that many articles had large di erences in their perception between elds (Fig. 3 C). In nanotechnology and microscopy, papers with high nanotechnology percentiles and low microscopy percentiles tended towards applications of nanotechnology, while their counterparts with high microscopy percentiles and low nanotechnology percentiles were often papers about technological developments in microscopy ( Fig. 3 A, Table 1). Immunochemistry-favored papers are largely applications of immunochemical methods, while anatomy-favored articles tend to focus experiments on a single anatomical region (Fig. 3 B, Table 2). Proteomics and metabolomics tend to use similar methods, so the elds on either end are largely (though not entirely) eld-speci c applications of those methods (Fig. 3 C, Table 3). Manuscripts favored in computational biology were similarly applications-focused. However, those with more importance in human genetics tended towards policy papers due to its MeSH heading (H01.158.273.343.385) excluding elds like genomics, population genetics, and microbial genetics (  Table 4). In addition to papers with large di erences between elds, each eld pair has papers with high PageRanks and similar percentiles. While some papers may be in uential in multiple elds, others have more eld-speci c import.
It is impossible to describe all the eld pairs and relevant di erences between elds within the space of a journal article. Instead, we have developed a web server that displays the percentiles for all pairs of elds in our dataset with at least 1000 shared articles (Fig. 3 D), which can be accessed at https://www.indices.greenelab.com. We hope that the availability of the web server and the reproducibility of our code will assist other scientists in uncovering new insights from this dataset. Points are colored based on the di erence in percentile scores in the elds e.g. "Nanotechnology-Microscopy" corresponds to the di erence between the nanotechnology and microscopy percentile scores. The numbers next to points are the reference number for the article in the bibliography. D) A screenshot of the webserver showing the percentile score di erence and journal median PageRank plot functionality.
. CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023.  . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023.  . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023.  . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023.

Selecting elds
To di erentiate between scienti c elds, we needed a way to map papers to elds. Fortunately, all the papers in Pubmed Central (https://www.ncbi.nlm.nih.gov/pmc/) have corresponding Medical Subject Headings (MeSH) terms. While MeSH terms are varied and numerous, the subheadings of the Natural Science Disciplines (H01) category t our needs. However, MeSH terms are hierarchical and vary greatly in their size and speci city. To extract a balanced set of terms, we recursively traversed the tree and selected headings having at least 10000 DOIs without having multiple children that also meet the cuto . Our resulting headings were comprised of 45 terms, from "Acoustics" to "Water Microbiology."

Building single heading citation networks
The COCI dataset consists of pairs of Digital Object Identi ers (DOIs). To change these pairs into a form we could run calculations on, we needed to convert them into networks. To do so, we created 45 empty networks, one for each previously selected MeSH term. We then iterated over each pair of DOIs in COCI and added them to a network if the DOIs corresponded to two journal articles written in English, both of which were tagged with the corresponding MeSH heading.
Because we were interested in the di erences between elds, we also needed to build networks from pairs of MeSH headings. These networks were built via the same process, except that instead of keeping articles corresponding to a single DOI we added a citation to the network if both articles were in the pair of elds, even if the citation occurred across elds. Running this network-building process yielded 990 two-heading networks.
Sampling a graph from the degree distribution while preserving the distribution of degrees in the network was challenging. Because citation graphs are directed, it is impossible to simply swap pairs of edges and end up with a graph uniformly sampled from the space. Instead, a more sophisticated three-edge swap method must be used [54]. Because this algorithm had not been implemented yet in NetworkX [55], we implemented the code to perform shu es and submitted our change to the library (https://github.com/networkx/networkx/pull/5663). With the shu ing code implemented, we created 100 shu ed versions of each of our combined networks to act as a background distribution against which we could compare metrics.
Once we had a collection of shu ed networks, we needed to split them into their constituent elds.
To do so, we reduced the network to solely the nodes that were present in the single heading citation . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 network and kept only citations between these nodes.

Metrics
We used the NetworkX implementation of PageRank with default parameters to evaluate paper importance within elds. To determine the degree to which the papers' PageRank values were higher or lower than expected, we compared the PageRank values calculated for the true citation networks to the values in the shu ed networks for each paper. We then recorded the percent of shu ed networks where the paper had a lower PageRank than the true network to derive a single number that described these values. For example, if a paper had a higher PageRank in the true network than in all the shu ed networks it received a percentile of 100. Likewise, if it had a lower PageRank in the true network than in all the shu ed networks it received a percentile of 0.
A convenient feature of the percentiles was that they were directly comparable between elds. For manuscripts represented in two elds, the di erence in scores was used to estimate its variability in importance. For example, if a paper had a score of 100 in eld A (indicating a higher PageRank in the eld than expected given its number of citations and the network structure) and a score of 0 in eld B (indicating a lower than expected PageRank), then the large di erence in scores indicated the paper was more highly valued in eld A than eld B. If the paper had similar scores in both elds, it indicated that the paper was similarly valued in the two elds.

Hardware/runtime
We ran the full analysis pipeline on the RMACC Summit cluster at the University of Colorado. The pipeline took about a week to run, from downloading the data to analyzing it to visualizing it. Performance in other contexts will depend heavily on details such as the number of CPU nodes available and the network speed.

Server details
Our webserver is built by visualizing our data in Plotly (https://plotly.com/python/plotly-express/) on the Streamlit platform (https://streamlit.io/). The eld pairs made available by the frontend are those with at least 1000 shared papers after ltering out papers with more than a 5% missingness level of their PageRanks after shu ing. The journals available for visualization are those with at least 25 papers for the given eld pair.

Discussion/Conclusion
We analyze hundreds of eld-pair citation networks to examine the extent to which article-level importance metrics vary between elds. As previously reported, we nd systematic di erences in PageRanks between elds [7,56] that would warrant some form of normalization when making crosseld comparisons with global statistics. However, we also nd that eld-speci c di erences are not driven solely by di erences in citation practices. Instead, the importance of individual papers appears to di er meaningfully between elds. Global rankings or e orts to normalize out eld-speci c e ects obscure meaningful di erences in manuscript importance between communities.
As with any study, this research has certain limitations. One example is our selection of MeSH terms to represent elds. We used MeSH because it is a widely-annotated set of subjects in biomedicine and thresholded MeSH term sizes to balance having enough observations to calculate appropriate statistics with having su cient granularity to capture elds. This selection process resulted in elds at the granularity of "biophysics" and "ecology." We also have to select a number of swaps to generate a . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted January 6, 2023. ; background distribution of PageRanks for each eld pair. We selected three times as many swaps as edges, where each swap modi es three edges, but certain network structures may require a di erent number.
We also note that there are inherent issues with the premise of ranking manuscripts' importance. We sought to understand the extent to which such rankings were stable between elds after correcting for eld-speci c citation practices. We found limited stability between elds, mostly between closelyrelated elds, suggesting that the concept of a universal ranking of importances is di cult to justify. In the way that reducing a distribution to a Journal Impact Factor distorts assessment, attempting to use a single universal score to represent importance across elds poses similar challenges at the level of individual manucripts. Furthermore, this work's natural progression would extend to estimating the importance of individual manuscripts to individual researchers. Thus, a holistic measure of importance would need to include a distribution of scores not only across elds but across researchers. It may ultimately be impossible to calculate a meaningful importance score. The lack of ground truth for importance is an inherent feature, not a bug, of science's step-wide progression.
Shifting from the perspective of evaluation to discovery can reveal more appropriate uses for these types of statistics. Field-pair calculations for such metrics may help with self-directed learning of new elds. An expert in one eld, e.g., computational biology, who aims to learn more about genetics may nd manuscripts with high importance in genetics and low importance in computational biology to be important reads. These represent manuscripts not currently widely cited in one's eld but highly in uential in a target eld. Our application can reveal these manuscripts for MeSH eld pairs, and our source code allows others to perform our analysis with di erent granularity.

Code and Data Availability
The code to reproduce this work can be found at https://github.com/greenelab/indices. The data used for this project is publicly available and can be downloaded with the code provided above. Our work meets the bronze standard of reproducibility [57] and ful lls aspects of the silver and gold standards including deterministic operation.

39.
Mapping of brain areas containing RNA homologous to cDNAs encoding the alpha and beta subunits of the rat GABAA gamma-aminobutyrate receptor. JM Séquier, JG Richards, P Malherbe, GW Price, S Mathews, H Möhler