NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Trikalinos TA, Moorthy D, Chung M, et al. Comparison of Translational Patterns in Two Nutrient-Disease Associations: Nutritional Research Series, Vol. 5. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 Oct. (Technical Reviews, No. 17.5.)

Cover of Comparison of Translational Patterns in Two Nutrient-Disease Associations

Comparison of Translational Patterns in Two Nutrient-Disease Associations: Nutritional Research Series, Vol. 5.

Show details

Appendix BDetails on the Construction of Citation Graphs

For each of the N papers in each topic (in each corpus), we queried Thompson ISI to obtain a list of its references. We examined these lists to identify which papers are cited by other papers in the corpus, ignoring citations by papers that were not included in the corpus. We organized this information into an adjacency matrix M, i.e., an N × N matrix whose elements mij code the number of citations (0 or 1) from the j-th to the i-th paper. This matrix contains all the information that is necessary to create the citation graph of the N papers.

In graph theory terminology, citation graphs are directed and acyclic. Directed, because the direction of the arcs is always from the paper that is being cited towards the paper that is making the citation. Acyclic means that there are no closed “loops” in a citation network, because a paper cannot cite itself, and, generally, two or more papers do not cite each other. There are however rare instances where two papers that are, e.g., published in the same issue, cite each other, resulting in a non-acyclic graph. We can transform this graph into a directed graph by assuming that such papers essentially cite each other’s preprints (Figure B1). This transformation does not affect important characteristics of the networks such as the distributions of in-degrees, authority scores, hub scores, and main path scores (see Appendix C).

The left panel shows two papers A and B that cite each other. This is a rare occurrence, but can be encountered, e.g., in invited papers published in the same issue. This introduces a directed loop in the citation graph: starting from A, there is an arc that points to B and there is another arc that returns to A. The right panel shows how we convert this graph into an undirected one. Specifically, we assume that each of the two papers that are printed in the same issue cites a “preprint” of the other paper. This transforms the graph to an undirected acyclic graph. Because this fix is rarely employed it does not affect the distributions of indegrees, outdegrees, authority scores, hub scores, and main path scores the over the original papers. This correction can be extended to three or more papers.

Figure B1

Resolving mutual citations to ensure that the citation graph is a simple directed acyclic graph. Left side (a) shows two papers A and B that cite each other. This is a rare occurrence, but can be encountered, e.g., in invited papers published in the same (more...)

After correcting for papers that cite each other, we verified that the resulting networks were acyclic, and that there was temporal consistency, i.e., that there were no citations from earlier published to later published papers (a paper published in 1979 cannot cite a paper published in 2000).

Practicalities and Coherency Assessment

We matched citations by exact string matching of titles, after basic preprocessing. Subsequently, we used fuzzy string-matching algorithms (algorithms that tolerate small discrepancies between two title strings) to identify titles that did not match exactly, but pertained to the same paper. This can happen especially for older papers that were entered manually in the Thompson ISI database, because of typos or alternative spelling of title words (e.g., “Randomised trial of …” in MEDLINE may become “Randomized trial of …” in ISI). A human manually reviewed all title pairs that had a Levenshtein edit distance of 5 or less. The results of the manual review were taken into account when forming the final citation graph.

Main Path Articles

Main paths go from a source vertex to a sink vertex in a citation network, and include vertices and arcs with the highest traversal weights. A source vertex is a vertex that has only outgoing arcs (indegree=0, outdegree>0) and a sink vertex is a vertex that has only incoming arcs (indegree>0, outdegree=0). The traversal weight of a vertex or an arc expresses the proportion of paths from all sources to all sinks in the entire network that include the particular vertex or arc. We calculated transversal weights using the Search Path Link Count (SPLC) method implemented in the Pajek software.60 The results of the SPLC algorithm were very similar to those of alternative methods (vertex pair projection count, VPPC, and search path vertex pair, SPVP). For details on these algorithms and a comparison of their relative performance see the technical report by Batagelj 2003.60 One may consider main path articles as central in a field because they integrate information from previous articles (vertices) and propagate information to other articles (vertices).60-64

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.3M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...