Analysis of inconsistencies in terminology of spinal and bulbar muscular atrophy and its effect on retrieval of research *

Diseases are sometimes known by many names 1–9, which complicates the retrieval of publications. Rare and emerging diseases may be especially vulnerable to this. Health sciences librarians do not often encounter rare diseases and may be unaware of the search challenges that these diseases present. To effectively retrieve publications for research, a strategy to identify all the different names of a disease is needed so that these can be incorporated into a comprehensive search. The author was engaged in a research project to create a comprehensive bibliography of the rare disease spinal and bulbar muscular atrophy (SBMA). The project utilized a strategy to identify all of the various names for the disease. This strategy should be helpful not only for SBMA researchers, but for anyone interested in a methodology for obtaining a comprehensive list of names for a disease. 
 
Spinal and bulbar muscular atrophy (SBMA) 
SBMA (ORPHA481, OMIM #313200, SNOMED CT Concept ID 230253001†) is a rare progressive neuromuscular disorder of males marked by proximal muscle weakness, cramping, fasciculations (twitching of individual muscle fibers), and muscle atrophy. Symptoms have been reported to first begin to develop between the third to sixth decades of life. Prevalence of SBMA has been reported alternately as 1 in 40,000 10, 1–2 in 100,000 11, and less than 1 in 50,000 live male births 12, but it is thought to be underdiagnosed 13–15. Degeneration of anterior horn cells (lower motor neurons) in the spinal cord of affected individuals is observed. Additional symptoms may include gynecomastia (abnormal growth of breasts in males), testicular atrophy, dysarthria (difficulty speaking), and dysphagia (difficulty swallowing). At this time, there is no known cure for most such neuromuscular diseases 16. 
 
Inheritance of SBMA is by traditional X-linked genetics. The disorder is related to a genetic defect in which a trinucleotide repeat occurs in the first exon of the androgen receptor gene on the X chromosome, first identified by La Spada in 1991 17. The string of three nucleotides of a trinucleotide repeat is present in a normal gene but is an unusually long string in a defective gene. The trinucleotide repeat of SBMA is cytosine-adenine-guanine (CAG), which codes for the amino acid glutamine. The presence of the repeat in DNA translation results in a string of glutamine molecules in the resulting peptide. Normal CAG repeat length in the androgen receptor gene is 11–34. SBMA is diagnosed if the number of CAG repeats exceeds 38 18, 19. 
 
Women do not develop SBMA, but heterozygous and homozygous women may exhibit mild symptoms, particularly muscle twitching and cramping 13, 20, 21. 
 
SBMA was identified as a unique disorder in 1968 by William R. Kennedy, but the disorder existed before its discovery 22. As far back as 1897, Japanese neurologist Hiroshi Kawahara first described what appeared to be SBMA in two brothers suffering from muscle atrophy and fasciculation of the tongue and limbs, with adult onset and sex-linked recessive inheritance 23, 24. 
 
Several disorders of varying severity and outcomes resemble SBMA, and the disorder is thought to be frequently misdiagnosed 25–29. The most well known is amyotrophic lateral sclerosis. The most challenging are the spinal muscular atrophies (SMA). However, SMA is an autosomal recessive genetic disease. Symptoms of most forms of SMA arise in childhood, but SMA3 is suggested to possibly first appear in adolescence or young adulthood. SMA4 symptoms may appear after age 30 30.


INTRODUCTION
Diseases are sometimes known by many names [1][2][3][4][5][6][7][8][9], which complicates the retrieval of publications. Rare and emerging diseases may be especially vulnerable to this. Health sciences librarians do not often encounter rare diseases and may be unaware of the search challenges that these diseases present. To effectively retrieve publications for research, a strategy to identify all the different names of a disease is needed so that these can be incorporated into a comprehensive search. The author was engaged in a research project to create a comprehensive bibliography of the rare disease spinal and bulbar muscular atrophy (SBMA). The project utilized a strategy to identify all of the various names for the disease. This strategy should be helpful not only for SBMA researchers, but for anyone interested in a methodology for obtaining a comprehensive list of names for a disease.
Spinal and bulbar muscular atrophy (SBMA) SBMA (ORPHA481, OMIM #313200, SNOMED CT Concept ID 230253001{) is a rare progressive neuromuscular disorder of males marked by proximal muscle weakness, cramping, fasciculations (twitching of individual muscle fibers), and muscle atrophy. Symptoms have been reported to first begin to develop between the third to sixth decades of life. Prevalence of SBMA has been reported alternately as 1 in 40,000 [10], 1-2 in 100,000 [11], and less than 1 in 50,000 live male births [12], but it is thought to be underdiagnosed [13][14][15]. Degeneration of anterior horn cells (lower motor neurons) in the spinal cord of affected individuals is observed. Additional symptoms may include gynecomastia (abnormal growth of breasts in males), testicular atrophy, dysarthria (difficulty speaking), and dysphagia (difficulty swallowing). At this time, there is no known cure for most such neuromuscular diseases [16].
Inheritance of SBMA is by traditional X-linked genetics. The disorder is related to a genetic defect in which a trinucleotide repeat occurs in the first exon of the androgen receptor gene on the X chromosome, first identified by La Spada in 1991 [17]. The string of three nucleotides of a trinucleotide repeat is present in a normal gene but is an unusually long string in a defective gene. The trinucleotide repeat of SBMA is cytosine-adenine-guanine (CAG), which codes for the amino acid glutamine. The presence of the repeat in DNA translation results in a string of glutamine molecules in the resulting peptide. Normal CAG repeat length in the androgen receptor gene is 11-34. SBMA is diagnosed if the number of CAG repeats exceeds 38 [18,19].
Women do not develop SBMA, but heterozygous and homozygous women may exhibit mild symptoms, particularly muscle twitching and cramping [13,20,21].
SBMA was identified as a unique disorder in 1968 by William R. Kennedy, but the disorder existed before its discovery [22]. As far back as 1897, Japanese neurologist Hiroshi Kawahara first described what appeared to be SBMA in two brothers suffering from muscle atrophy and fasciculation of the tongue and limbs, with adult onset and sex-linked recessive inheritance [23,24].
Several disorders of varying severity and outcomes resemble SBMA, and the disorder is thought to be frequently misdiagnosed [25][26][27][28][29]. The most well known is amyotrophic lateral sclerosis. The most challenging are the spinal muscular atrophies (SMA). However, SMA is an autosomal recessive genetic disease. Symptoms of most forms of SMA arise in childhood, but SMA3 is suggested to possibly first appear in adolescence or young adulthood. SMA4 symptoms may appear after age 30 [30].

Thesaurus and subject headings
Typically, problems resulting from inconsistent vocabulary can be overcome by using the thesaurus and subject heading searches available in a database. However, emerging diseases may not have a subject heading assigned yet, and rare diseases may not be common enough to be worthy of a unique subject heading. Presumably, subject headings are assigned using the most authoritative name in the estimation of database developers and, therefore, may reflect any existing confusion in name usage. The Medical Subject Headings (MeSH) term, ''Bulbo-Spinal Atrophy, X-Linked,'' was assigned to SBMA only recently, in 2009. From 2000 to 2008, the MeSH term, ''Muscular Atrophy, Spinal,'' was used, but when searched, it includes that entire class of neuromuscular disorders, not just SBMA. CINAHL has assigned the subject heading of ''Bulbo-Spinal Atrophy, X-Linked'' only since 2010. Prior to 2010, no specific subject heading was assigned. Databases do not typically re-index older references to reflect new subject headings so recent designations will not aid in retrieval of older literature.

METHODS
Creating the comprehensive bibliography involved creating both a list of name variants and a list of references. The list of name variants and their frequency of occurrence was created in Microsoft Excel. The bibliography was created in EndNote bibliographic management software. The Groups feature in EndNote made it possible to organize and count references by the name variant used. A basic Group can be created by dragging and dropping records into it. A Smart Group automatically pulls in records based on a Boolean search created by the user. EndNote X5 added the ability to create From Groups, using Boolean logic to combine existing Groups into a new Group. Groups facilitated organizing references by name variant, discovering and marking new name variants, and counting records.
To develop a search strategy, the author first acquired a basic understanding of SBMA and similar diseases to exclude the latter from the bibliography. The author was initially given two synonyms for SBMA by a physician: Kennedy disease and X-linked bulbospinal neuronopathy. Because the disease was known to be genetic, OMIM #313200 was used to identify the key characteristics of the disease and for an initial set of references. References were entered into a library in the EndNote bibliographic management software. Key characteristics of SBMA were used to eliminate references for diseases that resembled SBMA.
Reference books on neurology, genetic diseases, and other relevant topics were checked for SBMA entries using the various names from the initial set of references. Google Preview books sometimes allowed the SBMA entry in the book to be viewed. These entries were also examined for name variants and references, and the references were added to EndNote.
All of the prominent health sciences databases available to the author were searched, including the EBSCOhost databases for MEDLINE, CINAHL, Biomedical Reference Collection: Basic, Health Business FullTEXT, and Health Source: Nursing/Academic Edition, and the additional Thomson Reuters databases for Web of Science and Biological Abstracts. Retrieved records were first evaluated for whether they addressed the correct disease and then examined for name variants. As additional names were revealed, the author incorporated them into the developing Boolean search. When full-text articles were obtained, the references of the article were examined for additional records and name variants. Retrieved records were imported into EndNote.
The author used a circular methodology to find name variants where the method and the results influenced each other. Known name variants were incorporated into the search to find more name variants, which were then also incorporated into the search. The author was concerned that this methodology might miss references in previously searched databases. Therefore, when the author assumed she had a reasonably comprehensive EndNote library, she set it aside and started over searching all databases and sources using the Boolean search developed thus far. In this manner, she re-retrieved records and found records that might have been overlooked at the beginning of the project.
EndNote Groups were created for each name variant. Each EndNote reference was examined and added to the appropriate Groups, which counted references. Summary data of different name variants and their frequency of occurrence were entered into an Excel spreadsheet.
Many decisions about data admissibility were necessary. Records were retained in the developing bibliography if the title or abstract included a noun phrase with the specific disorder SBMA as a referent. Because SBMA had no name prior to 1968, pre-1968 records were excluded despite the fact that some probable cases were cited in the literature. Duplicate records from different databases were retained if they used the same SBMA name variant but spelled it differently, because different spellings affected the population of records that could be retrieved in databases and EndNote. For example, some databases, Web of Science in particular, use hyphens in name variants, and some do not.

RESULTS
A comprehensive bibliography from 1968 to 2010 produced 788 records of 206 different noun phrases that constitute name variants of SBMA. This included 32 records (4.1%) that were duplicates due to spelling discrepancies between databases. Categories were not mutually exclusive; the same record might be counted in more than 1 category. Some records contained multiple name variants or acronyms to identify the disorder, and some name variant constituents included various adjectives or compound words attached to the basic noun phrase. MEDLINE retrieved 100 records with a bracketed title indicating they were translations of titles from foreign languages.

Individual name variant records
The most commonly used name variant of ''spinal and bulbar muscular atrophy'' constituted only 38.8% of the records (Table 1). When ''spinal and bulbar muscular atrophy'' and ''spinal and bulbar muscularatrophy'' records were combined, they totaled 40.4%. The 4 most common name variants combined constituted 69.7% of the total bibliography from 1968 to 2010. The 8 most common name variants combined constituted 83.0%. Three of the top 13 were variations of shorter noun phrases already in the list. (Online only  Table 2 provides a complete list of all noun phrases.)

Kennedy family of name variants
Diseases are often named after their discoverers. All name variants that incorporated the name of ''Kennedy'' or another acknowledged discoverer constituted Brief communications: Arvin 38.5% of the records in the bibliography. No individual kennedy variant was used in more than 25.0% of the records.

Acronym variant records
Records using an acronym in the title or abstract to refer to the disorder totaled 45.4% of records in the bibliography. The most common acronym was SBMA, which constituted only 34.5% of all records in the bibliography and 76.0% of total records using any acronyms. Searches in science and health databases retrieved 26 false hits of alternative referents, usually for the acronyms SBMA and BSMA.

Name variant families
Specific root terms were observed to be used in almost every name variant of SBMA. The family groups were spinal/bulbar combination variants (81.1%), muscular atrophy variants (71.3%), kennedy variants (38.8%), neuronopathy/neuropathy variants (7.7%), and amyotrophy variants (2.8%). Individual records may occur in more than one variant family.

DISCUSSION
The rare disease SBMA displayed wide variation in naming, which would likely affect the success of any comprehensive search. Multiple name variants with inconsistent structures complicated the search strategy. Researchers creating a literature review for a specific SBMA research question would likely have difficulty locating all 206 name variants. A search for any single name variant would retrieve less than half of the published literature at best. A search that combined the 13 most common name variants would retrieve 88.5% of the known records (Table 1). All other name variants occurred rarely, individually constituting less than 2.0% of the comprehensive bibliography, but excluding them completely could miss publications of interest to determined researchers. Translations complicate retrieval as they were very likely to use a unique or unusual name variant for SBMA. A search for only kennedy variants would retrieve less than half of the published literature. The low percentage of acronyms as well as the high possibility of false hits makes searching by acronym minimally effective.
The author investigated which of the terms of ''neuronopathy,'' ''amyotrophy,'' or ''muscular atrophy'' were most frequently used in SBMA name variants. In addition, name variants incorporating the discoverer ''Kennedy'' and the commonly occurring combination of variations of ''spinal'' and ''bulbar'' were examined to learn their frequency of occurrence. The ''spinal-bulbar combination'' family (81.1%) and the ''muscular atrophy'' family (71.3%) are both more commonly used than the other families of terms.

CONCLUSIONS
This study illuminates the extent of the terminology problem for searches for literature relating to rare diseases. It is reasonable to assume that other rare and emerging disorders might have similar challenges resulting from terminology inconsistencies. Awareness of the challenge may help researchers avoid missing important studies.
Further monitoring of rare and emerging disease publications is recommended to track the magnitude of the impact of inconsistent naming on search success. Over time, trends in name usage may become apparent. Preferred name variants may also change over time. Future studies might explore the impact of key research laboratories on the development of rare disease terminology. It is a reasonable hypothesis that established researchers influence the terminology used by graduate students and research partners in their own studies. Brief communications: Arvin