Each unit of the polymeric structure is composed of a sugar (deoxyribose), a phosphate, and a variable base that protrudes from the sugar-phosphate backbone.
The structure of DNA, an abbreviation for deoxyribonucleic acid, illustrates a basic principle common to all biomolecules: the intimate relation between structure and function. The remarkable properties of this chemical substance allow it to function as a very efficient and robust vehicle for storing information. We begin with an examination of the covalent structure of DNA and its extension into three dimensions.
Each unit of the polymeric structure is composed of a sugar (deoxyribose), a phosphate, and a variable base that protrudes from the sugar-phosphate backbone.

All four bases are planar but differ significantly in other respects. Thus, the monomers of DNA consist of a sugar-phosphate unit, with one of four bases attached to the sugar. These bases can be arranged in any order along a strand of DNA. The order of these bases is what is displayed in the sequence that begins this chapter. For example, the first base in the sequence shown is G (guanine), the second is A (adenine), and so on. The sequence of bases along a DNA strand constitutes the genetic information—the instructions for assembling proteins, which themselves orchestrate the synthesis of a host of other biomolecules that form cells and ultimately organisms.
The double-helical structure of DNA proposed by Watson and Crick. The sugar-phosphate backbones of the two chains are shown in red and blue and the bases are shown in green, purple, orange, and yellow.
Adenine pairs with thymine (A-T), and guanine with cytosine (G-C). The dashed lines represent hydrogen bonds.
The base-pairs A-T (blue) and C-G (red) are shown overlaid. The Watson-Crick base-pairs have the same overall size and shape, allowing them to fit neatly within the double helix.
If a DNA molecule is separated into two strands, each strand can act as the template for the generation of its partner strand.
An important nucleic acid in addition to DNA is ribonucleic acid (RNA). Some viruses use RNA as the genetic material, and even those organisms that employ DNA must first convert the genetic information into RNA for the information to be accessible or functional. Structurally, RNA is quite similar to DNA. It is a linear polymer made up of a limited number of repeating monomers, each composed of a sugar, a phosphate, and a base. The sugar is ribose instead of deoxyribose (hence, RNA) and one of the bases is uracil (U) instead of thymine (T). Unlike DNA, an RNA molecule usually exists as a single strand, although significant segments within an RNA molecule may be double stranded, with G pairing primarily with C and A pairing with U. This intrastrand base-pairing generates RNA molecules with complex structures and activities, including catalysis.

RNA has three basic roles in the cell. First, it serves as the intermediate in the flow of information from DNA to protein, the primary functional molecules of the cell. The DNA is copied, or transcribed, into messenger RNA (mRNA), and the mRNA is translated into protein. Second, RNA molecules serve as adaptors that translate the information in the nucleic acid sequence of mRNA into information designating the sequence of constituents that make up a protein. Finally, RNA molecules are important functional components of the molecular machinery, called ribosomes, that carries out the translation process. As will be discussed in Chapter 2, the unique position of RNA between the storage of genetic information in DNA and the functional expression of this information as protein as well as its potential to combine genetic and catalytic capabilities are indications that RNA played an important role in the evolution of life.
A major role for many sequences of DNA is to encode the sequences of proteins, the workhorses within cells, participating in essentially all processes. Some proteins are key structural components, whereas others are specific catalysts (termed enzymes) that promote chemical reactions. Like DNA and RNA, proteins are linear polymers. However, proteins are more complicated in that they are formed from a selection of 20 building blocks, called amino acids, rather than 4.
The three-dimensional structure of a protein, a linear polymer of amino acids, is dictated by its amino acid sequence.
How is the sequence of bases along DNA translated into a sequence of amino acids along a protein chain? We will consider the details of this process in later chapters, but the important finding is that three bases along a DNA chain encode a single amino acid. The specific correspondence between a set of three bases and 1 of the 20 amino acids is called the genetic code. Like the use of DNA as the genetic material, the genetic code is essentially universal; the same sequences of three bases encode the same amino acids in all life forms from simple microorganisms to complex, multicellular organisms such as human beings.
Knowledge of the functional and structural properties of proteins is absolutely essential to understanding the significance of the human genome sequence. For example, the sequence at the beginning of this chapter corresponds to a region of the genome that differs in people who have the genetic disorder cystic fibrosis. The most common mutation causing cystic fibrosis, the loss of three consecutive Ts from the gene sequence, leads to the loss of a single amino acid within a protein chain of 1480 amino acids. This seemingly slight difference—a loss of 1 amino acid of nearly 1500—creates a life-threatening condition. What is the normal function of the protein encoded by this gene? What properties of the encoded protein are compromised by this subtle defect? Can this knowledge be used to develop new treatments? These questions fall in the realm of biochemistry. Knowledge of the human genome sequence will greatly accelerate the pace at which connections are made between DNA sequences and disease as well as other human characteristics. However, these connections will be nearly meaningless without the knowledge of biochemistry necessary to interpret and exploit them.
A disease that results from a decrease in fluid and salt secretion by a transport protein referred to as the cystic fibrosis transmembrane conductance regulator (CFTR). As a result of this defect, secretion from the pancreas is blocked, and heavy, dehydrated mucus accumulates in the lungs, leading to chronic lung infections.