Molecular biology and biochemistry of differentiation.

Recent work in the field of molecular biology and differentiation has been directed towards an assessment of the number of different genes involved in the development and differentiation process. By the techniques of RNA-DNA hybridization to single copy DNA, it appears that some 60,000-200,000 different RNA sequences are expressed during embryonic development in the mouse. The differentiating brain shows the highest degree RNA transcription diversity. The new technique of cDNA-poly(A)-containing RNA hybridization is described. The nuclear poly(A)-containing RNA appears to reflect the high complexity of sequences as determined in RNA-DNA experiments described above. The cytoplasmic poly(A)-containing RNA or messenger RNA appears to represent a subset of approximately 10-20% of the information in the nuclear poly (A)-RNA. Also, the major portion of frequency class of messenger RNA in different organs such as kidney, spleen, and liver are common to these different organs. However, brain, although containing many of the cytoplasmic messenger RNA sequences found in liver, kidney, and spleen, has a large class of messenger RNA sequences which are specific to the nervous system.

At the heart of understanding the mechanisms involved in cellular differentiation in eukaryotic organism is an assessment of the genomic output, both qualitative and quantitative, during embryonic development and in the final differentiated organ systems. Since each somatic cell contains the same genome in terms of DNA content and sequence information, the mechanisms underlying cellular differentiation must be reflected in differential gene activation and repression of specific DNA sequences at various times during development. There must be control mechanisms capable of activating and selecting part of the total genetic potential while repressing other regions at different developmental stages or in different cell types. It is assumed that specific cellular products produced by one cell can and do influence the fate of gene expression patterns of other cell types in the surrounding environment of the differentiating system. There must be *Department of Cell Biology, Baylor College of Medicine, Houston, Texas 77030. many genes, both structural and regulatory, involved in cellular processes of differentiating cells as well as in the final differentiated adult organ. We assume that many of these expressed genes will be both common to different cell types as well as genes which will be specific and characteristic of a given cell or organ.
Recent work in the field of the molecular biology of differentiation in eukaryotes, and in particular in the mouse system, has concerned itself with analysis of the informational content of RNA transcripts during development. With this information one can obtain a feel for the number of potential genes that are involved in the stabilization of the final differentiated state. The techniques of nucleic acid hybridization have provided us with useful information about genomic output in normal and abnormal development, and this tool could in the future be very useful for assessing the effect of various toxic and teratogenic compounds on various organ systems during embryonic development.
Although much information has accumulated on biochemical and genetic aspects of the pre-implantation mouse and rabbit embryo (1), this brief review will be more restricted to an analysis of genomic output at later stages of development and in final differentiated organs. The techniques of nucleic acid hybridization will be central to answering the following questions. First, what is the expressed genomic output during embryonic development in the mouse, (that is, the total informational content in the RNA transcripts or number of potential genes expressed during development)? For convenience, we define an average gene as a 1000 nucleotides of information. This is enough information to code for a protein with a moleoular weight of 38,000. The second question we will be concerned with is the total potential genes expressed in RNA in various differentiated organs of the mouse. How much of this genomic output is common and how much is specific for the various organs looked at? Finally, we can ask how much of the potential genetic information is actually transported to the cytoplasm and translated into protein (structural genes) and how much is either destroyed or processed and remains in the nucleus of the cell? With this information we may obtain an idea of at what level does the greatest organ specificity reside in terms of these RNA transcripts (nuclear vs. cytoplasm).
First, we must consider something about DNA sequence content and organization in the mouse. The DNA of most higher organisms, in particular the mouse, is said to contain basically three classes of DNA based on differences in their reannealing kinetics (Fig. 1). The highly repetitive DNA, the satellite DNA, contains over a million copies of a simple repeating sequence. The middle repetitive DNA, which in the mouse genome are present from a few hundred to several thousand copies, represents about 25% of the total mass of DNA in a typical mouse cell. The major class of DNA (60%o) reanneals with hybridization kinetics indicating only one copy per haploid genome. It is this unique DNA or single copy DNA which we will mainly be concerned about since RNA hybridization data with this DNA fraction can yield quantitative information about total informational output for a particular cell, tissue, or organ (1).

Expressed Genomic Output or Transcriptional Complexity
Once having isolated the unique DNA of the mouse by hydoxylapatite chromatography and establishing that the repetitive DNA has been removed ( Fig. 1) and that the analytical complexity or total unique genome complexity of the DNA is 2 x 109 NTP, one may now determine the expressed genomic output or transcriptional complexity of an RNA isolated from a particular cell, tissue, or organ. For example, as in Figure 2, a given RNA hybridized with 2% of the single copy DNA. This would then represent an RNA complexity of 4 x 10' NT or 40,000 different genes, each a 1000 nucleotides in length. from various embryonic stages of mouse development by hybridization to radioactive unique DNA. The results are summarized in Figure 3. From the 8-16 cell stage where there are some 10,000 potential genes of 1,000 NT each to the blastocyst, involves approxiamtely a 3-fold increase in transcriptional complexity (30,000 potential genes). From the blastocyst to term there is another 7-fold increase in transcriptional complexity (ca. 200,000 potential genes). This high RNA complexity presumably reflects the differentiation of the various organs and tissues. As we will see later, much of it is contributed by the nervous system of the mouse.

Sequence Organization and Primary Transcript
I should point out the reason one refers to the transcriptional complexity as potential genes is because it is not clear what the function is of much of this expressed genetic information. We do know that some fraction of the potential gene transcripts will be transported to the cytoplasm and become associated with polysomes. This information must represent the activity of the structural genes (messenger mRNA), since presumably this RNA information will be translated into proteins. Suffice it to say much of the transcriptional complexity of a given cell never reaches the cytoplasm and may play other vital roles in the cellular machinery.

Total Genomic Output in Different Adult Organs and in Different Organs during Development
In Figure 4 is shown an example of data obtained from similar RNA excess hybridization ex-  periments with mouse unique DNA (6). Liver, kidney, and spleen all hybridize to about 3-4% or a genomic output which represents some 60,000 to 80,000 different potential genes. Brain, on the other hand, has a total genomic output 3 to 4 times that of liver, kidney, or spleen. Presumably, this reflects the wide variety of cell types found in the brain tissue. Interestingly, as shown in Figure 5, the brain genomic output increases during late embryonic development and postnatally while the liver actually shows a slight decrease in the number of potential genes expressed postnatally. The effect of various toxic elements such as mercury chloride on the development and differentiation of the brain both during in utero life as well as after birth might be an area to analyze in terms of changing genomic out- An obvious question about the genomic output in different organs is how different or similar are the total RNA transcripts from these organs? This is a somewhat difficult question but can be approached by an additive RNA saturation experiment; that is, the RNA from liver and spleen are mixed in equal proportions and hybridized to the unique DNA (6). As shown in Figure 6, if all the sequences were different in liver and kidney, then the additive hybridization would be about 8% or 160,000 potential expressed genes. Experimentally, a 7% hybridization value is obtained. Therefore, approximately 20,000 sequences are common to both tissues. Similar experiments as shown in Figure 6, indicate that liver and spleen share some 60,000 different RNA sequences. It should be pointed out that this sort of additive hybridization experiment does not give one information about specific sequences to a given tissue-only the sequences that are common between those two organs and this is probably only a minimal estimate.
Polyadenylic-Containing RNA: Cytoplasmic Messenger RNA and Nuclear RNA Another approach to the tissue and organ specificity and overlap of RNA transcripts may be looked at by another recent technique. First, since discovering that a large fraction of cytoplasmic messenger RNA and a relatively large proportion of nuclear RNA have polyadenylic acid (poly A) tails on the 3'-end, a new approach has opened up to determining the genomic information of complexity of these poly A-RNAs in different organs (7). In Figure 7 is presented the scheme for the production and utilization of a complementary DNA probe to a given poly A-containing RNA population. With the use of a short oligodeoxythymidylic acid primer and the enzyme reverse transcriptase and 3H-deoxynucleoside triphosphates as substrate, a highly specific and sensitive probe can be synthesized. By hybridizing the radioactive complementary DNA (cDNA) to the RNA from which it was synthesized, the complexity of that poly A-RNA population can be determined by the rate at which hybrids are formed with respect to time and RNA concentration (Crjt). This principle is demonstrated in Figure 8. If a 3H-com-   plementary DNA is made to a pure messenger RNA, such as ovalbumin messenger RNA, the rate at which a sample of pure mRNAov will react or back hybridize to the radioactive cDNAov will be extremely fast since the rate is determined by the complexity, and one gene product obviously has a low complexity compared to total RNA with thousands of different sequences. On the other hand, if a poly A containing RNA population contains a 1000 different sequences, each 1000 nucleotides long, then a complementary DNA preparation from that RNA would hybridize back to the RNA at a rate a thousand times slower. Naturally, if one were to mix a pure messenger RNA and poly A containing RNA having 1000 different sequences in a one-to-one proportion, make the cDNA to this mix, and then obtain the back hybrid, the kinetic analysis or "Crot curve" would show two components with first-order rate constants varying by a factor of 1000 (Fig. 8). This, of course, is an ideal situation, and in reality when one makes cDNA to poly A containing RNA from a cytoplasmic or nuclear fraction and back hybridizes the respective RNA to its cDNA, complex kinetic curves are generated which are best resolved by computer into various first-order kinetic classes or frequences of sequences. In the idealized example (Fig. 8), there are two frequency classes, one with 1 sequence and a second with 1000 sequences, each representing 50% of the mass.

Comparison of Nuclear RNA and Messenger RNA
Previous results in embryonic systems as well as normal or transformed cells have indicated that much of the labeled nuclear RNA of HnRNA is broken down to smaller RNAs as well as down to nucleotides, and only a small portion (5-10%) is transferred to the cytoplasm with time (4,5). Recent experiments by Davidson and Britten's group working with sea urchin embryos have clearly shown that the complexity of the nuclear RNAs which hybridize to unique DNA is some ten times that of the messenger RNA isolated from the same group of embryonic cells (8). One may now ask how much of the nuclear poly A-containing RNA information is transferred to the cytoplasm; by cDNA-back hybrid analysis we may get an indication of the sequence complexities of these two RNA populations.
Paul's group, using mouse Friend cells, have recently analyzed the cytoplasmic and the nuclear poly A-containing RNA populations (9) by cDNA analysis. Their results can be summarized as follows (Fig. 9). The hybridization curves for the poly A messenger RNA are complex and can be resolved into at least three frequency classes or components. The average frequency class would indicate that approximately 4,000 to 20,000 different sequences may actually be translated in the Friend cell. On the other hand, when one analyzes the nuclear cDNA-nuclear poly A-RNA hybridization curve, which again can be divided into at least two frequency classes, it has a major first-order component (Crot1/2 = 400) indicating 80,000 potential genes. As you recall, this number is reasonably close to the value arrived at by the independent method of RNA saturation to unique DNA for various tissues like spleen, kidney, and liver (60,000-100,000 potential genes each 1000 nucleotides long). Suffice it to say, the total poly A-RNA complexity or nuclear poly A-RNA complexity tends to reflect the high complexity found  in total RNA hybridized to unique DNA, while the cytoplasmic or messenger poly A-containing RNA appears to be a subset of those nuclear sequences. How the information in nuclear RNA is processed and selected to result in the subset of sequences which ultimately are translated could be an important point of control in both normal and abnormal development and differentiation.
Overlapping Poly A-Messenger RNA Populations In Different Tissues and Organs The complementary DNA-poly A-containing RNA hybridization method offers an opportunity to assess the overlapping and tissue specific poly A-containing RNA sequences. In fact, the possibility of purifying a cDNA probe specific to a given specific state of an organ should be possible by obtaining various heterologous cDNA-poly Acontaining RNA hybridizations and isolating the cDNA that does not enter into hybrid. For example, Ryffel and McCarthy (10) have made a cDNA to total cytoplasmic messenger poly A-containing RNA from mouse brain, hybridized the cDNA with liver, spleen, and kidney messenger RNA (60% hybrid), and then isolated by hydroxyapatite chromatography the cDNA that did not form hybrids (r\40%). This cDNA is relatively specific for brain tissue.
In a similar matter, the overlap of messenger RNA populations from different tissues was studied by this method. A complementary DNA probe was made to L-cell cytoplasmic poly A messenger RNA. The kinetics of hybridization of the cDNA and the RNA from which it was made were analyzed by computer (Fig. 10). The curve can be resolved into three components, the third or high complexity component representing some 13,000 structural genes (330 amino acids each).
What is of interest here is the fact that liver, kidney, spleen, and brain cytoplasmic poly A messenger RNA can all hybridize and protect most of the L-cell cDNA. This indicates that these tissues and cells all have in common a large fraction of their messenger RNA population. However, it should be pointed out, that the RNA curve with brain RNA is shifted to higher Crot value due to the large fraction of the RNA which is specific to that tissue. It should also be pointed out that a 1 or 2% difference in the hybridization values of the different tissues could represent 100 to 200 different organ-or tissue-specific messenger RNAs, and the technique as presently used is in no way sensitive enough to pick up differences of 1-2% (10).
In Figure 11, is summarized the information briefly reviewed in this paper. The main points are that the genome puts out a lot of information apparently not involved in synthesizing proteins and the major portion of the information that does eventually arrive in the cytoplasm is common to a variety of tissues and organs. This might indicate that the nuclear RNA will show the greatest tissue.and organ diversity. This would in turn point to this high-complexity nuclear RNA as important in processing of the primary transcript or possibly having some role in the regulation of development and differnetiation.