a, Example of a new protein-coding exon identified by RNA-Seq. LR, likelihood ratio. For each base in a window, we plot the average rate at which it is covered in our data. Light blue denotes bases annotated as exonic in Ensembl, black indicates bases that are not. In the gene model, blue boxes represent annotated exons from Ensembl, black lines represent annotated introns. In red is the position of an inferred new protein-coding exon. Lines represent the positions of splice junctions predicted from the RNA-Seq data and supported by more than five sequencing reads; in red are those absent from current databases. Below each junction is the number of sequencing reads supporting the junction. b, New exons are more tissue-specific than annotated exons. For each exon, we estimated the fraction of either new or annotated exons observed in each tissue profiled previously12, as well as in chimpanzee LCLs (red). The grey line represents what would be expected if both annotated and unannotated exons were observed at the same rate. AD, adipose; BR, brain; BS, breast; BT, BT cell line; CO, colon; HM, HME cell line; HR, heart; LN, lymph node; LV, liver; SK, skeletal muscle; TS, testes. Data are for exons expressed at a mean rate in human LCLs between 0.1 and 0.3 reads per million; for other expression rates see Supplementary Fig. 7. c, Example of a new polyadenylation site identified by RNA-Seq. Labelled as in a. Red line shows the position of reads identified as originating in the poly-A tail. Grey line represents the position of the predicted cleavage site. d, Binding sites for CPSF are enriched upstream of predicted polyadenylation sites. We divided predicted polyadenylation cleavage sites (supported by at least two sequencing reads) into classes based on their proximity to annotated cleavage sites. For each site, we extracted the upstream 50 bases, and plot, for each position, the fraction of sequences matching the consensus AATAAA hexamer.