Decoding the Absolute Stoichiometric Composition and Structural Plasticity of α-Carboxysomes

ABSTRACT Carboxysomes are anabolic bacterial microcompartments that play an essential role in carbon fixation in cyanobacteria and some chemoautotrophs. This self-assembling organelle encapsulates the key CO2-fixing enzymes, Rubisco, and carbonic anhydrase using a polyhedral protein shell that is constructed by hundreds of shell protein paralogs. The α-carboxysome from the chemoautotroph Halothiobacillus neapolitanus serves as a model system in fundamental studies and synthetic engineering of carboxysomes. In this study, we adopted a QconCAT-based quantitative mass spectrometry approach to determine the stoichiometric composition of native α-carboxysomes from H. neapolitanus. We further performed an in-depth comparison of the protein stoichiometry of native α-carboxysomes and their recombinant counterparts heterologously generated in Escherichia coli to evaluate the structural variability and remodeling of α-carboxysomes. Our results provide insight into the molecular principles that mediate carboxysome assembly, which may aid in rational design and reprogramming of carboxysomes in new contexts for biotechnological applications.

According to the forms of encapsulated Rubisco and protein composition, carboxysomes can be categorized into two subclasses, aand b-carboxysomes (11,20). The a-carboxysome of the chemoautotrophic bacterium Halothiobacillus neapolitanus has been chosen as a model carboxysome in fundamental studies and synthetic engineering. The genes encoding a-carboxysome-related proteins are clustered mainly in the cso operon in the H. neapolitanus genome (Fig. 1). The shell is constructed by six types of paralogous proteins, including the hexameric proteins (BMC-H) CsoS1A, CsoS1B, and CsoS1C, which tile the major facet of shells, the pentamers (BMC-P) CsoS4A and CsoS4B, which sit at the vertices, and the trimeric pseudohexamer (BMC-T) CsoS1D, which possesses a larger central pore than other shell proteins and which was proposed to play a role in mediating the passage of large metabolite molecules, such as RuBP and 3-phosphoglycerate (3-PGA) (14,(21)(22)(23). Among the BMC-H proteins, CsoS1A and CsoS1C have a high sequence similarity, differing in only 2 amino acids out of 98 (24,25), whereas CsoS1B contains a 12-residue C-terminal extension (24). The cargo enzymes include Rubisco and CA. Rubisco is assembled by the large and small subunits CbbL and CbbS, which form an L 8 S 8 hexadecamer. CsoSCA acts as the functional CA in the a-carboxysome, existing as a dimer (26), and was suggested to associate with the shell inner surface (15,27). The linker protein CsoS2 in the H. neapolitanus a-carboxysome has two isoforms: a shorter polypeptide, CsoS2A (C-terminally truncated), and a full-length CsoS2B, translated via programmed ribosomal frameshifting (28). CsoS2A and CsoS2B share the middle region and the N-terminal domain, which binds with Rubisco and induces Rubisco condensation (29). The C terminus of CsoS2B, which is absent in CsoS2A, is presumed to bind with the shell and can serve as an encapsulation peptide to recruit nonnative cargos (27,30). In addition, CbbO and CbbQ function as the Rubisco activases, forming a bipartite complex comprising one CbbQ hexamer and one CbbO monomer, to remove inhibitors from the Rubisco catalytic site to restore its carboxylation (31)(32)(33)(34)(35).
Given the significance of metabolic improvement and synthetic engineering potential, substantial efforts have been made to uncover the assembly and structural principles of carboxysomes. However, our knowledge about the accurate stoichiometric composition of carboxysomes, which plays an essential role in determining their size, shape, structural integrity, permeability, and catalytic performance (36), is still primitive. Label-free quantitative mass spectrometry has been used to determine the relative content of protein compositions within the BMCs (37)(38)(39)(40). Furthermore, our recent work has applied mass spectrometry-based absolute quantification and a QconCAT (quantification concatemer of standard peptides) strategy to examine the precise stoichiometric composition of 1,2-propanediol utilization (PDU) metabolosomes from Salmonella enterica serovar Typhimurium LT2 (41). In addition, fluorescence labeling and microscopic imaging have been utilized to characterize the protein stoichiometry of b-carboxysomes from the cyanobacterium Synechococcus elongatus PCC 7942 (Syn7942) (42). However, the precise stoichiometric composition of a-carboxysomes has not been well characterized, despite the crude estimates based on protein electrophoresis profiles reported in previous studies (22,43,44).
In this study, we performed absolute quantification of protein components within native a-carboxysomes from H. neapolitanus and recombinant a-carboxysomes produced in Escherichia coli, using QconCAT-assisted quantitative mass spectrometry (MS) in combination with biochemical analysis, electron microscopy (EM), and enzymatic assays. Our results shed light on the molecular principles underlying the assembly and structural plasticity of a-carboxysomes and provide essential information required for design and engineering of carboxysomes in synthetic biology.

RESULTS
Quantifying the protein stoichiometry of native a-carboxysomes from H. neapolitanus. The QconCAT-assisted mass spectrometry approach permitted a precise quantification of the absolute abundance of proteins (45)(46)(47). This approach has been recently applied to quantify the stoichiometric composition of protein components within FIG 1 Schematic overview of QconCAT strategy. (A) A QconCAT was designed to encode concatenations of tryptic peptides from the H. neapolitanus a-carboxysome proteins, together with intervening peptide sequences that recapitulate the primary sequence context of the analyte peptides in the native proteins The QconCAT gene was expressed by cell-free synthesis and labeled with [ 13 C 6 , 15 N 4 ]arginine and [ 13 C 6 , 15 N 2 ]lysine. The purified and quantified QconCAT was added to four replicate samples of isolated native/recombinant a-carboxysomes from H. neapolitanus/E. coli. The absolute abundance and stoichiometry of the carboxysomal proteins were calculated by comparison of the area of the standard and analyte precursor ion chromatograms. A peptide for CbbQ, LLVKAGK, is shown here as an example. (B) SDS-PAGE of isolated native/recombinant a-carboxysomes showing the majority bands of a-carboxysome proteins. The protein bands with sizes between 15 and 40 kDa observed in recombinant carboxysomes might be cytoskeletal and membrane(-associated) proteins from E. coli as indicated by mass spectrometry (Data Set S1). (C) EM images of isolated native/recombinant a-carboxysomes. a-Carboxysome Stoichiometry and Structural Plasticity mBio the PDU metabolosome (41). To determine the stoichiometry of a-carboxysome components, native a-carboxysomes were first isolated from H. neapolitanus using sucrose gradient ultracentrifugation (see Fig. S1A in the supplemental material). Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) indicated that CsoS2A/B, CbbL/S, and CsoS1A/B/C are the major a-carboxysomes (Fig. S1B). NADH-coupled CO 2 fixation activity assays confirmed the functionality of isolated a-carboxysomes, with a measured carbon fixation V max of 2.96 6 0.09 mmolÁmg 21 Ámin 21 and K m for RuBP [K m(RuBP) ] at 0.20 6 0.02 mM (n = 4) (Fig. S1C). EM showed that the isolated a-carboxysomes form an intact and canonical polyhedral shape, with an average diameter of 124.6 6 9.6 nm (n = 272) ( Fig. S1D and E), consistent with previous results (31,48,49).
To establish the accurate stoichiometry of all proteins within the isolated a-carboxysomes, we created a QconCAT (47,50) to yield a series of stable isotope-labeled peptides as internal standards. The carboxysome preparations were mixed with a known amount of QconCAT and codigested, yielding analyte and standard peptides, differing only in the number of heavy atom centers. This codigest of analyte and standard was separated by nanoflow high-resolution liquid chromatography, coupled to high-resolution mass spectrometry (LC-MS) ( Fig. 1 and Fig. S2). The QconCAT encoded three unique peptides for the CbbL, CbbS, CsoSCA, CbbO, CbbQ, CsoS1D, CsoS4A, and CsoS2AB shared region, two peptides for the CsoS2B and CsoS1ABC shared region, and a single peptide for the CsoS1B and CsoS1AC shared region ( Fig. S2A and Table S1). Due to the high sequence similarity, CsoS1A and CsoS1C could not be distinguished in this QconCAT design. The QconCAT also encoded peptides to quantify the form II Rubisco CbbM. Since CbbM was presumed not to be a component of the H. neapolitanus a-carboxysome (51), we used CbbM as a reference to validate the quality of a-carboxysome isolation. The QconCAT gene encoding these peptide candidates was assembled from Qbricks using the ALACATs assembly strategy (47), to yield the QconCAT DNA sequence (Table S2). The QconCAT was then produced by cell-free synthesis (52) in the presence of stable isotope-labeled lysine and arginine, purified, and validated by SDS-PAGE and mass spectrometry (Fig. S2B).
From the LC-MS/MS traces, the peptide precursor ions for analyte and standards were isolated and their relative areas were quantified using Skyline (53). This was repeated with four independent preparations of carboxysomes ( Fig. 1A and Fig. S3). All carboxysomal proteins were detected in the isolated carboxysomes, whereas CbbM was not detectable in the isolated samples; the carboxysomal proteins accounted for 99.5% 6 0.2% of the total proteins in the samples (Fig. S4A). These results confirm the high purity and the structural and functional integrity of isolated carboxysomes.
We quantified the abundance of protein components within one H. neapolitanus carboxysome structure, based on the shell surface area of a typical icosahedron (54) and the average carboxysome size (124.6 6 9.6 nm; n = 272) measured in EM (Fig. 1C, Table 1, and Table S3; see details in Materials and Methods). The results revealed that the most abundant proteins in the H. neapolitanus a-carboxysome are CsoS1AC hexamers (863 copies), followed by Rubisco (447 copies, estimated by the CbbL content, as CbbL subunits contain all the catalytic sites per Rubisco), CsoS2A (248 copies), CsoS2B (192 copies), CsoS1B hexamers (112 copies), and 58 copies of CsoSCA dimers. The H. neapolitanus a-carboxysome has a molecular weight (MW) of ;346 MDa and the Rubisco enzymes account for ;66% of the total MW. The hexameric shell proteins CsoS1A/C and CsoS1B make up ;17.1% of the total MW. Additionally, about 11 copies of CsoS4A/B pentamers (CsoS4A, 9; CsoS4B, 2) are integrated within the a-carboxysome. CsoS1D pseudohexamers have a low abundance in the shell, with ;3 copies per carboxysome. Moreover, the linker proteins, CsoS2A and CsoS2B, account for 13.5% of the total MW.
Approximately 15 copies of CbbQO complexes, each composed of one CbbQ hexamer and one CbbO monomer, were identified in the carboxysome, indicating that the CbbQO complex is a structural component of native a-carboxysomes in H. neapolitanus. Consistently, CbbQ has been indicated to be tightly associated with the H. neapolitanus carboxysome shell  (31), and CbbQO can be incorporated into recombinant a-carboxysomes (35). Likewise, our mass spectrometry results showed the presence of McdAB-like proteins in purified native a-carboxysomes ( Fig. S4A and Data Set S1), implicating the association of McdAB-like proteins with a-carboxysomes, which was proposed to ensure proper distribution of a-carboxysomes in H. neapolitanus and carboxysome inheritance during cell division (55). Some chemoautotrophs, including H. neapolitanus, contain form II Rubisco (CbbM) and its activases CbbQ2 and CbbO2 (32). These proteins were not detected in the purified carboxysomes (Data Set S1), suggesting that they are not the organizational components of or associated with the a-carboxysomes in H. neapolitanus. Stoichiometric composition of recombinant a-carboxysomes. Previous studies have demonstrated that heterologous engineering of the H. neapolitanus a-carboxysomes could result in functional a-carboxysome structures (21,35,56,57). To verify the compositional similarity between native and recombinant a-carboxysomes, we reconstituted H. neapolitanus a-carboxysomes by expressing the cso operon with csoS1D using an arabinose-inducible pBAD33 vector in E. coli ( Fig. S1A and G). SDS-PAGE revealed overall similar contents of protein components within the isolated native and recombinant a-carboxysomes, except for a reduction in the CsoSCA content in recombinant carboxysomes ( Fig. S1B and S3). Carbon fixation kinetics as a function of RuBP concentrations confirmed the function of recombinant a-carboxysomes, with a V max of 2.07 6 0.12 mmolÁmg 21 Ámin 21 (n = 4) and a K m(RuBP) of 0.08 6 0.02 mM (n = 4), though both were lower than those of native a-carboxysomes (Fig. S1C). EM indicated that recombinant a-carboxysomes possess a polyhedral shape and an average diameter of 131.8 6 18.0 nm (n = 152), slightly larger than native a-carboxysomes ( Fig. S1D and E). Analysis of EM images showed that both native and recombinant a-carboxysomes possess single-layer shells (5.3 6 0.6 nm and 5.5 6 0.8 nm, respectively; n = 100 [ Fig. S1F]), consistent with previous observations (37).
Individual proteins in isolated recombinant a-carboxysomes were then quantified by mass spectrometry to retrieve the stoichiometric content of the two types of a-carboxysomes ( Fig. 1, Table 1, and Table S3). Within the recombinant a-carboxysome, the most abundant proteins are CsoS1AC hexamers (1,001 copies), followed by Rubisco (426 copies), CsoS2A (305 copies), CsoS2B (249 copies), and CsoS1B hexamers (79 copies). The recombinant a-carboxysome has a molecular mass of ;336 MDa and has reduced Rubisco copy numbers compared with the native a-carboxysome (P , 0.05 [ Fig. 2]). The content of CsoSCA in the recombinant a-carboxysome is reduced by 29-fold compared with that in the native a-carboxysome, resulting in only ;2 CsoSCA dimers per recombinant a-carboxysome, consistent with SDS-PAGE analysis (Fig. S1B). The hexameric shell proteins, CsoS1AC and CsoS1B, account for 19.4% of the total MW in recombinant a-carboxysomes ( Table 1). The CsoS1B content is reduced by ;30% (79 copies) compared with that in native a-carboxysomes (112 copies; P , 0.05; [Fig. 2]). There are, on average, 7 copies of pentameric proteins (CsoS4A, 6; CsoS4B, 1) in recombinant a-carboxysomes, less than the hypothetical 12 pentamers for a typical icosahedral structure. This suggests that some vertices are not capped by CsoS4 pentamers. Similar features have also been observed in b-carboxysomes and synthetic BMC shells (42,58,59), presumably providing a mechanism for regulating shell architecture and permeability. CsoS1D has ;1 copy per recombinant a-carboxysome, less than that in the native a-carboxysome (P , 0.001 [ Table 1]). CsoS2A and CsoS2B have 305 and 249 copies, respectively, per recombinant a-carboxysome, collectively accounting for 17.6% of the total MW. CsoS2B has an increased content in the recombinant a-carboxysome compared to the native form (Fig. 2).

DISCUSSION
In this study, we performed absolute quantification using QconCAT-based mass spectrometry to determine the stoichiometric composition of the H. neapolitanus a-carboxysomes, which represent a step toward gaining a comprehensive understanding of the structure and function of the model carboxysome.
The building components in the carboxysome have a wide range of abundances, from a few to thousands of copies per carboxysome. The major proteins could be visualized in protein gels, whereas some minor proteins were hardly visible ( Fig. 1 and Fig. S2). In addition, shell protein paralogs often have similar molecular masses and therefore were not readily distinguishable in SDS-PAGE gels. These intrinsic characteristics made it difficult to obtain the accurate and complete stoichiometry of carboxysomes solely based on protein band profiling of protein electrophoresis and label-free MS quantification. The QconCAT approach is a very effective method for determination of subunit stoichiometry. Because all of the labeled standard peptides are released in equal amounts (completeness of digestion was confirmed), they form a valuable baseline against which the intensities of the analyte cognate peptides can be measured. In this study, we were able to quantify subunits over 3 to 4 orders of magnitude, with high accuracy. Because the approach relies on internal standards, it does not depend on specific properties of members of the protein complex, such as SDS-PAGE band intensity, or the intrinsic properties of individual peptides in label-free proteomics, although the correlation between QconCAT absolute quantification and label-free intensities in this instance was acceptable (Fig. S4).
Comparison of QconCAT and label-free quantification results illustrated notable deviations in the abundances of some carboxysomal proteins (Fig. S4B). The results demonstrate that label-free quantification could potentially lead to inaccurate estimates in the contents of CsoS1B, CsoSCA, CsoS4A, and CsoS4B and highlight the necessity of QconCATbased quantification in studying the protein stoichiometric composition of BMCs. Moreover, the reliability of QconCAT quantification was evident by a great agreement between individual QconCAT peptides for the same carboxysome protein (Fig. S4C).
Stoichiometric variability and structural plasticity of a-carboxysomes. Characterization of the absolute stoichiometric compositions for native and recombinant carboxysomes provides insight into the organizational principles and plasticity of the H. neapolitanus a-carboxysome (Fig. 3). It becomes apparent that the BMC shells are amendable to integrate different copies or types of shell proteins, and the absence  Table 1. *, P , 0.05; **, P , 0.01; ***, P , 0.001 using two-sample t test, equal variance not assumed (Welch correction). Data are presented as means 6 SD from four independent biological replicates. ns, not significant. a-Carboxysome Stoichiometry and Structural Plasticity mBio of specific components or the changes in the ratios of protein paralogs may not necessarily impede the overall shell assembly (41,(60)(61)(62). The total copy numbers of shell pentamers (CsoS4A and CsoS4B) are 11.0 for native a-carboxysomes and 7.1 for recombinant a-carboxysomes, both less than the 12 pentamers that are postulated to occupy all the vertices of a regular icosahedron (3,7). These results elucidated that it is not a prerequisite to cap all the vertices with pentamers in a functional carboxysome. In support of this, polyhedral carboxysomes and BMC shells deficient in pentamers could still be formed (59,60,63,64). Our previous study has also demonstrated that variable copies of CcmL pentamers are integrated into Syn7942 b-carboxysomes under different growth conditions (42). The lack of pentamers at some vertices might result in observable structural heterogeneity and reduced integrity of the entire a-carboxysomes (Fig. S1D). Rubisco in carboxysomes was proposed to adopt a Kepler packing, filling maximally 74% of the internal carboxysome volume (54,65). Quantification based upon the CbbL content indicates that the native H. neapolitanus a-carboxysome can accommodate approximately 447 copies of Rubisco (the CbbL/CbbS ratio is 8:7.3), in agreement with the theoretical estimation based on the Kepler packing (411 copies of Rubisco, Table S3). In contrast, recombinant a-carboxysomes encapsulate 426 Rubisco (the CbbL/CbbS ratio is 8:5.7), lower than the estimated copy number of 491 based on measured recombinant carboxysome size (Table S3). The various CbbL/CbbS ratios of Rubisco might affect accurate determination of Rubisco content and carboxylation activity within the a-carboxysome and merits further investigation. The recombinant  Table 1); (C) schematic of native and recombinant a-carboxysome structures and shell organization. The numbers of proteins do not represent actual abundances and are only for illustration.

a-Carboxysome Stoichiometry and Structural Plasticity mBio
a-carboxysomes have a greater diameter (131.8 nm) and shell/interior ratio (1:1) than those of native a-carboxysomes (124.6 nm and 0.8:1) ( Table 2 and Fig. S1D and E). Our results suggest a lower level of Rubisco packing within recombinant a-carboxysomes (Fig. 3), presumably explaining the "darker centers" of recombinant carboxysomes observed in EM ( Fig. 1C and Fig. S1D). Moreover, the perturbed formation of Rubisco (L 8 S 8 ) as indicated by the changes in the CbbL/CbbS ratio has also been determined in recombinant carboxysomes (Table 2). Our results also showed that the Rubisco/CA (CbbL/CsoSCA) ratios vary drastically between native and recombinant a-carboxysomes (Table 2). It has been postulated that too little or too much carboxysomal CA activity, which could cause limited CO 2 supply or substantial leakage of CO 2 , may interfere with CO 2 fixation of carboxysomes (11). What caused the decrease in the CsoSCA content within recombinant a-carboxysomes remains to be investigated. It is possible that csoSCA was not strongly expressed in the E. coli host with its native expression element, such as ribosome binding site (RBS) from H. neapolitanus, or expressed CsoSCA proteins were not efficiently integrated into a-carboxysomes in the nonnative intracellular environment. Modification of CsoSCA expression, such as adjusting the promoter and RBS and optimizing the expression conditions, should be considered in future studies.
Other changes that occurred in recombinant carboxysomes involve the increased content of CsoS1 shell proteins, the reduced CsoS1D abundance, as well as the absence of CbbQO (cbbQ and cbbO genes were not included in the expression construct) (Fig. 3). All these structural alternations may collectively result in the higher size variation of recombinant a-carboxysomes (Fig. S1E) and the discrepancy in the carbon fixation performance between native and recombinant a-carboxysomes (Fig. S1C). CsoS2 in a-carboxysomes serves as the scaffolding protein that interlinks Rubisco and shells (11,(27)(28)(29). The CbbL/CsoS2 ratios in native and recombinant a-carboxysomes remain within a narrow range between 8:1 and 8:1.3 (Table 2), implicating the correlation between Rubisco and CsoS2, which is fundamental for Rubisco condensation and internal packing. Likewise, the CsoS2A/CsoS2B ratio remains relatively unaltered in native (ratio of 1.3:1) and recombinant (ratio of 1.2:1) a-carboxysomes.
Organizational features of diverse carboxysomes. Peptide composition of the a-carboxysomes from the a-cyanobacterium Prochlorococcus marinus MED4 has been estimated based on standard protein gel profiles (22). The H. neapolitanus a-carboxysomes (;125 nm in diameter) are larger in diameter than the Prochlorococcus a-carboxysomes (;90 nm in diameter). Consistently, the H. neapolitanus a-carboxysome has a 1.8-fold-increased content of CsoS1 hexameric shell proteins (975 versus 539 hexamers) and encapsulates double the content of CsoSCA proteins (58 versus 29 dimers) and nearly 3-fold more Rubisco enzymes (447 versus 152 copies). The experimentally determined Rubisco content fits well with the theoretical estimate (411 copies for the H. neapolitanus carboxysome and 143 copies for the Prochlorococcus carboxysome),  Table 2]), implying that Kepler packing of cargo enzymes is unlikely applicable to metabolosomes. The CsoSCA/CsoS1 ratio remains relatively constant in both native a-carboxysomes, presumably implicating their specific association within the carboxysomes. In contrast, CA in the Syn7942 b-carboxysomes, which is encoded by the ccaA gene, which is distant from the ccm operon, was demonstrated to have varying abundances per carboxysome under different environmental conditions (42). It remains to be investigated if the CsoSCA content in a-carboxysomes is subject to environmental modulation. A noteworthy feature of the Prochlorococcus a-carboxysome is that it contains only the full length of CsoS2 without the short isoform as the H. neapolitanus counterpart does, which might lead to formation of carboxysomes with reduced Rubisco loading capacity and overall size. However, the Rubisco/CsoS2 ratios in the a-carboxysomes from H. neapolitanus and Prochlorococcus remain relatively comparable (1:1 and 1:1.1, respectively), indicative of a general mechanism for Rubisco encapsulation of a-carboxysomes. In the Syn7942 b-carboxysome, the ratios between Rubisco and the scaffolding protein CcmM varied in a range of 1:0.8 to 1:1.3, depending upon environmental conditions (42). Unlike the similar CsoS2A/CsoS2B ratios in native and recombinant a-carboxysomes, the CcmM35/CcmM58 ratios in the Syn7942 b-carboxysomes have a wide range, 1:1 to 11:1, and have been proved to be vital for carboxysome assembly (65,66).
Carboxysomes are highly modular structures with the capacity of incorporating foreign cargos, representing an ideal system in synthetic biology (30). Advanced knowledge about the precise protein stoichiometry of functional carboxysomes and the approach to determine the stoichiometry of natural and synthetic carboxysomes developed in this study open the door toward reprogramming and compositional refinement of carboxysomes for metabolic enhancement and diverse biotechnological applications in new contexts (36). The QconCAT-based protein quantification technique could also be broadly used in the studies of diverse BMC paralogs and engineering of a variety of protein organelles in their native origins and heterologous organisms.

MATERIALS AND METHODS
Bacterial strains, growth conditions, and carboxysome production. H. neapolitanus (Parker, Kelly and Wood ATCC 23641 C2) used in this work was acquired from the American Type Culture Collection (ATCC) as freeze-dried powder (77,78). Stock cells were maintained in liquid ATCC medium 290 (78) or on ATCC 290 1.5% agar plates. For scale-up cultivation and carboxysome purification, a 5-mL seeding culture was inoculated in 5 L of Vishniac and Santer growth medium (67) (67). Cell growth was maintained in a 5-L fermentor (BioFlo 115; New Brunswick Scientific, USA) at 30°C. The pH of the growth medium was monitored by a pH probe and was maintained at 7.6 by constant supplementation with 3 M KOH. Air supply was set at 500 LÁmin 21 for initial growth and reduced to 200 LÁmin 21 24 h prior to harvesting. Agitation was kept at 250 to 300 rpm. The optical density at 600 nm (OD 600 ) of the culture was checked daily, and the cells were harvested before the culture entered the stationary phase. For expression of recombinant carboxysomes, the entire cso operon, as designed on pHnCBS1D reported previously (21), was fused on a pBAD33 arabinose-inducible expression vector (68) using the Gibson assembly strategy (69) with Gibson assembly master mix from New England BioLabs (NEB). Primer sets used for assembly are listed in Table S5. For recombinant carboxysome expression in E. coli, seeding cultures containing chloramphenicol at a final concentration of 50 mg mL 21 were inoculated at 37°C in LB broth until reaching an OD 600 at 0.6 and then scaled up for induction with 1 mM arabinose at 20°C overnight.
Carboxysome purification from H. neapolitanus and E. coli. Purification of a-carboxysome from H. neapolitanus was modified from the protocol described previously (70). The 5-L culture harvested from the bioreactor that contained H. neapolitanus cells and elemental sulfur sediments was first pelleted down at 12,000 Â g for 10 min. The pellet containing both cells and elemental sulfur sediments was resuspended in 60 mL of TEMB buffer (10 mM Tris-HCl, 10 mM MgCl 2 , 20 mM NaHCO 3 , 1 mM EDTA [pH 8.0]) and subsequentially centrifuged at 300 Â g for 15 min to sediment elemental sulfur. The supernatant was transferred to a new centrifugation tube, and cells were obtained by another round of centrifugation at 12,000 Â g for 10 min. The resulting cell pellet was resuspended in 15 mL of TEMB buffer and a-Carboxysome Stoichiometry and Structural Plasticity mBio incubated with egg lysosome (at a final concentration of 0.5 mg mL 21 ) for 1 h at 30°C before cell breakage by prewashed glass beads (150-to 212-mm glass beads, acid washed; Sigma-Aldrich) for 10 min (30s beating and 30-s interval on ice). The cell extracts were further treated with 33% (vol/vol) B-PERII (Thermo Fisher Scientific, UK) and 0.5% (vol/vol) IGEPAL CA630 (Sigma-Aldrich) and placed on a rotary mixer for 2 h. The unbroken cells and large membrane debris were removed by centrifugation at 9,000 Â g for 10 min. Crude carboxysome enrichment was pelleted at 48,000 Â g for 30 min. The pellet was resuspended, briefly centrifuged at 9,000 Â g, and then loaded to a step sucrose gradient (10%, 20%, 30%, 35%, 50%, and 60%) and ultracentrifuged at 105,000 Â g for 35 min. The 3 mL of enriched carboxysome was harvested at 35% to 50% sucrose gradient fractions. Sucrose was removed by an additional round of ultracentrifugation after dilution with 30 mL of TEMB buffer. The pure carboxysome pellet was resuspended in 800 mL of TEMB buffer. Unless indicated otherwise, all procedures were performed at 4°C. The carboxysome purification from E. coli was performed according to the previous protocols (21,35,71), with modifications. E. coli cells were lysed with B-PER II bacterial protein extraction reagent (Thermo Fisher Scientific, UK) and treated with 0.5% (vol/vol) IGEPAL CA-630 detergent for 2 h. The following purification steps were the same as for the isolation of native carboxysomes from H. neapolitanus as described above. SDS-PAGE analysis. SDS-PAGE analysis was performed following standard procedures. The 10 mg of purified carboxysomal samples or 100 mg of whole-cell fractions was loaded per well on 15% polyacrylamide gels and stained with Coomassie brilliant blue G-250 (Thermo Fisher Scientific, UK).
Electron microscopy and data analysis. Electron microscopy was carried out as described previously (37). The purified carboxysomes (;4 mg mL 21 ) were stained with 3% uranyl acetate on carbon grids and then inspected with an FEI 120 kV Tecnai G2 Spirit BioTWIN transmission electron microscope (TEM) equipped with a Gatan Rio 16 camera. The diameters of carboxysomes were measured with ImageJ as described previously (37) and were statistically analyzed using OriginPro 2020b (OriginLab, MA).
Rubisco activity assays. Carbon fixation assay was carried out to determine carbon fixation capacities of purified native and recombinant carboxysomes as described previously using a 3-phosphoglycerate-dependent NADH oxidation-coupled enzyme system (21). For both native and synthetic samples, four biological replicates that were isolated from different culture batches were assayed at 30°C, initiated with the final concentrations of 0 mM, 0.06 mM, 0.13 mM, 0.25 mM, 0.5 mM, 1 mM, and 2 mM RuBP. The concentration of HCO 3 2 was set to 24 mM for all assays in this work. Design, cell-free expression, and purification of QconCAT standard. Absolute quantification of the carboxysomal proteins was carried out by mass spectrometry analysis with stable isotope tryptic peptides as standards. The standard peptides were added in the form of a QconCAT, an artificial protein that is a concatenation of tryptic peptides in the same primary sequence context as the cognate analyte peptides (41,50). For each analyte protein, up to three standard peptides were encoded in the QconCAT (Table S1). All candidate peptides were searched by BLAST against the H. neapolitanus and E. coli proteomes to ensure their uniqueness. Due to the high level of sequence similarity of CsoS1A/B/C, CsoS2A/ B and CsoS4A/B, peptides representing shared sequences and unique sequences were included (Fig. S2). The DNA fragment encoded the above-mentioned peptides, together with sacrificial termini; N-terminal GluFib and cMyc (peptides to quantify the standard) and C-terminal His 6 tag were created by the ALACAT/Qbrick assembly strategy as reported previously (47). The final DNA sequence (Table S2) was assembled into a pEU-E01 vector for cell-free expression using wheat germ lysate (CellFree Sciences Co., Ltd., Japan). Synthesis was completed with [ 13 C 6 , 15 N 4 ]arginine and [ 13 C 6 , 15 N 2 ]lysine (CK Isotopes Ltd., UK) using the WEPR8240H full-expression kit following default protocols (2BScientific Ltd., UK). The QconCAT peptide was purified with Ni-Sepharose suspension (GE Healthcare Ltd., UK) in centrifuge filters (Corning Costar Spin-X 0.45-mm-pore-size cellulose acetate membrane; Merck, UK) following standard protocols. Finally, the QconCAT peptide was precipitated and resuspended in 30 mL of 25 mM ammonium bicarbonate with 0.1% (wt/vol) RapiGest SF surfactant (Waters, UK) and protease inhibitors (Roche cOmplete mini-EDTA-free protease inhibitor cocktail; Merck, UK).
Proteomic analysis. For QconCAT quantification, native and synthetic carboxysome preparations, four replicates of each, were diluted to a final protein concentration of 2.5 mg per 40 mL of 25 mM NH 4 HCO 3 .
QconCAT (approximately 5 pmol) and [Glu1]-fibrinopeptide B (5 pmol) were added, and samples were denatured using 2.5 mL of 1% (wt/vol) RapiGest (Waters, Manchester, UK) in 25 mM NH 4 HCO 3 followed by incubation at 80°C for 10 min. Samples were reduced by the addition of 2.5 mL of 12 mM dithiothreitol in 25 mM NH 4 HCO 3 and incubation at 60°C for 10 min. Alkylation was carried out by the addition of 2.5 mL of 36 mM iodoacetamide in 25 mM NH 4 HCO 3 and incubation at room temperature for 30 min in the dark. Trypsin at 2.5 mL (200 ng in 25 mM NH 4 HCO 3 ; Enzyme:Protease, 1:10) was added to each sample in a final digest volume of 50 mL. Samples were incubated at 37°C overnight. To remove residual RapiGest, digests were acidified by the addition of 0.5 mL of trifluoroacetic acid (TFA) followed by incubation at 37°C for 45 min. Samples were centrifuged at 17,200 Â g for 30 min and transferred to fresh low-bind tubes.
LC-MS analyses were conducted on a QExactive HF quadrupole-Orbitrap mass spectrometer coupled to a Dionex Ultimate 3000 rapid-separation liquid chromatography (RSLC) nano-liquid chromatograph (Thermo Fisher Scientific, UK). Sample digest (2 mL) was loaded onto a trapping column (Acclaim PepMap 100 C 18 , 75 mm by 2 cm, 3-mm packing material, 100 Å) using a loading buffer of 0.1% (vol/vol) TFA-2% (vol/vol) acetonitrile in water for 7 min at a flow rate of 12 mL min 21 . The trapping column was then set in-line with an analytical column (EASY-Spray PepMap RSLC C 18 21 , followed by washing at 1% A/99% B for 5 min and reequilibration of the column to starting conditions. The column was maintained at 40°C, and the effluent was introduced directly into the integrated nano-electrospray ionization source operating in positive ion mode. The mass spectrometer was operated in MS-only mode, with survey scans between m/z 350 to 2,000 acquired at a mass resolution of 240,000 Full Width Half Maximum (FWHM) at m/z 200. The maximum injection time was 50 ms, and the automatic gain control was set to 3e6. The raw data files were incorporated in Skyline (53), and quantification was performed by determining the summedpeak area of the first three isotopes of each peptide. The quantity of each protein in each biological replicate was determined as the average quantity of correlated QconCAT peptides as shown in Table S1.
Additionally, preparations of the four native and synthetic carboxysomes were analyzed by labelfree quantification. Carboxysomes were digested as described above but without [Glu1]-fibrinopeptide B and analyzed by LC-MS/MS as described above but with a data-dependent acquisition method consisting of a 60,000-resolution full-scan MS scan with Automatic Gain Control (AGC) set to 3e6 ions with a maximum fill time of 100 ms. The 16 most abundant peaks per full scan were selected for Higher Energy Collisional Dissociation (HCD) MS/MS (30,000 resolution, AGC set to 1e5 ions with a maximum fill time of 45 ms) with an ion selection window of 2 m/z and normalized collision energy of 30%. Ion selection excluded singularly charged ions and ions with a charge state equal to or greater than 16. To avoid repeated selection of peptides for fragmentation, the program used a 60-s dynamic exclusion window. The raw data files were imported into Progenesis QI for Proteomics v4. The chromatograms are aligned and normalized prior to label-free quantification. Peptide identification was performed by Mascot (v2.7; Matrix Science, UK) against the UniProt H. neapolitanus database (UP000009102; 2,353 sequences) and E. coli database (UP000000625; 4,438 sequences). A precursor mass tolerance of 10 ppm and a fragment ion mass tolerance of 0.01 Da were applied with dynamic modifications of 13 C 6 15 N 2 K, 13 C 6 15 N 4 R, and oxidation (M) and with the static modification of carbamidomethylation (C).
For single-carboxysome quantitative normalization, relative quantifications from QconCAT were normalized based on the 12-pentamer coverage or hexameric and pentameric protein coverage within a single-layer shell (Table S4). Twelve-pentamer normalization was done via assuming 60 copies of monomeric CsoS4A and CsoS4B in sum per carboxysome. For shell coverage normalization, the shell surface area is first calculated using TEM measured diameter with the following formula: where A f is total surface area, a is edge length, and R c is the circumscribed radius (referred to as the diameter). The hexameric counts were then calculated using the total surface area and diameters of CsoS1A hexamers in a layer as reported previously (24).
Data availability. The entire Skyline project and raw data for QconCAT quantification have been deposited at Panorama Public (72) with the access URL (https://panoramaweb.org/Wb6olk.url) and the ProteomeXchange identifier (ID) PXD031494. Raw LC-MS/MS data for label-free quantification have been deposited to the ProteomeXchange Consortium via the PRIDE (73) partner repository with the data set (https://www.ebi.ac.uk/pride/archive/projects/PXD031420). All other data are available from the corresponding author upon request.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, XLSX file, 0.1 MB.   We declare no conflict of interest.