The small subunit of Rubisco and its potential as an engineering target

Abstract Rubisco catalyses the first rate-limiting step in CO2 fixation and is responsible for the vast majority of organic carbon present in the biosphere. The function and regulation of Rubisco remain an important research topic and a longstanding engineering target to enhance the efficiency of photosynthesis for agriculture and green biotechnology. The most abundant form of Rubisco (Form I) consists of eight large and eight small subunits, and is found in all plants, algae, cyanobacteria, and most phototrophic and chemolithoautotrophic proteobacteria. Although the active sites of Rubisco are located on the large subunits, expression of the small subunit regulates the size of the Rubisco pool in plants and can influence the overall catalytic efficiency of the Rubisco complex. The small subunit is now receiving increasing attention as a potential engineering target to improve the performance of Rubisco. Here we review our current understanding of the role of the small subunit and our growing capacity to explore its potential to modulate Rubisco catalysis using engineering biology approaches.


Introduction
Carboxylation by Rubisco has been the dominant biological driving force for inorganic carbon sequestration since the early beginnings of life on our planet. In almost all prokaryotic and eukaryotic autotrophs, Rubisco catalyses the addition of CO 2 to ribulose 1,5-bisphosphate (RuBP) to form two molecules of 3-phosphoglyceric acid (3PGA) as the initial step of inorganic carbon fixation through the Calvin-Benson-Basham (CBB) cycle (Raven, 2009). However, Rubisco can also catalyse the oxygenation of RuBP to generate 2-phosphoglycolate (2PG), a toxic metabolite that inhibits two enzymes in the CBB cycle (i.e. triose phosphate isomerase and sedoheptulose 1,7-bisphosphate phosphatase) and requires recycling back to 3PGA through the photorespiratory salvage pathway (Fernie and Bauwe, 2020). While photorespiration may play a regulatory role in carbon and nitrogen metabolism (Flügel et al., 2017;Busch et al., 2018;Busch, 2020;Timm, 2020;Shi and Bloom, 2021), it is generally considered an energetically wasteful process that results in the loss of previously fixed CO 2 , and reduces the overall efficiency of photosynthesis and crop yield potential (Zhu et al., 2010). As such, Rubisco and the proteins associated with its assembly and regulation (the so-called 'Rubiscosome') have been key targets for crop improvement for several decades (Parry et al., 2013;Erb and Zarzycki, 2018;von Caemmerer, 2020;Taylor et al., 2022).
Form I Rubisco is the dominant form found today and is considered the most abundant enzyme in the living world (Ellis, 1979;Tabita et al., 2007;Bar-On and Milo, 2019;Hayer-Hartl and Hartl, 2020). It is characterized by the presence of eight small subunit (RbcS; ~12-18 kDa) protomers that hold together four large subunit (RbcL; ~50-55 kDa) dimers to form a hexadecameric L 8 S 8 complex (~530-550 kDa) in the shape of a cylinder with a diameter and height of ~110 Å and 100 Å, respectively ( Fig. 1)  . As each RbcL dimer forms two active sites, RbcL has long been the primary focus of attempts to engineer improvements in Rubisco performance (for reviews, see Carmo-Silva et al., 2015;Sharwood, 2017). In comparison, no RbcS residues interact directly with the active sites, and as such the RbcS has received relatively less attention. Although the RbcS is considered to play a structural role in stabilizing the L 8 complex, experimental evidence has shown that the presence of the RbcS is also important for efficient assembly and maximal catalytic activity in all Form I Rubiscos (Morell et al., 1997), expression of nuclear-encoded rbcS gene(s) (i.e. in most eukaryotes) plays a key role in regulating overall Rubisco levels (Rodermel, 1999;Wostrikoff and Stern, 2007;Wietrzynski et al., 2021), and the composition of the RbcS peptide can have a significant impact on the catalytic parameters of the Rubisco complex (Box 1). Nevertheless, a comprehensive understanding of how the RbcS influences Rubisco activity remains elusive, and the structural role and functional importance of the RbcS in the evolution of Form I Rubisco are still far from fully understood (Banda et al., 2020).
Although Rubisco has been the subject of a considerable number of reviews over the past years, the last dedicated review for the RbcS was produced almost 20 years ago (Spreitzer, 2003), before techniques to modify RbcS genes in land plants were well established. Several excellent reviews have since discussed newer aspects of RbcS-related research (e.g. Bracher et al., 2017;Sharwood, 2017), while advances in engineering biology [e.g. clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated peptide (Cas)] have now made it possible to edit partial or entire RbcS gene families  (Donovan et al., 2020;Martin-Avila et al., 2020;Matsumura et al., 2020). Furthermore, the capacity to express plant Rubiscos in Escherichia coli has opened up new opportunities to screen RbcS families, ancestral forms of RbcS, and synthetic RbcS variants (Aigner et al., 2017;Lin et al., 2020Lin et al., , 2022. Although progress is being made, chloroplast engineering is still only well established in a small number of plant species (Ruf et al., 2019;Yu et al., 2020). Thus, strategies to engineer RbcS are arguably more advanced and may be easier to implement broadly in crop plants than those for RbcL. Developing our understanding of the extent to which the RbcS could enhance Rubisco performance should be a critical goal going forward in efforts to engineer improvements in photosynthetic capacity . This review is aimed as an update to Spreitzer (2003), and will include a discussion of our current understanding of the evolutionary significance of the RbcS, the role of RbcS in assembly, catalytic properties, and biophysical CO 2 -concentrating mechanisms, RbcS families in different plant species, and RbcS as an engineering target to enhance Rubisco performance (Long et al., 2018;Atkinson et al., 2020;Oltrogge et al., 2020).

The origins of Form I Rubisco and the small subunit
There are three known groups of Rubisco found in nature, Form I, II, and III, which differ in terms of structure and sequence (Tabita et al., 2008), and a fourth group of Rubisco-like proteins (RLPs, or Form IV) that cannot catalyse carboxylation but instead function in various bacterial pathways, including sulfur metabolism and sugar degradation (Hanson and Tabita, 2001;Ashida et al., 2003;Erb et al., 2012;Zhang et al., 2016). The most ancient of the three functional forms of Rubisco (Form III) could have emerged up to 3.5 billion years ago (bya) when the atmosphere was anoxygenic (Andersson and Backlund, 2008;Iñiguez et al., 2020), and may have evolved from the enolase enzyme family in a non-CO 2 -fixing archaeal ancestor (Erb and Zarzycki, 2018). Form III Rubiscos are found mainly in anaerobic archaea and are typically associated with nucleotide and nucleoside metabolism rather than CO 2 fixation through the CBB cycle, with a few exceptions (Frolov et al., 2019). The discovery of Form II/III intermediates in the archaeal order Methanosarcinales has suggested that the original functional role of Rubisco may not have been to capture CO 2 , and that Rubisco-based autotrophy via the CBB evolved either in an ancient archaeon or later, possibly during the transfer of Form III-type Rubisco from archaea to eubacteria and the subsequent evolution of Form II and Form I Rubiscos (Tabita et al., 2007;Wrighton et al., 2016). The CBB cycle is speculated to have arisen subsequently from a primitive carbon metabolic pathway utilizing Rubisco, such as the archaeal reductive hexulose-phosphate pathway (Kono et al., 2017;Erb and Zarzycki, 2018).
All functional Rubiscos share a common core structural component of two RbcL peptides that assemble head to tail into an L 2 dimer to form two surface-exposed active sites, with each active site located at the interface of the C-terminal domain of one subunit and the N-terminal domain of the other. Form II Rubiscos can consist of one or more L 2 dimers, while Form III Rubiscos are assembled in oligomeric arrays of 3-5 dimers (Kitano et al., 2001). As both Form II and III Rubiscos lack RbcS, oligomerization must be facilitated by dimer-dimer interactions. For example, a 29 residue C-terminal structural domain in the Form III-type RbcL of the archaeon Methanococcoides burtonii acts as a small subunit 'mimic' that assists in the transition between dimeric and decameric states when substrate (i.e. RuBP) is present (Gunn et al., 2017). Oligomerization probably acts to concentrate the Rubisco active sites and increases carboxylation efficiency, which is also considered one of the key contributions of the RbcS in Form I Rubisco. Based on the available catalytic data, extant Form II and Form III Rubiscos can be fast (i.e. have high k cat c values), but both are highly sensitive to oxygen (O 2 ), with S c/o values ranging from 1 to 15 (Badger and Bek, 2008;Liu et al., 2017).
Form I Rubiscos bearing RbcS most probably originated after the transfer of Form III Rubisco to eubacteria but prior to the divergence of proteobacteria and cyanobacteria following the evolution of oxygenic photosynthesis ~2.9 bya in the late Archaean (Nisbet et al., 2007;Tabita et al., 2007). Thus, the atmosphere would still have been rich in CO 2 (Zang et al., 2021). Phylogenetic analyses have indicated distinct periods of RbcL amino acid substitutions associated with adaptation to rising O 2 stress during the transition from anaerobic Form III Rubisco to aerobic Form I Rubisco, which probably pre-date the acquisition of Form I Rubisco by fully derived cyanobacterial Box 1. Common catalytic parameters of Rubiscos Net CO 2 fixation by all Rubisco forms is determined by the difference between the rates of carboxylation and oxygenation, which are ultimately determined by the maximum rates of carboxylation (V c ) and oxygenation (V o ), the carboxylation turnover rate per active site (k cat c ), the Michaelis-Menten constants for CO 2 (K c and K c air in the absence and presence of O 2 , respectively), and O 2 (K o ), and the concentrations of CO 2 and O 2 at the Rubisco active site (Genkov and Spreitzer, 2009;Flamholz et al., 2019). Further parameters include the specificity of Rubisco for CO 2 versus O 2 , often called the specificity factor (S c/o or Ω), which can be derived from the catalytic efficiency of carboxylation relative to the catalytic efficiency of oxygenation (i.e. V c K o /V o K c ) (Parry et al., 1989), Rubisco carboxylation efficiency (k cat c /K c air ) (Orr et al., 2016), and the initial response of the rate of carboxylation to the concentration of CO 2 (V c /K c air ) (Shih et al., 2016).
clades (Kacar et al., 2017). Nevertheless, the subsequent global transition from a reducing to an oxidizing atmosphere during the so-called 'Great Oxygenic Event' (2.3-2.5 bya) was due to the success of early cyanobacteria that had acquired Form I Rubisco (Flamholz and Shih, 2020).

The discovery of Form Iʹ Rubisco
The emergence of the RbcS is still an area of ongoing research and debate (Shih et al., 2016). Recent work has uncovered a new clade of Form I-like, or Form Iʹ, Rubiscos in the nonphototrophic Anaerolineales order of the diverse Chloroflexi phylum that oligomerize into an octameric L 8 complex, akin to the hexadecameric (L 8 S 8 ) complex, but lack RbcS peptides ( Fig. 1) (Banda et al., 2020). This intriguing study highlights that RbcL can form octamers in the absence of RbcS via a fortified network of additional hydrogen bonds and salt bridges at the interface between L 2 dimers. Although Form Iʹ may have evolved from a Form I Rubisco that subsequently lost its RbcS, the most parsimonious inference is that both Form I and Form Iʹ diverged from a common ancestor pre-dating cyanobacteria that lacked RbcS. Extant Form Iʹ Rubisco from the mesophilic Chloroflexi species 'Candidatus Promineofilum breve' has a low S c/o value (36) compared with its close cyanobacterial relative Synechococcus sp. PCC 6301 (56), which has a typical RbcSbearing Form I Rubisco. This suggests that the L 8 Form Iʹ, like Form II and III, may have a limited capacity to evolve a higher specificity for CO 2 to adapt to environments with elevated O 2 . However, the existence of a stable L 8 core could have presented an opportunistic platform for the docking of ancestral RbcStype peptides. Overall, the RbcS seems to have appeared as an evolutionary response to increasing O 2 that afforded Form I Rubiscos a greater capacity to diversify their catalytic range (e.g. in terms of k cat c and S C/O ), while providing a scaffolding function to concentrate eight active sites (Spreitzer, 2003;Erb and Zarzycki, 2018). Modelling studies have also indicated that residues on the RbcS may bind CO 2 and potentially act as a 'reservoir' to increase local CO 2 availability, but this has yet to be experimentally verified (Van Lun et al., 2014).
It remains unclear what kind of ancestral protein RbcS might have evolved from. The apparent sequence similarity between RbcS and CcmM35, a linker protein involved in Rubisco compartmentalization in cyanobacterial β-carboxysomes, has previously led to speculation that RbcS emerged from an early carboxysome-like bacterial microcompartment (Spreitzer, 2003). However, carboxysomes emerged a considerable time after Form I Rubisco evolved (Iñiguez et al., 2020), and probably after the primary endosymbiotic event, as eukaryotic autotrophs do not appear to possess genes that code for carboxysome-like proteins (Shih et al., 2016;Price et al., 2019). More recent sequence analyses have indicated that the resemblance of CcmM35 to RbcS reflects small, coincidental local similarities rather than a close evolutionary relationship (Ryan et al., 2019). CcmM35 also lacks key motifs used by RbcS to bind with RbcL. Cryo-EM analysis has confirmed that the RbcS-like domains of CcmM35 bind to Rubisco in a groove between two RbcL subunits and the adjacent RbcS, and do not use the RbcS-binding region or displace RbcS (Wang et al., 2019;Zang et al., 2021). Thus, it is likely that the RbcS has a different and possibly more ancient origin.

RbcS diversity, structure, and interactions
Form I Rubiscos show considerable diversification and are divided into four subgroups consisting of the 'green-types' Forms IA and IB and 'red-types' Forms IC and ID ( Fig. 2A) (Tabita et al., 2008). Form IA is found in proteobacteria and cyanobacteria, and is further subdivided into Forms IAc and IAq based on differences in RbcS sequence and gene arrangements (Badger and Bek, 2008). The RbcL and RbcS genes for Form IAc Rubisco are found near an α-carboxysome operon, while Form IAq Rubisco genes are associated with the presence of the putative Rubisco chaperonin gene cbbQ, additional genes for Form II Rubisco, and the lack of an α-carboxysome operon. Form IB is the largest group and includes proteobacteria, cyanobacteria, green algae, and higher plants. It is subdivided into IB and IBc to indicate the Form IBc in cyanobacteria associated with β-carboxysomes (Badger et al., 2002). Forms IC occurs in proteobacteria and the Chloroflexi, while Form ID can be found in proteobacteria and non-green algae, such as the Rhodophyta and Haptophyta. In prokaryotes, Form I Rubisco is found as an RbcL-RbcS operon, while in most eukaryotes RbcL is found on the chloroplast genome and RbcS has been transferred to the nuclear genome [with some exceptions, including the Rhodophyta and some Chromophyta (Valentin and Zetsche, 1989;Kostrzewa et al., 1990)], and proliferated into an RbcS gene family.
The amino acid sequences of RbcL in Form I Rubiscos are relatively well conserved. For example, the RbcL sequences of Form IB Rubiscos in plants are generally 90% identical (Liu et al., 2017). In contrast, RbcS are characterized by much greater sequence diversity, with only ≥30% similarity observed across different species . The core structure of each RbcL subunit is characterized by a short N-terminal domain consisting of a four-stranded β-sheet and two α-helices, and a longer C-terminal domain that forms an eight-stranded β/α-barrel (Whitney et al., 2011;Valegård et al., 2018). The conserved catalytic residues reside within the β/α-barrel, which form an active site together with residues from the N-terminal domain of the adjacent RbcL in each L 2 dimer.
All RbcS also fold into a common core structure of a fourstranded antiparallel β-sheet, consisting of β-strands A-D, which are covered on one side by two α-helices (i.e. α-helices A and B) ( Fig. 2B) (Knight et al., 1990). The differences between RbcS isoforms in Forms IA-D Rubiscos include the length of the loop sequence between β-strand A and B (i.e. the βA-βB loop) and the length of the C-terminus, which in red-type Rubiscos contain an additional β hairpin consisting of β-strands E and F (i.e. the βE-βF hairpin) (Spreitzer, 2003;Andersson and Backlund, 2008). Furthermore, red-type Rubiscos can have a slightly longer loop between β-strands C and D (i.e. the βC-βD loop).

The variability of the βA-βB loop
The βA-βB loop of each RbcS faces inwards towards the central solvent channel, or pore, formed by the (L 2 ) 4 assembly, while the two α-helices are solvent exposed. The length of the βA-βB loop is the most variable structural feature of the RbcS and is thought to regulate the width of the aperture of the solvent channel (Spreitzer, 2003). Prokaryotes and non-green algae are characterized by a short loop of only ~10 residues, plants have an average of 22 residues, while green algae tend to have longer βA-βB loops ranging from 20 to 31 residues. The length of the βA-βB loop has been used as a phylogenetic marker, for example to distinguish between Chlorophyte and Streptophyte algae (Goudet et al., 2020). In red-type Rubiscos, which have RbcS with shorter βA-βB loops, the C-terminal βE-βF hairpins of the four RbcS on each end of the Rubisco complex come together to form a central β-barrel around the entrance of the solvent channel (Andersson and Backlund, 2008). This fills the space typically occupied by the βA-βB loop from green-type Rubiscos, although the contacts between the βA-βB loops of each RbcS in green-type Rubiscos are less extensive.

Interactions between RbcS and RbcL
Each RbcS sits in the groove between two L 2 dimers at the top and the bottom of the L 8 S 8 complex and makes polar contacts with three RbcL and two neighbouring RbcS subunits (Fig.  3A, B; Table 1). However, for each RbcS, the majority of polar interactions occur with the β/α-barrel domains of two RbcL subunits, with the RbcS N-terminal region forming extensive interactions with one RbcL subunit that accounts for at least two-thirds of the overall interaction energy (Knight et al., 1990;Ryan et al., 2019). The second RbcL subunit interacts primarily with the βA-βB loop region of the RbcS. There are relatively fewer contacts with other subunits. For example, in spinach (Spinacia oleracea) Rubisco, two residues in the RbcS βC-βD loop interact with one of the RbcL N-terminal domains in a neighbouring L 2 dimer, and L7 in the N-terminal domain and T46 in the βA-βB loop interact with an adjacent RbcS.

The role of RbcS in biophysical CO 2 -concentrating mechanisms
RbcS is also an important component in the assembly of Rubisco-containing micro-compartments associated with biophysical CO 2 -concentrating mechanisms (CCMs), namely the carboxysomes in cyanobacteria and pyrenoids in algae and hornworts (see recent reviews by Hennacy and Jonikas, 2020; Barrett et al., 2021;Borden and Savage, 2021). In both αand β-carboxysomes, condensation of Rubisco by liquid-liquid phase separation appears to play a key part in carboxysome biogenesis (Wang et al., 2019;Oltrogge et al., 2020;Zang et al., 2021). Condensation is mediated by low affinity, multivalent interactions between Rubisco and the linker proteins CsoS2 and CcmM35 in αand β-carboxysomes, respectively. Both proteins interact with residues on RbcL and RbcS, either by intrinsically disordered, linear motifs in CsoS2 or by folded domains in CcmM35 ( Fig. 3C; Table 1). Similarly, condensation of the Rubisco matrix within the pyrenoid of the green alga Chlamydomonas reinhardtii (hereafter Chlamydomonas) is mediated by the linker protein EPYC1 via five conserved motifs that interact exclusively with the α-helices of the RbcS (Mackinder et al., 2016;Atkinson et al., 2019;He et al., 2020). This motif is a common feature of many other proteins found within the Rubisco matrix, which indicates that the RbcS plays a key role in mediating pyrenoid assembly in Chlamydomonas . Nevertheless, sequences of the RbcS α-helices differ significantly between species with pyrenoids (Goudet et al., 2020), and EPYC1 is not broadly conserved. This suggests that the peptide sequences required for interactions between Rubisco and linker proteins are highly variable, which may relate to the wide diversity of different pyrenoid architectures and that pyrenoids appear to have evolved several times (Villarreal and Renner, 2012;Barrett et al., 2021).

Putative RbcS chaperones and the phenomenon of RbcS homogeneity in Rubisco
Our present understanding of the ancillary chaperones required for RbcL folding and Form I Rubisco assembly in prokaryotes and eukaryotes has been reviewed in detail recently (see Bracher et al., 2017;Hayer-Hartl and Hartl, 2020). In prokaryotes, current models show that the (L 2 ) 4 assembly is held in place by the chaperones RbcX and RAF1, which are then displaced by RbcS to form the L 8 S 8 complex (Xia et al., 2020). In eukaryotes, an additional step is involved, where the stromal protein BSD2 displaces RbcX and RAF1, and in turn is displaced by RbcS (Aigner et al., 2017). The mechanisms regulating RbcS binding and chaperone displacement are not yet fully understood. Unlike RbcL, RbcS is able to fold spontaneously and remains monomeric when expressed heterologously in E. coli (Andrews and Ballment, 1983;Andrews, 1988), but in vivo RbcS folding might require chaperones. In eukaryotes, RbcS may require assistance for folding following import and maturation into the chloroplast, possibly by the stromal chaperone Hsp70 (Wilson et al., 2019). In some plant species, the N-terminal methionine of the mature RbcS peptide is methylated, although the functional significance remains unclear (Ying et al., 1999). Furthermore, evidence in maize (Zea mays) and Arabidopsis thaliana (hereafter Arabidopsis) suggests that RbcS might form transient complexes with RAF1, RAF2, and BSD2, which in turn facilitate docking of RbcS with the (L 2 ) 4 assembly (Feiz et al., 2014;Fristedt et al., 2018). Notably, the genomes of organisms with red-type Rubisco lack several assembly chaperones, such as RbcX and RAF1, and instead rely on the plastid-encoded RbcS for assembly, with the C-terminal β-hairpin extension of the red-type RbcS playing a critical role (Joshi et al., 2015). It is possible that the red-type RbcS may more closely  (Wang et al., 2019;He et al., 2020;Oltrogge et al., 2020). Inset views highlight the interfaces between EPYC1, CSoS2, and CcmM35 with Rubisco subunits. Interacting residues are listed in Table 1.  (v.2.3.2). Interactions between the linker proteins and RbcS or RbcL include salt bridge, hydrogen bond, and cation-π interactions and, for EPYC1, residues involved in the formation of a hydrophic pocket. Amino acids and structural locations shown in bold represent the focal RbcS (in red in Fig. 3A); those not in bold represent the adjacent subunit. See Fig. 3 for matching illustrations and additional information.  represent the ancestral form of RbcS prior to the evolution of additional assembly chaperones.
In eukaryotes with green-type Rubiscos, the composition of RbcS isoforms in each Rubisco complex in vivo is still unclear (Yamada et al., 2019). One intriguing phenomenon, based on current empirical evidence, is that eukaryotic Rubiscos may assemble with only one RbcS isoform per L 8 S 8 complex, despite the presence of a family of different nuclear-encoded RbcS isoforms in most species. Multiple examples of crystal structures exist for plant Rubiscos, with eight identical RbcS isoforms, albeit with a few potential exceptions (Shibata et al., 1996;Loewen et al., 2013). More recently, Valegård et al. (2018) described a crystal structure for Arabidopsis Rubisco that was homogenous for the low abundance isoform RbcS1B, which represents only 3-5% of the total RbcS pool (Khumsupan et al., 2020). Structural characterization of Rubisco using alternative approaches (e.g. cryo-EM) should help to elucidate if such findings are an artefact of the crystallization process, or biologically relevant. If the Rubisco assembly process does favour one RbcS isoform per L 8 S 8 complex, this raises interesting questions about the potential mechanisms involved. One possibility is that binding of the first monomeric RbcS protomer could modify the structural conformation of the (L 2 ) 4 assembly to strongly favour the subsequent addition of the same isoform type. Although hypothetical, such a scenario could offer an efficient means of regulating the catalytic properties of the Rubisco pool through the expression of different RbcS isoforms.

Influence of the RbcS on the assembly and catalytic properties of Rubisco
The efficiency of CO 2 assimilation by Rubisco appears to be constrained by several catalytic trade-offs. Mechanistic models have hypothesized an apparent inverse relationship between k cat c and S c/o , or that any improvement in carboxylation efficiency may also improve oxygenation efficiency (for a more detailed, review see Flamholz et al., 2019). Several authors have argued that Form I Rubiscos may now be optimized to their existing environments, having reached a 'Pareto optimality' of activity and specificity in which neither parameter can be further improved without negatively affecting overall fitness (Tcherkez et al., 2006;Savir et al., 2010;Erb and Zarzycki, 2018). Nevertheless, new lines of evidence now suggest that Rubisco may have more room to manoeuvre than earlier thought. Firstly, previous arguments have been based on data from a relatively small pool of organisms. More expanded analyses have indicated that Rubisco has a greater degree of catalytic variability than expected and that the apparent catalytic trade-offs observed for Form I Rubiscos are weaker when larger groups are considered (Young et al., 2016;Flamholz et al., 2019). Bouvier et al. (2021) have also argued that the evolution of Rubisco in plants has been limited more by phylogenetic constraints than potential catalytic trade-offs. Such constraints may include the necessity for Rubisco to exhibit high levels of expression and protein stability, and a requirement to maintain complementarity with chaperones involved in assembly and regulation (i.e. the native Rubiscosome). Secondly, engineering efforts have screened predicted ancestral Rubiscos and produced Rubisco variants that deviate from the 'canonical' catalytic trade-offs between k cat c , K c , and S c/o (Wilson et al., 2018;Zhou and Whitney, 2019;Martin-Avila et al., 2020;Lin et al., 2022). Together, these studies suggest a wider scope for engineering improvements in the catalytic parameters of Rubisco, and that furthering our understanding of the underlying structural basis that confers the catalytic properties of Rubisco is still critically important (Valegård et al., 2018).
In the context of the RbcS, it has been clear for decades that it influences assembly and the catalytic performance of the Form I Rubisco complex, and earlier work has been well reviewed (e.g. Spreitzer, 2003). For example, even in rare cases in cyanobacteria where RbcL can assemble in the absence of RbcS, the L 8 complex exhibited only 0.15% of the carboxylase activity of the L 8 S 8 complex (Lee and Tabita, 1990), while k cat c was reduced to 0.6-1% of wild-type values (Andrews, 1988 Gutteridge, 1991). More recent studies can be divided into those that have examined the impact of native RbcS mutations on Rubisco performance, and those that have attempted to assemble heterologous RbcS with native RbcL to produce a hybrid Rubisco complex. Several of these modifications have resulted in significant changes to the catalytic parameters and/ or content of Rubisco (Table 2). Chlamydomonas has been used extensively to examine the importance of different residues and features of the RbcS, in particular the variable βA-βB loop region. Spreitzer et al. (2001) observed that single residue substitutions in the βA-βB loop resulted in significant changes in the catalytic efficiency and specificity of Rubisco (Khrebtukova and Spre itzer, 1996;Spreitzer et al., 2001). Furthermore, replacing the native βA-βB loop with a shorter variant from spinach or the model cyanobacterium Synechococcus elongatus PCC 7942 showed that the length of the βA-βB loop is not critical for Form IB Rubisco assembly (Karkehabadi et al., 2005). Both heterologous βA-βB loops caused similar reductions in the carboxylation rate of Rubisco, while the cyanobacterial sequence resulted in a reduction in S c/o values. These and further studies have demonstrated that changes to the βA-βB loop can impact the performance of the active sites of Rubisco even though the loop is structurally remote (i.e. >16 Å in distance), possibly through distant interactions with RbcL in the solvent pore (Spreitzer et al., 2005;Esquivel et al., 2013). Apart from the βA-βB loop, several highly conserved residues on the RbcS that make contact with RbcL near the β/α-barrel through van der Waals or salt bridge interactions have been shown to be important (Genkov and Spreitzer, 2009;Meyer et al., 2012). For example, the substitution L18A in the RbcS N-terminal region can influence Rubisco stability, while Y32A in αA and E43A in βA impact catalytic performance.
More recent vascular plant-based studies have focused on engineering entire heterologous small subunits into host organisms to generate hybrid Rubisco complexes ( Table 2). Introduction of an RbcS from C 4 Sorghum bicolor into rice (Oryza sativa) resulted in a hybrid Rubisco with more C 4 -like characteristics, including an increased k cat c and decreased S c/o (Ishikawa et al., 2011;Matsumura et al., 2020). Furthermore, transformation of three potato (Solanum tuberosum) RbcS isoforms into the chloroplast genome in tobacco (Nicotiana tabacum) produced hybrid Rubiscos with the native NtRbcL that had significant differences in catalysis, including a 13% and 8% increase in k cat c and S c/o , respectively, for StRbcS3 (Martin-Avila et al., 2020). Notably, StRbcS3 differs from two other StRbcS isoforms tested by two amino acid resides at the apex of the βA-βB loop. These results highlight the potential of modified or heterologous RbcS isoforms to enhance Rubisco performance in different crop species.
Introducing different plant RbcS variants into Chlamydomonas not only resulted in increased S c/o values, more similar to that of higher plants, but also revealed the critical role of the algal RbcS α-helices in pyrenoid assembly (Genkov et al., 2010;Meyer et al., 2012). In comparison, expression of Chlamydomonas CrRbcS2 in the Arabidopsis RbcS-deficient double mutant 1a3b (i.e. lacking expression of AtRbcS1A and AtRbcS3B) produced a hybrid Rubisco pool with reduced S c/o values . Given the evolutionary distance between plants and algae, this raises interesting questions of how broadly compatible different RbcS isoforms are for Rubisco assembly and stability in non-native species.

RbcS regulates the content and can influence the catalytic characteristics of Rubisco in eukaryotes
In eukaryotes that possess Form IB Rubisco, RbcS is encoded by a family of nuclear-encoded genes that varies in number between species. Some species have a small RbcS gene family, such as Chlamydomonas with two isoforms, while others have much larger gene families, such as tobacco and wheat (Triticum aestivum) with 13 and 25, respectively (Table 3) (Donovan et al., 2020;Caruana et al., 2022). As the RbcS is no longer located with the RbcL on a chloroplastic operon, further regulatory processes have evolved to coordinate the efficient production of both subunits in a 1:1 stoichiometric ratio. Several studies in algae, C 3 plants, and C 4 plants have shown that RbcL synthesis in the chloroplast is dependent on the presence of RbcS, and that knockdown of RbcS expression reduces the rate of RbcL synthesis (Khrebtukova and Spreitzer, 1996;Rodermel et al., 1996;Wostrikoff and Stern, 2007;Wostrikoff et al., 2012;Khumsupan et al., 2020). RbcL synthesis appears to be regulated by a process known as control by epistasis of synthesis, where in the absence of RbcS, the L 8 -RAF1 assembly intermediate (or possibly L 8 -BSD2 in species with BSD2) acts as a suppressor of RbcL translation (Wietrzynski et al., 2021). Thus, RbcS expression plays a key role in regulating the size of the Rubisco pool.
Recent work has shown that increasing native RbcS expression levels can lead to improvements in growth and yield. In rice, overexpression of OsRbcS2 resulted in a 23% and 28% increase in dry weight and rice yield, respectively, compared with wild-type plants (Table 2) (Yoon et al., 2020). In maize, overexpression of ZmRbcS alone had no impact on Rubisco levels (Salesse-Smith et al., 2018). However, co-expression of ZmRbcS with RAF1 resulted in a 30% increase in both Rubisco content and dry weight, which suggests that RAF1 expression may co-regulate Rubisco levels and activity, at least in C 4 plants.

The role(s) of RbcS gene families
The expansion of gene families is driven by various gene duplication processes, at both the single-gene and whole-genome level. For example, gene duplication can occur through tandem duplications, which results in gene copies being close together on the same chromosome in a tandem array (Freeling, 2009).
Tandem arrays of RbcS genes are present in many organisms, including CrRbcS1 and CrRbcS2 in Chlamydomonas (Goldschmidt-Clermont and Rahire, 1986), AtRbcS1B-AtRbcS 3B in Arabidopsis (Krebbers et al., 1988), OsRbcS2-OsRbcS5 in rice (Morita et al., 2014), five of the six RbcS genes in the facultative Crassulacean acid metabolism (CAM) plant Mesembryanthemum crystallinum (Derocher et al., 1993), six of the eight RbcS genes in petunia (Petunia sp. hybrid Mitchell strain) (Dean et al., 1985), and the entire RbcS gene family in pea (Pisum sativum) (Polans et al., 1985). A further mechanism for gene duplication is transposon activity, which can transfer gene copies between different chromosomes. Transposon activity may account for the chromosomally isolated RbcS isoforms in Arabidopsis (i.e AtRbcS1A) and petunia. At the whole-genome level, duplication can occur through polyploidization (Van De Peer et al., 2017). For example, in wheat, two allopolyploidy events have resulted in the presence of three distinct diploid subgenomes named A, B, and D, which each contributed an RbcS gene family of nine, eight, and eight genes, respectively (Table 3) (Caruana et al., 2022).
Despite these duplication events, RbcS gene families generally show little sequence divergence. For example, four of the five rice RbcS genes, OsRbcS2-OsRbc5, encode the same mature RbcS peptide (Suzuki et al., 2007). Similarly, tomato (Solanum lycopersicum) and Arabidopsis each encode three RbcS genes located in a tandem array that are nearly identical (Wanner and Gruissem, 1991). Given the apparent lack of sequence divergence, the evolutionary benefit of retaining individual members within RbcS gene families is somewhat unclear, and still an active area of ongoing research. One potential explanation is that multiple RbcS gene copies ensure a sufficient supply of RbcS peptides to meet the demands of Rubisco production (Khrebtukova and Spreitzer, 1996). Retaining an RbcS gene family could also help plants to respond more dynamically to a wider range of environmental and developmental stimuli. In line with the latter argument, the promoter elements regulating the expression of different RbcS genes within a family appear diverse and indicate a high degree of subfunctionalization of expression (Sugita et al., 1987;Dedonder et al., 1993;Cheng et al., 1998;Yoon et al., 2001;Sawchuk et al., 2008). Furthermore, evidence for differential expression of RbcS genes in different organs and under different environmental conditions is available in several plant species including Arabidopsis, tomato, tobacco, rice, and maize (Ewing et al., 1998;Narsai et al., 2009;Zhang et al., 2012;Laterre et al., 2017;Buti et al., 2018).
An alternative hypothesis is that RbcS isoforms have evolved for specific functions that impact on the catalytic properties of Rubisco. The apparent lack of sequence diversity within RbcS families argues against this hypothesis, as it is unlikely that the small differences in peptide sequences observed from current data would lead to a significant functional impact on the catalytic performance of Rubisco in photosynthetic tissues (Yamada et al., 2019). Nevertheless, a broader analysis of RbcS families may reveal further complexity. For example, Sharwood et al. (2016) noted that the C 4 Paniceae species Megathyrsus maximus and Panicum monticola share identical RbcL peptide sequences but showed significant differences in several catalytic parameters, suggesting that RbcS may influence Rubisco catalysis. Furthermore, E. coli-based characterizations of predicted ancestral Rubiscos within the Solanaceae family (i.e. tobacco and potato) have also shown that relatively small differences in the RbcS sequence can modify catalysis (Lin et al., 2022). Notably, the increased abundance of mutations found in ancestral RbcS sequences indicated that the recent evolution of C 3 Rubiscos may have been driven more by changes in the RbcS than in the RbcL.

RbcS in the T-type cluster produce Rubisco with different catalytic properties
A new cluster of 'T-type' RbcS isoforms has recently been discovered that are phylogenetically distinct from RbcS isoforms expressed in leaves (i.e. M-type RbcS) (Laterre et al., 2017). Initially identified in tobacco trichomes (hence 'T'), T-type RbcS appear to have a similar tertiary structure to M-type RbcS (Fig.  2B), but differ significantly in terms of sequence homology and expression (Table 3). T-type RbcS are present in many different plant species, including pteridophytes and bryophytes, but are absent in several dicots and monocots, indicating that independent losses of T-type RbcS genes have occurred during evolution (Pottier et al., 2018). Current evidence suggests that T-type RbcS facilitates Rubisco expression for non-photosynthetic processes, such as CO 2 recycling during secondary metabolic processes (e.g. oil or fatty acid biosynthesis). Notably, Rubisco complexes assembled with T-type RbcS can have significantly different catalytic properties compared with those with M-type RbcS (Table 2). For example, Chlamydomonas Rubisco had increased V c and K c values when assembled with the tobacco T-type RbcS compared with an M-type NtRbcS (Laterre et al., 2017). Furthermore, Rubisco carrying the T-type RbcS showed higher carboxylation rates at pH values below 8. Laterre et al. (2017) hypothesized that these changes may be a functional adaptation to allow Rubisco to operate in a more acidic, CO 2 -rich environment, which might be the case in specialized secretory cells compared with the more alkaline environment of the chloroplast stroma in mesophyll cells. More recently, assembly of the tobacco RbcL with different members of the RbcS family in E. coli has confirmed that the T-type RbcS increases the k cat c and K c values for tobacco Rubisco . Similar changes in Rubisco catalytic performance were observed in transgenic rice plants overexpressing the rice T-type RbcS, and in transgenic tobacco plants expressing the potato RbcL and potato T-type RbcS (Morita et al., 2014;Martin-Avila et al., 2020).
In flowering plants, expression of T-type RbcS is limited to non-photosynthetic tissues (Voo et al., 2012;Laterre et al., 2017). However, in pteridophytes and bryophytes, T-type RbcS expression appears less restricted, and T-type isoforms are generally more abundant than M-type RbcS. Pottier et al. (2018) suggested this may be due to species in early land plant lineages having emerged before the increase of atmospheric O 2 to modern levels (Lenton et al., 2016), when Rubisco would have been adapted to a higher CO 2 to O 2 ratio (i.e. a higher k cat c and lower S c/o ). Thus, M-type RbcS could have evolved to improve the specificity of Rubisco as atmospheric O 2 levels increased. Further measurements of the catalytic characteristics of Rubiscos with different T-type RbcS may help to uncover the past and ongoing ecological roles of the T-type cluster, and if T-type RbcS isoforms could be used to inform strategies to improve the catalytic properties of Rubiscos in crop plants.

RbcS as an engineering target for enhancing plant photosynthesis
Despite the relatively small number of studies to date focused on overexpressing native RbcS or engineering modified or heterologous RbcS isoforms in plants, the observed wide-ranging impacts on the catalytic performance of Rubisco are encouraging (Table 2). In support of the relative ease of nuclear transformation of RbcS, recent advances in tissue culture and protoplast regeneration highlight the growing feasibility and generally robust efficiencies of nuclear genome engineering in a wide selection of crop and non-crop species (Woo et al., 2015;Lin et al., 2018). Research going forward should continue to focus on establishing the extent of non-native RbcS compatibility in terms of sequence and structure for expression and assembly in plants (and algae of biotechnological interest), and the range of catalytic changes achievable by the RbcS alone.
RbcS-deficient mutants of model plant species, such as tobacco or Arabidopsis (Khumsupan et al., 2020;Martin-Avila et al., 2020), are useful proxies to investigate the capacity of heterologous RbcS to assemble with plant RbcL, affect the catalysis of Rubisco, and impact photosynthesis and growth. In planta screens could also be important to confirm that an appropriate transit peptide is being used for efficient chloroplast targeting of heterologous RbcS and that appropriate promoter and terminator combinations are being employed to drive sufficient expression. This could be achieved relatively quickly through transient expression in Nicotiana benthamiana or by protoplast transformation of the target plant (Xu et al., 2022), or an amenable close relative. More high-throughput screening approaches will probably rely on E. coli, which is rapidly developing as a powerful platform for expressing all Form I Rubiscos, screening RbcS and RbcL from different species, and directed evolution (Duraõ et al., 2015;Aigner et al., 2017;Wilson and Whitney, 2017;Lin et al., 2020Lin et al., , 2022. Nevertheless, producing Rubiscos from plants in E. coli still remains challenging due to inefficient RbcL processing and the accumulation of assembly intermediates (Ng et al., 2020), low functional Rubisco yields, and limited current knowledge of the chaperones required for Rubiscos from different plant species. Rubisco biogenesis in E. coli is also currently not a reli-able predictor for assembly in chloroplasts (Wilson et al., 2018), while the catalytic parameters of plant Rubiscos assembled in E. coli are similar but not identical to those observed in planta (Lin et al., , 2022. However, significant improvements have already been made (Zhou and Whitney, 2019), and the increasing availability of genomic data could assist with identifying homologues or additional ancillary chaperones required for different plant species (Kress et al., 2022). Further progress in the state of the art of in silico Rubisco models should also help to complement RbcS screening approaches (Van Lun et al., 2011. A key challenge to successfully exploiting heterologous RbcS in planta appears to be achieving sufficiently high levels of expression in transformed plants, as production of heterologous RbcS using single expression cassettes driven by high strength promoters has not yet yielded sufficient quantities to achieve expression commensurate with native RbcS levels Matsumura et al., 2020). Generating a synthetic RbcS family using a multigene expression cassette approach may help to solve this issue (Marillonnet and Grützner, 2020). Furthermore, removal or suppression of the native RbcS gene family may be required. Both CRISPR/Cas and RNAi strategies have been successfully used to remove or significantly suppress native RbcS expression levels in Arabidopsis, rice, and tobacco (Donovan et al., 2020;Khumsupan et al., 2020;Martin-Avila et al., 2020;Matsumura et al., 2020).
If improvements in Rubisco performance are achieved, other factors could become limiting to photosynthetic efficiency. For example, leaf CO 2 diffusion is enhanced in Limonium species with faster Rubiscos (i.e. a higher k cat c ) (Galmés et al., 2017). Increases in CO 2 diffusion could potentially be accomplished through the overexpression of CO 2 -permeable membrane channels, such as specific aquaporins (Kaldenhoff, 2012;Kromdijk et al., 2020;Clarke et al., 2022), or CO 2 /HCO 3 − transporters found in algal and cyanobacterial CCMs (Rae et al., 2017;Rottet et al., 2021). Overexpression of chaperones could also be required, for example RAF1 or Rubisco activase in maize and rice, respectively (Salesse-Smith et al., 2018;Suganami et al., 2021). Furthermore, any increases in Rubisco CO 2 assimilation may benefit from additional modifications in downstream fluxes, for example the transport of carbohydrates to sink tissues . Modified canopy-scale models that account for Rubisco parameters may help further to predict the physiological consequences of replacing native Rubiscos with heterologous enzymes in plants (Iqbal et al., 2021).
Despite these potential challenges, it is feasible that modifications to the RbcS will yield the first generation of crop plants with improved Rubiscos. RbcS changes may ultimately require concurrent modifications in the RbcL to maximize catalytic improvements or compatibility between the subunits, and this will probably become achievable as chloroplast engineering technologies improve (Yu et al., 2020). Nevertheless, it is fascinating that over the past two decades the RbcS has ascended from relative obscurity to becoming a potential game changer to enhance the photosynthetic performance of plants.