Gene-rich X chromosomes implicate intragenomic conflict in the evolution of bizarre genetic systems

Significance Sex determination systems such as haplodiploidy, in which males' gene transmission is haploid, are surprisingly common, however, the evolutionary paths to these systems are poorly understood. X chromosomes may play a particularly important role, either by increasing survival of males with only maternal genomes, or due to conflicts between X-chromosomal and autosomal genes. We studied X-chromosome gene richness in three arthropod lineages in which males are diploid as adults but only transmit their maternally inherited haploid genome. We find that species with such atypical systems have far more X-chromosomal genes than related diploid species. These results suggest that conflict between genetic elements within the genome drives the evolution of unusual sex determination systems.


Introduction
Many animal lineages have evolved genetic systems in which females are diploid but males are haploid or effectively haploid, with each male creating genetically identical sperm carrying the single haploid genome originally inherited from his mother (1). Such systems range from haplodiploidy (HD), in which males are produced from unfertilized eggs; to embryonic paternal genome elimination, in which diploid males eliminate their paternal genome early in development; to forms of germline-specific PGE (gPGE), where the paternal genome is present in male diploid cells but excluded during male meiosis (Figure 1a). HD/PGE is widespread, seen in ~12% of arthropods and having evolved roughly two dozen times (1). This recurrent evolution perhaps reflects the various advantages of HD/PGE, particularly to mothers, who can increase the transmission of their genes over paternallyinherited genes, control the sex ratio, ensure reproductive success without a mate (in HD), and, under monogamy, reduce conflict between gregarious offspring (2)(3)(4)(5)(6). Given these general benefits, why does HD/PGE evolve in some lineages and not in others? An important clue comes from the finding that HD/PGE evolves from ancestral male heterogamety (XX/XY or XX/X0) (7,8). The most influential hypothesis for this association is the Haploid Viability hypothesis. This hypothesis emphasizes that, starting from an ancestral standard diploid system, newly-evolved haploid males are expected to have markedly lowered fitness due to uncovered recessive deleterious mutations. However, because hemizygosity of X-linked genes facilitates purging of recessive deleterious mutations, an ancestral increase in the proportion of 4 genes on the X chromosome is expected to lead to a decrease in the total number of segregating recessive deleterious mutations, reducing the fitness burden of deleterious mutations for newly-evolved haploid males (5,(8)(9)(10).
However other hypotheses are possible. In particular, the Intragenomic Conflict hypothesis, instead, is a more general hypothesis that sees conflicts between genes within individuals as forces that can destabilize genetic systems and thus promote the origins of novel systems including, but not limited to, HD/PGE (11)(12)(13). X chromosomes seem to be more often associated with intragenomic conflict compared to autosomes (14)(15)(16). In particular, X-linked genes can evolve X chromosome drive (>50% transmission of the X in sperm), which can lead to female-biased population sex ratios. Under such sex biases, males have higher average fitness, thus driving selection for new means of producing males (17,18). This generally increased male fitness could select for production of haploid males. Moreover, silencing or foregoing the paternal genomic contribution (and in particular the paternal X) could be selectively advantageous insofar as selfish driving X alleles are expected to disproportionately act in males. Although more theoretical work is needed, according to the most developed model, HD/PGE in particular could evolve under the Intragenomic Conflict hypothesis through the exploitation of X chromosome drive by maternal autosomes that increase their transmission by becoming effectively X-linked (11). According to this model, the more genes are X-linked, the more genes will be selected to promote X chromosome drive (and the fewer will be selected to suppress drive), increasing the chance of the evolution of male haploidy.
These two hypotheses differ in whether they predict an association between X linkage and the origins of gPGE in those gPGE systems in which paternal chromosomes are expressed in the soma but are eliminated during male meiosis (Fig. 1a). Given diploid expression of autosomes in the male soma, gPGE, unlike other types of HD/PGE, does not uncover deleterious recessive alleles. Thus, the Haploid Viability hypothesis does not predict an association between X linkage and the evolution of gPGE. However, the notion that X-autosome conflict drives novel systems equally applies to gPGE and other HD/PGE systems, thus the Intragenomic Conflict hypothesis predicts an association between X linkage and the evolution of gPGE. (Notably, in most characterized gPGE systems including those studied here, the paternal genome remains present and expressed through the diploid pre-meiotic stages of spermatogenesis and is only eliminated during meiosis.) To our knowledge, this differential prediction has not been noted or tested. gPGE systems that retain sex chromosomes and diploid expression of somatic autosomes are known from three lineages: flies in Sciaridae (19) and Cecidomyiidae (20) (fungus gnats and gall midges respectively, two families in the diverse dipteran superfamily Sciaroidea) and springtails in the order Symphypleona (21). Sciaridae and Cecidomyiidae represent a substantial fraction of worldwide biodiversity and are some of the most abundant species of flying insects found in tropical rainforests and in temperate ecosystems, with many new species in these groups continuing to be described (22)(23)(24). Sciaridae, Cecidomyiidae, and Symphypleona have independently evolved similar variants of gPGE, in which males are produced through somatic elimination of paternal X chromosomes, while the remainder of the paternal genome is retained until its elimination during meiosis ( Fig. 1) (7,14,21,(25)(26)(27)(28)(29)(30). These clades offer a powerful opportunity to disentangle whether the origin of HD/PGE is better explained by the Haploid Viability hypothesis or the Intragenomic Conflict hypothesis.
To test these two hypotheses for the origins of HD/PGE, we performed whole genome sequencing and comparative analysis of 17 genomes from species with gPGE and their non-gPGE relatives. We developed methods to estimate genome-wide X chromosomal linkage using additional 35 dipteran species for validation, and then used these methods to estimate X linkage across the 17 studied species. We find evidence for ancestral gene-rich X chromosomes coincident with three independent origins of gPGE. These results provide the first empirical evidence for a role for intragenomic conflict in the origins of atypical genetic systems.

Results and Discussion
Development and testing of an improved method to estimate genome-wide X chromosomal linkage Illumina genome sequencing and assembly was performed for males of each studied Sciaroidea species, and average read coverage was calculated for each contig. For the dipteran species, putative orthologs of D. melanogaster genes were identified via TBLASTN searches of each genome. Each ortholog was then assigned to one of the so-called Muller elements, D.  Table S1) (33). Publicly available chromosome level assemblies for B. coprophila, A. gambiae, T. dalmanni, and several Drosophilid species allowed for direct comparison to our assignment and for each we found our estimation of X linkage to be within 1% of previous estimates, allowing us to be confident in our assignment of X-linked and autosomal genes (Fig. S5a).

Increased numbers of X-linked genes in gPGE species relative to related species
To test whether the evolution of gPGE is associated with gene-rich X chromosomes, we estimated the proportion of the genome that is X-linked for 17 species of Sciaroidea flies and two species of springtails. We sampled the flies across seven families spanning the root of Sciaroidea, including two families with gPGE and two outgroup species within Bibionomorpha.
We used the publicly available genome assembly and annotation supported by physical 7 mapping for the Hessian fly, Cecidomyiid Mayetiola destructor (34,35), and also used available female read data to estimate relative male to female coverage. For the springtails, we performed genomic sequencing of males and females from one species from the gPGE order Symphypleona, Allacma fusca (Fig. 1d), and of Orchesella cincta, from Entomobryidae, the closest relative springtail order with standard XX/X0 sex determination. In Springtails, instead of orthologs, we used genome annotations to estimate the gene density and used both male and female read coverage data. Our assignment methods provided clear estimates of X linkage for nearly all our species, with exceptions in one species, the Cecidomyiid Lestremia cinerea, which showed three distinct peaks in genome coverage rather than two, as well as a low number of complete BUSCO genes present (Fig. S1, S2).
Among all non-gPGE fly species of Bibionomorpha, we found very few X-linked genes, with the X chromosome in all species comprised mostly of genes from the diminutive F Muller element (<1% of all genes), consistent with the previous inference for the ancestral dipteran X chromosome ( Fig. 2a) (33). Interestingly, no Muller elements exhibited clear X-linked peaks in coverage in Platyura marginata and Symmerus nobilis, the latter of which is sister to all other Sciaroidean species, suggesting either homomorphic sex chromosomes, a lack of an X chromosome, or neo-X chromosomes too recently evolved to be distinguished via coverage as previously observed (33).
By contrast, for all six studied gPGE species in both the Sciaridae and Cecidomyiidae clades, genome-wide, we found large fractions of genes to be X-linked, including genes from all six Muller elements (Fig. 2a, 3). Notably, our results agree with previous results for M. destructor, identifying Muller elements C, D, F, and E as partially X-linked (33), and our methods additionally detect a small minority of X-linked genes for elements A and B. We also found a clear contrast between the two studied springtail genes: while only 16% of genes in the genome of non-gPGE Orchesella cincta are X-linked, for the gPGE springtail Allacma fusca, 38% of annotated genes are X-linked (Fig. 2b). 8

Statistical tests support the relationship between gPGE and X linkage
To test the association between percent X chromosome linkage and the evolution of gPGE, we used multiple statistical methods. While the number of transitions to gPGE with X chromosomes is small and all current phylogenetic methods with a binary response variable are prone to inflated Type 1 error rates with small sample sizes (36)

Correspondence between X-linked genes within families indicates ancestrally gene-rich X chromosomes
Although we found an association between gene-rich X chromosomes and gPGE in all three independent origins of this genetic system, the observed association could be explained by either X linkage facilitating the evolution of gPGE or vice versa. Consistent with the former, we see the same patterns of Muller group X linkage within families (E>A>B in Sciaridae species; C>D>E>A>B in Cecidomyiidae). In addition, we found an association between X-linked gene subsets within individual Muller elements, as expected from ancestral linkage. For instance, the subsets of Muller B genes that are X-linked in the Sciaridae species B. coprophila and T. splendens significantly overlap, and the same is true for all partially X-linked Mullers in both Sciaridae (Fig. 3). By contrast, X-linked genes between Sciaridae and Cecidomyiidae do not significantly overlap, supporting independent origins of the large X in these two families (Fig.   S4).
Examination of Cecidomyiidae reveals an intriguing pattern. The deeply-diverged species C. subobsoleta and M. destructor show high correspondence between X-linked gene subsets, indicating substantial ancestral X linkage. However, P. nigripennis shows divergent X linkage, with no significant pattern seen in shared X linkage with other Cecidomyiids, and a relative increase in X linkage on Muller elements A and B. This pattern suggests turnover and increases in X linkage in this lineage since the divergence from M. destructor (or, less parsimoniously, parallel loss of A/B linkage in the other lineages) (Fig. 2a, 3).

X Chromosome lability and partial Muller linkage
Our data attest to substantial dynamism of the X chromosome and Muller linkage in both gPGE families within Diptera. This is in contrast to the dominant model of Dipteran sex chromosome evolution where sex linked Muller elements are expected to remain stable over long evolutionary periods. Some notable cases indicate remarkable conservation, such as the X chromosome of the German cockroach which has remained conserved with the ancestral dipteran X chromosome (Muller element F) despite 400 million years of divergence (38). On the other hand, even within Drosophila this pattern is disrupted, with fusions of ancestral drosophilid X-linked element A and typically autosomal element D in the obscura clade into the X chromosome, as well as in D. willistoni (Fig. 4b) (39,40). Vicoso and Bachtrog demonstrated abundant sex chromosome turnover across Diptera, broadly challenging the established paradigm of sex linked Muller element stability (33).
In addition to demonstrating cases of lost and replaced sex chromosomes, Vicoso and Bachtrog also showed cases of partial linkage, where parts of multiple Muller elements are incorporated into sex chromosomes (33). Specifically, they find partial linkage for the B element of Holcocephala fusca and for the E element of M. destructor, both of which our methods also identified as partially X-linked (35% and 40% of genes, respectively). We additionally find minor partial linkage of Muller elements A and B in M. destructor (Fig. 2a and Supp figure 4b). In Anopheles gambiae, element A is typically discussed as if fully X-linked, however the X chromosome has been previously shown to be only partially composed of element A and parts of other Muller elements, while the rest of ancestrally Muller A genes are now found on autosomes (41), consistent with our results (Fig. S5, Supplemental table 1). Additionally, minor partial X linkage of A. gambiae elements E (11%) and F (33%) has been previously identified (42) and is consistent with our findings of 11% and 29% X linkage respectively (Table S1). Our methods demonstrate the resolution to detect low levels of X linkage and suggest partial linkage and general Muller element breakdown may be more common than is generally appreciated.

Concluding remarks
We find that species in the gPGE groups Cecidomyiidae and Sciaridae have, on average, X chromosomes 37 times more gene-rich than non-gPGE Sciaroidea species, with a more than doubling of the X chromosome gene content of the gPGE springtail species 11 compared to the diploid outgroup ( Fig. 2a and b). Furthermore, we recovered a robust positive correlation between the percent X linkage in the genome and the evolution of gPGE (Fig. S3).
While having additional independent origins of X chromosome-containing gPGE would add strength to our conclusions, we are only aware of those studied here and our results are bolstered by multiple statistical methods.
Notably, while previous similar reports of an association between the extent of X linkage and atypical sex determination are consistent with either the Haploid Viability hypothesis or the Intragenomic Conflict hypothesis (8), these findings represent the first empirical evidence that suggests Intragenomic Conflict as a strong driver of the evolution of unconventional sex determining systems such as gPGE and haplodiploidy. Given the widespread and repeated evolution of male haploidy, and its association with many unique ecological and life history strategies, our findings point to an important role for intragenomic conflict in shaping biology at all levels from molecule to organism to community.

Specimens and sequencing:
In order to compare X chromosomes of gPGE species to their diplodiploid relatives, we were generated by Edinburgh Genomics (UK) and sequenced on the Illumina HiSeq X (for springtails) or NovaSeq S1 (for B. coprophila) generating short reads (150 bp paired-end). The genome for B. coprophila was assembled using Megahit 1.2.9 (43). The genome of springtail A. fusca was assembled using SPAdes v3.13.1 (44). Both genomes of B. coprophila and A. fusca assemblies were decontaminated with BlobTools (45). The assembly of A. fusca was annotated using BRAKER 2.1.5 (46). We assessed the quality of all genomes using BUSCO (47), to determine the proportion of single copy orthologs expected to be present in either insects (insecta_odb10 for fungus gnat species) or arthropods (for springtails) in the genome assemblies (Fig. S1). Lestremia cinerea was excluded from downstream analysis due to irregular genome coverage patterns and a low number of complete BUSCO genes present, indicating likely issues with the genome quality for this species (Fig. S1 and S2). We used publicly available genome assemblies for the Cecidomyiid Mayetiola destructor (GCA_000149195.1) and for the springtail Orchesella cincta (GCA_001718145.1). For M. destructor, we used publicly available male (SRR1738190) and female reads (SRR1738189), and for O. cincta, we additionally used available female reads (SRR2222657).

Assigning ancestral linkage groups:
The X chromosome in each fly species was identified using two strategies-Muller group ortholog. The X chromosomes in springtails were identified using the coverage approach only.

Identifying X linkage via coverage
Our second strategy implemented DNA coverage levels to characterize autosomal and X-linked sequence, as we expect the single copy X chromosome in males to cause X-linked sequence to be found at half the coverage level of autosomes. Male DNA reads were mapped to their respective genome assemblies and repetitive sequence that could not be singly mapped was accounted for when calculating an adjusted coverage (See Supplemental methods). For species in which female read data was available, M. destructor and the two springtails, the relative coverage of male to female was used. In the case of A. fusca, we used median coverage of two males and 11 females available (26). To classify genes by coverage as either autosomal or Xlinked, we used a multi-step protocol relying on the full genome and per-Muller male DNA coverage distributions (See detail in Supplemental Information). We also assessed 35 other dipteran genomes outside Bibionormorpha using publicly available data and the same methods of analysis (Fig. S5, Table S1).

Statistical analysis and phylogenetic correction
To test the association between X linkage and the evolution of PGE, we estimated a Bayesian generalized linear mixed "threshold" model (49) and a likelihood-based phylogenetic logistic regression described in (50) Both methods attempt to control for the phylogenetic relatedness of the species. For full detail, see Supplemental Information.

Testing for ancestral Muller group linkage
To test for evidence of ancestral X linkage, we compared various pairs of species. We studied each Muller element for which both compared species had partial X linkage, in which the ancestral linkage groups have broken up and are now partially X-linked and partially autosomal.
Genomes of each species pair were reciprocally blasted to defined putative pairwise orthologs using TBLASTX. Only best reciprocal hits and orthologs that blasted to the same D.
melanogaster gene were included in further analysis. Each ortholog pair was then assigned based on its inferred X/autosomal linkage for both species (X-linked/X-linked, Xlinked/autosomal, autosomal/X-linked, or autosomal/autosomal). Association between X linkage across between-species orthologs was tested by a Chi square test.