Edinburgh Research Explorer Diversity and Divergence: Evolution of secondary metabolism in the tropical tree genus Inga

Plants are widely recognized as chemical factories, with each species producing dozens to hundreds of unique secondary metabolites. These compounds shape the interactions between plants and their natural enemies. We explore the evolutionary patterns and processes by which plants generate chemical diversity, from evolving novel compounds to unique chemical profiles. We characterized the chemical profile of one-third of the species of tropical rainforest trees in the genus Inga (~ 100, Fabaceae) using UPLC-MS based metabolomics and applied phylogenetic comparative methods to understand the mode of chemical evolution. We show: 1) Each Inga species contain structurally unrelated compounds and high levels of phytochemical diversity. 2) Closely related species have divergent chemical profiles, with individual compounds, compound classes and chemical profiles showing little to no phylogenetic signal. 3) At the evolutionary time scale, a species' chemical profile shows a signature of divergent adaptation. At the ecological time scale, sympatric species were the most divergent, implying it is also advantageous to maintain a unique chemical profile from community members. 4) Finally, we integrate these patterns with a model for how chemical diversity evolves. Taken together, these results show that phytochemical diversity and divergence are fundamental to the ecology and evolution of plants.


Introduction
For sessile organisms such as plants, secondary metabolism plays a fundamental role in mediating biotic interactions ranging from mutualisms (e.g.pollination) to antagonisms (e.g.competition and defense).Plant secondary metabolites, sometimes referred to as specialized metabolites, which are classically considered nonessential for basic cellular function, are exceedingly diverse, with nearly 1,000,000 predicted to exist across the plant kingdom (Afendi et al. 2012) .It has long been thought that this incredible diversity strongly influences the ecology and evolution of interactions between plants and their pests and pathogens (Ehrlich and Raven 1964;Endara et al. 2017;Endara et al. 2018).Plant secondary metabolites are also essential for plants' ability to survive in harsh abiotic environments by offering protection from UV damage and desiccation (Weng 2014).The evolution of novel compounds or unique combinations of compounds (hereafter, chemical profile) can be highly adaptive, increase plant fitness, and facilitate species coexistence (Salazar et al. 2016;Vleminckx et al. 2018;Forrister et al. 2019).
Thus, understanding the origin and maintenance of chemical diversity is central to both the evolution and ecology of plants.
Much of the theoretical and empirical literature supports the idea that selection has placed a premium on chemical diversity in plants (Jones 1991;Berenbaum and Zangerl 1996;Richards et al. 2016;Kessler and Kalske 2018;Salazar et al. 2018;Wetzel and Whitehead 2020).
A species' chemical profile is thought to arise from a diverse set of selective pressures ranging from abiotic factors, such as water loss and solar radiation, as well as selection exerted by a multitude of herbivores, pathogens, and mutualists (Weng 2014; Endara et al. 2017;Salazar et al. 2018).For example, increased phytochemical diversity in tropical forests is negatively correlated with both the number of herbivore species associated with a given host (Salazar et al. 2018;Endara et al. 2021) and herbivory (Richards et al. 2015).In addition to producing a diverse set of compounds, recent studies have highlighted the importance for a given species to maintain a unique chemical profile relative to other species in its community (Kursar et al. 2009;Forrister et al. 2019;Endara et al. 2021).While there is a clear consensus on the value of phytochemical diversity, the underlying evolutionary processes that generate chemical diversity in plant lineages remain widely debated (Wetzel and Whitehead 2020).
Here we ask how plants generate chemical diversity and what evolutionary processes lead to novel compounds and unique chemical profiles.To address this question, we build off the classic 'escape and radiate' theoretical frame, first suggested a half-century ago by the work of Dethier (1954), Fraenkel (1959), and Ehrlich and Raven (1964).In this model, random mutations in biosynthetic genes lead to the production of novel defense compounds, often through the gradual embellishment of core structures into more complex and derived compounds (Berenbaum 1983, Berenbaum andFeeny 1981;Coley et al. 2019).If these derived compounds have stronger deterrent properties or are effective against different enemies, selection acts to promote the novel genotype.In this study, we test the prediction put forth by the 'escape and radiate model' that chemical evolution proceeds in a gradual step-wise mannor through the modification of core structures (Ehrlich andRaven 1964, Berenbaum 1983).To test this, we combine untargeted metabolomic and comparative phylogenetic methods to characterize the chemical profiles for nearly 100 species of tropical trees in the genus Inga (Fabaceae).By focusing on a recently radiated monophyletic genus of trees, we attempt to understand how chemistry evolves at tips of the phylogenetic tree over a relatively short period of evolutionary history.This offers a different perspective than studies of chemical evolution focused on deeper phylogenetic scales such as divergence among families (e.g., Wink 2003).
Inga is a useful case study for exploring how secondary metabolism evolves over short phylogenetic distances.Inga is a speciose genus with ~300 tree species in tropical moist forests throughout the New World.At any given site, it usually constitutes one of the most abundant and speciose genera, with up to 40 coexisting species (Valencia et al. 2004).Multiple lines of evidence have implicated the importance of chemistry in the ecological and evolutionary processes that have shaped the genus (Kursar et al. 2009;Endara et al. 2017;Coley et al. 2018).
Moreover, Inga, and other speciose tropical genera such as Bursera, Psychotria, Piper and Protium are among the most phytochemically diverse plant lineages that have been documented, often having more compounds in a single genus than entire plant communities in temperate ecosystems (Sedio et al. 2018).Thus, Inga is an illustrative model for the generation of phytochemical diversity as a whole.The results presented in this study, build off of previous work in Inga which focused on a few specific metabolites (Coley et al. 2019) or broad compound classes (Kursar et al. 2009).Here we increase the phylogenetic coverage and leverage metabolomics to greatly expand our exploration of the relationship between evolutionary history and chemical similarity.
We use untargeted metabolomics to quantify intraspecific phytochemical diversity, examine how chemical similarity between congeners changes over evolutionary time and geographic distance, and finally quantify the phylogenetic signal of individual compounds as well as larger chemical classes.In doing so we aim to address the following questions and hypotheses: 1) Do species invest in phytochemical diversity by producing structurally unrelated compounds?
Investment in structurally diverse defensive compounds is adaptive for protection against a broad suite of pests and pathogens (Salazar et al. 2018;Wetzel and Whitehead 2020;Endara et al. 2021), yet investment in chemical defense comes at a cost (known as the 'growth-defense tradeoff') (Strauss et al. 2002;Panda et al. 2021;Monson et al. 2022).Investment in chemical defense is expensive both in terms of the carbon and nitrogen used as inputs for the biosynthetic products, as well as in terms of transcribing and regulating enzymes involved in secondary metabolism (Gershenzon 1994).It is unclear, whether biosynthetic constraints and pleiotropy of biosynthetic enzymes limit phytochemical diversity or lead to evolutionary trade-offs between chemical classes (Koricheva et al. 2004;Agrawal et al. 2009;Gershenzon et al. 2012).Because phytochemical diversity is potentially adaptive (Richards et al. 2015;Salazar et al. 2018;Endara et al. 2021), we hypothesize that selection will favor investment in a diverse suite of compounds rather than structurally related ones.
2) Does the entire chemical profile diverge between closely related species and does it evolve under divergent selection?
The 'escape and radiate' model, predicts that closely related species would have similar defensive profiles (Ehrlich and Raven 1964;Berenbaum and Feeny 1981;Berenbaum 1983;Coley et al. 2019).However, it has also been posited that diffuse coevolution between plants and their natural enemies would result in divergent adaptation in defense traits (Endara et al. 2015;Maron et al. 2019).The latter argues that it is advantageous for a species to not only have a diversity of compound classes, but to be different from other species in their community in order to not share pests and pathogens (Kursar et al. 2009;Bagchi et al. 2014;Salazar et al. 2018;Forrister et al. 2019).Here we ask if species' chemical profiles show phylogenetic signal, or if they have diverged sufficiently to erase the effect of shared evolutionary history.We also incorporate biogeography asking if sympatric species are more or less divergent in their chemical profile than species occurring in parapatry.Biogeography is an important factor because at the population (within species) level, selection pressures may differ at different sites.Additionally, because sympatric species should be divergent in ecologically relevant traits to coexist (Chesson 2000), we hypothesize that sympatric relatives will be more divergent in their chemical profile than parapatric ones.Finally, we use a novel modeling framework (Anderson and Weir 2020) to formally test the hypothesis that chemical profiles are evolving under divergent adaptation.

3) Are individual compounds phylogenetically conserved?
The evolution of novel chemistry is assumed to be the result of stepwise changes to chemical structures resulting in more derived chemical defenses over evolutionary time (Berenbaum and Feeny 1981;Coley et al. 2019).This process should lead to a pattern of phylogenetic conservatism of metabolites and biosynthetic pathways (Ehrlich and Raven 1964;Salazar et al. 2018).To test this prediction, we mapped all individual compounds present in Inga onto the phylogeny and estimated their phylogenetic signal.We then used ancestral state reconstruction to estimate the number of times each compound had transitioned on the phylogenetic tree (Courtois et al. 2016).In contrast to the 'escape and radiate' model, we hypothesize that in order for species to invest in structurally diverse compounds and diverge from close relatives, the mode of chemical evolution would not proceed in a stepwise manner.Rather, rapid changes based on transcriptional regulation would result in low phylogenetic signal of individual compounds.

4) Is there evidence of metabolic integration or apparent trade-offs between biosynthetic pathways?
Comparative phylogenetic analyses of defense traits have revealed both trade-offs (negative correlations) (Kursar and Coley 2003;Agrawal and Fishbein 2006;Agrawal et al. 2009;Coley et al. 2018;Monson et al. 2022) and positive correlations (Agrawal and Fishbein 2006), providing evidence for evolutionary integration and defense syndromes.For example, trade-offs between compound classes that share the same biosynthetic precursor are well supported in the literature (Keinänen et al. 1999;Nyman and Julkunen-Tiitto 2005;Agrawal et al. 2009).Nevertheless, other studies have found little evidence for these trade-offs based on meta-analysis (Koricheva et al. 2004).Here we ask whether biosynthetic constraints lead to trade-offs that persist over evolutionary timescales or if each branch of the biosynthetic pathway evolves independently.

Materials and Methods
Study sites and species sampling: We studied Inga between 2005 and 2014 at five lowland tropical rainforest sites across the Amazon basin and in Panama (Table S1), where we extensively surveyed understory saplings, a prolonged and key vulnerable stage in the life cycle of tropical forest trees (Coley et al. 2018).
We sampled Inga across the full distributional range of the genus.We spent approximately 16 people-months per site collecting data in the field.Specifically, we exhaustively searched each site for all Inga species, taking measurements on morphological and defense traits for a total of 97 species as well as one species from its sister genus, Zygia.Species delimitation was based on the combination of morphology, phylogenetic reconstruction (Nicholls et al. 2015) and in some cases for morphologically difficult to identify individuals, we relied on chemocoding to confirm species identifications (Endara et al. 2018).Young leaves at approximately 50% full expansion were collected in the understory from 5 to 10 spatially separated individuals (with very few exceptions for rare species where we included 3 individuals).We focused on expanding leaves, as they receive more than 70% of the lifetime damage of a leaf (Coley and Aide 1991), and their chemical profiles are an important factor for host associations of insect herbivores (Endara et al. 2017;Endara et al. 2018).In general, we found the chemical profile of each species to be highly canalized and previous work has shown that 5 individuals is sufficient to capture ~75% of compounds encountered in up to 15 individuals (Endara et al. 2021).Samples were dried in the field at ambient temperature in silica immediately following collection, and then stored at -20 o C. Some Inga species invest in the overexpression of the essential amino acid Ltyrosine as an effective chemical defense (Coley et al. 2019).Tyrosine is insoluble in our extraction buffer, so a different protocol was used to determine the percentage of leaf dry weight.Extractable nitrogenous metabolites were extracted from a 5 mg subsample of each leaf using 1 mL of aqueous acetic acid (pH 3) for 1 h at 85 o C (Coley et al. 2019). .Fifteen microliters of the supernatant were injected on a 4.6 x 250 mm amino-propyl HPLC column (Microsorb 5u, Varian).Metabolites were chromatographed using a linear gradient (17-23%) of aqueous acetic acid (pH 3.0) in acetonitrile over 25 min.Mass of solutes in each injection were measured using an evaporative light scattering detector (SEDERE S.A., Alfortville, France).ELSD temperature was 75˚C with 2.2.bars of compressed N2 and instrument gain was set to 6. Tyrosine concentrations were determined by reference to a four-point standard curve (0.2-3.0 mg tyrosine/mL, r 2 =0.99) prepared from pure tyrosine.

a) Compound separation, annotation, and assignment to species:
Following HPLC and UPLC-MS data acquisition, metabolites were quantified and assigned available structural information in all samples using an untargeted metabolomics pipeline developed by our research group (see Endara et al. (2021) for details).In this pipeline, spectral features are extracted from raw MS data, and related features are grouped into compounds based on shared retention time and correlated abundance between scans using CAMERA (Kuhl et al. 2012).We employed a variety of techniques in order to assign individual compounds into classes including NMR structural characterization, MS/MS-based spectral library searches using GNPS (Wang et al. 2016), in silico compound annotation, and machine learning prediction.As a result, MS/MS data for each compound were uploaded to GNPS for annotation of putative structures and compound classes.These analyses generate 1) a species by compound abundance (MS-1 peak intensity measured by total ion current) matrix and 2) a compound by compound MS/MS spectral cosine similarity matrix, which are then combined into a pairwise species similarity matrix which accounts for both shared compounds between species and the MS/MS structural similarity of unshared compounds.3) A classification table is created with the assignment for all annotated compounds based on ClassyFire (Djoumbou Feunang et al. 2016).
To test for phylogenetic signal of the entire chemical profile and quantify divergence between species, we developed a method for quantifying overall chemical similarity between two species (Endara et al. 2021).This provides a challenge because few compounds are shared between species, making classic distance metrics such as Bray-Curtis uninformative (Endara et al. 2021;Sedio et al. 2017).Our method, which is similar to the method developed by Sedio et al. (2017), accounts for the fact that two species may have different compounds that are structurally similar (Endara et al. 2018;Endara et al. 2021).Specifically, we leverage MS/MS spectra as a proxy for the structural similarity between compounds (Wang et al. 2016) .In this method, total chemical similarity between species is a function of the normalized abundance of shared compounds plus the normalized abundance of unshared compounds weighted by their structural similarity in the molecular network (see ( 18) for details).
We quantified investment in phytochemical diversity for each focal species using its chemical profile and the MS/MS molecular network to calculate the functional Hill number (Chao et al. 2014).This diversity measure accounts for both variation in compound abundance and structural similarity in the molecular network.In short, it calculates the effective number of equally abundant and structurally distinct compounds produced by a given species (Chao et al. 2014) .
We compared this diversity index with a null model where we assembled compounds into chemical profiles through a bifurcating process from root to tip on the Inga phylogenetic tree.This null model is rooted in the null models often employed in community ecology, but is expanded to incorporate phylogenetic relatedness.The null model represents the chemical profiles randomly drawn from the entire pool of compounds found in our study samples, while controlling for evolutionary history, compound frequency and abundance (see Appendix 1 for detailed explanation of the null model).To make a representative null model we matched the number of compounds produced by a given species and the number of compounds shared between any two closely related species with the values observed in the actual data, while randomizing the structural relatedness of shared compounds.We normalized phytochemical diversity values of each species relative to our null model.

Phylogenetic reconstruction of Inga:
A phylogenetic tree containing 165 Inga accessions, including taxa sampled at multiple sites, was reconstructed using a newly generated targeted enrichment (HybSeq) dataset of 810 genes.These 810 loci include those presented in Nicholls et al. (Nicholls et al. 2015), supplemented with a subset of the loci from work by Koenen et al. (Koenen et al. 2020).DNA library preparation, sequencing and the informatics leading to final sequence alignments follow protocols in Nicholls et al. (2015).For the phylogenetic inference, we accounted for the putative effect of incomplete lineage sorting by constraining the maximum likelihood phylogeny with the topology obtained from a coalescent-based method.First, we inferred gene trees for 810 loci using IQtree 2 (Minh et al. 2020).The best substitution model was estimated for each loci using the ModelFinder (Kalyaanamoorthy et al. 2017) module implemented in IQtree 2. For each gene tree, we performed 1,000 bootstrap replicates with the ultrafast bootstrap approximation (Hoang et al. 2017).The resulting gene trees were subsequently used as the input for ASTRAL-III to estimate a phylogeny in a summary coalescent framework (Chernomor et al. 2016), after contracting branches with bootstrap support <10.We then used the topology obtained with ASTRAL to perform a constrained maximum likelihood tree search in IQtree 2. We performed a partitioned analysis (Chernomor et al. 2016) after inferring the best-partition scheme for the 810 genes and the best substitution model for each partition using ModelFinder.Branch support was estimated with ultrafast bootstrap approximation (1,000 replicates).The phylogenetic tree was subsequently time-calibrated using penalized likelihood implemented in the program treePL (Smith and O'Meara 2012).We used cross-validation to estimate the best value of the smoothing parameter and implemented secondary calibration points on the crown and node ages of Inga with an interval of 9.2-11.9My and 13.4-16.6My, respectively.Finally, the complete phylogeny was pruned to include only the 98 species for which chemistry data were available.

Phylogenetic Comparative Methods and Ancestral State reconstruction:
For phylogenetic signal of continuous traits we calculated Blomberg's K (Blomberg et al. 2003) using function phylosignal in the R package picante v.1.8.2 (Kembel et al. 2010).K is close to zero for traits lacking phylogenetic signal, and higher than 1 when close relatives are more similar than expected under the classic Brownian motion evolutionary model.For the presence and absence of individual compounds we calculated the D-statistic (Fritz and Purvis 2010) using the caper package (Orme 2012).
We took a stochastic character mapping approach for the ancestral state reconstruction of compound presence/absence on the Inga phylogeny.Specifically, we used the function make.simmap(Bollback 2006) from R package phytools v.0.7-47 (Revell 2012) to estimate the state of each internal node on the phylogeny using 100 simulated trees.Based on the ancestral state reconstruction of each compound, we created an index of evolutionary lability, calculated as the number of times a given compound transitioned between present and absent divided by the number of species where a compound is present.Low values for this index indicate strong phylogenetic conservatism, where a compound likely evolved few times and was retained within a given lineage.Values near or above 1 indicate that a compound is evolutionarily labile, having been gained or lost as many times as the compound was present.
To model how the complete chemical profile changes over time, we used a modeling framework developed by Anderson and Weir (2020) which uses simulated trait values based on either Brownian motion or Ornstein-Uhlenbeck.This framework also test for divergent adaptation by adding a term for the interactions between lineages during simulated trait evolution.

Results:
Our untargeted metabolomics pipeline (Endara et al. 2021) allowed us to characterize thousands of individual compounds and determine the similarity of chemical profiles across species.In total we observed 9,105 unique compounds across 808 samples.Inga species invest substantial resources in soluble secondary metabolites, averaging 194 ± 103 (mean ± s.d.) unique compounds per species, and comprising 37 ± 11% (mean ± s.d.) of the expanding leaf's dry weight (Fig. S1).We were able to classify 42.5% of compounds, a substantial improvement from the 2.9% achieved from library matches alone (Fig. 1).Although our extraction and detection methods did not explicitly exclude primary metabolites, the vast majority of annotated compounds were assigned to secondary metabolites, specifically chemical classes that have been classically implicated in plant defense against pathogens and herbivores, including flavonoids and saponins.Similarly, given the scale of this study, it should be noted that a small fraction of the chemical compounds analyzed in the study are not likely to be found in-planta, as they could be adducts, chemical artifacts and decomposition products.The inclusion of said artifacts should not influence the general conclusions of this study because they are relatively rare.

Individual species invest in structurally diverse compounds.
We asked whether biosynthetic tradeoffs constrain a plant's ability to invest in structurally unrelated compounds (i.e., the cost of maintaining enzymes in multiple metabolic pathways), or whether selection promotes investment in chemical diversity.To answer this question, we quantified investment in phytochemical diversity using functional Hill numbers and compared these findings to a null model.For the majority (94%) of species, phytochemical diversity was within the range of values expected by our null model.The rest of the species exceeded that range (4%) or were underdispersed (2%) (Fig) .The absence of species with lower phytochemical diversity than the null model indicates that all species invest in structurally diverse compounds.

Chemical profiles evolve under divergent adaptation
To test for phylogenetic signal of the entire chemical profile and quantify divergence between species, we developed a method for quantifying overall chemical similarity between two species (Endara et al. 2021) .We compared these calculations to estimates of chemical similarity expected from a null model (Appendix 1).We found that chemical similarity was highest for intraspecific comparisons, but quickly decreased to the point where two species were as dissimilar as expected under our null model based on all interspecific comparisons (Fig. 3; Fig. S3).Within a species, chemical similarity was highest between individuals at a single site (but rapidly decreased between individuals of the same species at different sites (Fig. 3).We also found that interspecific chemical similarity was highly divergent even between sister species and that the majority (83%) of pairwise comparisons between species fell within the range of our null model (Fig. 3, Fig. S3).Sister species at different sites (parapatric) were divergent and sympatric sister species were more divergent than parapatric sister species.Interspecific chemical similarity of the entire chemical profile showed no phylogenetic signal (Mantel test: r= -0.03, P= 0.68, Fig. S3).
To formally test the hypothesis that a species chemical profile is evolving under divergent selection, we used recently developed phylogenetic comparative methods to model different modes of trait evolution and select the best fitting model.We found strong support for the divergent adaptation model over models that assume all lineages evolve independently of others on a tree (i.e.Divergent vs Brownian motion and the Ornstein-Uhlenbeck process) (Table S2).Our results show that each species evolves to have a unique chemical profile compared to close relatives.Unlike a species chemical profile, we found that traits related to the amount of chemical investment (number of compounds, gravimetric chemical investment, and phytochemical diversity; Fig. S1) were best explained by an Ornstein-Uhlenbeck process model, indicating that these traits are evolving towards an optimal trait value (Table S2) rather than diverging.

Many compounds showed no phylogenetic signal and were evolutionary labile.
The majority of compounds are detected in only a few species (median = 4), and roughly half (53%) of compounds showed no phylogenetic signal (Fig. 4A).Although some compounds are clustered in specific clades, many compounds are found dispersed across the phylogeny (Fig. 4B).We found that the majority of compounds (58%; lability >= 1.0) were labile having evolved as many or more times than they were present (Fig. 4C).

Evidence for phylogenetic signal at larger chemical scales
The chemical profiles of Inga species are dominated by two classes of compounds that can be broadly categorized as phenolics and saponins.Phenolic chemistry arises from the flavonoid pathway (Fig. S5 contains a summary of Inga phenolics).Inga phenolic chemistry is based on flavone and mono/polymeric flavan backbones that are extensively modified.Inga saponins are glycosylated triterpenoids that have their origin in the mevalonic acid pathway and as such are biosynthetically distinct from phenolic compounds.We mapped investment in each of these classes onto the phylogeny (Fig. 5) and then tested for phylogenetic signal of each subclass of these compounds.We found that quinic acid gallates (K= 0.68, p = 0.02), tyrosine and related depsides (K= 0.73, p=0.03) as well as saponin glycosides (K= 1.02, p=0.007), showed significant phylogenetic signal.In contrast, all flavonoid subclasses showed no phylogenetic signal (Fig. 5).
We used phylogenetic structural equation modeling (SEM) to determine if chemical classes were correlated with each other (Fig. S4).We applied this approach because it controls for the phylogenetic non-independence of species as well as the biosynthetic non-independence of predictor variables.Our SEM model revealed several trade-offs between compound classes suggesting that there may be switch points between major branches of the biosynthetic pathway: 1) saponin glycosides were negatively correlated with the left and right branch of the flavonoid pathway, 2) quinic acid gallates were negatively correlated with the right side of the flavonoid pathway and 3) the right branch of the flavonoid pathway was negatively correlated with the left branch (Fig. S4).

Discussion:
In this manuscript we set out to thoroughly characterize the profile of plant secondary metabolites produced in nearly 100 species of Inga from across their geographic range.We combine untargeted metabolomics and phylogenetic comparative methods to answer questions about how chemical profiles evolve.Our analysis uncovered nearly 10,000 unique metabolites produced across the genus.Based on compound annotations, most of these compounds were flavonoids and saponin glycosides (Fig. 1), both prominent secondary metabolite classes in plants.These profiles largely exclude primary metabolites because they are generally observed in much lower concentrations than secondary metabolites and therefore are not readily detected in our UPLC-MS pipeline.Moreover, when these chemical extracts were incorporated at only 0.5-2% DW into artificial diets, they were highly detrimental to larval growth and survival, suggesting that they are toxic and contain defensive compounds (reviewed in (Coley et al. 2018)).
Although many of the compounds observed in this study may play a role in defense, determining function of compounds is very challenging in metabolomics studies.To that end, in this study we characterize the chemical profile as a whole, which contains a diversity of compounds likely selected for a variety of functions.

Diversity and Divergence:
Based on our analytical models, we found that each Inga species produces compounds that are more phytochemically diverse than would be expected by chance.This result underscores the strong selective pressure to generate and maintain chemical diversity that plants and other sessile organisms face from both harsh abiotic conditions and from a multitude of herbivores, pathogens, and mutualists (Weng 2014; Salazar et al. 2018;Wetzel and Whitehead 2020).Our results rely on a null model framework and the use of Functional Hill numbers which are a unifying and flexible approach to diversity measures (Chao et al. 2014).They consider functional relatedness (cosine based structural similarity between compounds) as well as compound abundance.We chose to exclude abundance measures in our measure (Q=0) which results in a cosine weighted structural similarity score.
We found strong evidence that a species' chemical profile evolved rapidly with little phylogenetic signal in chemical similarity (Fig. 3, Fig. S3).These results confirm previous findings that defense strategy has little phylogenetic signal in Inga and other plant lineages (Becerra 2007;Kursar et al. 2009;Endara et al. 2017;Salazar et al. 2018;Volf et al. 2018).We also found evidence for population-level divergence across sites in a species chemical profile (Fig. 3A).This occurred despite the fact that there is essentially no limitation on the dispersal of Inga species across the Amazon, such that the metacommunity for any site is the entire Amazon basin (Dexter et al. 2017;Endara et al. 2021).Instead, site differences in abiotic and biotic conditions may drive intraspecific population-level differences in chemical profiles, including variation in soil types and precipitation patterns or the potentially complete turnover of herbivore communities (our unpublished data).The fact that we observed divergent chemical profiles between close relatives in parapatry (Fig 3), is unsurprising given many differences across sites in abiotic and biotic selection pressures (Thompson 2005).However, the fact that sister species in sympatry (where all individuals are exposed to a similar community of pests and abiotic conditions) displayed much higher niche divergence (Fig. 3), is consistent with natural selection to not share pests and pathogens (Bagchi et al. 2014;Forrister et al. 2019).These results also highlight the importance of chemistry as an important niche axis facilitating species' coexistence (Chesson 2000;Endara et al. 2021).
Our modeling framework selected divergent adaptation as the best model to explain how interspecific differences in chemical profiles are evolving (Table S2).This divergent adaptation model shows that ecological interactions among coexisting species shape the evolutionary trajectory of a trait.A pattern of divergent adaptation also requires a divergent selective force, such as one imposed by specialists pests and pathogens (Ehrlich and Raven 1964).In contrast, if a species' chemical profile was evolving in response to an abiotic stressor, such as solar radiation, we would expect chemistry to converge among coexisting species.We posit that defenses, including a species' chemical profile, are one of the first traits to diverge during or after the speciation process, especially compared with non-defensive traits such as those used for resource acquisition (Endara et al. 2015).
Consistent with our findings that Inga species invest in phytochemical diversity (Fig. 2), many species of Inga produce compounds from multiple biosynthetically distinct classes (Fig. S4).The ability for some species to produce compounds from up to five different classes coupled with the fact that one class did not completely exclude the production of other classes indicate that physiological constraints may not impose large biosynthetic trade-offs among compound classes.For example, saponin production was negatively correlated with investment in flavan-3ols, yet there were nine species that invested in both of these pathways simultaneously.The lack of strong physiological constraints likely facilitates the evolution of novel chemical profiles and divergence between closely related species.

What is the mode of chemical evolution in Inga?
Increasingly, evidence is supporting the adaptive value of chemical diversity both within and among plant species (Richards et al. 2015;Salazar et al. 2018;Wetzel and Whitehead 2020;Whitehead et al. 2021).But how are novel structures generated and what is the mode of chemical evolution?In the 'escape and radiate' model for defense evolution, novel structures evolve through the gradual embellishment of core structures into more complex and derived compounds (Berenbaum andFeeny 1981, Berenbaum 1983;Coley et al. 2019).However, the results presented in this study do not support a model of chemical evolution underpinned by stepwise gradual embellishments.Instead, we found that each Inga maximizes phytochemical diversity and produces structurally unrelated compounds (Fig. 2); chemical similarity decreases rapidly over short phylogenetic distances (Fig. 3); and chemical profiles are evolving under divergent adaptation (Table S2).This high divergence between closely related species is supported by the fact that most compounds are highly labile (Fig. 4), and many compound classes show low phylogenetic signal (Fig. S2).Taken together, these patterns point towards regulation of gene expression as the more likely mechanism facilitating the rapid evolution of species' chemical profiles and for generating unique combinations of compounds that are divergent from neighbors within a community and from close relatives.

Regulatory changes facilitate divergence:
We propose that changes in gene regulation is a parsimonious explanation for the pattern of phylogenetically dispersed expression of individual compounds.Although compounds spread throughout the phylogeny could have evolved independently by convergent evolution, the scale of how frequently they are apparently gained and lost is more consistent with the up-and downregulation of key enzymes via transcriptional regulation (Moore et al. 2014;Courtois et al. 2016).
The role of regulation also applies at the compound class level where we find low phylogenetic signal and moderate trade-offs across biosynthetic pathways (Fig. 5, Fig S4).
Consistent with our findings that Inga species invest in phytochemical diversity (Fig. 2) many species of Inga produce compounds from multiple biosynthetically distinct classes (Fig. S4).The ability for some species to produce compounds from up to five different compound classes coupled with the fact that one class did not completely exclude the production of other classes indicates that these trade-offs may not be driven by hard physiological constraints.For example, saponin production was negatively correlated with investment in flavan-3-ols, yet there were nine species that invested in both pathways simultaneously.The lack of strong physiological constraints likely facilitates the evolution of novel chemical profiles and divergence between closely related species.
Changes in gene expression would allow an evolutionary fluidity not possible via changes to genes coding for biosynthetic enzymes (structural genes).Regulatory changes of existing biosynthetic genes permit distantly related species to express the same compound and closely related species to express divergent compounds (Courtois et al. 2016).For example, one sister species could make saponins and its close relative could make phenolics, presenting very different detoxification challenges for pests and pathogens.Thus, the evolutionary fluidity of defensive chemistry may be a major factor allowing long-lived trees to effectively persist in the arms race with insect herbivores and plant pathogens.
Regulation as a model for chemical evolution would imply that species maintain a complete set of biosynthetic enzymes within their genome that are up-or down-regulated in different species and that "unused" genes would have to remain functional over evolutionary timescales.Preliminary results from two Inga genomes indicate that the core biosynthetic genes involved in flavonoid and saponin biosynthesis are in fact present in all species even when they do not produce these compound classes (pers. comm. C.A. Kidner, 2021).The maintenance of these supposedly unused enzymes may be required by deep homology and pleiotropy for core biosynthetic enzymes (Moore et al. 2014;Moghe and Last 2015).We offer several possibilities for how viable genes are maintained.First, many compounds, including pathway intermediates, do not accumulate to physiologically significant levels.However, because they are essential for the synthesis of downstream compounds, the enzymes responsible for them must be transcribed and maintained.This is the case for the phenylpropanoid compounds that link the shikimic acid pathway with the flavonoid pathway (Fig. S5).Second, it is possible that many compounds that are absent in leaves could be present in other tissues (van Dam et al. 2009;Schneider et al. 2021).

"Lego-chemistry" as a mechanism for novel structures:
While regulatory changes may explain novel combinations of metabolites, regulation alone cannot generate novel structures.The classic 'escape and radiate' model proposes gradual embellishments to a compound's core structure.Instead, in Inga, we more commonly see the addition of larger structures, such as phenolic acids and carbohydrates, which are precursors and intermediates in secondary metabolism pathways (Fig. 5, Fig S4).The addition of these side groups in a combinatorial manner referred to as "Lego-chemistry," has been shown to generate an impressively diverse array of larger structures from a small group of building blocks (Menzella et al. 2005;Sherman 2005).
Lego-chemistry could be particularly important for the generation of novel structures in the phenolic biosynthetic pathway, which produces the most diverse class of compounds in Inga (Fig. S5).Inga produces several subclasses of flavonoids that are further modified by the addition of divergent combinations of R-groups to key linkage sites on the basic scaffold molecule (flavonoid aglycones).For example, (epi)catechin (Fig. S5, comp 27), one of the most common compounds in Inga, is modified into at least four divergent structures (illustrated in Fig. S6), which upon polymerization lead to the generation of at least a dozen unique polymers (Fig. S5, comp 34).
The idea that combinatory Lego-chemistry may generate structural diversity in plants is in line with the growing body of literature on the underlying genetic and biochemical mechanisms for the evolution of plant secondary metabolism (Schwab 2003;Gershenzon et al. 2012;Kreis and Munkert 2019;Monson et al. 2022).There is a wide consensus that secondary metabolites originate from a small group of precursor compounds derived from primary metabolism with gene duplication and subsequent neofunctionalization driving novel metabolites (Moore et al. 2014;Weng 2014).Finally, because there are many more secondary metabolites than enzymes that produce them, it has been argued that a core set of enzymes with low substrate specificity is capable of producing a broad set of chemical structures (Schwab 2003;Gershenzon et al. 2012).
This concept has proven to be important for generating novel structures via Lego-chemistry (Schwab 2003;Gershenzon et al. 2012;Kreis and Munkert 2019).
Taken together, we hypothesize that the mode of chemical evolution for Inga is the combination of Lego-chemistry to generate novel structures along with changes in regulation of gene expression to generate unique chemical profiles in each species.We put forth this model of chemical evolution to integrate the patterns we observed in our study of Inga metabolomes, with their underlying genetic, biochemical and regulatory mechanisms.Future studies using multiomic approaches (Monson et al. 2022) that integrate, genomics, transcriptomics and metabolomics are needed to further test and refine this working model.

Conclusions
In this paper, we integrate untargeted metabolomics and phylogenetic comparative methods to characterize the chemical profile of nearly 100 species of tropical trees from the genus Inga.We set out to address the fundamental questions of how phytochemical diversity evolves and what is the mode of chemical evolution.We show that each species maximizes phytochemical diversity by investing in structurally unrelated compounds.We also show that chemistry evolves rapidly, under a model of divergent adaptation.We find that sympatric sister species are more divergent than parapatric sister species implying an advantageous to be distinct from other species in a community.Finally, we integrate these patterns into a hypothesized model of chemical evolution in which novel structures are generated through "Lego-Chemistry" and divergent profiles arise through transcriptional regulation.Understanding the evolution of plant chemistry is of fundamental importance because chemistry underpins a plant's ability to survive stressful abiotic conditions, as well as their ecological interaction such as interactions with pests, pathogens, and pollinators.S3.

Figures and Tables
Marvin was used for drawing, displaying and characterizing chemical structures, substructures and reactions, Marvin 20.20.0,ChemAxon (https://www.chemaxon.com)

Figure 1 :Figure 2 :Figure 3 :Figure 4 .
Figure 1: Compound based molecular network: (A) Subset of molecular network (see Fig. S2. for the full network) containing all compounds observed across 98 study species.Nodes represent individual compounds identified in the metabolomics pipeline, and connections between compounds (edges) are based on the MS/MS cosine similarity score from GNPS (https://gnps.ucsd.edu).(B) Percent of compounds that were annotated using different methodsin silico fragmentation, machine learning, MS/MS library exact matches and adducts, and comparison to authentic standards on our UPLC-MS system based on mass-charge ratio (m/z) and retention time (RT).(C) Percent of compounds with annotations represented by each compound class.For B and C, total number of compounds are reported at top of bars.

Fig. S1 A
Fig. S1 A) Defense investment traits mapped on to the Inga phylogeny .Number of unique compounds per species, percent of leaf dry weight invested in secondary metabolism per species, and the phytochemical diversity (measured as functional Hill numbers, q = 2) of each species profile are represented by points.Horizontal bars indicate one standard deviation.Dotted red lines represent mean trait values across all species and the blue line represents the mean value for

Fig. S3
Fig. S3 Correlation between chemical similarity and phylogenetic distance (My) for all interspecific comparisons.The solid red line represents the mean chemical similarity score observed in the null model which simulates the expected chemical similarity between two randomly assembled chemical profiles.The dashed red lines represent 2 standard deviations above and below the null mean.

Fig. S6
Fig. S6 Illustration of Lego-chemistry concept based on annotation of monomeric and polymeric

Table S1
Site and Sampling information for all 98 study species

Table S2
Maximum-likelihood estimates for different evolutionary models of trait evolution

Table S1
Site and Sampling information for all 98 study species