The genome of Corydalis reveals the evolution of benzylisoquinoline alkaloid biosynthesis in Ranunculales

Species belonging to the order Ranunculales have attracted much attention because of their phylogenetic position as a sister group to all other eudicot lineages and their ability to produce unique yet diverse benzylisoquinoline alkaloids (BIAs). The Papaveraceae family in Ranunculales is often used as a model system for studying BIA biosynthesis. Here, we report the chromosome-level genome assembly of Corydalis tomentella, a species of Fumarioideae—one of the two subfamilies of Papaveraceae. Based on the comparisons of sequenced Ranunculalean species, we present clear evidence of a shared whole-genome duplication (WGD) event that has occurred before the divergence of Ranunculales but after its divergence from other eudicot lineages. The C. tomentella genome enabled us to integrate isotopic labelling and comparative genomics to reconstruct the BIA biosynthetic pathway for both sanguinarine biosynthesis shared by papaveraceous species and the cavidine biosynthesis specific to Corydalis. Also, our comparative analysis revealed that gene duplications, especially tandem gene duplications, underlie the diversification of BIA biosynthetic pathways in Ranunculales. In particular, tandemly duplicated berberine bridge enzyme-like genes appear to be involved in cavidine biosynthesis. In conclusion, our study of the C. tomentella genome provides important insights into the occurrence of WGDs during the early evolution of eudicots as well as into the evolution of BIA biosynthesis in Ranunculales.


INTRODUCTION
Within eudicots, the order Ranunculales forms a sister group to all other extant eudicot lineages (Group, 2016), holding an important phylogenetic position for understanding the evolution and diversification of eudicots.In addition, Ranunculales is an order of significant pharmaceutical importance because of its unique biosynthesis of benzylisoquinoline alkaloid (BIA) compounds.To date, more than 2,500 BIAs have been identified in species of the Ranunculaceae, Papaveraceae, Berberidaceae, and Menispermaceae families (Liscombe et al., 2005;Hao, 2018).Many well-known BIAs have medicinal value, such as the antitumor activity of noscapine from Papaver somniferum; analgesic activity of morphine and codeine from P. somniferum; and antibacterial activity of sanguinarine, berberine, and palmatine from most Papaveraceae species (Shamma, 2012).
To study BIA biosynthesis, opium poppy (P.somniferum) has long served as a model system.The complete biosynthetic pathways of some well-known BIAs, including morphine, noscapine, and sanguinarine, have been decoded using the opium poppy system (Hagel and Facchini, 2010;Winzer et al., 2012;Liu et al., 2017).In addition, the P. somniferum genome has revealed the critical roles of gene duplications and gene clusters in the evolution of BIA genes and alkaloid production (Guo et al., 2018;Li et al., 2020a).However, the focus on opium poppy can only offer a limited view of the evolution of BIA biosynthetic pathways in Papaveraceae.In fact, the poppy family Papaveraceae comprises two subfamilies, namely Papaveroideae and Fumarioideae, which diverged from each other around 87.4 million years ago (Valtuena et al., 2012).Although noscapine, morphine, and codeine production is unique to certain species of the Papaver genus of Papaveroideae, species in both Papaveroideae and Fumarioideae can biosynthesize sanguinarine (Li et al., 2020b), and its pathway is still unclear.In addition, Fumarioideae species have evolved certain independent BIA biosynthetic pathways, which differ from the ones known in Papaveroideae (Xu et al., 2021), offering a system to understand the diversification of BIA biosynthetic pathways along with the divergence of lineages.Furthermore, the sequenced Ranunculalean genomes of P. somniferum (Guo et al., 2018), Eschscholzia californica (Hori et al., 2018), and Macleaya cordata (Liu et al., 2017) are all from the subfamily Papaveroideae.Hence, genome sequencing of species from the subfamily Fumarioideae of Papaveraceae is imperative to study the evolution of BIA biosynthesis in Ranunculales and Papaveraceae.
The insights gained by analyzing the diversification of BIA biosynthesis in Papaveraceae may not only shed light on the evolution of plant secondary metabolic pathways but also assist enzymatic and metabolic engineering for industrial BIA biosynthesis.Specifically, although the microbial production of several BIAs has been successfully implemented (Minami et al., 2008;Galanie et al., 2015;Li et al., 2018;Courdavault et al., 2020), the current yield cannot meet the requirements of industrial production; therefore, potential BIA biosynthetic genes in the species of Papaveraceae, specifically Fumarioideae, may be important to assist enzymatic and metabolic engineering for industrial BIA biosynthesis.
Here, we report the genome of Corydalis tomentella, which is the first genome of the subfamily Fumarioideae of the poppy family.The subfamily Fumarioideae of Papaveraceae comprises over 570 species in 20 genera (Perez-Gutierrez et al., 2012).Corydalis is the largest genus within Fumarioideae, with 465 species native to the Northern Hemisphere and South Africa.Approximately 80% Corydalis species are distributed in China, mainly in the Hengduan Mountains and the Qinghai-Tibet Plateau.Species of the genus occur at various altitudes and in diverse habitats, such as forest margins, wet meadows, wastelands, rock crevices, and even dry and rocky limestone cliffs.Corydalis has undergone reticulate evolution and intensive differentiation (Jiang et al., 2018;Ren et al., 2019), resulting in remarkable variability in morphology, even within the same species, which hinders species identification.The Corydalis genome will provide critical insights for the use of medicinal resources and study of herbgenomics (Xin et al., 2019;Xu et al., 2020).
Similar to other species in the poppy family, Corydalis produces sanguinarine, in addition to a class of antiinflammatory BIA compounds called cavidines (e.g., cavidine, apocavidine, dehydroapocavidine, and dehydrocavidine) (Bhakuni and Chaturvedi, 1983).As rich sources of these unique BIA compounds, some Corydalis species are widely used in traditional Chinese medicine because of their antibacterial, antiviral, and anticancer activities (Zhang et al., 2016;Liu et al., 2019;Tian et al., 2020).For instance, the tuber of Corydalis yanhusuo and the whole plant of Corydalis bungeana have been noted in the Chinese Pharmacopoeia for their medicinal usage in invigorating blood circulation and analgesic effects (Committee;, 2015).Furthermore, the dry herbs of C. tomentella and C. saxicola are used in traditional Chinese medicine to alleviate fever and hepatitis.
The sequenced C. tomentella genome is crucial for comprehensively investigating the biosynthetic pathways of sanguinarine and other BIAs through comparisons between the two subfamilies of Papaveraceae as well as between the Papaveraceae and other Ranunculalean species.In addition, the C. tomentella genome allows us to examine the origin of cavidine biosynthetic pathway in the Corydalis lineage.Here, using isotope tracking, metabolite profiling, and gene expression analyses, we dissected the biosynthetic pathways of active BIAs in C. tomentella.Furthermore, we identified the gene clusters and tandem duplication eventsinvolved in the divergent evolution of BIA biosynthesis through comparative genomics.

RESULTS
Genome assembly and annotation Genome of the medicinal plant C. tomentella exhibits low heterozygosity (about 0.3%).The genome size is 258.56Mb based on the frequency distribution of 21 k-mer (Figure S1) based on 33.66 Gb Illumina sequencing reads (Table S1-2).We generated 27.64 Gb (~107 × coverage) raw data using the third-generation Sequel sequencing platform (Table S3).The N50 length of filtered subreads (4,077,968) was 9.63 Kb.The error-corrected raw reads were pre-assembled into seed sequence using hierarchical genome assembly, and the seed reads from different hierarchical clusters with an average length of 14.37 Kb were further assembled and polished using FALCON.The polished assembly contained 1,321 contigs with the contig N50 length of 2.36 Mb.After removing heterozygous contigs using Redundans, the final assembly comprised 1,022 contigs; the genome size was 248.9 Mb, the N50 length was 2.52 Mb, covering 96.26% of the estimated nuclear genome size; the longest contig was 9.83Mb; the GC content was 36.79%.Then, a library of Chromosome conformation capture techniques (Hi-C) was constructed, and 39.67 Gb sequencing reads were produced for scaffolding.A total of 953 contigs, covering 248.6 Mb (99.9%) of the assembled genome, were anchored to eight pseudochromosomes (2n=16) (Figure 1A, Figure S2, Table S4-5).Moreover, 36 Gb RNA-Seq data from roots, stems, leaves, and flowers of C. tomentella were aligned to the assembled genome at an average mapping rate of 95.34% (Table S2).De novo assembled transcripts from the RNA-Seq reads were aligned to the C. tomentella genome, and 82.65% reads aligned with at least 90% coverage and 90% identity.In addition, 1,334 (97.67%) of the 1,375 embryophyta singlecopy orthologs from BUSCO were identified 'as complete'in the C. tomentella genome (Simao et al., 2015), suggesting that the genome assembly was of high quality (Table S4).
Approximately 44.24% (110,107,533 bp) of the C. tomentella genome was annotated as transposable elements (TEs) (see Materials and Methods and Table S6), similar to that in the Macleaya cordata genome (43.5%) also from Papaveraceae.Of the TEs, 19.37% were long terminal repeat (LTR) retrotransposons.A total of 81,630 LTR elements were identified, of which 22,370 (6.05%) belonged to the Gypsy superfamily and 13,905 (3.99%) to the Copia superfamily.A total of 136,330 simple sequence repeats (SSRs) were annotated (see Materials and Methods), providing valuable molecular markers for future genetic diversity studies of C. tomentella (Table S7).We confidently predicted 37,808 protein-coding genes by integrating ab initio gene predictions, homologous protein searches, and de novo assembled transcripts from the RNA-Seq reads (see Materials and Methods).Complete orthologs were identified for 94.8% of the embryophyta BUSCOs, indicating that the predicted protein-coding genes were largely complete (Table S4).

Phylogenomic analysis and dating
We used OrthoFinder to delineate orthologous groups of proteomes from angiosperms and obtained 21,299 orthologous groups covering 497,631 genes.Here, we selected 10 different gene sets, with 78 single-copy gene families, and 120,259,350,424,540,693,749,850, and 1,714 low-copy gene families, to infer a high-confidence species phylogeny using Amborella trichopoda as an outgroup (see Materials and Methods).The final phylogenetic relationships of the candidate species using the different gene sets were consistent with the Angiosperm Phylogeny Group IV botanical classification system.The phylogenetic analysis, as expected, revealed that C. tomentella from Fumarioideae is sister to all sequenced Papaveroideae species, including M. cordata, P. somniferum, and E. californica (Figure 1B).Molecular dating using nucleotide sequences of the 78 single-copy genes and four fossil age calibrations indicated that the two subfamilies of Papaveraceae, that is Fumarioideae and Papaveroideae, diverged approximately 96.00 million years ago (MYA), with a 95% confidence interval (CI) of 82.65 to 109.09MYA (see Materials and Methods).The split between Papaveraceae and Ranunculaceae was dated to approximately 114.65 MYA, with a 95% CI of 106.15 to 120.28 MYA.

Whole-genome duplications (WGDs) in the Ranunculales
Intragenomic colinearity analysis revealed remnants of one WGD event in the C. tomentella genome (Figure S3).Intergenomic co-linearity analyses between C. tomentella and Vitis vinifera, Aquilegia coerulea, and Nelumbo nucifera supported the identified WGD event.For instance, two paralogous segments in the C. tomentella genome correspond to three, two, and two orthologous regions in the V. vinifera, A. coerulea, and N. nucifera genomes, respectively (Figure S4).Moreover, the identified WGD event was supported by chromosome printing of the C. tomentella genome with the ancestral eudicot karyotype (AEK) pre-γ chromosomes (Murat et al., 2017) (Figure 2A).In addition, distributions of synonymous substitutions per synonymous site (KS) for all paralogous genes and for paralogous genes in the collinear regions of C. tomentella showed a clear peak at KS ≈ 1.04, indicative of a WGD event (Figure 2B and Figure S5).Similarly, previously sequenced genomes of Ranunculales species, including A. coerulea, M. cordata, and E. californica, showed a signature peak for a WGD event in their genomes, albeit with different KS peak values (Figure S5), while the P. somniferum genome showed two KS peaks, including a recent major peak (KS ≈ 0.1) and a more ancient minor peak (KS ≈ 1.5) (Guo et al., 2018).
The various KS peak values for the WGD signatures of different published Ranunculalean genomes sparked a debate on whether Ranunculalean species share an ancient Ranunculalean specific WGD (Guo et al., 2018), or whether the observed WGD is shared by all eudicots and contributes to the hexaploidization event (referred to as gamma) that all core eudicots share (Aköz and Nordberg, 2019).Analysis of the A. coerulea genome suggested a model of hybridization for the origin of core eudicots, including a pan-eudicot WGD (tetraploidization) followed by a hybridization forming the hexaploid common ancestor of the core eudicots (Aköz and Nordberg, 2019).However, the proposed scenario contradicts the results of the analysis of the Nelumbo genome and other Ranunculalean genomes, in which the identified WGDs were inferred to have occurred after the divergence of the core eudicots and Nelumbo (Ming et al. 2013;Shi and Chen, 2020) or even after the divergence of Ranunculales (Guo et al., 2018).Interestingly, comparing with the peak in the KS age distribution of one-to-one orthologs between C. tomentella and V. vitis revealed a smaller KS value of peak identified in the C. tomentella genome (i.e., younger age) but a larger value of peak identified in the A. coerulea genome (i.e., older age, Figure 2B), seemingly reflecting different WGD scenarios.
To resolve the various KS values for the WGD peaks in different Ranunculalean species, we argue that these differences are due largely to the variability in substitution rates among the Ranunculalean lineages.Indeed, comparison of one-to-one orthologous KS distributions between V. vinifera and sequenced Ranunculalean species (C.tomentella, M. cordata, P. somniferum, E. californica, and A. coerulea) revealed some differences in orthologous KS peaks, suggesting that the studied Ranunculaclean species have evolved at different substitution rates (Figure S6).Indeed, because all orthologous KS peaks reflect the divergence between Ranunculales and core eudicots (represented by Vitis), these lineages should have similar KS values if the substitution rates are similar.Therefore, crude comparisons of paralogous and orthologous KS distributions, regardless of the differences in substation rates, do not necessarily reflect the exact timing of WGD events.
To more carefully date the identified WGD events in the Ranunculalean genomes, considering the various substitution rates observed for different Ranunculales species, we used the 78 single-copy gene families to infer branch lengths in KS units on the species phylogeny (see Materials and Methods).The investigated Ranunculalean species presented different branch lengths following divergence, which further supports the various substitution rates of these lineages (Figure 2C).By mapping the different KS values for WGD peaks identified in the Ranunculalean and Protealean species on the phylogeny (see Materials and Methods), we inferred that the ancient WGD in P. somniferum and the WGDs in C. tomentella, M. cordata, and A. coerulea occurred on the stem branch of Ranunculales around the same time, suggesting a shared WGD event among all Ranunculalean species.Notably, based on our results, the California poppy (E.californica) genome has undergone two WGD events-an ancient event shared by all Ranunculalean species and a recent lineage-specific WGD event-similar to the poppy (P.somniferum) genome (Figure 2C).However, because of the more recent WGDs and the higher substitution rate of the sequenced Ranunculalean species (Figure S6), the ancient Ranunculalean WGD event only has a (very) vague remnant in the paralogous KS distributions of E. californica and P. somniferum.
In addition, both the WGD in Nelumbo and the hexaploidization in Vitis are assumed to have occurred after their divergence from Ranunculales(Figure 2C).Hence, our results support a scenario describing independent paleopolyploidizations after the divergence of Ranunculales, Proteales, and core eudicots.Nevertheless, the branch leading to the divergence between Proteales and the core eudicots is extremely short, with a KS of 0.0034, while the WGD events in Vitis and Ranunculales are close to the early speciation events of eudicots (Figure 2C) Therefore, we cannot completely rule out other scenarios involving more complicated hybridizations (Kellogg, 2016) or lineage-specific rediploidization (Robertson et al., 2017).

BIA accumulation in C. tomentella
BIAs are enriched in different C. tomentella tissues.The contents of different BIAs in roots, stems, leaves, and flowers were spectrophotometrically quantified.Obvious chromatographic peaks were observed at retention times of 25.74, 36.33, and 40.01 min, which corresponded to dehydrocheilanthifoline, coptisine, and dehydrocavidine, respectively (Figure S7 -9).The key intermediate compound, dehydrocheilanthifoline, shows relatively higher accumulation in flowers than in other organs.Specific accumulation of coptisine in flowers was detected, suggesting the involvement of flower-specific oxidoreductases in coptisine biosynthesis.The biomarker compound dehydrocavidine was mainly accumulated in the roots, stems, and flowers, exhibiting the highest content among all tested alkaloids of C. tomentella (Figure S7).
Cavidines, including dehydroapocavidine and dehydrocavidine, are exclusively present in Corydalis species.We employed an integrative strategy to investigate cavidine biosynthesis in C. tomentella.First, to identify metabolites involved in cavidine biosynthesis in C. tomentella, we employed the isotopic tracer method by feeding the 13 C6 (benzene-ring)-labeled tyrosine into the culture medium of C. tomentella.Using liquid chromatography-mass spectrometry (LC-MS), we identified 21 metabolites involved in BIA biosynthesis in C. tomentella .According to a known BIA biosynthetic process (Figure 3), we missed two metabolites, namely 4-hydroxyphenylpyruvate and tetrahydropalmatine.Given the detection of their derivates, we assume that they are present in C. tomentella but were probably missed due to their extremely low accumulation.
Next, to identify the BIA genes involved in cavidine biosynthesis, we used BIA genes in the poppy genome to identify the ones in the upstream of BIA biosynthesis in C. tomentella; most of the BIA biosynthetic genes, except those involved in Corydalis-specific cavidine biosynthesis, have been identified in the P. somniferum genome (see the brown and green flows in Figure 3).We then compared the BIA genes from A. coerulea, C. tomentella, M. cordata, and P. somniferum in Ranuncluales and N. nucifera in Proteales (Table S8), to understand the evolution of BIAs in Ranunculales and aid the identification of cavidine-specific BIA genes in C. tomentella.By integrating chemical compound structural analyses and comparative genomics, we propose a potential biosynthetic pathway of the cavidines in C. tomentella (see the blue flow Figure 3).

Comparative genomics of BIA genes upstream of cavidine biosynthesis
Starting from L-tyrosine, (S)-Norcoclaurine (NOR) is a central intermediate metabolite of BIA biosynthesis, which is present in all analyzed Ranunculalean species as well as in N. nucifera (Figure 4A).NOR is synthesized from two L-tyrosine products, dopamine and 4-hydroxyphenylacetadehyde, by the action of norcoclaurine synthase (NCS); the genes encoding NCS are conserved in all the Ranunculalean species and N. nucifera.Ranunculalean species harbor more NCS genes than N. nucifera (four in N. nucifera versus nine in C. tomentella to 58 in P. somniferum; Table S8).Interestingly, a gene cluster including seven NCS genes, which has originated through tandem duplication, was identified in the C. tomentella genome, and collinearity analysis revealed the corresponding syntenic blocks, including five NCS genes in M. cordata, one NCS gene in A. coerulea, and one NCS gene in N. nucifera, suggesting tandem duplications played a key role in the expansion of NCS genes following the divergence of Ranunculales and Proteales (Figure S12, S13).
Cheilanthifoline (CHE) biosynthesis involves the berberine bridge enzyme (BBE) for the conversion of RET to into (S)-scoulerine, which is further catalyzed by CYP719 members, such as CYP719A14 and CYP719A2, to (S)-cheilanthifoline and (S)-stylopine (Figure 3).Both (S)-cheilanthifoline and (S)-stylopine are the common intermediates of sanguinarine, and cheilanthifoline is present in all studied Ranunculalean species (Figure 4A).Four CYP719 subfamily genes were identified in the C. tomentella genome (Table S8).Furthermore, we detected collinear regions containing CYP719 genes encoding cheilanthifoline synthase (CYP719A14 or CHS) and stylopine synthase (CYP719A2 or STS) among the C. tomentella, M. cordata, and P. somniferum genomes, suggesting that the biosynthesis of the two common intermediates is conserved in Papaveraceae (Figure S17).Interestingly, the CHS and STS genes are adjacent in Papaveraceae genomes, indicating that they were formed through a tandem duplication event.These results, together with phylogenetic findings, indicated that tandem duplication must have occurred before the divergence of Ranunculales.Further, although A. coerulea possesses seven paralogs of CHS genes, it does not harbor the orthologs of STS related to sanguinarine biosynthesis.Similarly, N. nucifera lacks cheilanthifoline and stylopine because of the absence of CYP719A members (Table S8).
Sanguinarine is widely distributed in all the Papaveraceae species (Figrue 4A), and two CYP450 members from the CYP82N subfamily, encoding (S)-N-methylstylopine hydroxylase (MSH) and (S)-protopine-6hydroxylase (P6H), form a critical step catalyzing the production of sanguinarine, which have been reported in both P. somniferum and M. cordata (Liu et al., 2017).A gene cluster including one SDR gene, one BBElike (BBEL) gene, three OMT genes, 12 CYP450 genes, and one 2OGD gene was identified in the C. tomentella genome; among these, the CYP450 cluster contained the MSH and NMCH genes (Figure S12).Genome synteny analysis revealed that the gene cluster in C. tomentella well aligned with the gene cluster in M. cordata (Figure S18), suggesting that the upstream biosynthetic genes of sanguinarine are conserved in Papaveraceae.In addition, MSH and P6H were created by gene duplications before the divergence of Papaveraceae (Figure S17); therefore, N. nucifera and A. coerulea lack these genes (Table S8), consistent with the specific emergence of sanguinarine biosynthesis in Papaveraceae.
Finally, noscapine biosynthesis is unique to Papaver (Figure 4A).Specifically, in P. somniferum, CYP82Y1, CYP82X2, and CYP82X1 catalyze the formation of hydroxy products of (S)-N-methylcanadine.The opium poppy genome harbors 11 CYP82X/Y genes (Table S8), which are sister to the MSH (CYP82N) genes.In the phylogenetic tree, a duplication event before the divergence of Papaveraceae formed two clades of these genes (Figure S19).In the opium poppy genome, PsCYP82N4 (PsMSH), PsCYP82X1, PsCYP82X2, PsCYP82Y1, and PsCYP82Y2 are located in a 569 Kb gene cluster (Figure S19).Collinearity analysis of this CYP82 gene cluster revealed that the two collinearity regions in C. tomentella and M. cordata genomes only contain the orthologous of CYP82N4 and have lost the orthologs of PsCYP82X1, PsCYP82X2, PsCYP82Y1, and PsCYP82Y2 genes related to noscopine biosynthesis.In addition, these syntenic blocks in N. nucifera and A. coerulea genomes are completely lost (Table S8).These results indicate that only one of the duplicates was retained in the Papaver lineage, followed by a series of lineage-specific gene duplications (Figure S19).

Origin of cavidine biosynthesis in Corydalis
BBEL enzymes, a subgroup of the superfamily of FAD-linked oxidases, are conserved in bacteria, fungi, and plants (Daniel et al., 2017).The BBE and BBEL genes catalyze two-and four-electron oxidation steps in the BIA biosynthesis, such as the conversion of the central intermediate RET to (S)-scoulerine in all Ranunculaleans, and the conversion of (S)-tetrahydrocolumbamin to columbamine in Papaver (Figure 3).Therefore, we speculate that in cavidine biosynthesis, the four electron oxidation steps from (S)cheilanthifoline to dehydrocheilanthifoline, from apocavidine to dehydroapocavidine, from cavidine to dehydrocavidine, and from (S)-stylopine to coptisine are catalyzed by BBEL enzymes (Figure 3).Respectively 16, 17, 34, 42, and 43 BBEL genes were identified in the N. nucifera, A. coerulea, C. tomentella, M. cordata, and P. somniferum genomes (Table S8), suggesting the expansion of BBEL genes in Papaveraceae species.
Additionally, we found the largest cluster of 25 Corydalis BBEL genes originating from tandem duplications among all the compared genomes (Figure S12).Collinearity analysis of the orthologous regions of this cluster among the N. nucifera, A. coerulea, M. cordata, and P. somniferum genomes revealed that the N. nucifera and A. coerulea genomes lack BBEL genes, while the M. cordata and P. somniferum genomes harbor only four BBEL genes in collinear regions (Figure 4A).The expression profiles of the BBEL genes in the cluster were also in consistent with the accumulation of coptisine and cavidines in C. tomentella, with the specific expression of CtBBEL22 in the flower and of CtBBEL31 in all the investigated tissues (Figure 4C).In the phylogenetic tree, CtBBEL8 and CtBBEL31 clustered on the Corydalis-specific BBEL branch (Figure 4B), and the expression of these two genes in all tested tissues (FPKM per sample > 5) was relatively higher than that of other genes from the tandem gene cluster of Corydalis after its divergence with P. somniferum.
To further confirm the catalytic activities of BBEL enzymes in cavidine and coptisine biosynthesis, we successfully expressed three BBEL proteins (CtBBEL8, CtBBEL22, and CtBBEL31) from C. tomentella in Sf9 insect cells (see Materials and Methods).In vivo assays using tetrahydrocolumbamin, tetrahydropalmatine, cheilanthifoline, cavidine, and stylopine as substrates revealed that the candidate BBEL genes catalyzed the four-electron oxidation of all the tested BIAs (Figure 4D, Figure S20 -22).In addition, ultra-performance liquid chromatography (UPLC) and LC-MS analysis revealed that CtBBEL22 likely catalyzes the conversion of stylopine to coptisine, while CtBBEL8 and CtBBEL31 likely catalyze the conversion of cheilanthifoline to dehydrocheilanthifoline and of cavidine to dehydrocavidine (Figure 4E, Figure S21 -22).The distinct substrate specificities of these BBEL genes might depend on the methylenedioxy bridge of BIA substrates.As a negative control, we tested whether BBEL genes from other species, but closely related to the Corydalis BBEL genes in the cluster genome, could catalyze the oxidation of cheilanthifoline and cavidine.Briefly, we expressed BwSTOX from Berberis wilsoniae in Sf9 insect cells.Although BwSTOX can transform tetrahydropalmatine into palmatine (Gesell et al., 2011), it did not act on cheilanthifoline and cavidine in our analysis (Figure S21), suggesting that the specific biosynthesis of dehydrocheilanthifoline and dehydrocavidine is closely correlated with the unique tandem duplication of BBEL genes in the Corydalis genus after its divergence from P. somniferum and M. cordata.
Stylopine and coptisine are conserved in most Papaveraceae species, and the CtBBEL22 branch that originated before the divergence of C. tomentella, M. cordata and P. somniferum may be related to coptisine biosynthesis.Together, collinearity and phylogenetic analyses revealed that the 25 BBEL genes in C. tomentella were first formed by tandem duplications before the divergence of Papaveraceae, followed by additional tandem duplications in C. tomentella after its divergence from P. somniferum and M. cordata (Figure 4B).BBEL21, BBEL22, BBEL23, BBEL24, and BBEL25 in C. tomentella are the orthologs of conserved PS0410460 and PS0515260 in P. somniferum as well as of BVC80_1779g11 and BVC80_1779g18 in M. cordata, which are more likely to be involved in coptisine biosynthesis from stylopine.In addition, BBEL8 and BBEL31 have exclusively evolved in C. tomentella through tandem duplication.Hence, we conclude that the dramatic expansion of BBEL genes in the C. tomentella genome may be related to the appearance of cavidines and coptisine in Corydalis, which further obtained the function of catalyzing the four-electron oxidation steps in the biosynthesis of specific BIAs.

DISCUSSION
The four lineages of eudicots, Ranunculales, Proteales (Sabiaceae), Trochodendrales, as well as Buxales, all share deeper common ancestors with the core eudicots than do the species within the core eudicots (Group, 2016).The C. tomentella genome sequence reported herein provides a valuable genomic resource for Papaveraceae in Ranunculales.Many Papaveraceae species, such as P. somniferum, M. cordata, and C. tomentella, produce common or species-specific BIAs, which have important medical and economic value.The genome of C. tomentella, as the first sequenced species of Fumarioideae, offers a crucial reference for studying the early evolution of eudicots and elucidating the diversification of BIA biosynthesis in Ranunculales.In particular, we uncovered once common WGD event that was likely shared by all species of Ranunculaleans after its divergence from other eudicot lineages.Furthermore, the analyzed Ranunculalean species present variable synonymous substitution rates, leading to various KS values of peaks representing the shared WGD event.
Genome mining can effectively promote discovery of natural product biosynthetic pathways and facilitate their characterization.Our analysis of the BIA biosynthetic pathway revealed genes involved in the biosynthesis of common or lineage-specific BIAs in Ranunculales and highlighted the importance of gene duplications, especially tandem gene duplications, and loss in the diversification of plant secondary metabolic pathways.Cavidine biosynthesis in Corydalis is likely shaped by the unique expansion of BBEL genes via tandem gene duplication.Similarly, the sanguinarine and noscapine biosynthetic genes have probably originated through a duplication event that occurred before the divergence of Papaveraceae.Interestingly, only one duplicate was retained and the other was lost in most Papaveraceae lineages, except the genus Papaver.In most Papaveraceae species, the retained duplicate evolved and gave rise to the CYP82N subfamily, which contains MSH and P6H that are crucial for sanguinarine biosynthesis.Meanwhile, in Papaver, the retained duplicate underwent further tandem duplication and gave rise to the CYP82X/Y subfamily, which contains key enzymes involved in the Papaver-specific biosynthesis of noscapine.The lack of 4'OMT, CHS (CYP719A14), STS (CYP719A2), and CYP82N/X/Y members in Proteobacteria mainly contribute to the complete loss of BIA compounds.In contrast, tandem duplications and loss of crucial genes related to BIA biosynthesis are important drivers of the specification and diversity of BIA accumulation in Ranunculales.
Cavidines, including cavidine, apocavidine, dehydroapocavidine, and dehydrocavidine, have mainly been isolated from Corydalis species.Cavidines are compounds with a conserved methyl moiety at the C-14 position and methylenedioxy groups at the C-9 and C-10 positions in the skeleton structure of protoberberine alkaloids.According to the 13 C6 (benzene ring)-labeled BIAs and the known BIA biosynthetic pathway, we assume that cavidines are the downstream products of cheilanthifoline under the catalytic activity of CMT or OMT.In addition, sinactine is the only reported O-methylation product of cheilanthifoline, synthesized through the action of an hitherto uncharacterized enzyme (Beaudoin and Facchini, 2014).In a previous study, sinactine was isolated from the genus Corydalis (Bhakuni and Chaturvedi, 1983); however, we could not identify the fragment ions of sinactine in our 13 C6 (benzene ring)-labeled datasets of LC-MS in Corydalis tomentella, suggesting the trace level of sinactine.Moreover, 14-C-methylated cavidines and dehydrocheilanthifoline were abundant in Corydalis tomentella.Therefore, we believe that C-methylation likely precedes O-methylation during cavidine biosynthesis.The biosynthesis of dehydrocavidine and dehydroapocavidine in Corydalis is likely shaped by the unique expansion of BBEL genes via tandem gene duplication.In our study, the BBEL8/31 could both accept cheilanthifoline and cavidine as substrates, suggesting that both enzymes are promiscuous regarding the C-14 methylation pattern of the substrates.However, the substrate specificities or promiscuity of BBEL enzymes need more evidence, which will provide important insight in the findings and synthesis of diverse BIA compounds with two-and four-electron oxidation.

Sample collection and genomic survey
C. tomentella was identified using DNA barcoding (Ren et al., 2019).For genomic DNA extraction, a C. tomentella plant was collected from the Chongqing City in Nanchuan District (29°N, 107°E), China.Two libraries of 250 and 500 insert DNAs were constructed, and 2 × 125 paired-end sequencing was performed using the Illumina HiSeq 4000 platform.Illumina reads were used to estimate the genome size of C. tomentella based on the distribution of 21 K-mers.For transcriptome sequencing, different tissues of C. tomentella, including roots, stems, leaves, and flowers, were collected.

SMRT sequencing and genome assembly
The high-molecular-weight (HMG) gDNA of C. tomentella was extracted following the methods of megabase-size DNA preparation (Zhang et al., 2012).HMG DNA was used to prepare 20 kb templates with the BluePippin size selection system (PAC20KB/BLF7510).Libraries were constructed, and SMRT sequencing of seven cells was performed on the Sequel platform.Raw data were filtered to remove adapters and low-quality reads through SMRT Portal analysis.
To improve genome assembly, young leaves of C. tomentella were collected for Hi-C library construction and paired-end sequencing.Cross-linked and lysed cells were digested using the HindIII restriction enzyme.Paired-end reads were mapped to the draft assembly, and the misjoins of the chimeric contigs were split and corrected using the 3D de novo assembly pipeline (Dudchenko et al., 2017).The corrected contigs were anchored to pseudo-chromosomes using the ALLHiC pipeline (Zhang et al., 2019).

Genome annotation and gene expression analysis
Structural repeat annotation of the C. tomentella genome was performed using the RepeatModeler (v1.0.9) package (http://www.repeatmasker.org/RepeatModeler/). Two de novo repeat finding programs, namely RECON and RepeatScout, were employed to identify and classify the repeat elements in the C. tomentella genome.Moreover, Repbase (v.21.12) was used, and consensus classification of TEs was performed according to the default parameters.Finally, RepeatMasker (v.4.0.6) was used to calculate and mask TE sequences (http://www.repeatmasker.org/).Transcriptomic data from different tissues were assembled de novo using Trinity (v.2.2.0) (Grabherr et al., 2011).Putative protein-coding genes were ab initio predicted using the MAKER (v 2.31.9) annotation pipeline with the following parameters (Cantarel et al., 2008).A. thaliana was selected as the gene prediction species model in AUGUSTUS.Unigenes and protein sequences predicted based on the RNA-Seq data of C. tomentella and the annotated protein sequences from A. thaliana and M. cordata were subjected to combined EST and proteomic analysis with BLAST and Exonerate alignment.The completeness of genome annotation was estimated using BUSCO (v. 4) (Simao et al., 2015).

Identification of WGD events
KS-based age distributions for paralogous genes of the sister lineages to core eudicots (N.nucifera, A. coerulea, C. tomentella, M. cordata, P. somniferum, and E. californica) and V. vinifera were constructed using the "wgd" pipeline (Zwaenepoel and Van de Peer, 2019).Briefly, the paralogous genes for each species were identified using the Diamond (v.0.9.18.119) (Buchfink et al., 2015) sequence similarity search tool, with an e-value cutoff of 1e−10.The paralogous gene families were clustered using the Markov Cluster Algorithm (MCL) (Enright et al., 2002), with an inflation factor of 2.0.Each paralogous gene family was aligned using MAFFT (v7.453) (Katoh and Standley, 2013), with default parameters.The KS values of all gene pairs within a gene family were estimated using the CODEML program in the PAML package (v4.9)(Yang, 2007).KS estimates were subsequently node-weighted to correct for redundancy, and phylogenetic trees were constructed for all families using FastTree (v2.1.7)(Price et al., 2009).The KS age distributions of paralogous genes in all tested species are shown as gray bars in Figure S5.The intragenomic collinear segments and the corresponding anchor pairs for each genome were identified using i-ADHoRe (v.3.0) (Proost et al., 2012).The KS distribution of paralogous genes located in the collinear segments was estimated using the CODEML program of the PAML package (v.4.9) (Yang, 2007).The KS age distribution of anchor pairs for each genome is shown as black bars in Figure S7.
The KS-based age distributions of orthologs in tested species were constructed using the "wgd" pipeline, as described above.Orthologs between species were selected according to the best hits of Diamond (v.0.9.18.119) (Buchfink et al., 2015).The KS values of all orthologs or anchor pairs from i-ADHoRe (v.3.0) (Proost et al., 2012) were estimated using the CODEML program in the PAML package (v.4.9) (Yang, 2007).To trace the WGD events identified in Ranunculales, anchor-pair KS distributions between C. tomentella and A. coerulea, M. cordata, and V. vinifera were identified.To confirm the different substitution rates in the sister lineages of core eudicots, the KS distributions of orthologs between V. vinifera and N. nucifera, A. coerulea, C. tomentella, M. cordata, P.somniferum, and E. californica were compared one-toone.
The different substitution rates of sister lineages to core eudicots led to variability in the placement of (likely the same) WGD events.To further correctly place the WGD events, single copy genes from the OrthoFinder (v.2.2.7) (Emms and Kelly, 2019) analysis of all tested species were used to calculate the branch lengths of phylogenies in the KS unit using the CODEML program of the PAML package (v.4.9) (Yang, 2007) with the free-ratio model.To map the identified peaks of KS distributions to the phylogeny (in the KS unit), half of the KS peak values were placed from the tip toward the root of the phylogeny, with the assumption that duplicated genes in each genome evolved with similar substitution rates after a WGD event.

Identification of BIAs in different tissues of Corydalis tomentella
Different tissues of Corydalis tomentella, including roots, stems, leves and flowers, collected from the same cultivated plant in the Chongqing City in Nanchuan District, China, were dried at 45°C and ground to a fine powder (20 mesh).To isolate potential metabolites, 100 mg of dried powder, which was weighed accurately and dissolved in 25 mL of mobile phase, was extracted using ultrasound (100 kHz) at 25°C for 30 min.The same solvent was added to compensate for the weight lost during extraction.The extract filtered through a 0.22 μm membrane for HPLC analysis.
The Waters 2695 (Milford, USA) liquid chromatograph equipped with a quaternary pump 19 solvent management system, an online degasser, an autosampler, a DAD detector, and the Xbridge BEH C18 (4.6 mm × 250 mm, 5 μm) column was used.The mobile phase was composed of acetonitrile (A) and water (0.2% phosphoric acid, 20 mmol•L-1 potassium dihydrogen phosphate, and 10 mmol•L-1 triethylamine), using the gradient elution of 15-22% A at 0-10 min, 22% A at 10-40 min, and 22-15% A at 40-50 min.The flow rate of the mobile phase was set to 0.8 mL•min-1.The injection volume for each sample was 10 μL.

Isotopic labelling of BIA biosynthesis and liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS) analysis
C. tomentella plants were collected and cultured in a greenhouse at 25℃ for 7 days.The plants were then cultured in water with or without isotopically labeled [ring-13 C6]-tyrosine (4 mg•mL-1) for a week.The unlabeled and labeled samples with three replicates were dried at 45°C and ground to a fine powder (20 mesh).The dried powder (100 mg) was accurately weighed, dissolved in 2 mL of methanol, and extracted using ultrasound (100 Hz) at room temperature for 60 min.After centrifugation at 4,500 rpm for 10 min, the supernatant was filtered through a 0.22 μm membrane and used for LC-MS analysis.

Heterologous expression of CtBBELs and activity assay
BBEL genes were expressed using the Bac-to-Bac system.BBEL genes (CtBBEL8, CtBBEL22, and CtBBEL31) from C. tomentella and BwSTOX from B. wilsoniae were cloned into the pFastBacDual vector, and the pFastBacDual-BBELs were transformed into the competent E. coli strain DH10Bac.Recombinant bacmid DNA was used for the transfection of Sf9 insect cells at a density of 2 × 10 6 cells•mL -1 .After 24 h at 27°C for infection, the alkaloid standards, including tetrahydrocolumbamin, tetrahydropalmatine, stylopin, cheilanthifoline, and cavidine, were respectively added to the medium for additional 36 h incubation.The cultures were further extracted using 1 volume butanol and dried in vacuum.The dried substances were resuspended in methyl alcohol for HPLC and LC-MS analyses.

FiguresFigure
Figures Figure 1.Genomic features and phylogenetic position of C. tomentella. A. Chromosome level assembly of the C. tomentella genome.(a) transposable elements in 100 kb windows, (b) gene density in 100 kb windows, (c) gene expression level estimated from read counts per million mapped reads in 100 kb windows, (d) GC content in 100 kb windows, and (e) synteny blocks of paralogous sequences of C. tomentella genome.B. A phylogenetic tree showing the topology and divergence times for 19 plant species.The different circles signify the bootstrap support (BS) values from 10 phylogenetic trees based on the different gene sets.Numbers at the branch leading to C. tomentella indicate the expansion and contraction of gene families.Blue bars at the internodes indicate 95% confidence intervals of estimated divergence time.

Figure 2 .
Figure 2. Synteny and KS analysis of the C. tomentella WGD. A. Synteny comparison between the C. tomentella genome and ancestral eudicot karyotype (AEK) pre-γ chromosomes.Two paralogous blocks in C. tomentella chromosomes are shown in shared colors but different filling.B. KS distributions of anchor pairs for the paralogous genes of C. tomentella, M. cordata, A. coerulea, and V. vinifera, and for the one-toone orthologs between C. tomentella and M. cordata, A. coerulea, and V. vinifera, respectively.C. Phylogeny of C. tomentella and other species with branch lengths in KS units.Different blocks represent the inferring of WGD or WGT events based on KS values of paralogs.

Figure 4 .
Figure 4. Tandem gene duplications of CtBBEL involved in the biosynthesis of cavidines and coptisine.A. The distributions of NOR (norcoclaurine), RET (reticuline), CHE (cheilanthifoline), SAN (sanguinarine), DHCH (dehydrocheilanthifoline), CAV (cavidine), DHCA (dehydrocavidine), STY (stylopine), COP (coptisine), MOR (morphine), and NOS (noscapine) in N. nucifera, A. coerulea, M. cordata, P. somniferum, and C. tomentella.The collinearity analysis identified syntenic blocks containing the tandem duplications of BBEL genes in C. tomentella.Two syntenic blocks in N. nucifera and P. somniferum, respectively, originated from their lineage-specific WGD event.The hollow blocks with different colors, which were linked by dotted lines, represent the syntenic blocks of non-BIA genes among candidate species.B. Phylogenetic tree constructed for BBEL genes from the syntenic blocks of M. cordata, P. somniferum, and C. tomentella with the CtBBE gene as outgroup.C. Gene expression profile of CtBBEL genes from the syntenic block and CtBBE gene in different tissues of C. tomentella, including root, flower, stem, and leaf.D. In vivo catalytic assays of CtBBEL8, CtBBEL31, and CtBBEL22 in Sf9 insect cells using cheilanthifoline, cavidine, and stylopine as substrates, respectively.E. Proposed biosynthetic pathway of cavidines and coptisine from cheilanthifoline in C. tomentella.The red full lines represent the identified steps in this study.