Comparative analysis of chloroplast genomes of Rhododendron catawbiense, R. maximum and R. mucronulatum (Ericaceae)
Article information
Abstract
This study investigated the chloroplast genome characteristics of and phylogenetic relationships among Rhododendron catawbiense, R. maximum, and R. mucronulatum. Among these species, R. catawbiense and R. maximum belong to the subgenus Hymenanthes, whereas R. mucronulatum is classified under the subgenus Rhododendron, allowing for a comparison across two major subgenera. Complete chloroplast genomes were sequenced using the Illumina platform, de novo assembled, and annotated for subsequent comparative and phylogenetic analyses. The genomes ranged in size from 199,538 to 206,382 bp and exhibited a conserved quadripartite structure, gene content, and GC composition. A comparative analysis of 19 Rhododendron species revealed four distinct inverted repeat/single-copy boundary types, with the three target species sharing the most common structure. A sliding-window analysis identified highly variable intergenic regions, in this case trnI–rpoB and rpoA–ycf1, which may serve as potential phylogenetic markers. Phylogenetic trees based on 77 protein-coding genes yielded strongly supported topologies, corroborating current taxonomic frameworks based on nuclear and morphological data. The findings here offer genomic insights into Rhododendron evolution and provide a basis for further phylogenetic and biogeographic research.
INTRODUCTION
Rhododendrons are mainly distributed in temperate and subtropical regions of the Northern Hemisphere, and are widely cultivated, especially in Asia, North America, and Europe. They are widely cultivated for their ornamental value, vibrant floral diversity, and adaptability to various climates. Species vary in growth form—from small shrubs to large trees—and may be evergreen or deciduous. Their showy, often round inflorescences typically feature clusters of flowers in a wide array of colors, including white, pink, red, and yellow. These striking floral displays, along with their attractive foliage, contribute to their popularity in horticulture and public gardens.
Chamberlain et al. (1996) divided the genus Rhododendron into eight subgenera based on morphological characteristics such as the position of the inflorescence buds, the presence of leaf scales, and whether they are evergreen or deciduous. Since then, molecular phylogenetic studies have been performed to elucidate the relationships among these subgenera (Kron, 1997; Gao et al., 2002; Goetsch et al., 2005), and the most recent study of 3,437 nuclear orthologous genes revealed that they are divided into five subgenera and 11 sections (Xia et al., 2021). A total of 11 taxa are known to be indigenous to the Korean Peninsula (Chang, 2007). Although Rhododendron species in this region exhibit low species richness per unit area, they represent five distinct subgenera and display high phylogenetic diversity, indicating an early divergence within the genus Rhododendron (Shrestha et al., 2018).
In this study, we obtained the complete chloroplast genome sequences of R. mucronulatum, widely distributed across the Korean Peninsula, northeastern China, and the Russian Far East, as well as R. catawbiense and R. maximum, both native to the eastern United States. Rhododendron mucronulatum is placed in subgenus Rhododendron, section Rhododendron, subsection Rhodorastra, whereas R. catawbiense and R. maximum are assigned to subgenus Hymenanthes, section Ponticum, based on morphological characteristics (Chamberlain, 1996). This classification is supported by phylogenetic inferences using nuclear orthologous genes (Goetsch et al., 2005; Khan et al., 2021; Xia et al., 2021). Among the five subgenera proposed by Xia et al. (2021), four are represented in this study. Therorhodion, the only subgenus not included, is a small lineage comprising just two species, positioned as the earliest-diverging clade in Rhododendron, and currently lacks a reported chloroplast genome sequence. In light of these taxonomic placements, the phylogenetic positions of these three species were reexamined using complete chloroplast genome sequences.
MATERIALS AND METHODS
Genomic DNA was extracted from fresh leaves of R. mucronulatum, collected from samples at Konggeomji, Konggeom-myeon, Sangju-si, Gyeongsangbuk-do, South Korea, in April 2019, with the corresponding voucher specimen deposited at KB (Won et al. 16734). In contrast, for R. catawbiense and R. maximum, DNA was obtained from dried leaves sourced from the Genetic Resources Repository of the Florida Museum of Natural History. According to voucher records, both species were collected in June 2016 from the Appalachian Blue Ridge in Macon County, NC, USA (FLAS265334 for R. maximum and FLAS265298 for R. catawbiense).
DNA extraction was performed using the cetyltrimethyl-ammonium bromide method (Doyle and Doyle, 1987), and DNA quality was assessed via 1% agarose gel electrophoresis. DNA libraries were constructed using a TruSeq DNA Nano Kit with a 350-bp insert size, following the manufacturer’s protocol (Illumina Inc., San Diego, CA, USA). Whole-genome sequencing was carried out using the Illumina NovaSeq 6000 platform (DNA Link Inc., Seoul, Korea) for R. catawbiense and R. maximum (2 × 151 bp PE [paired-end]), generating raw yields of approximately 3.22 Gb and 4.38 Gb with 21,348,650 and 29,003,014 raw reads, respectively. The DNA library of R. mucronulatum was sequenced on the Illumina MiSeq platform (Macrogen Ltd., Seoul, Korea) using a 2 × 301 bp PE approach, yielding approximately 7.34 Gb of data with 24,370,894 raw reads.
Quality assessment of the raw sequencing data was performed using FastQC, and high-quality reads were obtained through trimming with Trimmomatic (Bolger et al., 2014). De novo genome assembly was conducted using Velvet v1.2.10 (Zerbino and Birney, 2008) to reconstruct the complete genome. For genome annotation, reference data from previously characterized chloroplast genomes of Ericaceae were utilized. Gene prediction was first performed based on the identified regions, and the resulting annotations were further refined through comparative analysis with chloroplast genes from other Ericaceae species. The assembled genome was annotated using Geneious Prime 2020.0.5 (Biomatters Ltd., Auckland, New Zealand), and tRNA genes were predicted and verified using tRNAscan-SE 2.0 (Chan and Lowe, 2019). The structural features of the chloroplast genome were analyzed to define the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IR) regions. The annotated gene map was visualized using OGDRAW 1.3.1 (Greiner et al., 2019). To calculate nucleotide variability (Pi) values among 19 Rhododendron species, MAFFT alignment was performed with the IRb region excluded. Pi values were then calculated using the aligned sequences in DnaSP v6.12.03, employing a sliding window of 600 bp and a step size of 200 bp (Rozas et al., 2017).
Phylogenetic analysis was conducted using the same dataset of 18 complete chloroplast genome sequences referenced in Kwak and Bussmann (2023), with the addition of three newly sequenced complete chloroplast genomes. Gaultheria longibracteolata R. C. Fang and Vaccinium myrtillus L. were used as outgroup taxa to root the phylogenetic tree. Phylogenetic analysis was conducted using 77 coding sequences from Rhododendron species. Sequence alignments were carried out with Clustal Omega v1.2.2 integrated within the Geneious Prime software, and the resulting alignments were concatenated for further analysis. Phylogenetic reconstruction was performed in PhyloSuite v1.2.3 (Zhang et al., 2020; Xiang et al., 2023). Optimal partitioning schemes and substitution models for the 77 protein-coding genes (PCGs) were selected using ModelFinder under the Bayesian information criterion, as summarized in Table 1 (Kalyaanamoorthy et al., 2017). Maximum likelihood phylogenetic analysis was conducted using IQ-TREE (Nguyen et al., 2015) with 10,000 ultrafast bootstrap replicates (Minh et al., 2013). For Bayesian inference, MrBayes v3.2.7a (Ronquist et al., 2012) was employed. The Markov Chain Monte Carlo algorithm was run for 10,000,000 generations, with trees sampled every 1,000 generations. To ensure convergence and chain stability, the initial 25% of the sampled trees were discarded as burn-in. The remaining trees were utilized to construct a strict consensus tree and to estimate posterior probabilities for each node.
RESULTS AND DISCUSSION
The chloroplast genomes of R. catawbiense, R. maximum, and R. mucronulatum (GenBank accession numbers OM575017, OM575018, and OM575019, respectively) range in length from 199,538 bp to 206,382 bp. Each genome comprises four subregions: a LSC region of 108,483–110,064 bp, a SSC region of 2,609–2,644 bp, and two IR regions of 43,646–46,837 bp. The overall GC content is similar across species, ranging from 35.6% to 35.7%. Detailed genomic features are summarized in Table 2.
The chloroplast genomes of R. catawbiense and R. mucronulatum contain 142 genes, including 92 PCGs, eight ribosomal RNAs (rRNAs), and 42 transfer RNAs (tRNAs). Among these, 24 genes—specifically 15 PCGs, four rRNAs, and nine tRNAs—are duplicated within the IR regions. In R. maximum, an additional trnI gene was found within the IR region compared to the other two species, resulting in a total of 144 predicted genes. Consistent with previous reports on Rhododendron chloroplast genomes (Ma et al., 2021; Wang et al., 2021; Kwak and Bussmann, 2023), the genes clpP, ycf2, and ycf68 were not detected. The chloroplast genome sizes of these species are within the typical range reported for Rhododendron chloroplast genomes. As noted in earlier studies, including R. caucasicum (Kwak and Bussmann, 2023), the relatively large chloroplast genome sizes are likely attributable to IR expansion, a common feature observed in Rhododendron chloroplast genomes.
Comparison of the IR-LSC and IR-SSC boundaries among the chloroplast genomes of 19 Rhododendron species revealed four distinct types. Notably, R. catawbiense, R. maximum, and R. mucronulatum shared the most common boundary structure observed in Rhododendron (Fig. 1). In these species, the trnH gene is located on the left side of the LSC/IRb junction (JLB), while the trnV gene is found on the right side. At the IR/SSC boundary, the rpl32 gene is positioned within the IR region, and the ndhF gene is the only gene located in the SSC region. At the IRa/LSC junction (JLA), trnV is located within the IR region and rps12 within the LSC. In R. calophytum, R. platypodum, and R. anthopogonoides, the orientation of the ndhF gene is reversed compared to the common type. Among them, R. anthopogonoides exhibits a unique boundary structure where both trnV and trnI a re located within the LSC at the LSC/ IR boundary. In R. molle and R. huadingense, the IR region is expanded to include ndhF, resulting in a significantly contracted SSC region containing no genes—26 bp in R. molle and 53 bp in R. huadingense. Except for R. calophytum, R. platypodum, R. anthopogonoides, R. molle, and R. huadingense, the remaining 11 species share the same IR/ SC boundary structure as R. catawbiense, R. maximum, and R. mucronulatum. Despite these structural differences in IR/SC boundaries, no clear correspondence was observed between boundary type and the phylogenetic relationships among the species.
Inverted repeat (IR)/single-copy (SC) boundaries of the chloroplast genomes of Rhododendron species. Dotted lines indicate the junctions between the IR and SC regions. The map illustrates the locations positions and orientations of genes adjacent to flanking each boundary. The light blue boxes represent the chloroplast genomes sequences reported in this study, which exhibit the most common IR/SC boundary type found observed in Rhododendron. LSC, large single-copy.
The nucleotide diversity (Pi) among all 19 Rhododendron species was 0.01265, and sliding window analysis revealed that Pi values ranged from 0 to 0.025861 (Fig. 2). Highly variable regions with Pi values exceeding 0.02 were identified in the intergenic spacers trnI–rpoB, trnT–accD, rpoA–ycf1, and rps16–psaI. The relatively high Pi values observed in the ndh coding region appear to result from the reverse orientation of the ndh gene in R. calophytum, R. platypodum, and R. anthopogonoides, compared to the direction found in the other 16 Rhododendron species analyzed. In R. maximum, an additional trnI identified within the IR region is located in the trnI–rps16 intergenic spacer, which also shows relatively high sequence variation within the IR. Nucleotide diversity analyses further identified variable intergenic regions that may serve as informative markers for future phylogenetic and population-level studies.
Sliding-window analysis of nucleotide diversity (Pi) across aligned chloroplast genomes of Rhododendron species. The analysis was performed using a window size of 600 bp and a step size of 200 bp. The X-axis represents the midpoint position of each window, while the Y-axis shows the nucleotide diversity (Pi) within that window. LSC, large single-copy.
The phylogenetic trees inferred using both maximum likelihood and Bayesian inference methods exhibited identical topologies, with strong support values across all branches (Fig. 3). The recovered subgeneric relationships are broadly congruent with those reported in earlier molecular phylogenetic studies (e.g., Shrestha et al., 2018; Xia et al., 2021). Rhododendron catawbiense and R. maximum were grouped within Subgenus Hymenanthes, Section Ponticum, while R. mucronulatum was placed in Subgenus Rhododendron, Section Rhododendron, consistent with previously established morphological and molecular phylogenetic frameworks (Chamberlain, 1996; Goetsch et al., 2005; Xia et al., 2021).
Phylogenetic tree of Rhododendron species inferred from 77 chloroplast protein-coding genes. The tree was constructed using the maximum likelihood (ML) method. Numbers above the branches indicate ML bootstrap support values and Bayesian posterior probabilities. Gaultheria longibracteolata and Vaccinium myrtillus were used as outgroups. GenBank accession numbers from the National Center for Biotechnology Information (NCBI) are shown in parentheses.
Overall, the complete chloroplast genome analysis of R. catawbiense, R. maximum, and R. mucronulatum revealed conserved genome structures, gene content, and IR/SC boundary types consistent with the typical features of Rhododendron chloroplast genomes. These results provide a comprehensive chloroplast genomic framework for understanding evolutionary relationships within the genus and serve as a valuable reference for resolving taxonomic ambiguities and guiding conservation efforts.
Notes
ACKOWLEDGMENTS
This research was supported by grants from the National Institute of Biological Resources, funded by the Ministry of Environment of the Republic of Korea (Grant No. NIBR201905102). The authors are grateful to the Genetic Resources Repository of the Florida Museum of Natural History at the University of Florida for sharing the valuable genetic materials and to Dr. Jongsun Park and Dr. Woochan Kwon at Infoboss for their assistance on assembly and annotation.
CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest.
