The complete chloroplast genome sequence of Cinnamomum balansae and its phylogenetic implications
Article information
Abstract
Cinnamomum balansae Lecomte, a member of the Lauraceae family, is an endemic genetic resource of Vietnam. This species is classified under Group IIA - forest plants not yet endangered but at risk if not strictly managed. Cinnamomum balansae also holds significant economic and medicinal value. In this study, we sequenced and annotated the complete chloroplast (cp) genome of C. balansae for the first time. The cp genome is 152,747 base pairs (bp) in length, with a GC content of 39.2%. It exhibits a typical quadripartite structure, comprising a large single-copy region of 93,701 bp, a small single-copy region of 18,898 bp, and two inverted repeat regions of 20,074 bp each. The genome encodes 129 genes, including 84 protein-coding genes, 37 transfer RNA genes, and eight ribosomal RNA genes. Phylogenetic analysis based on protein-coding sequences revealed a close evolutionary relationship between C. balansae and C. longipaniculatum. These findings provide essential genetic insights that enhance our understanding of the phylogenetic position of C. balansae within the Lauraceae and support future conservation and evolutionary studies.
INTRODUCTION
Cinnamomum balansae Lecomte (1913) is a prominent evergreen tree in the Lauraceae, naturally distributed in the tropical forests of central and northern Vietnam (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007; Hai et al., 2016). It grows up to 30 meters in height, with a diameter at breast height ranging from 85 to 90 cm. This species thrives in secondary forests at elevations above 200 meters, typically on ancient alluvial and granite formations with gentle slopes and low relief. The bark of C. balansae contains essential oils and camphor. Cinnamomum balansae has traditionally been used for extracting oil and fat from its fruits for soap production, while its wood is utilized in furniture manufacturing (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007). However, overexploitation by local communities and forestry enterprises has significantly threatened its survival (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007). As a result, C. balansae is currently classified as an endangered species in the Vietnam Red Data Book (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007).
The chloroplast (cp) genome is maternally inherited in most plant species and contains both highly conserved and variable regions (Huynh et al., 2024). Chloroplast genome sequencing provides crucial insights into genetic diversity, population structure, and evolutionary relationships (Thi Huynh et al., 2024). By analyzing the cp genome, researchers can assess population-level genetic variation and identify conservation priorities (Birky et al., 1983). Overexploitation and habitat destruction may have caused genetic bottlenecks in C. balansae populations, leading to reduced genetic diversity (Jiang et al., 2022). Additionally, cp genome data are instrumental in resolving phylogenetic relationships within Cinnamomum and the broader Lauraceae. Accurate taxonomic classification is essential for conservation planning, preventing misidentification, and ensuring effective resource allocation (Xue et al., 2024).
Despite its ecological and economic significance, no complete cp genome data is currently available for C. balansae. In this study, we sequenced and annotated the full cp genome of C. balansae. The results provide a valuable reference for the conservation of this endangered species and contribute to a deeper understanding of its phylogenetic placement within Lauraceae.
MATERIALS AND METHODS
Plant sampling
Leaves of C. balansae used in this study were collected from a two-year-old planted tree at Yenthe Forestry Two Member Company Limited, Bac Giang Province, Vietnam (21°29′17″N, 106°4′3″E, 1,100m). The sample was collected and identified by Son Le (contact e-mail: leson@vafs.gov.vn), and a specimen was deposited at the Institute of Forest Tree Improvement and Biotechnology under the voucher number VHBG.2024 (managed by Son Le).
Sequencing, assembly, and annotation of cp genome
Fresh leaves of C. balansae were ground in liquid nitrogen, and total genomic DNA was extracted using the DNeasy Plant Tissue Extraction Kit (Cat. 69104, Qiagen). A DNA library was subsequently prepared and sequenced on the BIGSEQ-50 platform (KTEST Co., Ltd., Ho Chi Minh City, Vietnam). After filtering out low-quality reads using Trimmomatic, we obtained a total of 73,952,406 reads (Bolger et al., 2014). De novo assembly of the C. balansae cp genome was performed using NOVOPlasty version 4.3.5 (Dierckxsens et al., 2017), and the assembled genome was annotated with the GESEQ online tool (Tillich et al., 2017). All annotated transfer RNA (tRNA) genes and protein-coding genes (PCGs) were curated using tRNA-scan-SE (Chan et al., 2021) and BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), respectively. The genome map was visualized using the OGDraw tool (Greiner et al., 2019). Finally, the complete annotated cp genome of C. balansae was deposited in the GenBank database under accession number PQ576718. The BioProject, BioSample, and SRA identifiers are PRJNA1181598, SAMN44564949, and SRR31249618, respectively.
Phylogenetic analysis
For the phylogenetic analysis, we downloaded 23 cp genomes from the GenBank database and designated Lindera floribunda (C. K. Allen) H. P. Tsui as an outgroup. All information on sequences is listed in Table 1. The 79 shared PCGs from these genomes were extracted and aligned individually using MAFFT version 7.490 (Katoh and Standley, 2013). Poor alignment sequences were discarded using TrimAl version 2.0 (Capella-Gutierrez et al., 2009). We concatenated the sequences of each gene to create a dataset with 68,922 columns. jModelTest version 3.2.2 was applied to determine the best-fit evolutionary model (Posada, 2008). The maximum-likelihood (ML) phylogenetic tree was reconstructed using the IQ-TREE webserver with 1,000,000 ultrafast bootstrap replications (BS) under the TVM + I + G model (Nguyen et al., 2015). Finally, the resulting ML tree was visualized using the iTOL web tool (Letunic and Bork, 2021).
RESULTS
A total of 717,302 paired-end reads were used to assemble the cp genome of Cinnamomum balansae. The sequencing depth across the genome ranged from 592× to 2,009×, with an average depth of 1,408.42×. The cp genome of C. balansae was a circular DNA molecule spanning 152,747 bp, including a large single-copy (LSC) region of 93,701 bp, a small single-copy (SSC) region of 18,898 bp, and a pair of inverted repeat (IR) regions of 20,074 bp each. The genome had a GC content of 39.2%, with the LSC, SSC, and IR regions having GC contents of 38.0%, 33.9%, and 44.4%, respectively.
This genome encoded 129 genes, including 84 PCGs, 37 tRNA genes, and eight ribosomal RNA (rRNA) genes (Fig. 1). Sixteen of these genes were duplicated within the IR regions, including four rRNA genes (4.5S rRNA, 5S rRNA, 16S rRNA, and 23S rRNA), seven tRNA genes (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), and six PCGs (ndhB, rpl2, rpl23, rps7, rps12, and ycf2). Among the 129 annotated genes, ten genes contained introns, including eight single-intron genes (rps16, atpF, rpoC1, petB, petD, rpl2, ndhB, and ndhA) and two genes with two introns (pafI and clpP). Additionally, rps12 was identified as a trans-splicing gene with three exons, extending from the LSC (exon I) into the IR regions (exons II and III).
The resulting phylogenetic tree showed strong bootstrap support for the backbone topology, with values ranging from 59 to 100 (Fig. 2). At the species relationship level, C. balansae was most closely related to C. longipaniculatum, with strong BS (BS = 100).
DISCUSSION
Cinnamomum is a diverse genus within the Lauraceae family, comprising several species with significant economic and ecological value (Zhang et al., 2020; Yang et al., 2022). The cp genome of C. balansae is structurally, orientationally, and compositionally similar to those of other Cinnamomum species. Specifically, the C. balansae cp genome is 152,747 bp in length, aligning closely with the typical genome sizes observed in Cinnamomum, which range from approximately 152.7 to 152.6 kb (Table 1) (Chen et al., 2019; Xie et al., 2019; Zhao et al., 2019; Yuan et al., 2020). Additionally, the GC content of C. balansae is 39.2%, comparable to other Cinnamomum species (39.0–39.2%).
The phylogenetic analysis based on 79 PCGs supports a close relationship between C. balansae and C. longipaniculatum, clustering them into a well-supported clade. This result aligns with previous studies that used cp genome markers (Lv et al., 2021; Zheng et al., 2022). However, our study provides new insights by resolving deeper relationships within the genus that were previously unclear in studies using mitochondrial or nuclear DNA markers alone. This emphasizes the power of cp genome data in resolving complex phylogenetic relationships.
Our results show that C. balansae and C. longipaniculatum form a distinct clade, which contrasts with earlier studies suggesting that C. balansae might be more closely related to species such as C. camphora and C. mollifolium (Yang et al., 2023; Xu et al., 2025). This discrepancy may be attributed to differences in the genomic regions used in previous studies or the inclusion of fewer species. In particular, our use of a comprehensive set of PCGs from the cp genome offers a more robust dataset for resolving phylogenetic relationships at the species level.
Further comparative analysis with other Cinnamomum species, such as C. camphora and C. verum, suggests that the evolutionary divergence within the genus is more complex than previously understood (Huang et al., 2016; Yang et al., 2022). The close relationship observed between C. balansae and C. longipaniculatum is likely due to their shared evolutionary history in Southeast Asia’s tropical forests (Yang et al., 2022). However, while our results align with the general framework of Cinnamomum phylogeny, they also highlight the need for further studies incorporating additional species and genomic data to refine these classifications.
These findings underscore the potential of cp genome sequencing to resolve fine-scale phylogenetic relationships within plant genera. By comparing our results with existing intra-genus classifications, we provide a clearer understanding of the evolutionary trajectories within Cinnamomum, with important implications for conservation and taxonomy within the Lauraceae family. Moreover, this study contributes essential genetic information that enhances our understanding of Cinnamomum’s phylogenetic placement and supports future evolutionary studies within the angiosperms (Nam et al., 2023; Nguyen et al., 2024).
Notes
ACKNOWLEDGMENTS
We want to thank Mr. Hoang Van Chuc from Yenthe Forestry Two Members Company Limited, who allowed us to collect the material. The authors also would like to thank Dr Le Van Quang from Silviculture Research Institute (VAFS) for helping with morphological confirmation.
CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest.
