The complete chloroplast genome sequence of Cinnamomum balansae and its phylogenetic implications

Article information

Korean J. Pl. Taxon. 2025;55(2):70-75
Publication date (electronic) : 2025 June 30
doi : https://doi.org/10.11110/kjpt.2025.55.2.70
1Institution of Forest Tree Improvement and Biotechnology – Vietnamese Academy of Forest Sciences, Hanoi, 100000, Vietnam
2School of Biological Sciences – University of Tasmania, 7001, TAS, Australia
3Department of Microbiology - Parasitology, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 70000, Vietnam
Corresponding author: Minh Trong QUANG, E-mail: qtminh@ump.edu.vn. Son LE, E-mail: leson@vafs.gov.vn, son.le@utas.edu.au
Received 2025 February 14; Revised 2025 March 24; Accepted 2025 May 28.

Abstract

Cinnamomum balansae Lecomte, a member of the Lauraceae family, is an endemic genetic resource of Vietnam. This species is classified under Group IIA - forest plants not yet endangered but at risk if not strictly managed. Cinnamomum balansae also holds significant economic and medicinal value. In this study, we sequenced and annotated the complete chloroplast (cp) genome of C. balansae for the first time. The cp genome is 152,747 base pairs (bp) in length, with a GC content of 39.2%. It exhibits a typical quadripartite structure, comprising a large single-copy region of 93,701 bp, a small single-copy region of 18,898 bp, and two inverted repeat regions of 20,074 bp each. The genome encodes 129 genes, including 84 protein-coding genes, 37 transfer RNA genes, and eight ribosomal RNA genes. Phylogenetic analysis based on protein-coding sequences revealed a close evolutionary relationship between C. balansae and C. longipaniculatum. These findings provide essential genetic insights that enhance our understanding of the phylogenetic position of C. balansae within the Lauraceae and support future conservation and evolutionary studies.

INTRODUCTION

Cinnamomum balansae Lecomte (1913) is a prominent evergreen tree in the Lauraceae, naturally distributed in the tropical forests of central and northern Vietnam (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007; Hai et al., 2016). It grows up to 30 meters in height, with a diameter at breast height ranging from 85 to 90 cm. This species thrives in secondary forests at elevations above 200 meters, typically on ancient alluvial and granite formations with gentle slopes and low relief. The bark of C. balansae contains essential oils and camphor. Cinnamomum balansae has traditionally been used for extracting oil and fat from its fruits for soap production, while its wood is utilized in furniture manufacturing (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007). However, overexploitation by local communities and forestry enterprises has significantly threatened its survival (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007). As a result, C. balansae is currently classified as an endangered species in the Vietnam Red Data Book (Ministry of Science and Technology & Vietnamese Academy of Science and Technology, 2007).

The chloroplast (cp) genome is maternally inherited in most plant species and contains both highly conserved and variable regions (Huynh et al., 2024). Chloroplast genome sequencing provides crucial insights into genetic diversity, population structure, and evolutionary relationships (Thi Huynh et al., 2024). By analyzing the cp genome, researchers can assess population-level genetic variation and identify conservation priorities (Birky et al., 1983). Overexploitation and habitat destruction may have caused genetic bottlenecks in C. balansae populations, leading to reduced genetic diversity (Jiang et al., 2022). Additionally, cp genome data are instrumental in resolving phylogenetic relationships within Cinnamomum and the broader Lauraceae. Accurate taxonomic classification is essential for conservation planning, preventing misidentification, and ensuring effective resource allocation (Xue et al., 2024).

Despite its ecological and economic significance, no complete cp genome data is currently available for C. balansae. In this study, we sequenced and annotated the full cp genome of C. balansae. The results provide a valuable reference for the conservation of this endangered species and contribute to a deeper understanding of its phylogenetic placement within Lauraceae.

MATERIALS AND METHODS

Plant sampling

Leaves of C. balansae used in this study were collected from a two-year-old planted tree at Yenthe Forestry Two Member Company Limited, Bac Giang Province, Vietnam (21°29′17″N, 106°4′3″E, 1,100m). The sample was collected and identified by Son Le (contact e-mail: leson@vafs.gov.vn), and a specimen was deposited at the Institute of Forest Tree Improvement and Biotechnology under the voucher number VHBG.2024 (managed by Son Le).

Sequencing, assembly, and annotation of cp genome

Fresh leaves of C. balansae were ground in liquid nitrogen, and total genomic DNA was extracted using the DNeasy Plant Tissue Extraction Kit (Cat. 69104, Qiagen). A DNA library was subsequently prepared and sequenced on the BIGSEQ-50 platform (KTEST Co., Ltd., Ho Chi Minh City, Vietnam). After filtering out low-quality reads using Trimmomatic, we obtained a total of 73,952,406 reads (Bolger et al., 2014). De novo assembly of the C. balansae cp genome was performed using NOVOPlasty version 4.3.5 (Dierckxsens et al., 2017), and the assembled genome was annotated with the GESEQ online tool (Tillich et al., 2017). All annotated transfer RNA (tRNA) genes and protein-coding genes (PCGs) were curated using tRNA-scan-SE (Chan et al., 2021) and BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), respectively. The genome map was visualized using the OGDraw tool (Greiner et al., 2019). Finally, the complete annotated cp genome of C. balansae was deposited in the GenBank database under accession number PQ576718. The BioProject, BioSample, and SRA identifiers are PRJNA1181598, SAMN44564949, and SRR31249618, respectively.

Phylogenetic analysis

For the phylogenetic analysis, we downloaded 23 cp genomes from the GenBank database and designated Lindera floribunda (C. K. Allen) H. P. Tsui as an outgroup. All information on sequences is listed in Table 1. The 79 shared PCGs from these genomes were extracted and aligned individually using MAFFT version 7.490 (Katoh and Standley, 2013). Poor alignment sequences were discarded using TrimAl version 2.0 (Capella-Gutierrez et al., 2009). We concatenated the sequences of each gene to create a dataset with 68,922 columns. jModelTest version 3.2.2 was applied to determine the best-fit evolutionary model (Posada, 2008). The maximum-likelihood (ML) phylogenetic tree was reconstructed using the IQ-TREE webserver with 1,000,000 ultrafast bootstrap replications (BS) under the TVM + I + G model (Nguyen et al., 2015). Finally, the resulting ML tree was visualized using the iTOL web tool (Letunic and Bork, 2021).

List of Cinnamomum species used in phylogenetic analysis and their information.

RESULTS

A total of 717,302 paired-end reads were used to assemble the cp genome of Cinnamomum balansae. The sequencing depth across the genome ranged from 592× to 2,009×, with an average depth of 1,408.42×. The cp genome of C. balansae was a circular DNA molecule spanning 152,747 bp, including a large single-copy (LSC) region of 93,701 bp, a small single-copy (SSC) region of 18,898 bp, and a pair of inverted repeat (IR) regions of 20,074 bp each. The genome had a GC content of 39.2%, with the LSC, SSC, and IR regions having GC contents of 38.0%, 33.9%, and 44.4%, respectively.

This genome encoded 129 genes, including 84 PCGs, 37 tRNA genes, and eight ribosomal RNA (rRNA) genes (Fig. 1). Sixteen of these genes were duplicated within the IR regions, including four rRNA genes (4.5S rRNA, 5S rRNA, 16S rRNA, and 23S rRNA), seven tRNA genes (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), and six PCGs (ndhB, rpl2, rpl23, rps7, rps12, and ycf2). Among the 129 annotated genes, ten genes contained introns, including eight single-intron genes (rps16, atpF, rpoC1, petB, petD, rpl2, ndhB, and ndhA) and two genes with two introns (pafI and clpP). Additionally, rps12 was identified as a trans-splicing gene with three exons, extending from the LSC (exon I) into the IR regions (exons II and III).

Fig. 1

The circular map of Cinnamomum balansae chloroplast genome, generated by OGDraw.

The resulting phylogenetic tree showed strong bootstrap support for the backbone topology, with values ranging from 59 to 100 (Fig. 2). At the species relationship level, C. balansae was most closely related to C. longipaniculatum, with strong BS (BS = 100).

Fig. 2

Maximum-likelihood (ML) tree based on the chloroplast gene sequences of Cinnamomum balansae and 23 related species. ML bootstrap support values between 59 and 99 were indicated at the nodes.

DISCUSSION

Cinnamomum is a diverse genus within the Lauraceae family, comprising several species with significant economic and ecological value (Zhang et al., 2020; Yang et al., 2022). The cp genome of C. balansae is structurally, orientationally, and compositionally similar to those of other Cinnamomum species. Specifically, the C. balansae cp genome is 152,747 bp in length, aligning closely with the typical genome sizes observed in Cinnamomum, which range from approximately 152.7 to 152.6 kb (Table 1) (Chen et al., 2019; Xie et al., 2019; Zhao et al., 2019; Yuan et al., 2020). Additionally, the GC content of C. balansae is 39.2%, comparable to other Cinnamomum species (39.0–39.2%).

The phylogenetic analysis based on 79 PCGs supports a close relationship between C. balansae and C. longipaniculatum, clustering them into a well-supported clade. This result aligns with previous studies that used cp genome markers (Lv et al., 2021; Zheng et al., 2022). However, our study provides new insights by resolving deeper relationships within the genus that were previously unclear in studies using mitochondrial or nuclear DNA markers alone. This emphasizes the power of cp genome data in resolving complex phylogenetic relationships.

Our results show that C. balansae and C. longipaniculatum form a distinct clade, which contrasts with earlier studies suggesting that C. balansae might be more closely related to species such as C. camphora and C. mollifolium (Yang et al., 2023; Xu et al., 2025). This discrepancy may be attributed to differences in the genomic regions used in previous studies or the inclusion of fewer species. In particular, our use of a comprehensive set of PCGs from the cp genome offers a more robust dataset for resolving phylogenetic relationships at the species level.

Further comparative analysis with other Cinnamomum species, such as C. camphora and C. verum, suggests that the evolutionary divergence within the genus is more complex than previously understood (Huang et al., 2016; Yang et al., 2022). The close relationship observed between C. balansae and C. longipaniculatum is likely due to their shared evolutionary history in Southeast Asia’s tropical forests (Yang et al., 2022). However, while our results align with the general framework of Cinnamomum phylogeny, they also highlight the need for further studies incorporating additional species and genomic data to refine these classifications.

These findings underscore the potential of cp genome sequencing to resolve fine-scale phylogenetic relationships within plant genera. By comparing our results with existing intra-genus classifications, we provide a clearer understanding of the evolutionary trajectories within Cinnamomum, with important implications for conservation and taxonomy within the Lauraceae family. Moreover, this study contributes essential genetic information that enhances our understanding of Cinnamomum’s phylogenetic placement and supports future evolutionary studies within the angiosperms (Nam et al., 2023; Nguyen et al., 2024).

Notes

ACKNOWLEDGMENTS

We want to thank Mr. Hoang Van Chuc from Yenthe Forestry Two Members Company Limited, who allowed us to collect the material. The authors also would like to thank Dr Le Van Quang from Silviculture Research Institute (VAFS) for helping with morphological confirmation.

CONFLICTS OF INTEREST

The authors declare that there are no conflicts of interest.

References

Bandaranayake P. C. G., Naranpanawa N., Chandrasekara CHWMRB, Samarakoon H., Lokuge S., Jayasundara S., Bandaranayake A. U., Pushpakumara NGDKn, Wijesundara D. S. A.. 2023;Chloroplast genome, nuclear ITS regions, mitogenome regions, and Skmer analysis resolved the genetic relationship among Cinnamomum species in Sri Lanka. PLos ONE 18:e0291763.
Birky C. W. Jr, Maruyama T., Fuerst P.. 1983;An approach to population and evolutionary genetic theory for genes in mitochondria and chloroplasts, and some results. Genetics 103:513–527.
Bolger A. M., Lohse M., Usadel B.. 2014;Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.
Capella-Gutiérrez S., Silla-Martínez J. M., Gabaldón T.. 2009;trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973.
Chan P. P., Lin B. Y., Mak A. J., Lowe T. M.. 2021;tRNA-scan-SE 2.0: Improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49:9077–9096.
Chen H., Liu C., Han L., Song Y., Tang L.. 2019;The plastid genome of an oil plants Cinnamomum chago (Lauraceae). Mitochondrial DNA Part B Resources 4:1733–1734.
Dierckxsens N., Mardulyn P., Smits G.. 2017;NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 45:e18.
Greiner S., Lehwark P., Bock R.. 2019;OrganellarGenome-DRAW (OGDRAW) version 1.3.1: Expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Research 47:W59–W64.
Hai T. N., Nghi D. H., Phuong L. D., Hoang T. V.. 2016;Study on the sylviculture characters of Cinnamomum balansae Lecomte at Ben En National Park. Journal of Forestry Science and Technology 6:176–185.
Huang J.-F., Li L., van der Werff H., Li H.-W., Rohwer J. G., Crayn D. M., Meng H.-H., van der Merwe M., Conran J. G., Li J.. 2016;Origins and evolution of cinnamon and camphor: A phylogenetic and historical biogeographical analysis of the Cinnamomum group (Lauraceae). Molecular Phylogenetics and Evolution 96:33–44.
Huynh T-TT, Quang M. T., Nguyen H. D.. 2024;The complete chloroplast genome of Syzygium zeylanicum (Myrtaceae, Myrtales) and its phylogenetic analysis. Mitochondrial DNA Part B Resources 9:1642–1647.
Jiang Y., Miao Y., Qian J., Zheng Y., Xia C., Yang Q., Liu C., Huang L., Duan B.. 2022;Comparative analysis of complete chloroplast genome sequences of five endangered species and new insights into phylogenetic relationships of Paris. Gene 833:146572.
Katoh K., Standley D. M.. 2013;MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30:772–780.
Letunic I., Bork P.. 2021;Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Research 49:W293–W296.
Lv Q., Liang B., Guo X.. 2021. The complete chloroplast genome sequence of Cinnamomum septentrionale (Lauraceae) from Sichuan Province, China, a medicinal plant and phylogenetic analysis. Mitochondrial DNA Part B Resources 6p. 2770–2771.
Ministry of Science and Technology & Vietnamese Academy of Science and Technology. 2007. Vietnam Red Data Book, Part II. Plants Science and Technology Publishing. Hanoi: p. 289–290.
Nam N. N., Danh N. H., Thiet V. M., Do H. D. K.. 2023;New insights into the evolution of chloroplast genomes in Ochna species (Ochnaceae, Malpighiales). Evolutionary Bioinformatics Online 19:11769343231210756.
Nguyen H. D., Do H. D. K., Vu M. T.. 2024;Comparative genomics revealed new insights into the plastome evolution of Ludwigia (Onagraceae, Myrtales). Science Progress 107:00368504241272741.
Nguyen L.-T., Schmidt H. A., von Haeseler A., Minh B. Q.. 2015;IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268–274.
Posada D.. 2008;jModelTest: Phylogenetic model averaging. Molecular Biology and Evolution 25:1253–1256.
Thi Huynh T. T., Quang M. T., Nguyen H. D.. 2024;Complete chloroplast genome sequence of the medicinal plant Oxyceros horridus (Rubiaceae) and phylogenetic analysis. Mitochondrial DNA Part B Resources 9:1658–1663.
Tillich M., Lehwark P., Pellizzer T., Ulbricht-Jones E. S., Fischer A., Bock R., Greiner S.. 2017;GeSeq: Versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45:W6–W11.
Wu C.-C., Chu F.-H., Ho C.-K., Sung C.-H., Chang S.-H.. 2017;Comparative analysis of the complete chloroplast genomic sequence and chemical components of Cinnamomum micranthum and Cinnamomum kanehirae. Holzforschung 71:189–197.
Wu Y., Wei W., Liu L., Liu G., Guo Q.-Q., Qian Z.-Q.. 2019;The complete chloroplast genomes of the evergreen tree species Cinnamomum camphora and Cinnamomum parthenoxylon (Laurales: Lauraceae). Mitochondrial DNA Part B Resources 4:813–814.
Xie P., Lin S., Lai Q., Lian H., Chen J., Zhang Q., He B.. 2019;The complete plastid genome of Chinese cinnamon, Cinnamomum aromaticum Nees (Lauraceae). Mitochondrial DNA Part B Resources 4:3831–3833.
Xu J., Zhang H., Yang F., Zhu W., Li Q., Cao Z., Song Y., Xin P.. 2025;Phylogeny of Camphora and Cinnamomum (Lauraceae) based on plastome and nuclear ribosomal DNA data. International Journal of Molecular Sciences 26:1370.
Xue H., Xing Y., Bian C., Hou W., Men W., Zheng H., Yang Y., Ying X., Kang T., Xu L.. 2024;Comparative analysis of chloroplast genomes of Pulsatilla species reveals evolutionary and taxonomic status of newly discovered endangered species Pulsatilla saxatilis. BMC Plant Biology 24:293.
Yang Z., Ferguson D. K., Yang Y.. 2023;Plastome phylogeny and taxonomy of Cinnamomum guizhouense (Lauraceae). Forests 14:310.
Yang Z., Liu B., Yang Y., Ferguson D. K.. 2022;Phylogeny and taxonomy of Cinnamomum (Lauraceae). Ecology and Evolution 12:e9378.
Yuan X., Li Y., Wang Y.. 2020;The complete chloroplast genome sequence of Cinnamomum kotoense. Mitochondrial DNA Part B Resources 5:331–332.
Zhang X., Zhou X.-L., Liu Y.-H., Mo J.-Q., Zhang L.-Q., Wang Y.-H., Shen S.-K.. 2020;Investigating the status of Cinnamomum chago (Lauraceae), a plant species with an extremely small population endemic to Yunnan, China. Oryx 54:470–473.
Zhao G., Yang J., Wang X., Song Y., Zhu R.. 2019;The plastid genome of a spice plants Cinnamomum glanduliferum in Tibet (Lauraceae). Mitochondrial DNA Part B Resources 4:3284–3285.
Zheng Y., Chen Y., Wu Y., Liu X., Wang Y.. 2022;The chloroplast genome of aromatic plants Cinnamomum pauciflorum (Lauraceae). Mitochondrial DNA Part B Resources 7:585–586.
Zheng Y., Luo Y., Li Y., Wang Y.. 2020;The complete chloroplast genome sequence of Cinnamomum longipetiolatum. Mitochondrial DNA Part B Resources 5:198–199.
Zhou X.-L., Zhang L.-Q., Yang L., Huang F., Wang Y.-H., Huang X., Deng G., Shen S.-K.. 2019;The complete chloroplast genome of Cinnamomum pittosporoides reveals its phylogenetic relationship in Lauraceae. Mitochondrial DNA Part B Resources 4:3246–3247.

Article information Continued

Fig. 1

The circular map of Cinnamomum balansae chloroplast genome, generated by OGDraw.

Fig. 2

Maximum-likelihood (ML) tree based on the chloroplast gene sequences of Cinnamomum balansae and 23 related species. ML bootstrap support values between 59 and 99 were indicated at the nodes.

Table 1

List of Cinnamomum species used in phylogenetic analysis and their information.

Species GenBank accession no. % GC Length (bp)
C. aromaticum NC_046019 39.2 152,763
C. balansae This study 39.2 152,747
C. bodinieri NC_057605 39.1 152,727
C. camphora NC_035882 39.1 152,570
C. chago MN047449 39.2 152,753
C. glanduliferum NC_057217 39.1 152,726
C. insularimontanum MW801152 39.1 152,750
C. jensenianum MW801023 39.1 152,713
C. kotoense NC_050346 39.2 154,010
C. longipetiolatum NC_050347 39.0 158,603
C. micranthum NC_035802 39.1 152,675
C. migao NC_058709 39.1 152,711
C. mollifolium MW421302 39.2 152,763
C. oliveri KT716496 38.4 132,619
C. parthenoxylon MH050971 39.1 152,760
C. pauciflorum MW421303 39.1 152,766
C. pittosporoides NC_048978 39.2 152,730
C. septentrionale MZ128522 39.1 152,725
C. subavenium MW801140 39.1 152,785
C. tenuipile NC_057069 39.2 152,699
C. verum NC_035236 39.2 152,766
C. wilsonii MW800949 39.2 152,685
C. yabunikkei NC_044864 39.2 152,731
Lindera floribunda NC_045257 39.2 152,551