| Home | E-Submission | Sitemap | Editorial Office |  
top_img
Korean J. Pl. Taxon > Volume 56(1); 2026 > Article
HU, WANG, and JIN: Characterization and phylogenetic analysis of the chloroplast genome of Camellia albosericea (Theaceae)

Abstract

To compare the chloroplast genome differences between Camellia albosericea and other congeneric species, and clarify the evolutionary position of C. albosericea within the genus Camellia, we sequenced and characterized its chloroplast genome using next-generation sequencing technology. This study aims to provide a scientific basis for species identification, genetic diversity analysis, and resource conservation of Camellia. Bioinformatics tools were integrated to perform sequence assembly, genome annotation, and characteristics analysis (including genome structure, codon bias, repeat sequences, simple sequence repeats [SSRs]) as well as analyses of functional region boundaries and phylogeny. The chloroplast genome of C. albosericea is 156,944 bp in length, exhibiting a typical quadripartite structure. A total of 134 genes were annotated, including 88 protein-coding genes, 37 tRNA genes, 8 rRNA genes, and 1 pseudogene. Codon usage analysis revealed a strong bias toward A/U-ending codons. Long repeat sequence analysis identified 43 repeats, and SSR detection yielded 244 SSR loci. Boundary comparison between C. albosericea and C. reticulata showed significant length differences in the ycf1 gene. Phylogenetic trees were constructed using maximum likelihood based on the complete chloroplast genome sequences of 28 Camellia species. The results indicated that C. albosericea clusters closely with C. borealiyunnanica, suggesting a close phylogenetic relationship between them. In conclusion, the basic characteristics of the C. albosericea chloroplast genome and its phylogenetic position reported herein provide critical support for developing molecular markers of sect. Camellia, clarifying interspecific relationships, and resolving the evolutionary process and taxonomic status of C. albosericea.

INTRODUCTION

The genus Camellia is the largest genus in Theaceae and it is mainly distributed in eastern and southeastern Asia, of which over 80% of the species are found in the southwest and south of China (Ming and Zhang, 1996; Chang, 1998; Yang et al., 2013; Do et al., 2019). According to Chang’s classification system, there are 57 species of plants in section Camellia of the genus Camellia worldwide, of which 55 are native to China. Plants in sect. Camellia are predominantly found in China (Chang, 1998). Most plants of sect. Camellia possess ornamental morphological traits, including a well-formed crown, red petals, an extended flowering period, and stable petal color (without discoloration) following rainfall. They are widely used for greening in garden cities and tourist attractions, and are evergreen species for gardening. In addition to their ornamental value, plants of sect. Camellia also have potential utilization values in economy, medicine, ecology, and even energy (Tong et al., 2022; Xiao et al., 2025).
Camellia albosericea Chang was first discovered in the 1980s, named in the early 1990s, and classified into sect. Camellia subsection Reticulatae Chang (Liu et al., 1991). Camellia albosericea is a shrub, its flowering period is in January, and it is mainly distributed in Panzhihua City, Sichuan Province, in the Jinsha River Basin of China (Chang, 1998).
There are considerable differences in classification systems and phylogenetic relationships of plants in the sect. Camellia (Chang, 1998). In China, the main references for botanical classification are the systems of Chang Hongda (Chang, 1998) and Min Tianlu (Ming, 1999), with obvious differences between the two. The reason is that the traditional species classification method based on morphology is easily affected by environmental factors (Huang et al., 2014). In “Flora of China” (published in 2007), Min Tianlu’s classification system was adopted, which recognized 11 sections and 97 species in China; among these, species such as C. albosericea were merged into C. reticulata (Ming and Bartholomew, 2007). Studying the phylogenetic relationships among species can provide a more reliable theoretical basis and reference for solving such classification problems.
Photosynthesis in plant cells occurs in chloroplasts (Pogson et al., 2015), which have their own genetic system. In recent years, chloroplast genomes have been widely used for species identification and phylogenetic analysis, providing new solutions for phylogenetic problems of some taxonomically difficult groups (Zhang et al., 2023; Ran et al., 2024a). At present, information on the chloroplast genome of C. albosericea is scarce, and the reliability of relevant research conclusions on the evolutionary classification of the genus Camellia is limited—this gap hinders the further development and utilization of C. albosericea resources. In this study, by employing the NGS technology, the whole chloroplast genome data of C. albosericea was obtained, the genome was assembled and annotated, and its genomic map was drawn. The structural characteristics of its chloroplast genome, interspersed repeats, simple sequence repeats (SSR), and codon bias were analyzed. The similarities and differences in the chloroplast genomes between C. albosericea and related species were compared and analyzed, and a phylogenetic tree was established, which revealed the genetic and evolutionary relationships among species of the genus Camellia. Our findings are expected to enrich the chloroplast genome data of plants of the genus Camellia, along with providing a theoretical basis and reference for the development and utilization of C. albosericea resources, the phylogenetic relationships among related species, and evolution and classification studies.

MATERIALS AND METHODS

Experimental materials

The samples of Camellia albosericea were obtained from the Jinhua International Camellia Species Garden located in Zhuma Township, Jinhua City, Zhejiang Province, known as the “Hometown of Camellia in China”. This garden currently has the most complete camellia species worldwide. The species in the garden were all collected from their native habitats, and plants were arranged following the classification system proposed by Professor Chang Hongda (Chang, 1998), a famous Chinese expert in the classification of Camellia plants. Fresh and healthy leaves of the samples were placed in sampling bags and brought back to the laboratory. They were rinsed several times with sterile water, dried, and stored at low temperature (−20°C) for later use.

Genomic DNA extraction and sequencing

The genomic DNA was extracted using a plant genomic DNA extraction kit (Tiangen, DP305). The purity of DNA was detected by electrophoresis using 1.0% agarose gel. Qualified genomic DNA was used to construct sequencing libraries. The sequencing libraries that passed the quality inspection were subjected to paired-end sequencing (PE150) on the Illumina NovaSeq 6000 platform (completed by Nanjing Genepioneer Biotechnology Co., Ltd.).

Assembly and annotation of the chloroplast genome

Raw data were filtered using fastp v0.20.0 (https://github.com/OpenGene/fastp) to obtain clean data. The very-sensitive-local mode of bowtie2 v2.2.4 (http://bowtiebio.sourceforge.net/bowtie2/index.shtml) was used to align against the company’s self-built chloroplast genome database to reduce the complexity of subsequent sequence assembly, and the sequencing reads that were aligned were regarded as the chloroplast genome sequencing reads (cpDNA sequences) of the project samples. SPAdes v3.10.1 (http://cab.spbu.ru/software/spades/) was used to assemble the chloroplast genome with the core module of the assembly. Kmer values of 55, 87, and 121 were used, respectively, and the assembly did not rely on the reference genome. After the assembly, the reference sequence of Camellia mairei (NC_035688.1) was used for quality control.
Two methods were used to annotate the chloroplast genome to improve the annotation accuracy. Coding DNA sequence (CDS) of the chloroplast was annotated using prodigal v2.6.3 (https://www.github.com/hyattpd/Prodigal). rRNA was predicted using hmmer v3.1b2 (http://www.hmmer.org/) and tRNA was predicted using aragorn v1.2.38 (http://www.ansikte.se/ARAGORN/). Gene sequences of related species published on National Center for Biotechnology Information (NCBI) were extracted, and assembled sequences were compared using BLAST v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to obtain the results of the second annotation. The results of two annotations were manually inspected for genes showing differences, incorrect annotations, or redundant annotations, which were removed, and multi-exon boundaries were determined to obtain the final annotation. The chloroplast genome map was drawn using OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html). Annotated sequences were submitted online through NCBI. After review, the sequence was deposited in NCBI with the accession number OR876391.1.

Codon bias

A self-written Perl script was used to screen the relative synonymous codon usage (RSCU) of unique CDS (select one from multiple copy CDS) in the chloroplast genome of Camellia albosericea. The calculation method was as follows: (the number of one of the codons encoding a certain amino acid/the total number of all codons encoding the amino acid)/(1/the number of codon types encoding the amino acid), that is, the actual usage frequency of the codon / the theoretical usage frequency of the codon.

Analysis of repetitive sequences

Interspersed repetitive sequences are a type of repetitive sequence that differs from tandem repetitive sequences and show dispersed distribution in the genome. The vmatch v2.3.0 (http://www.vmatch.de/) software combined with a Perl script was used to identify repetitive sequences. The parameter settings were minimum length = 30 bp, hamming distance=3, and the identification forms were four: forward, palindromic, reverse, and complement.

SSR locus analysis

SSR markers are tandem repeat sequences comprising several nucleotides (generally 1–6) as repeat units, which can be up to dozens of nucleotides long. SSR markers on the chloroplast genome are called cpSSR markers. The MISA v1.0 (MIcroSAtellite identification tool, https://webblast.ipkgatersleben.de/misa/) software was used for cpSSR analysis, with parameters 1–8 (single-base repeated 8 times or more), 2–5, 3–3, 4–3, 5–3, and 6–3.

Boundary analysis

The chloroplast genome has a circular structure, and there are four boundaries between the inverted repeat (IR) and long single-copy (LSC) and short single-copy (SSC), namely LSCIRb, IRb-SSC, SSC-IRa, and IRa-LSC. During the evolution of the genome, the IR boundaries have expanded and contracted, causing certain genes to expand toward the IR region or the single-copy region. The cloud platform tool, CPJSdraw (http://cloud.genepioneer.com:9929/#/tool/alltool/detail/296), was used to visualize the boundary information and analyze the chloroplast whole genome sequences of Camellia albosericea and six species of Theaceae extracted from NCBI, including Camellia chekiangoleosa (MG431968.1), Camellia japonica (KU951523.1), Camellia jinshajiangica (OQ731945.1), Camellia reticulata (KJ806278.1), and Schima superba (OL449817.1).

Phylogenetic analysis

For phylogenetic tree analysis based on the whole chloroplast genome, we downloaded the chloroplast whole genome sequences of 25 species of the Camellia subgenus of Theaceae, 2 species of the Camellia oleifera subgenus, and 1 outgroup species Schima superba from NCBI, along with the whole genome chloroplast sequence of C. albosericea. Subgenus refers to a taxonomic rank between genus and section, which groups closely related sections within a genus based on shared morphological and genetic characteristics. The sequences were subjected to multiple sequence alignment using the MAFFT v7.427 software (—auto mode) (with the same starting point for the circular sequences), and aligned data were analyzed using the RAxML v8.2.10 (https://cme.h-its.org/exelixis/software.html) software. The GTRGAMMA model was used with the following settings to construct the phylogenetic tree based on the maximum likelihood (ML) principle: rapid bootstrap analysis with 1000 bootstrap replicates.
Because the 29 chloroplast genome sequences did not reach saturation, a phylogenetic tree was reconstructed using the ML method in IQ-TREE v1.6.12 (Chernomor et al., 2016). The optimal model (GTR + I + G) was identified using MrModeltest v2.3 (Nylander 2004), and a Bayesian inference phylogenetic tree was subsequently reconstructed using MrBayes v3.2.7 (Ronquist and Huelsenbeck 2003). The genetic distances of the 13 sect. Tuberculata chloroplast genomes after alignment were calculated by MEGA11 with 1,000 bootstrap replicates.

RESULTS

Genome content and organization

The chloroplast genome of Camellia albosericea has a typical quadripartite organization, with a total length of 156,944 bp, comprising a pair of inverted repeats (IRA and IRB, 26,045 bp), an SSC (18,194 bp), and an LSC (86,660 bp) (Fig. 1). The GC content of the chloroplast genome of C. albosericea is 37.30%; however, the GC contents in its three regions (SSC, LSC, and IR) differs significantly; GC content is the highest in the IR region at 42.98%, followed by the LSC region at 35.29%, and the SSC region has the lowest GC content at 30.59% (Table 1).

Gene composition

The chloroplast genome of C. albosericea contains 134 genes, including 88 protein-coding genes, 37 tRNA genes, 8 rRNA genes, and 1 pseudogene. The 88 protein-coding genes can be categorized into the following four major groups: Category 1 includes 44 genes related to photosynthesis with 5 photosynthesis system I genes, 15 photosynthesis system II genes, 11 NADH oxidoreductase genes, 6 cytochrome b/f complex genes, 6 ATPase genes, and 1 ribosomal large subunit gene. Category 2 has 25 genes related to self-replication, including 4 RNA polymerase subunit genes, 9 ribosomal protein large subunit genes, and 12 ribosomal protein small subunit genes, in addition to rRNA genes (4) and tRNA genes (30). Category 3 has 6 genes encoding different proteins. Category 4 has 7 genes with unknown functions (Table 2). Introns are crucial for the regulation of gene expression. In the chloroplast genome of C. albosericea, 12 protein-coding genes and 6 tRNA genes contained introns, except for clpP, ycf3, and rps12 genes, each containing 2 introns. The remaining 15 genes contained 1 intron, but their sizes varied. Among them, trnK-UUU, wherein matK protein-coding gene was located, possessed the largest intron (2,493 bp) (Table 3).

1. Codon usage bias

A total of 26,674 codons were predicted in the protein-coding sequence of the C. albosericea chloroplast genome, of which the highest number of codons (2,760, 10.34%) encoded leucine (Leu) and the lowest number of codons (293, 1.10%) encoded cysteine (Cys). There are 20 amino acids in the chloroplast genome of C. albosericea, and all have 2–7 synonymous codons, except for tryptophan (Trp), which uses one codon, namely UGG. The RSCU of the amino acids of C. albosericea is shown in Table 4. There were 31 codons with RSCU values greater than 1.00, of which 29 ended in A or U and 2 ended in G. This indicated that the codons in the chloroplast genome of C. albosericea preferred A or U at the end (Table 4).

2. Repeat sequences

Forty-three scattered repeat sequences were predicted in the chloroplast genome of C. albosericea (Table 5). Among them, 21 (48.84%) were forward repeats (F), and 22 (51.16%) were palindromic repeats (P); no reverse repeats (R) or complementary repeats (C) were found. Repeat sequences were distributed in spacer regions (petD, psbN), gene coding regions (psaA, psaB, ycf2), transfer RNAs (trnS-GCU, trnQ-UGA, trnSGGA), and intron regions of the ycf3 gene. Multiple nested sequence repeats were found, especially in ycf2.

3. SSR

Based on the sequence analysis of the chloroplast genome of C. albosericea, its SSR loci were examined, and 244 SSR loci were found (Fig. 2, Table 6), harboring single-nucleotide to tetranucleotide repeats; pentanucleotide repeats and complex nucleotide repeats were not found. The number of single nucleotide repeats was 157, and the main repeat type was A/T only, with fewer repeats of G/C type. The number of dinucleotide repeats was 4, trinucleotide had 71 repeats, and tetranucleotide had 12 repeats. As shown in Fig. 2, single nucleotide has the largest percentage of 64.34%, and dinucleotide has the smallest percentage of 1.64%. Most SSRs were located in the LSC region (143) and showed less distribution in the SSC region (52) and IR region (48). Moreover, most SSRs were located in intergenic region 114 (46.72%), followed by 93 (38.11%) in the coding region and 37 (15.16%) in the intronic region of the genes, confirming that the SSRs were predominantly distributed in the spacer region of the genes (Table 6).

IR expansion and contraction

Although chloroplast genes evolve relatively slowly and show conserved sequence and structure, contraction and expansion of boundaries in the IR region are common phenomena. In this study, we downloaded five complete chloroplast genome sequences of Theaceae species from NCBI to assess the contraction and expansion of the IR region boundary. A schematic diagram is shown in Fig. 3, which is drawn by comparing the sequences with C. albosericea. Except for the rps19 gene of Camellia chekiangoleosa, which is located in the LSC region, the rps19 genes of the other five species are all located at the LSC/IRb boundary. The rps19 gene segment spans the LSC and IRb regions in most species, with 233 bp in LSC and 46 bp in IRb. An exception is Schima superba, where the rps19 segment in LSC is 273 bp and that in IRb is 6 bp. Except for C. chekiangoleosa, the rpl2 genes of the other five species were all located in the IR region; except for the full length of the gene of C. chekiangoleosa at 1,487 bp, those of other species are all 1,495 bp. Except for the rpl2 gene of Schima superba located 65 bp away from the LSC-IRb boundary, those of the other four species are all 106 bp away from the LSC/IRb boundary.
The ndhF gene is protein-coding and is located in the SSC region close to the IRb region. The ndhF genes of C. albosericea, C. jinshajiangica, and C. reticulata span the IRb and SSC regions. The ndhF genes of the remaining species are all located in the SSC region. The ndhF genes of C. chekiangoleosa, C. japonica, and Schima superba are 64 bp, 64 bp, and 0 bp away from the boundary, respectively.
In the IRb/SSC boundary region, the ycf1 genes of the tested species expand toward the IRa region. C. chekiangoleosa and C. japonica have the shortest length of 967 bp. The lengths of the ycf1 genes of C. albosericea, C. jinshajiangica, and C. reticulata are all 1,049 bp. The length of the ycf1 gene of Schima superba is the longest at 1,394 bp. Similarly, in the SSC-IRa boundary region, the ycf1 genes of all tested species expand toward the SSC region. The lengths of the ycf1 genes of C. albosericea, C. chekiangoleosa, C. japonica, C. reticulata, and Schima superba are 5,604, 5,625, 5,625, 5,610, 5,616, and 5,658 bp, respectively.
The trnH genes of six plants are all located in the LSC region and close to the IRa region. Among them, except for the trnH gene of C. chekiangoleosa, 160 bp away from the boundary, the trnH gene of C. japonica is 0 bp away from the boundary, and the trnH gene of Schima superba is 14 bp away from the boundary; for the other three species, the gene is 1 bp away from the LSC-IRa boundary.

Phylogenetic analysis

Cluster analysis results for sequences of C. albosericea and 29 species of Theaceae downloaded from NCBI showed the clustering of C. albosericea and C. borealiyunnanica into one branch (Fig. 4).

DISCUSSION

The GC content of the chloroplast genome of Camellia albosericea is 37.30%, consistent with the total GC content of the chloroplast genome of plants in the Theaceae family, which is approximately 37% (Zheng et al., 2022; Xiao et al., 2025). However, the GC content in different regions (SSC, LSC, IR) showed obvious differences. Among them, the GC content in the IR region was the highest at 42.98%, which is similar to Camellia polyodonta (Tong et al., 2022) and Camellia trichosperma (Zheng et al., 2022) (the GC content in the IR region is 42.98% and 43%, respectively), followed by the LSC region; the GC content in the SSC region was the lowest.
The chloroplast genome of C. albosericea exhibited a typical quadripartite structure, consistent with the chloroplast genome structures of other reported plants. The whole chloroplast genome of C. albosericea is predicted to contain 134 genes (including 88 CDS genes, 37 tRNA genes, 8 rRNA genes, and 1 pseudogene). This is similar to the results for other Camellia species with slightly different gene types and quantities (Xiao et al., 2025).
After comparing the boundary structures of LSC, SSC, and IR regions, the specificity of ycf1, rpl2, and ndhF genes in IR and SSC regions showed differences in the sequence lengths of chloroplast genomes among different species. There was a difference in the length of the ycf1 gene between C. albosericea and C. reticulata. In the IRb-SSC boundary region, the length of the ycf1 gene of C. albosericea was 1,061 bp and expanded by 12 bp to the SSC region, while that of the ycf1 gene of C. reticulata was 1,059 bp, expanding by 10 bp to the SSC region. In the SSC-IRa boundary region, the lengths of the ycf1 genes of C. albosericea and C. reticulata were 5,604 bp and 5,616 bp, respectively. The ycf1 gene can be used as a DNA barcode for plants of sect. Camellia.
Among the 244 SSR loci detected in the chloroplast genome of C. albosericea, 59.43%, 20.90%, and 19.67% of the loci are located in the LSC, SSC, and IR regions, respectively. The proportion of SSR loci in different regions differs from the results for chloroplast genome repeat sequences of other Camellia plants (Yin et al., 2018; Tong et al., 2022; Zheng et al., 2022). This discrepancy may be attributed to interspecific differences in chloroplast genomes and variations in parameter settings for SSR locus retrieval. Relevant primers can be designed according to SSR loci. After verification, effective molecular marker loci are excavated and further used for phylogenetic analysis.
In this study, 31 codons had RSCU greater than 1.00 in C. albosericea. Among them, 29 ended with A or U, and 2 ended with G, consistent with findings in most angiosperms, which all prefer to use codons ending with A/U (Tong et al., 2022). This may be attributed to the major role of natural selection.
C. albosericea and C. borealiyunnanica were clustered into one branch. C. albosericea was not clustered with C. reticulata. According to Chang Hongda’s classification, plants in sect. Camellia are divided into two subsections: subsect. Reticulata Chang and subsect. Lucidissima Chang. As proposed by Chang (1998), subsect. Reticulata Chang comprises two series: ser. Villosae Chang and ser. Reticulatae Chang. Specifically, C. albosericea and C. borealiyunnanica belong to the latter (ser. Reticulatae Chang).
The chloroplast genomes of Camellia show high genetic diversity. Conducting species identification and phylogenetic studies is feasible to resolve discrepancies within the genus Camellia (Yang et al., 2013; Huang et al., 2014). However, classification studies on Camellia plants in terms of traditional morphology (Ran et al., 2024b), anatomy (Pi et al., 2009), cytology, and molecular systematics cannot be ignored (Gong et al., 2022; Pang et al., 2022; Wu et al., 2022; Zhao et al., 2023). Since higher plants have three sets of genomes (nuclear, chloroplast, and mitochondrial) (Liang et al., 2025), relying solely on the chloroplast genome for phylogenetic analysis has inherent limitations. Therefore, future studies should utilize classic morphological classification and combine nuclear genome and cytoplasmic genome data to establish a more comprehensive, systematic, and authoritative classification system for the genus Camellia.

NOTES

ACKNOWLEDGMENTS
This research was jointly funded by the Major Project of Flower Breeding in the New Agricultural Varieties of Zhejiang Province (2021C02071-5), the Public Welfare Technology Application Research Project of Jinhua City (2022-4-010), and the Major Science and Technology Project of Xiaoshan District (2021223). Xiao-cong Hu and Liang Jin conceived the study; Xiao-cong Hu, Ke Wang, and Liang Jin performed most of the experiments; Liang Jin conducted the data analysis; Xiao-cong Hu and Ke Wang assisted in experiments and discussed the results; Liang Jin drafted the manuscript; Xiao-cong Hu and Ke Wang revised the manuscript. All authors provided comments and final approval.
CONFLICTS OF INTEREST
The authors declare that they have no conflict of interest.

Fig. 1
Complete chloroplast genome map of Camellia albosericea. Note: Genes encoded in the forward direction are located on the outer side of the circle, and genes encoded in the reverse direction are located on the inner side of the circle. The inner gray circle represents the GC content.
kjpt-56-1-85f1.jpg
Fig. 2
The chloroplast simple sequence repeat of Camellia albosericea.
kjpt-56-1-85f2.jpg
Fig. 3
Comparison of the quadripartite borders of chloroplast genomes of six species from Theaceae.
kjpt-56-1-85f3.jpg
Fig. 4
Phylogenetic tree of 28 species in Camellia based on complete chloroplast genomes. Maximum likelihood (ML) phylogenetic tree of Camellia species based on complete chloroplast genomes. The tree was constructed using RAxML with the GTRGAMMA model with 1,000 bootstrap repeats. Values above or below each branch represent bootstrap support values (only values >50% are shown), which indicate the reliability of the corresponding branch. C. albosericea is clustered with C. borealiyunnanica, suggesting a close evolutionary relationship between the two species. Schima superba was used as the outgroup.
kjpt-56-1-85f4.jpg
Table 1
Base composition of the chloroplast genome of Camellia albosericea.
Region A (%) T (%) C (%) G (%) GC %) Length (bp)
Chloroplast genome 31.09 31.62 19.01 18.28 37.30 156,944
Small single-copy region, SSC 34.64 34.77 16.11 14.48 30.59 18,194
Large single-copy region, LSC 31.89 32.82 18.13 17.16 35.29 86,660
Inverted repeated region A, IRA 28.49 28.54 22.22 20.75 42.98 26,045
Inverted repeated region B, IRB 28.54 28.49 20.75 22.22 42.98 26,045

SSC, short single-copy; LSC, long single-copy; IR, inverted repeat.

Table 2
Genes in the chloroplast genome of Camellia albosericea.
Gene function classification Group of genes No. of genes Gene name
Photosynthesis Subunits of photosystem I 5 psaA, psaB, psaC, psaI, psaJ
Subunits of photosystem II 15 psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Subunits of NADH dehydrogenase 11 ndhA*, ndhB*(2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Subunits of cytochrome b/f complex 6 petA, petB*, petD*, petG, petL, petN
Subunits of ATP synthase 6 atpA, atpB, atpE, atpF*, atpH, atpI
Large subunit of rubisco 1 rbcL
Self-replication Proteins of large ribosomal subunit 9 rpl14, rpl16*, rpl2*(2), rpl20, rpl22, rpl23(2), rpl32, rpl33, rpl36
Proteins of small ribosomal subunit 12 rps11, rps12**(2), rps14, rps15, rps16*, rps18, rps19, rps2, rps3, rps4, rps7(2), rps8
Subunits of RNA polymerase 4 rpoA, rpoB, rpoC1*, rpoC2
Ribosomal RNAs 4 rrn16(2), rrn23(2), rrn4.5(2), rrn5(2)
Transfer RNAs 30 trnA-UGC*(2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG*, trnG-UCC, trnH-GUG, trnI-CAU(2), trnI-GAU*(2), trnK-UUU*, trnL-CAA(2), trnL-UAA*, trnL-UAG, trnM-CAU, trnN-GUU(2), trnP-UGG, trnQ-UUG, trnR-ACG(2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC(2), trnV-UAC*, trnW-CCA, trnY-GUA, trnfM-CAU
Other genes Maturase 1 matK
Protease 1 clpP**
Envelope membrane protein 1 cemA
Acetyl-CoA carboxylase 1 accD
c-type cytochrome synthesis gene 1 ccsA
Translation initiation factor 1 infA
Unknown function Conserved hypothetical chloroplast ORF 7 #ycf1, lhbA, ycf1, ycf15(2), ycf2(2), ycf3**, ycf4

Gene*, gene with one introns; Gene**, gene with two introns;

# Gene, pseudogene; Gene(2), number of copies of multi-copy genes.

Table 3
Lengths and locations of introns in chloroplast genome of Camellia albosericea.
Gene Location Exon I (bp) Intron I (bp) Exon II (bp) Intron II (bp) Exon III (bp)
trnK-UUU LSC 37 2,487 35 - -
rps16 LSC 39 857 216 - -
trnG LSC 23 690 48 - -
atpF LSC 159 704 408 - -
rpoC1 LSC 435 732 1,626 - -
ycf3 LSC 126 738 228 722 153
trnL-UAA LSC 37 519 50 - -
trnV-UAC LSC 39 586 37 - -
rps12 IRa 114 - 232 538 26
clpP LSC 69 539 291 800 285
petB LSC 6 762 657 - -
petD LSC 9 696 525 - -
rpl16 LSC 9 1,018 402 - -
rpl2 IRb 393 667 435 - -
ndhB IRb 777 679 756 - -
rps12 IRb 232 - 26 538 114
trnI-GAU IRb 42 942 35 - -
trnA-UGC IRb 38 812 35 - -
ndhA SSC 552 1,086 540 - -
trnA-UGC IRa 38 812 35 - -
trnI-GAU IRa 42 942 35 - -
ndhB IRa 777 679 756 - -
rpl2 IRa 393 667 435 - -

LSC, long single-copy; IR, inverted repeat; SSC, short single-copy.

Table 4
Relative synonymous codon usage (RSCU) of amino acids of Camellia albosericea.
Amino acid Codon No. RSCU Amino acid Codon No. RSCU
Ter UAA 42 1.4319 Met AUU 1 0.0105
UAG 22 0.7500 CUG 0 0
UGA 24 0.8181 GUG 1 0.0105
Ala GCA 399 1.1440 UUG 0 0
GCC 224 0.6424 Asn AAC 300 0.4608
GCG 133 0.3812 AAU 1,002 1.5392
GCU 639 1.8324 Pro CCA 330 1.1808
Cys UGC 69 0.4710 CCC 198 0.7084
UGU 224 1.5290 CCG 139 0.4972
Asp GAC 202 0.3652 CCU 451 1.6136
GAU 904 1.6348 Gln CAA 715 1.5294
Glu GAA 1,032 1.5066 CAG 220 0.4706
GAG 338 0.4934 Arg AGA 498 1.8732
Phe UUC 544 0.7130 AGG 164 0.6168
UUU 982 1.2870 CGA 378 1.4220
Gly GGA 737 1.6416 CGC 87 0.3270
GGC 184 0.4096 CGG 114 0.4290
GGG 301 0.6704 CGU 354 1.3314
GGU 574 1.2784 Ser AGC 116 0.3324
His CAC 137 0.4254 AGU 428 1.2276
CAU 507 1.5746 UCA 415 1.1904
Ile AUA 737 0.9555 UCC 330 0.9462
AUC 464 0.6015 UCG 178 0.5106
AUU 1,113 1.4430 UCU 625 1.7928
Lys AAA 1,077 1.4886 Thr ACA 406 1.2200
AAG 370 0.5114 ACC 243 0.7304
Leu CUA 371 0.8064 ACG 137 0.4116
CUC 198 0.4302 ACU 545 1.6380
CUG 179 0.3894 Val GUA 540 1.5136
CUU 578 1.2564 GUC 161 0.4512
UUA 877 1.9068 GUG 195 0.5468
UUG 557 1.2108 GUU 531 1.4884
Met AUA 1 0.0105 Trp UGG 483 1.0000
AUC 0 0 Tyr UAC 195 0.3900
AUG 649 6.9678 UAU 805 1.6100
Table 5
Repetitive sequences of chloroplast genome of Camellia albosericea.
Repeat I start Repeat II start Type Size (bp) Distance Gene Region
1 86,661 130,900 P 26,045 0 - IR
2 93,919 93,937 F 82 −3 ycf2; ycf2 IRb; IRb
3 93,919 149,587 P 82 −3 ycf2; ycf2 IRb; IRa
4 93,937 149,605 P 82 −3 ycf2; ycf2 IRb; IRa
5 149,587 149,605 F 82 −3 ycf2; ycf2 IRa; IRa
6 93,931 93,949 F 70 −2 ycf2; ycf2 IRb; IRb
7 93,931 149,587 P 70 −2 ycf2; ycf2 IRb; IRa
8 93,949 149,605 P 70 −2 ycf2; ycf2 IRb; IRa
9 93,941 93,959 F 60 −1 ycf2; ycf2 IRb; IRb
10 93,941 149,587 P 60 −1 ycf2; ycf2 IRb; IRa
11 93,959 149,605 P 60 −1 ycf2; ycf2 IRb; IRa
12 93,919 93,955 F 60 −3 ycf2; ycf2 IRb; IRb
13 93,919 149,591 P 60 −3 ycf2; ycf2 IRb; IRa
14 93,955 149,627 P 60 −3 ycf2; ycf2 IRb; IRa
15 149,587 149,623 F 60 −3 ycf2; ycf2 IRa; IRa
16 61,207 61,207 P 57 −1 IGS LSC; LSC
17 93,931 93,967 F 52 −2 ycf2; ycf2 IRb; IRb
18 69,754 69,804 F 50 0 IGS LSC; LSC
19 76,774 76,774 P 50 −2 psbN; psbN LSC; LSC
20 79,223 79,223 P 46 0 petD; petD LSC; LSC
21 101,002 122,798 F 42 0 IGS; ndhA IRb; SSC
22 122,798 142,562 P 42 0 ndhA; IGS SSC; IRa
23 93,941 93,977 F 42 −1 ycf2; ycf2 IRb; IRb
24 45,560 122,797 F 42 −3 ycf3; ndhA LSC; SSC
25 93,919 93,973 F 42 −3 ycf2; ycf2 IRb; IRb
26 149,587 149,641 F 42 −3 ycf2; ycf2 IRa; IRa
27 45,563 101,004 F 39 −2 ycf3; IGS LSC; IRb
28 45,563 142,563 P 39 −2 ycf3; IGS LSC; IRa
29 40,523 42,747 F 35 −3 psaB; psaA LSC; LSC
30 93,931 93,985 F 34 −2 ycf2; ycf2 IRb; IRb
31 9,038 37,363 F 32 −3 trnS-GCU; trnS-UGA LSC; LSC
32 38,688 38,688 P 31 −3 IGS LSC; LSC
33 9,040 47,304 P 30 0 trnS-GCU; trnS-GGA LSC; LSC
34 14,178 14,178 P 30 −2 IGS LSC; LSC
35 37,365 47,304 P 30 −3 trnS-UGA; trnS-GGA LSC; LSC
36 45,575 101,016 F 30 −3 ycf3; IGS LSC; IRb
37 45,575 142,560 P 30 −3 ycf3; IGS LSC; IRa
38 83,095 101,354 F 30 −3 IGS LSC; IRb
39 83,095 142,222 P 30 −3 IGS LSC; IRa
40 91,483 91,525 F 30 −3 ycf2; ycf2 IRb; IRb
41 91,483 152,051 P 30 −3 ycf2; ycf2 IRb; IRa
42 91,525 152,093 P 30 −3 ycf2; ycf2 IRb; IRa
43 152,051 152,093 F 30 −3 ycf2; ycf2 IRa; IRa

IGS, intergenic spacer; P, palindrome repeat; IR, inverted repeat; F, forward repeat; LSC, long single-copy; SSC, short single-copy.

Table 6
The distribution of cpSSR of Camellia albosericea.
Region Exon Intron Intergenic All Proportion (%)
LSC 36 21 88 145 59.43
SSC 31 6 14 51 20.90
IR 26 10 12 48 19.67
Total 93 37 114 244 100.00
Proportion (%) 38.11 15.16 46.72 100.00

cpSSR, chloroplast simple sequence repeat; LSC, long single-copy; SSC, short single-copy; IR, inverted repeat.

LITERATURE CITED

Chang, H. 1998. Flora Reipublicae Popularis Sinicae. 49(Part 3): (in Chinese) Retrieved Aug. 1, 2015, available from: https://www.iplant.cn/info/Camellia?t=z.

Chernomor, O., Von Haeseler, A. and Minh, B. Q. 2016. Terrace aware data structure for phylogenomic inference from supermatrices. Systematic Biology 65: 997-1008.
crossref pmid pmc
Do, D. N., Luong, D. V. Nguyen, S. T. Le, C. D. Hoang, H. T. Han, J. E. and Park, H.-S. 2019. A new yellow Camellia (Theaceae) from central Vietnam. Korean Journal of Plant Taxonomy 49: 90-95.
crossref pdf
Gong, W., Xiao, S. Wang, L. Liao, Z. Chang, Y. Mo, W. Hu, G. Li, W. Zhao, G. Zhu, H. Hu, X. Ji, K. Xiang, X. Song, Q. Yuan, D. Jin, S. and Zhang, L. 2022. Chromosome-level genome of Camellia lanceoleosa provides a valuable resource for understanding genome evolution and self-incompatibility. The Plant Journal 110: 881-898.
crossref pmid pdf
Huang, H., Shi, C. Liu, Y. Mao, S.-Y. and Gao, L.-Z. 2014. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: Genome structure and phylogenetic relationships. BMC Evolutionary Biology 14: 151.
crossref pmid pmc pdf
Liang, H., Qi, H. Wang, C. Wang, Y. Liu, M. Chen, J. Sun, X. Xia, T. Feng, S. Chen, C. and Zheng, D. 2025. Analysis of the complete mitogenomes of three high economic value tea plants (Tea-oil Camellia) provide insights into evolution and phylogeny relationship. Frontiers in Plant Science 16: 1549185.
crossref pmid pmc
Liu, H., Zhang, Y. Xiang, G. and Zhang, H. 1991. New species of red Camellia from Sichuan. Acta Scientiarum Naturalium Universitatis Sunyatseni 30: 77-80 (in Chinese).

Ming, T. 1999. Systematic synopsis of the genus Camellia. Acta Botanica Yunnanica 21: 149-159 (in Chinese with English abstract).

Ming, T. and Zhang, W. 1996. The evolution and distribution of genus Camellia. Plant Diversity (Acta Bot.Yunn.) 18: 110-114 (in Chinese).

Ming, T. L. and Bartholomew, B. 2007. Theaceae. Flora of China. Vol. 12. Hippocastanaceae through Theaceae. Science Press, Beijing and Missouri Botanical Garden Press, St. Louis, MO. Pp. 366-478.

Nylander, J. A. A. 2004. MrModeltest v2. Program distributed by the author. Uppsala University: Evolutionary Biology Centre,

Pang, Z., Wang, Y.-L. Mantri, N. Wang, Y. Hua, X.-J. Quan, Y.-P. Zhou, X. Jiang, Z.-D. Qi, Z.-C. and Lu, H.-F. 2022. Molecular phylogenetic relationships and taxonomy position of 161 Camellia species in China. Taiwania 67: 560-570.

Pi, E., Peng, Q. Lu, H. Shen, J. Du, Y. Huang, F. and Hu, H. 2009. Leaf morphology and anatomy of Camellia section Camellia (Theaceae). Botanical Journal of the Linnean Society 159: 456-476.
crossref
Pogson, B. J., Ganguly, D. and Albrecht-Borth, V. 2015. Insights into chloroplast biogenesis and development. Biochimica et Biophysica Acta (BBA) - Bioenergetics 1847: 1017-1024.
crossref pmid
Ran, Z., Li, Z. Xiao, X. An, M. and Yan, C. 2024a. Complete chloroplast genomes of 13 species of sect. Tuberculata Chang (Camellia L.): Genomic features, comparative analysis, and phylogenetic relationships. BMC Genomics 25: 108.
crossref pmid pmc pdf
Ran, Z., Li, Z. Xiao, X. and Tang, M. 2024b. . Camellia neriifolia and Camellia ilicifolia (Theaceae) as separate species: Evidence from morphology, anatomy, palynology, molecular systematics. Botanical Studies 65: 23.
crossref pmid pmc pdf
Ronquist, F. and Huelsenbeck, J. P. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572-14 1574.
crossref pmid pmc pdf
Tong, Y. H., Zheng, Q. Du, X. M. Feng, S. L. Zhou, L. Ding, C. B. and Chen, T. 2022. Analysis on sequence characteristics of chloroplast genome of Camellia polyodonta. Journal of Plant Resources and Environment 31: 27-36 (in Chinese with English abstract).

Wu, Q., Tong, W. Zhao, H. Ge, R. Li, R. Huang, J. Li, F. Wang, Y. Mallano, A. I. Deng, W. Wang, W. Wan, X. Zhang, Z. and Xia, E. 2022. Comparative transcriptomic analysis unveils the deep phylogeny and secondary metabolite evolution of 116 Camellia plants. The Plant Journal 111: 406-421.
crossref pmid pmc pdf
Xiao, X., Chen, J. Ran, Z. Huang, L. and Li, Z. 2025. Comparative analysis of complete chloroplast genomes and phylogenetic relationships of 21 SectCamellia (Camellia L.) plants. Genes 16: 49.
crossref pmid pmc
Yang, J.-B., Yang, S.-X. Li, H.-T. Yang, J. and Li, D.-Z. 2013. Comparative chloroplast genomes of Camellia species. PLoS ONE 8: e73053.
crossref pmid pmc
Yin, X., Wen, Q. Wang, J. Li, T. Ye, J. and Xu, L. 2018. Characterization of microsatellites in complete chloroplast genome of the genus Camellia and marker development. Molecular Plant Breeding 16: 6761-6769 (in Chinese with English abstract).

Zhang, X., Liu, R. and Liu, B. 2023. Complete chloroplast genome sequence of Camellia caudata (Theaceae). Journal of Shanxi University (Natural Science Edition) 47: 464-467.

Zhao, D.-W., Hodkinson, T. R. and Parnell, J. A. N. 2023. Phylogenetics of global Camellia (Theaceae) based on three nuclear regions and its implications for systematics and evolutionary history. Journal of Systematics and Evolution 61: 356-368.
crossref pdf
Zheng, Q., Tong, Y. Kong, Q. Feng, S. Zhou, L. Ding, C. and Chen, T. 2022. Characterization of complete chloroplast genome and phylogenetic analysis of Camellia trichosperma Chang. Journal of Sichuan Agricultural University 40: 574-582 (in Chinese with English Abstract).

Editorial Office
Korean Journal of Plant Taxonomy
Department of Biology, Daejeon University, Daejeon 34520, Korea
TEL: +82-42-280-2434   E-mail: kjpt1968@gmail.com
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Copyright © Korean Society of Plant Taxonomists.                 Developed in M2PI