DNA Barcoding of Plants
Generally partitioned into four major groups (angiosperms, bryophytes, ferns, gymnosperms), plants comprise more than 300,000 species globally and over 5,600 species in Canada. Plants account for over 90% of above ground biomass, form the foundation of most terrestrial food webs, and are the basis for our agricultural and forest industries. They are also the engines of terrestrial ecosystems, driving the cycling of carbon, nutrients, and water, and plant species diversity may additionally influence ecosystem functions such as net primary production (Hector et al. 1999), nitrogen cycling (Hooper and Vitousek 1997), and hydrological processes (Jackson et al. 2001). Hence, knowledge of the distribution of plant species diversity is essential for understanding interactions among organisms and developing sound environmental management strategies.
As it stands, identifications based on flower, fruit, leaf, and stem morphology are a straightforward task for just one third of plant species. Many plant groups are inherently difficult to identify (e.g., grasses, sedges, goldenrods, quillworts, bryophytes) because they are most readily distinguishable by reproductive structures that are available for a short period each year. The taxonomic impediment caused by our inability to identify many components and life stages of plants – including roots, pollen, spores, seeds, seedlings, and the gametophyte stages of cryptogams – acts as a further barrier to our understanding of plant species diversity. Currently, the identification of these tissues or life stages is only possible for a few plant groups using highly skilled personnel and time-consuming procedures (e.g., cellular characterization, chemical tests). This strongly suggests that the future of plant identification lies in the development of DNA-based diagnostic systems for which the life stage or source tissue is irrelevant. The development of this diagnostic ability will facilitate many aspects of basic biological research as well as areas of practical importance including forensics, invasive species control, identification by poison centres, agriculture, forestry, and many others. In the following sections, we highlight some of the implications of developing a DNA-based system of this type.
Roots: As the primary sites of nutrient and water uptake, roots are critical controllers of productivity, biogeochemistry, and hydrology. However, the study of subterranean ecological patterns and processes is impeded by a lack of methods to identify them (Linder et al. 2000). This limitation is particularly acute for understanding key ecosystem responses to biodiversity loss and global climate change, given that 70% of the carbon retained within terrestrial ecosystems is in soils. Even basic information about how different species contribute to below-ground net primary production is unavailable because roots cannot be identified reliably in the soil profile. This fact has limited the development of understanding on how diversity in root physiological functions contributes to key ecosystem processes, such as nutrient cycling and evapotranspiration. Yet, this information is critical for managing Canada’s forest resources. For instance, decisions about how to manage the productivity and carbon sequestration capacity of much of the boreal forest is currently based solely on information derived from study of the above-ground components of the ecosystem.
Pollen: Pollen, the male gametophyte (gamete-producing) stage of seed plants, must be dispersed for seed production to occur. As a result, mature pollen is often distant from its parent plant, making it difficult to identify to a species level. However, the ability to identify pollen grains to the species level would have significant benefits in varied contexts. For example, health officials could use such information to manage human exposure to allergens, and plant researchers could more readily assess the kinds of pollen available to pollinators, information that is central to the study of seed and fruit production in crops and wild species. Also, because pollen grains persist in lake sediments and peat bogs long after their deposition, pollen fossils are valuable indicators of past landscapes and climates. Pollen grains do vary in shape, size, and surface texture, thereby enabling their partial identification, but there is rarely sufficient variation to allow a taxonomic placement below the genus or family level. A DNA-based identification system based on single pollen typing (e.g., Matsunaga et al. 1999) would transform the fields of plant reproductive biology and palynology by providing unambiguous and comprehensive inventories of pollen.
Seeds and spores: Although seeds and spores present a significant challenge to identification, there is a need for this capability in many contexts. For example, the ecology of seed and spore banks is fundamental to the control of invasive species in Canadian ecosystems and to the control of weeds in agriculture. Unfortunately, identifications by gross morphology are only possible for about five percent of Canada’s flora. Current protocols for identifying other species in seed or spore banks require their culture on sterile media in a greenhouse or growth chamber. The process takes several months and is subject to many problems such as diverse germination requirements, so that results must be interpreted with great caution. DNA-based identifications of seeds would be quick and accurate, revolutionizing studies on seed banks.
Cryptogams: As their name might suggest, cryptogamic plants (bryophytes, ferns) have attracted little taxonomic interest, despite the outstanding potential of these to serve as bio-indicators. The Canadian flora includes nearly 200 species of ferns and their allies, along with about 1,200 species of mosses and liverworts in roughly 300 genera. Many of these species remain difficult to identify because their diagnosis relies on the inspection of cellular and morphological characters of gametophytes that can only be evaluated by experts.
Source of specimens: We will initially focus our work on the identification of genes enabling discrimination of the dominant plant species at the Joker’s Hill Reserve, a research facility overseen by the University of Toronto that is being developed for environmental genomics. The first phase of our study will focus on surveying sequence diversity in a variety of candidate genes using leaves, stems, or thalli from the dominant plants at this site to identify those providing the best diagnoses for each major group (angiosperms, bryophytes, ferns, gymnosperms). In the second phase, we will sample varied tissues (seeds, roots, spores, etc.) and life stages (gametophyte/sporophyte) to verify the ability of our DNA-based systems to deliver identifications for any plant part or life stage. Having established the feasibility of DNA barcodes for species identification within the restricted geographical locale of southern Ontario, we will subsequently extend sampling to include broader geographical scales.
Taxonomic diversity: Our work will begin with the identification of protocols enabling the discrimination of taxa using DNA identifications from leaf and stem tissues. Sampling will include 300 species belonging to diverse taxa. We will stratify our sample to include the maximum number of divisions, orders, families, and genera. Where possible, we will include several species within selected genera. For taxa at Joker’s Hill with several species in a genus (e.g. Carex, Solidago) we will extend our sampling to include additional species from diverse geographical locations. Voucher specimens will be prepared for all samples and a professional taxonomist will identify all species with expert confirmations sought for particularly difficult taxa. Rarity, exotic status of individual plants, species nomenclature, and coding will follow the Ontario Plant List (Newmaster et al. 1998).
Tissue characterization: The second phase of our work will extend analysis to include other plant tissues and life cycle stages. We will sample a variety of tissues from each plant at different stages of maturity (i.e., seedling, vegetative, flowering). Seeds and spores will be collected from species within the plots, and roots will be collected from plants excavated outside the plots. These tissues will come from positively identified species, which will be collected as vouchers. Biomass soil samples will be collected within destructive sampling plots for roots, seeds, and spores with an unknown identity, providing test samples for DNA identifications. This data set will allow us to characterize the number and spatial distribution of a variety of species using previously unidentifiable plant tissues. All tissues will be stored in the Herbarium at the Biodiversity Institute of Ontario.
Genetic approaches: Mitochondrial genes have not been widely used by plant biologists for systematic purposes because rates of mitochondrial sequence divergence are typically extremely low, around 1-2% of those in animals. In several plant groups, the synonymous substitution rate in plants is vastly higher (e.g., Cho et al. 2005), but these cases are likely limited in number. There are recent reports of very long distance horizontal gene transfer involving parts of the mitochondrial genome, including cytochrome oxidase subunit genes (Bergthorsson et al. 2004) and COI introns (Palmer et al. 2000). The slow mutational tempo and horizontal gene transfer in plants means that mitochondrial barcodes are likely to have limited utility for barcoding plants.
Several research groups (Kew, Smithsonian) are now carrying out exploratory studies looking at short genomic regions for the purposes of plant identification. We will coordinate our barcoding efforts with these research groups, although it bears noting that these initial studies are focussing on genes widely used in plant molecular systematics, such as the nuclear ITS regions and a few chloroplast coding and noncoding regions. The ITS regions (and associated rDNA genes) suffer from practical difficulties associated with the existence of multiple paralogous copies in many plant taxa (Álvarez and Wendel, 2003), which will probably limit their utility as barcoding markers. Also, while it is not as slowly evolving as the plant mitochondrial genome, the chloroplast genome generally has a slow tempo of evolution (e.g., Cho et al. 2005), so barcoding based on this organelle will likely require substantially longer reads than are required for barcoding animals.
We will therefore screen multiple well-characterized chloroplast genes with a variety of evolutionary rates for their utility in barcoding (candidate coding regions include ndhF, matK, and rbcL), in addition to chloroplast noncoding regions with an elevated rate of substitution (relatively rapidly evolving candidate regions include trnD-trnT and rpoB-trnC; Shaw et al. 2005). Protein-coding chloroplast regions have the advantage that very large cross-taxon databases of these genes exist in GenBank, but the noncoding regions may more readily yield sequence variation among closely related species. We anticipate that a minimum of several thousand nucleotides of plastid DNA will be required per taxon, because of the (roughly) order of magnitude difference in evolutionary rates between chloroplast and animal mitochondrial genomes (e.g., Cho et al. 2005).
Relatively long sequence analyses of even the most rapidly evolving chloroplast genomic regions may still not provide sufficient discriminatory power to distinguish among closely related species (Small et al. 2004). We will therefore screen several low-copy nuclear genes for their utility as supplementary barcoding markers. Plant nuclear genes are enormously variable and fluid, with substantial gene duplication across different groups (Kellogg and Bennetzen, 2004), and so the development and use of single copy nuclear genes in plant systematics is still in its infancy (Small et al. 2004). However, an increasing number of candidate low-copy genes have been identified with utility as systematic markers, and we will focus our screening efforts on a subset of these. Candidate regions include the PHY (phytochrome) and RPB (RNA polymerase beta subunit) loci (e.g., Denton et al. 1999; Mathews and Donoghue, 1999; Nickerson and Drouin, 2004). Strategies for screening and assessing the utility of low-copy nuclear genes are outlined in Small et al. (2004). In addition to screening for levels of variation appropriate for distinguishing closely related species, candidate loci must also be assessed for ease of experimental processing, including the ability to distinguish orthologous vs. paralogous loci where multiple copies exist within a species due to ancestral duplications.
For maximum efficiency, we will sequence species in a hierarchical fashion. That is, we will begin by comparing sequences among a small number of species taken from taxonomically divergent groups in the sample. If sufficient divergence is found, we will then proceed to progressively finer taxonomic scales, including multiple species in each of several genera. If more sequence information is required for species-level diagnoses, we will extend our analyses to include additional genes.
Our work on land plants will involve four individuals: Spencer Barrett (Toronto) a CRC Chair in Evolutionary Genetics, Brian Husband (Guelph) who holds a CRC in Plant Evolutionary Ecology, and Sean Graham (UBC) whose group has extensive experience in high throughput land-plant molecular phylogenetics. All three labs bring expertise in plant molecular systematics, and will lead the selection of genes for species identification. Steve Newmaster (Guelph), an expert on the Ontario flora (Newmaster et al. 1998), will lead the sampling and identification of all plants and will lead our work on cryptograms. As Curator of the Herbarium within the Biodiversity Institute of Ontario, he will also manage the tissue collections, vouchers, and accessions for this project.Top