DNA Barcoding: Solving Old and New Problems in Biology

A History of DNA Barcoding

DNA Barcodes: The beginning

Paul Hebert of the University of Guelph coined the term DNA barcode to mean a unique DNA sequence that identifies each living thing in the same way that the unique pattern of bars in a Universal Product Code (UPC) identifies each consumer product. DNA barcoding relies on short, highly variable regions of the mitochondrial and chloroplast genomes.

For barcoding plants, a region of the chloroplast gene rbcL—RuBisCo large subunit—is used. The most abundant protein on Earth, RuBisCo (ribulose-1,5-bisphosphate carboxylase oxygenase) catalyzes the first step of carbon fixation. Carolina provides Plant rbcL Primer Sets and Fungal Primer Sets for a worry free analysis of plant DNA. A region of the mitochondrial gene COI (cytochrome c oxidase subunit I) is used for barcoding animals. Cytochrome c oxidase is involved in the electron transport phase of respiration. Carolina also provides a Fish COI Primer Set and Insect/Mammal Primer Set for barcoding animals. Thus, the genes used in DNA barcoding are involved in the key reactions of life: storing energy in carbohydrates and releasing it to form ATP.

Categorizing life on Earth

The International Barcode of Life (iBOL) was launched at meetings held at Cold Spring Harbor Laboratory in 2003 as a megaproject to identify and categorize all life on Earth. Today, iBOL organizes collaborators from more than 150 countries to participate in a variety of “campaigns.” These campaigns census diversity among plant and animal groups (including ants, bees, butterflies, fish, birds, mammals, fungi, and flowering plants) and within ecosystems (including the seas, poles, rain forests, kelp forests, and coral reefs).

We know very little about the diversity of plants and animals—let alone microbes—living in many unique ecosystems on Earth. Less than 2 million of the estimated 5 million to 50 million plant and animal species have been identified; more than half of known species are insects. Scientists agree that the yearly rate of extinction has increased from about 1 species per million to 100 to 1,000 per million. This means that thousands of plants and animals are potentially lost each year. Most of these have not yet been identified.

Taxonomy, the science of classifying living things, has grown in importance as we monitor the biological effects of global climate change and attempt to preserve species diversity in the face of accelerating habitat destruction. However, classical taxonomy, which categorizes organisms according to physical features, falls short in this race to catalog biological diversity before it disappears.

Barcodes aid in taxonomy and more

In classical taxonomy, discriminating subtle anatomical differences between closely related species requires the subjective judgment of a highly trained specialist, and few are being produced in colleges today. Specimens must also be carefully collected and handled to preserve their distinguishing attributes. In contrast, DNA barcodes allow nonexperts to objectively identify species—even from small, damaged, or industrially processed specimens. DNA barcodes can be quickly processed from thousands of specimens and unambiguously analyzed by computer programs.

The 10-year Census of Marine Life, completed in 2010, was a mission of 80 nations completing 540 marine expeditions, with a goal to catalog marine life diversity, distribution, and abundance. The census provided the first comprehensive list of more than 190,000 marine species, identified 1,200 new species, and identified 6,000 potentially new species.

There is also a surprising level of biological diversity on land, literally in front of our eyes. For example, Hebert’s group used DNA barcodes to show that a well-known skipper butterfly (Astraptes fulgerator), identified in 1775, is actually 10 distinct species. The DNA differences correlated with striking differences in the coloration of caterpillars and the food plants they exploit. This sort of cryptic, or hidden, diversity is likely common among insects and other organisms with complicated life cycles. The urban environment is also unexpectedly diverse; DNA barcodes were used to catalog 54 species of bees and 24 species of butterflies in New York City community gardens.

DNA barcodes are also used to detect food fraud and products taken from conserved species. Working with researchers from Rockefeller University and the American Museum of Natural History, students from Trinity High School found that 25% of 60 seafood items purchased in New York City grocery stores and restaurants were mislabeled as more expensive species. One mislabeled fish was the Acadian redfish, an endangered species. Another group identified 3 protected whale species as the source of sushi sold in California and Korea. The Carolina kit, Using DNA Barcodes to Identify and Classify Living Things, developed by the DNA Learning Center (DNALC), makes it possible to develop DNA barcodes for a variety of plants or animals—or products made from them. Your students can also be food-fraud detectives.

Barcode projects

Begin a barcode project by having students brainstorm about a topic that interests them, such as:

  • Checking for pests or invasive species
  • Monitoring animal movements or migrations
  • Identifying exotic or endangered food products
  • Detecting food or product fraud

Use a smartphone or digital camera to photograph specimens in their natural environments or where they were purchased or obtained. GPS-enabled devices will tag photos with latitude and longitude, allowing collection sites to be readily placed on a map.

One common barcoding application is to inventory biodiversity in an ecosystem, park, or garden. This introduces the concept of a sampling unit, such as the quadrat, a 1 m square area from which each different plant and animal is collected for barcoding. Collected specimens are sorted by visible types or keyed to the family, genus, or species level; then 1 or more representatives of each type or taxa are barcoded. Students may also consider allying with other classes or schools in a “campaign” to systematically inventory a river or ecosystem.

Specimen preparation

Specimens can include a plant leaf, petal, or bud; an entire insect or insect part; several hair roots (follicles); or flesh from the base of a feather. Fresh, frozen, dried, and even processed food items are also good sources of DNA. DNA can be safely isolated from a small sample of specimen tissue in about 75 minutes. The sample is ground with nuclei lysis solution, and then proteins and other cellular debris are removed. DNA is precipitated with alcohol, pelleted by centrifugation, dried, and rehydrated.

The purified DNA is mixed with a primer set for either the COI or rbcL barcode region and amplified by polymerase chain reaction (PCR). The animal COI barcode is obtained from the mitochondrial genome, and the plant rbcL barcode from the chloroplast genome. Each cell has 10s to 100s of these organelles with multiple copies of their genomes, so the COI and rbcL barcode sequences are readily amplified—even from very small or degraded specimens. A sample of the amplified DNA is electrophoresed on an agarose gel to confirm a product of about 700 nucleotides for COI and about 600 nucleotides for rbcL.

Sequencing and analysis

The remainder of the amplified DNA is submitted for low-cost sequencing of the barcode region by GENEWIZ® DNA services. This company has optimized reaction conditions for the Carolina kit and produces an excellent-quality sequence with rapid turnaround—usually within 48 hours of sample receipt.

GENEWIZ® sequences are automatically uploaded to the DNALC website DNA Subway, an intuitive bioinformatics workflow for analyzing DNA barcodes.

  • At the first stop, students view electropherograms of their barcode sequences, trim the ends, and make a consensus sequence (if both strands have been sequenced).
  • At the second stop, they submit their sequences to the BLAST® website to identify close matches in GenBank® and other major databases. Sequences from the BLAST® search and reference data are then added to the analysis.
  • At the final stop, the MUSCLE algorithm aligns the sequences and phylogenetic trees are generated using the neighbor-joining and maximum likelihood methods. Novel DNA barcodes can be submitted to the database at the Barcode of Life Data System (BOLD).

The Urban Barcode Project

The Urban Barcode Project, funded by a grant from the Alfred P. Sloan Foundation, provides a scalable infrastructure to broadly disseminate DNA barcoding in education. Student research teams at New York City schools develop ideas and submit proposals for barcode projects. Sponsoring teachers receive training and are supported by DNALC staff and scientific mentors at research institutions in New York City. Each team is provided everything needed to develop up to 100 barcode sequences; they may borrow an equipment footlocker to use at their school or attend Open Lab Days at locations around New York City.

A microsite (www.urbanbarcodeproject.org) supports all phases of the project. The science of DNA barcoding and suggestions for student experiments are presented in video interviews with scientists and students, animations, an active news feed, and links to The Barcode Blog and iBOL. An online “lab notebook” includes interactive and print versions of the barcode protocol. A Google Maps™ mapping service utility tracks student teams and the specimens they collect.

Together, the Carolina kit Using DNA Barcodes to Identify and Classify Living Things and theDNA Subway and Urban Barcode Project websites provide the infrastructure for students across the United States to participate in the exciting science of DNA barcoding.


Primer Sets

Using DNA Barcodes to Identify and Classify Living Things

David A. Micklos
Founder and Executive Director
DNA Learning Center at Cold Spring Harbor Laboratory
Cold Spring Harbor, NY

Leave a Reply