May 5, 2009

Random picks better than complicated process in gene identification

WEST LAFAYETTE, Ind. - Researchers at Purdue University have found a way to save time, money and a little frustration in searches for specific genes that shed light on the biological processes associated with all forms of life.

Andrew DeWoody, a professor of genetics, and postdoctoral associate Matthew C. Hale have provided evidence that a step called normalization is no longer necessary with recent advances in DNA sequencing technology. Instead, they used a theoretical approach called rarefaction for gene discovery, a process developed for ecological surveys to determine the abundance of a species in an ecosystem. Their results were published in the early online release of the journal BMC Genomics.

When searching for specific genes in a tissue sample, there may be thousands of genes that perform simple housekeeping functions, whereas others expressed in smaller numbers are charged with more complex and important functions. The difficulty is sorting through thousands of genes to find the ones that have unique functions.

"These housekeeping genes are highly expressed, often hundreds or thousands of times more than other genes," DeWoody said.

Through normalization, scientists heat up DNA to a point in which its two component strands split, or denature. As it is cooled, matching strands randomly find each other and reattach. Those that reunite quickly are typically the most numerous. By adding specific enzymes, many of the overabundant genes are decreased in number, while the few that reunite slowly are amplified until the genes are equal in number, making it easier to sort through them.

DeWoody and Hale believe normalization is not necessary given the vast amount of data that can be obtained through modern DNA sequencers.

"Normalization used to be required because commonly expressed genes would swamp the signal of rare genes," DeWoody said. "But normalization also discards valuable information about the relative levels of gene expression in a tissue sample."

Another key aspect of the paper is the novel use of analytical rarefaction in gene discovery.

In rarefaction, a species is selected at random from that ecosystem. Those selections are charted, with each selection considered one unit of effort. Once selections yield only previously selected species - or in this case, genes - the amount of effort needed to find all the species or genes has been determined. Scientists then know how much effort to use when searching for other genes.

"When it plateaus, you can give up. You've put in as much effort as you need," DeWoody said.

DeWoody and Hale tested the theory on samples from the reproductive organs of lake sturgeon while trying to find the genes responsible for determining fish sex. Hale said the work is important to conserve species by understanding their biological functions.

"Few studies have been done on threatened and endangered species. They're usually done on models such as mice and Arabidopsis," Hale said. "This species, the lake sturgeon, is a perfect example of a conservation concern."

The Great Lakes Fishery Trust and the Indiana Department of Natural Resources funded the research. DeWoody said the next step is to continue the process of finding the genes responsible for determining sex in sturgeon.

Writer: Brian Wallheimer, 765-496-2050, bwallhei@purdue.edu

Sources: Andrew DeWoody, 765-496-6109, dewoody@purdue.edu

Matthew C. Hale, 765-496-3427, mchale@purdue.edu

Ag Communications: (765) 494-8415;
Steve Leer, sleer@purdue.edu
Agriculture News Page


ABSTRACT

Next-Generation Pyrosequencing of Gonad Transcriptomes in the Polyploid
Lake Sturgeon (Acipenser Fulvescens):
The Relative Merits of
Normalization and Rarefaction in Gene Discovery

 Matthew C. Hale, Cory R. McCormick, James R. Jackson
and J. Andrew DeWoody

Background: Next-generation sequencing technologies have been applied most often to model organisms or species closely related to a model. However, these methods have the potential to be valuable in many wild organisms, including those of conservation concern. We used Roche 454 pyrosequencing to characterize gene expression in polyploid lake sturgeon (Acipenser fulvescens) gonads.

Results: Titration runs on a Roche 454 GS-FLX produced more than 47,000 sequencing reads. These reads represented 20,741 unique sequences that passed quality control (mean length = 186 bp). These were assembled into 1,831 contigs (mean contig depth = 4.1 sequences). Over 4,000 sequencing reads (~19%) were assigned gene ontologies, mostly to protein, RNA, and ion binding. A total of 877 candidate SNPs were identified from >50 different genes. We employed an analytical approach from theoretical ecology (rarefaction) to evaluate depth of sequencing coverage relative to gene discovery. We also considered the relative merits of normalized versus native cDNA libraries when using next-generation sequencing platforms. Not surprisingly, fewer genes from the normalized libraries were rRNA subunits. Rarefaction suggests that normalization has little influence on the efficiency of gene discovery, at least when working with thousands of reads from a single tissue type.

Conclusion: Our data indicate that titration runs on 454 sequencers can characterize thousands of expressed sequence tags which can be used to identify SNPs, gene ontologies, and levels of gene expression in species of conservation concern. We anticipate that rarefaction will be useful in evaluations of gene discovery and that next-generation sequencing technologies hold great potential for the study of other non-model organisms.


 

To the News Service home page