Genome sequencing speeds ability to improve soybeans

January 13, 2010 Brian Wallheimer

WEST LAFAYETTE, Ind. - Purdue University scientists led an effort to sequence the soybean genome, giving researchers a better understanding of the plant's genes and how to use them to improve its characteristics.

Agronomy professor Scott Jackson said the U.S. departments of Energy and Agriculture study found that the soybean has about 46,000 genes, but many of those - 70 percent to 80 percent - are duplicates. This duplication may make it difficult to target the genes necessary to improve soybean characteristics such as seed size, oil content or yield.

Adding to the difficulty, Jackson said many of the duplicated genes in the soybean genome have been shuffled, making it hard to predict where the duplicate copies of a gene might be. This complicates the genetics and breeding of soybeans.

"If I'm selecting for a gene, I may have difficulty in locating all the necessary duplicates of that gene," said Jackson, corresponding author on the soybean genome paper that was published Wednesday (Jan. 13) on the cover of the journal Nature. "It has a lot of backup copies."

Despite the difficulties the soybean genome presents, having a sequenced genome does speed up the work scientists are able to do to improve the plant's characteristics. Genome sequencing eliminates the need for meticulous searches for particular genes.

"It really is going to change the way we ask questions about soybeans in research," said Randy Shoemaker, a research geneticist from the U.S. Department of Agriculture's Agricultural Research Service at Iowa State University and the paper's co-author. "What used to take us literally years can take us weeks or months now. This is the entire genetic code in front of you."

Jianxin Ma, an assistant professor of agronomy at Purdue and paper co-author, said the largest proportion of the soybean genome is composed of transposable elements, or TEs, which are often referred to as 'genomic junks'. However, some are important, Ma said.

Bursts of activity from these TEs, as well as their enormous presence, inevitably have an impact on the functionality of genes. The team's annotation of a nearly complete set of TEs in the sequenced genome led to another discovery regarding how TEs thrive in the host genome, which will be published in the journal The Plant Cell on Friday (Jan. 15). 

"Although TEs are ubiquitous, what we discovered has not been seen in any other organisms yet," Ma said. "We found that some 'dead' TEs can actually be revivified by swapping with their active TE partners, and thus restore or even enhance their ability to proliferate using the amplification machinery encoded by their partners."

Having the genome in hand will allow scientists to compare different varieties of soybean plants and determine which genes are responsible for different characteristics, such as increased oil content or larger plants. One of the next steps in the research is to resequence the 20,000 soybean lines in the U.S. germplasm collection to find genes not common to all soybean cultivars.

"When soybeans were domesticated, they were selected for seed size and other traits, but there were a lot of potentially valuable genes left behind," Jackson said. "There may be valuable genes associated with protein content or disease resistance."

The United Soybean Board, National Science Foundation, USDA and Department of Energy funded the research.


Writer: Brian Wallheimer, 765-496-2050,  

Sources:   Scott Jackson, 765-496-3621,

                   Jianxin Ma, 765-496-3662,



Genome Sequence of the Palaeopolyploid Soybean

Jeremy Schmutz, Steven B. Cannon, Jessica Schlueter, Jianxin Ma, Therese Mitros, William Nelson, David L. Hyten, Qijian Song, Jay J. Thelen, Jianlin Cheng, Dong Xu, Uffe Hellsten, Gregory D. May, Yeisoo Yu, Tetsya Sakurai, Taishi Umezawa, Madan K. Bhattacharyya, Devinder Sandhu, Babu Valliyodan, Erika Lindquist, Myron Peto, David Grant, Shengqiang Shu, David Goodstein, Kerrie Barry, Montona Futrell-Griggs, Brian Abernathy, Jianchang Du, Zhixi Tian, Liucun Zhu, Navdeep Gill, Trupti Joshi, Marc Libault, Anand Sethuraman, Xue-Cheng Zhang, Kazuo Shinozaki, Henry T. Nguyen, Rod A. Wing1, Perry Cregan, James Specht, Jane Grimwood, Dan Rokhsar, Gary Stacey, Randy C. Shoemaker and Scott A. Jackson

Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.