Survey Sequencing and Annotation of Chromosome 6B
Under the framework of IWGSC, sequencing project of chromosome 6B was started in Japan in 2011 and the first survey sequences of 6B was released in 2013 (Tanaka et al. 2014). In this analysis, the DNA libraries of sorted 6B chromosome arms were constructed and sequenced independently using the 454 GS-FLX Titanium (Roche, CT, USA). The sequence reads (454 reads) from each arm were assembled by GS assembler 2.7 (Roche). From more than 12 million reads for each arm, 234 and 273 Mbp were assembled comprising 262,375 and 173,655 contigs for 6BS and 6BL, respectively. They correspond to 56.6 % and 54.9 % of the estimated lengths of both arms (415 Mbp for 6BS and 498 Mbp for 6BL).
As described before, the wheat genome is composed of abundant repetitive elements. Known classes of repeat elements were detected using the repeat libraries, such as TREP and MIPS repeat libraries. In addition, to detect novel repeat elements, we constructed a new repeat library by RepeatModeler (repeat- masker.org/RepeatModeler.html). Using a repeat masking program, censor ( girinst.org/censor/index.php) with TREP and the new repeat library (Jurka et al. 1996), 76.6 % and 85.5 % of 6BS and 6BL assembly were masked, respectively. Since 63.6 % and 72.2 % of 6BS and 6BL assemblies were masked by TREP library, around 13 % of repetitive regions may be novel repeat elements detected only by the new library.
After repeat detection, we identified transcribed regions by mapping many transcripts in public domains. In addition to mRNA and millions of ESTs in DDBJ/ EMBL/GenBank, wheat full-length cDNAs (FLcDNAs) were available from TriFLDB (trifldb.psc.riken.jp/index.pl) (Mochida et al. 2009). In combination with transcriptome mapping and an ab initio gene prediction program, 4,798 transcribed regions were determined. We found several genes that were known to locate on chromosome 6B, such as α-gliadin gene, the stripe rust resistance gene Yr36, the grain protein content gene Gpc-B1, α-amylase gene, the genes for three low-temperature-responsive dehydrins, Wcs120, Wcs66 and Wcor410, the flowering time gene TaHd1-2 and the gene involved in vernalization TmVIL2.
Our assemblies also showed the conservation of syntenic genes between monocots. First, 2,399 of 2,573 high-confidence barley genes on chromosome 6H could be mapped on our assemblies (E value <10−5). Second, 3,772 syntenic loci were detected from homology search of syntenic genes from chromosome 2 of O. sativa, chromosome 3 of B. distachyon and chromosome 4 of S. bicolor. Since 57.4 % of the syntenic regions had wheat transcriptome evidence, which was significant higher than that of non-syntenic regions (32.7 %), we concluded that wheat 6B has a conserved synteny with the chromosomes of other grass species.
Our annotation pipeline included detection of RNA genes, rRNAs, tRNAs, and miRNAs. It is known that chromosome 6B has a locus for ribosomal DNA (rDNA) containing approximately 5,500 rRNA genes. Moreover, non-protein coding RNAs, such as microRNAs (miRNAs) are currently recognized as biologically important genetic components. We found that some RNA genes were associated to a particular repetitive element. For example, 83 of 131 tRNALys were located in an LTR retrotransposon, Gypsy, and de novo repeats. Almost predicted miRNAs were also located in repeat-masking regions, especially DNA transposons, Mariner and CACTA. In case of rRNA genes, the quite small number of contigs with rRNA genes could be explained by high read depth of contigs. Because of the high sequence similarity, rRNA regions were degenerated during the assembly so that a few contigs with high depth reads existed in our data. This result is quite similar to that of repetitive regions. These results suggested that RNA genes were distributed in the wheat genome with the diffusion of transposons and repetitive elements
Application of Chromosome 6B Sequences to Wheat Genomics
Decipher of genome sequences enables us not only to know representative gene set containing many novel genes, but also to prepare resources for genomics and breeding, such as maker information. In case of wheat, chromosome information is quite useful to distinguish homoeologous genes. For example, there are three homoeologs of flowering time genes, TaHd1-1, TaHd1-2 and TaHd1-3. Our 6B assembly can distinguish TaHd1-2 transcribed from 6B and other two homoeologs from 6A and 6D in the sequence similarity level. In addition, since exon-intron structures are determined on wheat genomes, constructions of transcript-based markers, such as PLUG markers, are easier and more accurate than the previous situation using rice genome data.
Insertion site-based polymorphism (ISBP) marker can be constructed using genome sequences (Paux et al. 2010). Genome wide survey of simple sequence repeat (SSR) is applied to construct SSR markers on non-genic regions that have not been focused by the transcript-based marker constructions. As same as the genome
zipper analysis (Mayer et al. 2009, 2011), virtual order of the markers would be speculated by sequence homology of the flanking regions of the markers to closely related species, such as barley and Brachypodium. In fact, we found 16,728 SSRs on non-repetitive regions of 6B and at least 1,354 SSRs of them were positioned on barley chromosome 6H. Since more than 80 % of the SSRs were located in intergenic regions of 6H, the new SSR markers can be efficiently used for the gap filling between known markers.
Survey sequences of wheat chromosome 6B provided the various types of novel information, e.g. repeat information, genome annotation including genes and RNA genes, and marker information. However the current genomic sequences of the chromosome 6B are fragmented and not completely covered so that improvement of genome assembling should be needed. Sequencing of chromosome 6B is ongoing with MTP method and BAC by BAC sequencing using Roche 454, and more accurate and physical positioned sequences will be available in near future.
Acknowledgments This work was supported by grants from the Ministry of Agriculture, Forestry and Fisheries of Japan (KGS1001, KGS1003, KGS1004, and NGB1003) and funding from Nisshin Flour Milling Inc.