Results and Discussion

Various algorithms have been proposed for building dense genetic maps, including the stepwise increase of the map density (Jansen et al. 2001; Isidore et al. 2003; Mester et al. 2003, 2010; Wu et al. 2008). This problem becomes especially challenging with the current widespread transition from a few hundred to tens or even hundreds of thousands of typed markers per genome. It is well recognized that in such a reality even 1 % of typing errors may lead to a dramatic reduction of map quality, i.e., “more” (markers) may imply “less” (confidence in map quality, at least on a microscale). The problem includes a few aspects: (i) computational

Fig. 14.3 The structure of clouds with markers with scoring errors

complexity, related to the exponential growth of the number of potential marker orders to be tested, (ii) the impossibility to resolve the vast majority of markers by recombination under reasonable population sizes, and (iii) high impact of typing errors on map quality. Our approach is based on the assumption that upon high excess of irresolvable compared to resolvable markers and a low level of typing errors, members of “twin” groups with minimum missing scores can be considered as more credible markers compared to singleton markers.

For an illustration of the efficiency of our “twins” approach, two examples are provided here: simulated data for one chromosome with 10,000 markers for a DH population with N = 200 (two variants of the same marker set were considered, with and without marker typing errors), and real DH data on ~24,000 markers of wheat chromosome 3B (the whole genome set included ~420,000 markers). In the first example, the map length was 212 cM. For error-free data, the skeleton map included 197 markers. For data with 1 % typing errors, about 1/8 of the markers appear as AL groups, while 7/8 of the markers appear as clouds surrounding AL groups, as explained in Fig. 14.1 and illustrated by Fig. 14.3 (grey dots). Figure 14.3 illustrates the distribution of markers with errors relative to the skeleton map (when it is known, as with simulated data).

The analysis of simulated data with 1 % errors (Table 14.1) demonstrates how a meaningful map can be obtained for such data when nothing is known about the order of markers, which is a standard situation with non-model species. Obviously, the result may depend on the threshold size of the AL groups to be represented in the skeleton map. Thus, with threshold = 4, AL groups with two and three markers are excluded from consideration together with singletons (moved to heap) and the first variant of the skeleton map is constructed (stage 1 of the procedure). Stage 2 is cleaning the map. MultiPoint package enables the detection and removal of markers violating the order stability and monotonic growth of distances in the skeleton map (Ronin et al. 2010). After cleaning, markers from the heap can be checked as candidates for filling in the gaps (if gaps are present in the obtained skeleton map). The

Table 14.1 Building dense multilocus maps based on selection of twin markers

Stage

Threshold size of AL groups

2

3

4

1

M

318

122

98

L

384

218

208

2–3

M

158

141

145

L

218

219

218

M number of markers in the skeleton map, L skeleton map length (cM), the skeleton map build using error-free marker data included 197 markers (L = 212 cM)

Fig. 14.4 Map of wheat chromosome 3B, the largest in the wheat genome (the figure is split into two parts to fit the page size limits)

results in Table 14.1 show a relatively weak dependence on the arbitrarily selected threshold of the AL group size and very good correspondence between the map characteristics (the number of skeletal markers and length of the map) obtained under zero and 1 % marker typing errors. Clearly, each of the remaining >9,800 markers can be attached to the corresponding interval or marker on the skeleton map. Figure 14.4 shows the skeleton map of the second example, on wheat chromosome 3B (DH population, the total set included ~420,000 markers).

Acknowledgments This work was supported by the Israel Science Foundation (ISF grant

#800/10), Binational Agricultural Research and Development Fund (BARD grant #3873/06), MultiQTL Ltd, and the Israeli Ministry of Absorption.

 
< Prev   CONTENTS   Next >