Building Ultra-Dense Genetic Maps in the Presence of Genotyping Errors and Missing Data
Abstract Recent advances of genomic technologies have opened unprecedented possibilities in building high-quality ultra-dense genetic maps. However, with very large numbers of markers available for a mapping population, most of the markers will remain inseparable by recombination. Real situations are also complicated by genotyping errors, which “diversify” a certain part of the markers that would be identical in error-free situations. The higher the error rate the more difficult is the problem of building a reliable map. In our algorithm, we assume that error-free markers can be selected based on the presence of “twins”. There is also a probability of an opposite effect, when non-identical markers may become “twins” because of genotyping errors. Thus, a certain threshold is introduced for the selection of markers with a sufficient number of twins. The developed algorithm (implemented in MultiPoint software) enables mapping big sets of markers (~105–106). Unlike some other algorithms used in building ultra-dense genetic maps, the proposed “twins” approach does not need any prior information (e.g., anchor markers), and hence can be applied to genetically poorly studied organisms.
Recent advances of genomic technologies have opened unprecedented possibilities in building high-quality ultra-dense genetic maps. However, with very large numbers of markers available for a mapping population, most of the markers will remain inseparable by recombination and will represent groups of co-segregating, or absolutely linked markers (AL markers). In such cases, only one marker from each group could be placed on the map that can be referred to as a framework, skeleton, or bin map; the remaining markers can then be attached to the skeleton map (Mester et al. 2003; Korol et al. 2009; Ronin et al. 2010). The real situation is significantly complicated by genotyping errors, which “diversify” a certain part of markers that would be identical in the ideal situation of no errors. The higher the error rate and the number of markers, the more difficult it is to build a reliable map (Buetow 1991). An additional complication is when a part of data points is missing, which is common in the genotyping-by-sequencing (GBS) approach and cannot always be compensated for by the imputation of missing scores.
Several approaches have been suggested for constructing high-density genetic maps aimed at overcoming the aforementioned difficulties. The dominating strategy includes various ways of building hierarchical framework maps (Isidore et al. 2003),
e.g. by combining the irresolvable markers of a linkage group into bins (groups of “bound together markers”) in the first phase followed by joint ordering of the representatives of these groups and singleton markers. Our approach to the ordering problem is based on reducing it to the traveler salesperson problem (TSP) and employing Guided Evolutionary Strategy heuristics for building the framework or skeleton map (Mester et al. 2003, 2010; Ronin et al. 2010). An interesting alternative possibility of phasing the mapping analysis is by constructing a minimum spanning tree of a graph followed by improvement of the initial solution based on TSP-inspired heuristics (Wu et al. 2008). For situations of ultra-dense mapping, with thousands and dozens of thousands of markers per chromosome “contaminated” by typing errors, we propose a simple “twins” approach for selecting reliable skeletal markers. Combined with our powerful discrete optimization heuristics, this approach enables the mapping of very big sets of markers (e.g. 105), i.e., suitable to wheat genotyping with the 90 K iSelect chip as well as with the GBS approach. The corresponding algorithms implemented in MultiPoint software were intensively tested using simulated data and a set of 420,000 SNP and GBS markers of a wheat DH population.