Browsing by Subject "Genomic prediction"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Publication Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material(2018) Thorwarth, Patrick; Schmid, Karl J.Due to the advent of new sequencing technologies and high-throughput phenotyping an almost unlimited amount of data is available. In combination with statistical methods such as Genome-wide association mapping (GWAM) and Genomic prediction (GP), these information can provide valuable insight into the genetic potential of individuals and support selection and crossing decisions in a breeding program. In this thesis we focused on the evaluation of the aforementioned methods in diverse barley (Hordeum vulgare L.) and cauliflower (Brassica oleracea var. botrytis) populations consisting of elite material and genetic resources. We concentrated on the dissection of the influence of specific parameters such as marker type, statistical models, influence of population structure and kinship, on the performance of GWAM and GP. For parts of this thesis, we additionally used simulated data to support findings based on empirical data. First, we compared four different GWAM methods that either use single-marker or haplotypes for the detection of quantitative trait loci in a barley population. To find out the required population size and marker density to detect QTLs of varying effect size, we performed a simulation study based on parameter estimates of the empirical population. We could demonstrate that already in small populations of about 100 individuals, QTLs with a large effect can be detected and that at least 500 individuals are necessary to detect QTLs with an effect < 10%. Furthermore, we demonstrated an increased power of haplotpye based methods in the detection of very small QTLs. In a second study we used a barley population consisting of 750 individuals as training set to compare different GP models, that are currently used by scientists and plant breeders. From the training set 33 offspring families were derived with a total of 750 individuals. This enabled us to assess the prediction ability not only based on cross-validation but also in a large offspring population with varying degree of relatedness to the training population. We investigated the effects of linkage disequilibrium and linkage phase, population structure and relatedness of individuals, on the prediction ability. We could demonstrate a strong effect of the population structure on the prediction ability and show that about 11,203 evenly spaced SNP markers are necessary to predict even genetically distant populations. This implies that at the current marker density prediction ability is based on the relatedness of the individuals. In a third study we focused on the evaluation of GWAM and GP in cauliflower. We focused on the evaluation of genotyping-by-sequencing and compared the influence of imputation methods on the prediction ability and the number of significant associations. We obtained a total 120,693 SNPs in a random collection of 174 cauliflower genebank accessions. We demonstrated that imputation did not increase prediction ability and that the number of detected QTLs only slightly differed between the imputed and the unimputed data set. GP performed well even in such a diverse gene bank sample, but population structure again influenced the prediction ability. We could demonstrate the usefulness and limitations of Genome-wide association mapping and genomic prediction in two species. Even though a lot of research in the field of statistical genetics has provided valuable insight, the usage of Genomic prediction should still be applied with care and only as a supporting tool for classical breeding methods.Publication Extensions of genomic prediction methods and approaches for plant breeding(2013) Technow, Frank; Melchinger, Albrecht E.Marker assisted selection (MAS) was a first attempt to exploit molecular marker information for selection purposes in plant breeding. The MAS approach rested on the identification of quantitative trait loci (QTL). Because of inherent shortcomings of this approach, MAS failed as a tool for improving polygenic traits, in most instances. By shifting focus from QTL identification to prediction of genetic values, a novel approach called 'genomic selection', originally suggested for breeding of dairy cattle, presents a solution to the shortcomings of MAS. In genomic selection, a training population of phenotyped and genotyped individuals is used for building the prediction model. This model uses the whole marker information simultaneously, without a preceding QTL identification step. Genetic values of selection candidates, which are only genotyped, are then predicted based on that model. Finally, the candidates are selected according their predicted genetic values. Because of its success, genomic selection completely revolutionized dairy cattle breeding. It is now on the verge of revolutionizing plant breeding, too. However, several features set apart plant breeding programs from dairy cattle breeding. Thus, the methodology has to be extended to cover typical scenarios in plant breeding. Providing such extensions to important aspects of plant breeding are the main objectives of this thesis. Single-cross hybrids are the predominant type of cultivar in maize and many other crops. Prediction of hybrid performance is of tremendous importance for identification of superior hybrids. Using genomic prediction approaches for this purpose is therefore of great interest to breeders. The conventional genomic prediction models estimate a single additive effect per marker. This was not appropriate for prediction of hybrid performance because of two reasons. (1) The parental inbred lines of single-cross hybrids are usually taken from genetically very distant germplasm groups. For example, in hybrid maize breeding in Central Europe, these are the Dent and Flint heterotic groups, separated for more than 500 years. Because of the strong divergence between the heterotic groups, it seemed necessary to estimate heterotic group specific marker effects. (2) Dominance effects are an important component of hybrid performance. They had to be included into the prediction models to capture the genetic variance between hybrids maximally. The use of different heterotic groups in hybrid breeding requires parallel breeding programs for inbred line development in each heterotic group. Increasing the training population size with lines from the opposite heterotic group was not attempted previously. Thus, a further objective of this thesis was to investigate whether an increase in the accuracy of genomic prediction can be achieved by using combined training sets. Important traits in plant breeding are characterized by binomially distributed phenotypes. Examples are germination rate, fertility rates, haploid induction rate and spontaneous chromosome doubling rate. No genomic prediction methods for such traits were available. Therefore, another objective was to provide methodological extensions for such traits. We found that incorporation of dominance effects for genomic prediction of maize hybrid performance led to considerable gains in prediction accuracy when the variance attributable to dominance effects was substantial compared to additive genetic variance. Estimation of marker effects specific to the Dent and Flint heterotic group was of less importance, at least not under the high marker densities available today. The main reason for this was the surprisingly high linkage phase consistency between Dent and Flint heterotic groups. Furthermore, combining individuals from different heterotic groups (Flint and Dent) into a single training population can result in considerable increases in prediction accuracy. Our extensions of the prediction methods to binomially distributed data yielded considerably higher prediction accuracies than approximate Gaussian methods. In conclusion, the developed extensions of prediction methods (to hybrid prediction and binomially distributed data) and approaches (training populations combining heterotic groups) can lead to considerable, cost free gains in prediction accuracy. They are therefore valuable tools for exploiting the full potential of genomic selection in plant breeding.Publication Factors influencing the accuracy of genomic prediction in plant breeding(2017) Schopp, Pascal; Melchinger, Albrecht E.Genomic prediction (GP) is a novel statistical tool to estimate breeding values of selection candidates without the necessity to evaluate them phenotypically. The method calibrates a prediction model based on data of phenotyped individuals that were also genotyped with genome-wide molecular markers. The renunciation of an explicit identification of causal polymorphisms in the DNA sequence allows GP to explain significantly larger amounts of the genetic variance of complex traits than previous mapping-based approaches employed for marker-assisted selection. For these reasons, GP rapidly revolutionized dairy cattle breeding, where the method was originally developed and first implemented. By comparison, plant breeding is characterized by often intensively structured populations and more restricted resources routinely available for model calibration. This thesis addresses important issues related to these peculiarities to further promote an efficient integration of GP into plant breeding.Publication Genetic dissection of phosphorus use efficiency and genotype-by-environment interaction in maize(2022) Li, Dongdong; Li, Guoliang; Wang, Haoying; Guo, Yuhang; Wang, Meng; Lu, Xiaohuan; Luo, Zhiheng; Zhu, Xintian; Weiß, Thea Mi; Roller, Sandra; Chen, Shaojiang; Yuan, Lixing; Würschum, Tobias; Liu, WenxinGenotype-by-environment interaction (G-by-E) is a common but potentially problematic phenomenon in plant breeding. In this study, we investigated the genotypic performance and two measures of plasticity on a phenotypic and genetic level by assessing 234 maize doubled haploid lines from six populations for 15 traits in seven macro-environments with a focus on varying soil phosphorus levels. It was found intergenic regions contributed the most to the variation of phenotypic linear plasticity. For 15 traits, 124 and 31 quantitative trait loci (QTL) were identified for genotypic performance and phenotypic plasticity, respectively. Further, some genes associated with phosphorus use efficiency, such as Zm00001eb117170, Zm00001eb258520, and Zm00001eb265410, encode small ubiquitin-like modifier E3 ligase were identified. By significantly testing the main effect and G-by-E effect, 38 main QTL and 17 interaction QTL were identified, respectively, in which MQTL38 contained the gene Zm00001eb374120, and its effect was related to phosphorus concentration in the soil, the lower the concentration, the greater the effect. Differences in the size and sign of the QTL effect in multiple environments could account for G-by-E. At last, the superiority of G-by-E in genomic selection was observed. In summary, our findings will provide theoretical guidance for breeding P-efficient and broadly adaptable varieties.Publication Genomic prediction in rye(2017) Bernal-Vasquez, Angela-Maria; Piepho, Hans-PeterTechnical progress in the genomic field is accelerating developments in plant and animal breeding programs. The access to high-dimensional molecular data has facilitated acquisition of knowledge of genome sequences in many economically important species, which can be used routinely to predict genetic merit. Genomic prediction (GP) has emerged as an approach that allows predicting the genomic estimated breeding value (GEBV) of an unphenotyped individual based on its marker profile. The approach can considerably increase the genetic gain per unit time, as not all individuals need to be phenotyped. Accuracy of the predictions are influenced by several factors and require proper statistical models able to overcome the problem of having more predictor variables than observations. Plant breeding programs run for several years and genotypes are evaluated in multi environment trials. Selection decisions are based on the mean performance of genotypes across locations and later on, across years. Under this conditions, linear mixed models offer a suitable and flexible framework to undertake the phenotypic and genomic prediction analyses using a stage-wise approach, allowing refinement of each particular stage. In this work, an evaluation and comparison of outlier detection methods, phenotypic analyses and GP models were considered. In particular, it was studied whether at the plot level, identification and removal of possible outlying observations has an impact on the predictive ability. Further, if an enhancement of phenotypic models by spatial trends leads to improvement of GP accuracy, and finally, whether the use of the kinship matrix can enhance the dissection of GEBVs from genotype-by-year (GY) interaction effects. Here, the methods related to the mentioned objectives are compared using experimental datasets from a rye hybrid breeding program. Outlier detection methods widely used in many German plant breeding companies were assessed in terms of control of the family-wise error rate and their merits evaluated in a GP framework (Chapter 2). The benefit of implementation of the methods based on a robust scale estimate was that in routine analysis, such procedures reliably identified spurious data. This outlier detection approach per trial at the plot level is conservative and ensures that adjusted genotype means are not severely biased due to outlying observations. Whenever it is possible, breeders should manually flag suspicious observations based on subject-matter knowledge. Further, removing the flagged outliers identified by the recommendedmethods did not reduce predictive abilities estimated by cross validation (GP-CV) using data of a complete breeding cycle. A crucial step towards an accurate calibration of the genomic prediction procedure is the identification of phenotypic models capable of producing accurate adjusted genotype mean estimates across locations and years. Using a two-year dataset connected through a single check, a three-stage GP approach was implemented (Chapter 3). In the first stage, spatial and non-spatial models were fitted per locations and years to obtain adjusted genotype-tester means. In the second stage, adjusted genotype means were obtained per year, and in the third stage, GP models were evaluated. Akaike information criterion (AIC) and predictive abilities estimated from GP-CV were used as model selection criteria in the first and in the third stage. These criteria were used in the first stage, because a choice had to be made between the spatial and non-spatial models and in the third stage, because the predictive abilities allow a comparison of the results of the complete analysis obtained by the alternative stage-wise approaches presented in this thesis. The second stage was a transitional stage where no model selection was needed for a given method of stage-wise analysis. The predictive abilities displayed a different ranking pattern for the models than the AIC, but both approaches pointed to the same best models. The highest predictive abilities obtained for the GP-CV at the last stage did not coincide with the models that AIC and predictive ability of GP-CV selected in the first stage. Nonetheless, GP-CV can be used to further support model selection decisions that are usually based only upon AIC. There was a trend of models accounting for row and column variation to have better accuracies than the counterpart model without row and column effects, thus suggesting that row-column designs may be a potential option to set up breeding trials. While bulking multi-year data allows increasing the training set size and covering a wider genetic background, it remains a challenge to separate GEBVs from GY effects, when there are no common genotypes across years, i.e., years are poorly connected or totally disconnected. First, an approach considering the two-year dataset connected through a single check, adjusted genotype means were computed per year and submitted to the GP stage (Chapter 3). The year adjustment was done in the GP model by assuming that the mean across genotypes in a given year is a good estimate of the year effect. This assumption is valid because the genotypes evaluated in a year are a sample of the population. Results indicated that this approach is more realistic than relying on the adjustment of a single check. A further approach entailed the use of kinship to dissect GY effects from GEBVs (Chapter 4). It was not obvious which method best models the GY effect, thus several approaches were compared and evaluated in terms of predictive abilities in forward validation (GP-FV) scenarios. It was found that for training sets formed by several disconnected years’ data, the use of kinship to model GY effects was crucial. In training sets where two or three complete cycles were available (i.e. there were some common genotypes across years within a cycle), using kinship or not yielded similar predictive abilities. It was further shown that predictive abilities are higher for scenarios with high relatedness degree between training and validation sets, and that predicting a selection of top-yielding genotypes was more accurate than predicting the complete validation set when kinship was used to model GY effects. In conclusion, stage-wise analysis is recommended and it is stressed that the careful choice of phenotypic and genomic prediction models should be made case by case based on subject matter knowledge and specificities of the data. The analyses presented in this thesis provide general guidelines for breeders to develop phenotypic models integrated with GP. The methods and models described are flexible and allow extensions that can be easily implemented in routine applications.Publication Integration of hyperspectral, genomic, and agronomic data for early prediction of biomass yield in hybrid rye (Secale cereale L.)(2021) Galán, Rodrigo José; Miedaner, ThomasCurrently, the combination of a growing bioenergy demand and the need to diversify the dominant cultivation of energy maize opens a highly attractive scenario for alternative biomass crops. Rye (Secale cereale L.) stands out for its vigorous growth and increased tolerance to abiotic and biotic stressors. In Germany, less than a quarter of the total harvest is used for food production. Consequently, rye arises as a source of renewables with a reduced bioenergy-food tradeoff, emerging biomass as a new breeding objective. However, rye breeding is mainly driven by grain yield while biomass is destructively evaluated in later selection stages by expensive and time-consuming methods. The overall motivation of this research was to investigate the prospects of combining hyperspectral, genomic, and agronomic data for unlocking the potential of hybrid rye as a dual-purpose crop to meet the increasing demand for renewable sources of energy affordably. A specific aim was to predict the biomass yield as precisely as possible at an early selection stage. For this, a panel of 404 elite rye lines was genotyped and evaluated as testcrosses for grain yield and a subset of 274 genotypes additionally for biomass. Field trials were conducted at four locations in Germany in two years (eight environments). Hyperspectral fingerprints consisted of 400 discrete narrow bands (from 410 to 993 nm) and were collected in two points of time after heading for all hybrids in each site by an uncrewed aerial vehicle. In a first study, population parameters were estimated for different agronomic traits and a total of 23 vegetation indices. Dry matter yield showed significant genetic variation and was stronger correlated with plant height (r_g=0.86) than with grain yield (r_g=0.64) and individual vegetation indices (r_g: =<|0.35|). A multiple linear regression model based on plant height, grain yield, and a subset of vegetation indices surpassed the prediction ability for dry matter yield of models based only on agronomic traits by about 6 %. In a second study, whole-spectrum data was used to indirectly estimate dry matter yield. For this, single-kernel models based on hyperspectral reflectance-derived (HBLUP) and genomic (GBLUP) relationship matrices, a multi-kernel model combining both matrices, and a bivariate model fitted also with plant height as a secondary trait, were considered. HBLUP yielded superior predictive power than the models based on vegetation indices previously tested. The phenotypic correlations between individual wavelengths and dry matter yield were generally significant (p < 0.05) but low (r_p: =< |0.29|). Across environments and training set sizes, the bivariate model yielded the highest prediction abilities (0.56 – 0.75). All models profited from larger training populations. However, if larger training sets cannot be afforded, HBLUP emerged as a promising approach given its higher prediction power on reduced calibration populations compared to the well-established GBLUP. Before its incorporation into prediction models, filtering the hyperspectral data available by the least absolute shrinkage and selection operator (Lasso) was worthwhile to deal with data dimensionally. In a third study, the effects of trait heritability, as well as genetic and environmental relatedness on the prediction ability of GBLUP and HBLUP for biomass-related traits were compared. While the prediction ability of GBLUP (0.14 - 0.28) was largely affected by genetic relatedness and trait heritability, HBLUP was significantly more accurate (0.41 - 0.61) across weakly connected datasets. In this context, dry matter yield could be better predicted (up to 20 %) by a bivariate model. Nevertheless, due to environmental variances, genomic and reflectance-enabled predictions were strongly dependant on a sufficient environmental relationship between data used for model training and validation. In summary, to affordably breed rye as a double-purpose crop to meet the increasing bioenergy demands, the early prediction of biomass across selection cycles is crucial. Hyperspectral imaging has proven to be a suitable tool to select high-yielding biomass genotypes across weakly linked populations. Due to the synergetic effect of combining hyperspectral, genomic, and agronomic traits, higher prediction abilities can be obtained by integrating these data sources into bivariate models.Publication Prospects of genomic selection for disease resistances in winter wheat (Triticum aestivum L.)(2019) Grote, Cathérine Pauline; Miedaner, ThomasDie Ziele dieser Arbeit waren (i) die erstmalige Evaluierung des Effekts des Zwerggens Rht24 auf FHB- und STB-Resistenzen, Wuchshöhe und Ährenschieben im Vergleich zum weit genutzten Locus Rht-D1, (ii) die Untersuchung des Potenzials der nichtadaptierten QTL Fhb1 und Fhb5 für die Entwicklung von Kurzstrohweizen, (iii) die Analyse der Vorhersagegenauigkeit von GS innerhalb und zwischen Familien durch die Anwendung der beiden Modelle RR-BLUP (ridge-regression best linear unbiased prediction) und wRR-BLUP (weighted RR-BLUP) und (iv) die Berechnung des Selektionsgewinns bzw. die Bestimmung der korrekt selektierten Top-10 %-Genotypen für FHB- und STB-Resistenzen durch GS. Die Ergebnisse dieser Studie zeigten, dass das gibberellinsäuresensitive Zwerggen Rht24 auf Chromosom 6 die Wuchshöhe um durchschnittlich 8,96 cm senkte, ohne dabei die FHB- und STB-Resistenzen oder den Zeitpunkt des Ährenschiebens ungünstig zu beeinflussen. Demgegenüber senkte das weitläufig verwendete Allel Rht-D1b die FHB-Resistenz um durchschnittlich 10,05 Prozentpunkte in einer Winterweizenpopulation bestehend aus acht biparentalen Familien, die für diese Resistenzloci segregierten. Diese Arbeit hat zusätzlich aufgezeigt, dass die Resistenzallele von Fhb1 und Fhb5 die FHB-Anfälligkeit um 6,54 bzw. 11,33 Prozentpunkte reduzierten und somit bereits allein das nicht-adaptierte Allel Fhb5b in der Lage ist, den negativen Effekt von Rht-D1b auf die FHB-Resistenz im untersuchten Material auszugleichen. Das verdeutlicht, dass die Wahl der Zwerg- und Resistenzgene in Zuchtprogrammen, in denen FHB-Resistenz ein Selektionsmerkmal ist, von entscheidender Bedeutung ist. In dieser Studie wurde des Weiteren das Potenzial der GS innerhalb und zwischen Familien untersucht. Die Vorhersagegenauigkeiten innerhalb einer Familie waren für alle Zielmerkmale höher als die zwischen Familien und unterschieden sich zwischen den einzelnen Familien und Vorhersagekonstellationen. Die stärkere Gewichtung von signifikanten Markern durch das wRR-BLUP-Modell führte zu einer Verbesserung der Vorhersagegenauigkeit im Vergleich zum weit genutzten RR-BLUP-Modell, wenn einzelne Gene, wie Rht-D1, oder Major-QTL, wie Fhb5, vorhanden waren. In dieser Studie wurden die genomisch geschätzten Zuchtwerte (GEBVs) von 2.500 ungeprüften Genotypen bestimmt, basierend auf einer partiell verwandten Trainingspopulation von 1.120 Genotypen. Die 10 % FHB- und STB-resistentesten Linien und eine zufällige Stichprobe wurden unter Berücksichtigung der Wuchshöhe genomisch selektiert und phänotypisch in einem vierortigen Feldversuch evaluiert. Für die FHB-Resistenz wurde ein Selektionserfolg von 10,62 Prozentpunkten relativ zur zufällig selektierten Populationsstichprobe ermittelt. Die GS erhöhte die STB-Resistenz allerdings nur um 2,14 Prozentpunkte. Auch die Selektion von neuen Kreuzungseltern auf der Basis von GS erscheint nicht ausreichend zuverlässig, da nur 19 % der Top-10 %-Individuen korrekt selektiert wurden. Zusammenfassend stellt die GS ein wertvolles Werkzeug dar, um den Zuchtfortschritt für die komplex vererbte FHB-Resistenz über kürzere Zyklen und größere Populationen zu unterstützen. In Kombination mit der Nutzung geeigneter Zwerggene und des nicht adaptierten QTL Fhb5 kann dadurch eine Steigerung der FHB-Resistenz im Winterweizen erzielt werden.