Browsing by Subject "Prediction"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Publication Comparison of omics technologies for hybrid prediction(2019) Westhues, Matthias; Melchinger, Albrecht E.One of the great challenges for plant breeders is dealing with the vast number of putative candidates, which cannot be tested exhaustively in multi-environment field trials. Using pedigree records helped breeders narrowing down the number of candidates substantially. With pedigree information, only a subset of candidates need to be subjected to exhaustive tests of their phenotype whereas the phenotype of the majority of untested relatives is inferred from their common pedigree. A caveat of pedigree information is its inability to capture Mendelian sampling and to accurately reflect relationships among individuals. This shortcoming was mitigated with the advent of marker assays covering regions harboring causal quantitative trait loci. Today, the prediction of untested candidates using information from genomic markers, called genomic prediction, is a routine procedure in larger plant breeding companies. Genomic prediction has revolutionized the prediction of traits with complex genetic architecture but, just as pedigree, cannot properly capture physiological epistasis, referring to complex interactions among genes and endophenotypes, such as RNA, proteins and metabolites. Given their intermediate position in the genotype-phenotype cascade, endophenotypes are expected to represent some of the information missing from the genome, thereby potentially improving predictive abilities. In a first study we explored the ability of several predictor types to forecast genetic values for complex agronomic traits recorded on maize hybrids. Pedigree and genomic information were included as the benchmark for evaluating the merit of metabolites and gene expression data in genetic value prediction. Metabolites, sampled from maize plants grown in field trials, were poor predictors for all traits. Conversely, root-metabolites, grown under controlled conditions, were moderate to competitive predictors for the traits fat as well as dry matter yield. Gene expression data outperformed other individual predictors for the prediction of genetic values for protein and the economically most relevant trait dry matter yield. A genome-wide association study suggested that gene expression data integrated SNP interactions. This might explain the superior performance of this predictor type in the prediction of protein and dry matter yield. Small RNAs were probed for their potential as predictors, given their involvement in transcriptional, post-transcriptional and post-translational regulation. Regardless of the trait, small RNAs could not outperform other predictors. Combinations of predictors did not considerably improve the predictive ability of the best single predictor for any trait but improved the stability of their performance across traits. By assigning different weights to each predictor, we evaluated each predictors optimal contribution for attaining maximum predictive ability. This approach revealed that pedigree, genomic information and gene expression data contribute equally when maximizing predictive ability for grain dry matter content. When attempting to maximize predictive ability for grain yield, pedigree information was superfluous. For genotypes having only genomic information, gene expression data were imputed by using genotypes having both, genomic as well as gene expression data. Previously, this single-step prediction framework was only used for qualitative predictors. Our study revealed that this framework can be employed for improving the cost-effectiveness of quantitative endophenotypes in hybrid prediction. We hope that these studies will further promote exploring endophenotypes as additional predictor types in breeding.Publication Genome-wide association mapping of molecular and physiological component traits in maize(2013) Riedelsheimer, Christian; Melchinger, Albrecht E.Genome-wide association (GWA) mapping emerged as a powerful tool to dissect complex traits in maize. Yet, most agronomic traits were found to be highly polygenic and the detected associations explained together only a small portion of the total genetic variance. Hence, the majority of genetic factors underlying many agronomically important traits are still unknown. New approaches are needed for unravelling the chain from the genes to the phenotype which is still largely unresolved for most quantitative traits in maize. Instead of further enlarging the mapping population to increase the power to detect even smaller QTL, this thesis research aims to present an alternative route by mapping not the polygenic trait of primary interest itself, but genetically correlated molecular and physiological component traits. As such components represent biological sub-processes underlying the trait of interest, they are supposed to be genetically less complex and thus, more suitable for genetic mapping. Using large diversity panels of maize inbred lines, this approach is demonstrated with (i) biomass yield by using metabolites and lipids as molecular component traits and with (ii) chilling sensitivity by using physiological component traits such as photosynthesis parameters derived from chlorophyll fluorescence measurements. In a first step, we developed a sampling and randomization scheme which allowed us to obtain metabolic and lipid profiles from large-scale field trials. Both profiles were found to be inten- sively structured reflecting their functional grouping. They also showed repeatabilities higher than in comparable profiles obtained in previous studies with the model plant Arabidopsis under controlled conditions. By applying GWAS with 56,110 SNPs to metabolites and lipids, large-scale genetic associations explaining more than 30 % of the genetic variance were detected. Confounding with structure was found to be a problem of less extent for molecular components than for agronomic traits like flowering time. The lipidome was also found to show a multilevel control architecture similar as employed in controlling complex mechanical systems. In several instances, direct links between candidate genes underlying the detected associations and agronomic traits could be established. An example is cinnamoyl-CoA reductase, a key enzyme in the lingin biosynthesis pathway. It was found to be a candidate gene underlying a major QTL found for several intermediates in the lignin biosynthesis pathways. These intemediates were in turn found to be correlated with plant height, lignin content, and dry matter yield at the end of the vegetation period. The different signs of these correlations indicated that the relationships between pathway intermediates and the final product is not simple. Directly modeling complex traits with individual component traits may therefore require consideration of feedback loops and other interdependencies. Such connections were however found difficult to be established with physiological components underlying chilling sensitivity. The main reasons for this were the weak correlations between physiological components under controlled conditions and chilling sensitivity in the field as well as high levels of genotype × environment interactions caused by the complex and environment- dependent responses of maize after perception of chilling temperatures. The approach explored in this thesis research uses component traits to gain biological insights about the genetic control of biomass yield and chilling sensitivity evaluated in diverse populations of still manageable sizes. We showed that GWAS with 56k SNPs can identify large additive effects for component traits correlated with these traits. For mapping epistatic interactions and rare variants, classical linkage mapping with biparental populations will be a reasonable complementary approach. However, controlling and modeling genotype × environment interactions remains an important issue for understanding the genetic basis of especially chilling sensitivity. If the goal is merely to predict the phenotypic value in a given set of en- vironments, black-box genomic selection methods with either SNPs, molecular profiles, or a combination of both, are very promising strategies to achieve this goal.Publication Genomic selection in synthetic populations(2017) Müller, Dominik; Melchinger, Albrecht E.The foundation of genomic selection has been laid at the beginning of this century. Since then, it has developed into a very active field of research. Although it has originally been developed in dairy cattle breeding, it rapidly attracted the attention of the plant breeding community and has, by now (2017), developed into an integral component of the breeding armamentarium of international companies. Despite its practical success, there are numerous open questions that are highly important to plant breeders. The recent development of large-scale and cost-efficient genotyping platforms was the prerequisite for the rise of genomic selection. Its functional principle is based on information shared between individuals. Genetic similarities between individuals are assessed by the use of genomic fingerprints. These similarities provide information beyond mere family relationships and allow for pooling information from phenotypic data. In practice, first a training set of phenotyped individuals has to be established and is then used to calibrate a statistical model. The model is then used to derive predictions of the genomic values for individuals lacking phenotypic information. Using these predictions can save time by accelerating the breeding program and cost by reducing resources spent for phenotyping. A large body of literature has been devoted to investigate the accuracy of genomic selection for unphenotyped individuals. However, training individuals are themselves often times selection candidates in plant breeding, and there is no conceptual obstacle to apply genomic selection to them, making use of information obtained via marker-based similarities. It is therefore also highly important to assess prediction accuracy and possibilities for its improvement in the training set. Our results demonstrated that it is possible to increase accuracy in the training set by shrinkage estimation of marker-based relationships to reduce the associated noise. The success of this approach depends on the marker density and the population structure. The potential is largest for broad-based populations and under a low marker density. Synthetic populations are produced by intermating a small number of parental components, and they have played an important role in the history of plant breeding for improving germplasm pools through recurrent selection as well as for actual varieties and research on quantitative genetics. The properties of genomic selection have so far not been assessed in synthetics. Moreover, synthetics are an ideal population type to assess the relative importance of three factors by which markers provide information about the state of alleles at QTL, namely (i) pedigree relationships, (ii) co-segregation and (ii) LD in the source germplasm. Our results show that the number of parents is a crucial factor for prediction accuracy. For a very small number of parents, prediction accuracy in a single cycle is highest and mainly determined by co-segregation between markers and QTL, whereas prediction accuracy is reduced for a larger number of parents, where the main source of information is LD within the source germplasm of the parents. Across multiple selection cycles, information from pedigree relationships rapidly vanishes, while co-segregation and ancestral LD are a stable source of information. Long-term genetic gain of genomic selection in synthetics is relatively unaffected by the number of parents, because information from co-segregation and from ancestral LD compensate for each other. Altogether, our results provide an important contribution to a better understanding of the factors underlying genomic selection, and in which cases it works and what information contributes to prediction accuracy.Publication Gut microbiota patterns predicting long-term weight loss success in individuals with obesity undergoing nonsurgical therapy(2022) Bischoff, Stephan C.; Nguyen, Nguyen K.; Seethaler, Benjamin; Beisner, Julia; Kügler, Philipp; Stefan, ThorstenThe long-term success of nonsurgical weight reduction programs is variable; thus, predictors of outcome are of major interest. We hypothesized that the intestinal microbiota known to be linked with diet and obesity contain such predictive elements. Methods: Metagenome analysis by shotgun sequencing of stool DNA was performed in a cohort of 15 adults with obesity (mean body mass index 43.1 kg/m2) who underwent a one-year multidisciplinary weight loss program and another year of follow-up. Eight individuals were persistently successful (mean relative weight loss 18.2%), and seven individuals were not successful (0.2%). The relationship between relative abundancies of bacterial genera/species and changes in relative weight loss or body mass index was studied using three different statistical modeling methods. Results: When combining the predictor variables selected by the applied statistical modeling, we identified seven bacterial genera and eight bacterial species as candidates for predicting success of weight loss. By classification of relative weight-loss predictions for each patient using 2–5 term models, 13 or 14 out of 15 individuals were predicted correctly. Conclusions: Our data strongly suggest that gut microbiota patterns allow individual prediction of long-term weight loss success. Prediction accuracy seems to be high but needs confirmation by larger prospective trials.Publication Operational poverty targeting by proxy means tests : models and policy simulations for Malawi(2010) Houssou, Nazaire S. I.; Zeller, ManfredThere is a long standing belief that accurate targeting of public policy can play a major role in alleviating poverty and fostering pro-poor economic growth. Many development programs fail to reach the poor in that a sizeable amount of program benefits leak to higher-income groups and a substantial proportion of poor are excluded. This is also the case in Malawi, one of the poorest countries in Sub-Saharan Africa. In response to widespread poverty and endemic food insecurity, the country decision makers enacted various programs, including free food, food-for-work, cash-for-work, subsidized agricultural inputs, etc. To target these programs at the poor and smallholder farmers in the country, policy makers rely mainly on community-based targeting systems in which local authorities, village development committees, and other community representatives identify program beneficiaries based on their assessment of the household living conditions. However, most of these programs have been characterized by poor targeting and significant leakage of benefits to the non-poor due to a number of factors, including various local perceptions, favoritism, abuse, lack of understanding of targeting criteria, political interests, etc. Almost all interventions are poorly targeted in the country. Therefore, this research explores potential methods and models that might improve the targeting efficiency of agricultural and development policies in the country. Using the Malawi Second Integrated Household (IHS2) survey data and a variety of estimation methods along with stepwise selection of variables, we propose empirical models for improving the poverty outreach of agricultural and development policies in rural and urban Malawi. Moreover, the research analyzes the out-of-sample performances of different estimation methods in identifying the poor and smallholder farmers. In addition, the model robustness was assessed by estimating the prediction intervals out-of-sample using bootstrapped simulation methods. Furthermore, we estimate the cost-effectiveness and impacts of targeting the poor and smallholder farmers. It is often argued that targeting is cost-ineffective and once all targeting costs have been considered, a finely targeted program may not be any more cost-efficient and may not have any more impact on poverty than a universal program. We assess whether this is the case using household-level data from Malawi. More importantly, we evaluate whether administering development programs using the newly developed models is more target- and cost-efficient than past agricultural subsidy programs namely the 2000/2001 Starter Pack and the 2006/2007 Agricultural Input Support Program (AISP). Estimation results suggest that under the newly designed system, mis-targeting is considerably reduced and the targeting efficiency of development policies improves compared to the currently used mechanisms in the country. Findings indicate that the estimation methods applied achieve the same level of targeting performance. The rural model achieves an average poverty accuracy of about 72% and a leakage of 27% when calibrated to the national poverty line of 44.29 Malawi Kwacha (MK). On the other hand, the urban model yields on average a poverty accuracy of about 62% and a leakage of 39% when calibrated to the same poverty line. The results are also confirmed by the Receiver Operating Characteristic (ROC) curves of the models which show that there is no sizeable difference in aggregate predictive accuracy between the estimation methods. The ROC curve is a powerful tool that can be used by policy makers and project managers to decide on the number of poor a program or development policy should reach and ponder on the number of non-poor that would also be wrongly targeted. Calibrating the models to a higher poverty line improves its targeting performances, while calibrating the models to a lower line does the opposite. For example, under the international poverty line of US$1.25 (i.e. MK59.18 in Purchasing Power Parity), the rural model covers about 82% of the poor and wrongly targets only 16% of the non-poor, whereas the urban model covers about 74% of the poor and wrongly identifies 26% of the non-poor. On the other hand, using an extreme poverty line of MK29.81 disappointingly reduces the model?s poverty accuracy and leakage: the rural model yields a poverty accuracy of 51% and a leakage of 39% while the urban model yields a poverty accuracy of about 48% and a leakage of 68%. Furthermore, a breakdown of targeting errors by poverty deciles indicates that the models perform well in terms of those who are mistargeted; covering most of the poorest deciles and excluding most of the richest ones. These results have obvious desirable welfare implications for the poor and smallholder farmers. It is all important to mention that the models selected cannot explain but predict poverty. A causal relationship should not be inferred from the results. There is compelling evidence in favor of targeting since considering all costs does not make targeting cost- and impact-ineffective. Findings suggest that the new system is considerably more accurate and more target-efficient than the currently used mechanisms for targeting agricultural inputs in the country. Likewise, simulation results indicate that targeting the poor and smallholder farmers is more cost- and impact-effective than universal coverage of the population. Better targeting not only reduces the Malawian Government?s direct costs for providing benefits, but also reduces the total costs of a targeted program. Though administrative costs increase with finer targeting, the results indicate that the overall benefits outweigh the costs of targeting. Likewise, finer targeting reduces the costs of leakage by a sizable margin and produces the highest impacts on poverty compared to universal regimes. However, the finest redistribution does not consistently yield the best transfer efficiency, nor does it consistently improve post-transfer poverty. Furthermore, the newly designed system appears to be more cost-efficient than the 2000/2001 Starter Pack and the 2006/2007 Agricultural Input Support Program (AISP). While the Starter Pack and the AISP transferred about 50% of total transfer, under the new system about 73% of transfer is delivered to the poor and smallholder farmers. Likewise, under the new proxy system the costs of leakage are cut down by 55% and 57% for the Starter Pack and AISP, respectively. Thus, under the new system it is possible to reduce leakage and undercoverage rates and improve the cost and transfer efficiency of development programs in the country. The proxy indicators selected reflect the local communities? understandings of poverty and include variables from different dimensions, such as demography, education, housing, and asset ownership. These indicators are objective and most can be easily verified. However, the collection of information on those indicators might entail an effective verification process. Likewise, the emphasis put on proxy means tests in this research does not imply that other potential targeting methods should be disregarded. Indeed, proxy means tests are not perfect at targeting; the system developed can be combined with other methods in a multi-stage targeting process. Furthermore, targeting can be a politically sensitive issue; the system developed does not take into account the reality that policy makers, program managers, or development practitioners may adjust eligibility criteria due to political, administrative, budgetary, or other reasons. The models developed can be used in a wide range of applications, such as identifying the poor and smallholder farmers, improving the existing targeting mechanisms of agricultural input subsidies, assessing household eligibility to welfare programs and safety net benefits, producing estimates of poverty rates and monitoring changes in poverty over time as the country and donors cannot afford the costs of frequent household expenditure surveys, estimating the impacts of development policies targeted to those living below the poverty line, and assessing the poverty outreach of microfinance institutions operating in the country. This broad range of applications makes the models potentially interesting policy tools for the country. However, the models developed are not sufficient. They must also be coupled with investments in education, rural infrastructure, economic growth related sectors, and strong political will to impact on the welfare of Malawian people. The research also provides a framework for developing and evaluating a simple and reasonably accurate system for reaching the poor and smallholder farmers in Malawi, but the methodology can be useful in other areas of applied research and replicated in other developing countries with similar targeting problems.Publication Prediction of hybrid performance in maize using molecular markers(2008) Schrag, Tobias; Melchinger, Albrecht E.Maize breeders develop a large number of inbred lines in each breeding cycle, but, owing to resource constraints, evaluate only a small proportion of all possible crosses among these lines in field trials. Therefore, predicting the performance of hybrids by utilising the data available from related crosses to identify untested but promising hybrids is extremely important. The objectives of this thesis research were to develop and evaluate methods for marker-based prediction of hybrid performance (HP) in unbalanced data as typically generated in commercial maize hybrid breeding programs. For HP prediction, a promising approach uses the sum of effects across quantitative trait loci (QTL) as predictor. However, comparison of this approach with established prediction methods based on general combining ability (GCA) was lacking. In addition, prediction of specific combining ability (SCA) is also possible with this approach, but was so far not used for HP prediction. The objectives of the first study in this thesis were to identify QTL for grain yield and grain dry matter content, combine GCA with marker-based SCA estimates for HP prediction, and compare marker-based prediction with established methods. Hybrids from four Dent × Flint factorial mating experiments were evaluated in field trials and their parental inbreds were genotyped with amplified fragment length polymorphism (AFLP) markers. Efficiency for prediction of hybrids, of which both parents were testcross evaluated (Type 2), was assessed by leave-one-out cross-validation. The established GCA-based method predicted HP better than the approach exclusively based on markers. However, with greater relevance of SCA, combining GCA with marker-based SCA estimates was superior compared with HP prediction based on GCA only. Linkage disequilibrium between markers was expected to reduce the prediction efficiency due to inflated QTL effects and reduced power. Thus, in the second study, multiple linear regression (MLR) with forward selection was employed for HP prediction. In addition, adjacent markers in strong linkage disequilibrium were combined into haplotype blocks. An approach based on total effects of associated markers (TEAM) was developed for multi-allelic haplotype blocks. Genome scans to search for significant QTL involve multiple testing of many markers, which increases the rate of false-positive associations. Thus, the TEAM approach was enhanced by controlling the false discovery rate. Considerable loss of marker information can be caused by few missing observations, if the prediction method depends on complete marker data. Therefore, the TEAM approach was improved to cope with missing marker observations. Modification of the cross-validation procedure reflected, that often only a subset of parental lines is crossed with all lines from the opposite heterotic group in a factorial mating design. The prediction approaches were evaluated with the same field data as in the previous study. The results suggested that with haplotype blocks instead of original marker data, similar or higher efficiencies for HP prediction can be achieved. Marker-based HP prediction of inter-group crosses between lines, which were marker genotyped but not testcross evaluated, was not investigated hitherto. Heterosis, which considerably contributes to maize grain yield, was so far not incorporated into marker-based HP prediction. Combined analyses of field trials from multiple experiments of a breeding program provide valuable data for HP prediction. With a mixed linear model analysis of such unbalanced data from nine factorial mating experiments, best linear unbiased prediction (BLUP) values for HP, GCA, SCA, line per se performance, and heterosis of 400 hybrids were obtained in the third study. The prediction efficiency was assessed in cross-validation for prediction of hybrids, of which none (Type 0) or one (Type 1) parental inbred was testcross evaluated. An extension of the established HP prediction method based on BLUP of GCA and SCA, but not using marker data, resulted in prediction efficiency intermediate for Type 1 and very low for Type 0 hybrids. Combining line per se with marker-based heterosis estimates (TEAM-LM) mostly resulted in the highest prediction efficiencies of grain yield and grain dry matter content for both Type 0 and Type 1 hybrids. For the heterotic trait grain yield, the highest prediction efficiencies were generally obtained with marker-based TEAM approaches. In conclusion, this thesis research provided methods for the marker-based prediction of HP. The experimental results suggested that marker-based HP prediction is an efficient tool which supports the selection of superior hybrids and has great potential to accelerate commercial hybrid breeding programs in a very cost-effective manner. The significance of marker-based HP prediction is further enhanced by recent advances in production of doubled haploid lines and high-throughput technologies for rapid and inexpensive marker assays.Publication Relevance of amino acid digestibility for the protein utilization efficiency in poultry(2022) Siegert, Wolfgang; Siegert, WolfgangOne aim of poultry nutrition research that has been pursued for decades is to decrease the ingested protein relative to the protein accreted in animal body weight or eggs, which is described in the key figure ‘protein utilization efficiency’. Increasing protein utilization efficiency aims to ensure global food and water security and to minimize the effects of excreted nitrogenous compounds on the environment and the health of animals and humans. Protein utilization efficiency can be increased by adjusting the supply of digestible amino acids to animals relative to the requirement for digestible amino acids. The predictability of amino acid digestibility of feed ingredients is a prerequisite to achieve this goal. This habilitation thesis puts knowledge gained from studies on methods of amino acid digestibility determination, influences on amino acid digestibility, and variation in amino acid digestibility within feed ingredients into the context of predictability of amino acid digestibility. Methodological, dietrelated, and animal-related influences that considerably determine amino acid digestibility are presented and evaluated. This includes feed intake, feed provisioning, feed processing, chemical composition of feed ingredients, feed enzymes, and microbiota in the digestive tract. Cropping conditions influencing amino acid digestibility are also addressed. The gained insights may contribute to make amino acid digestibility more predictable in the future. Recent attempts to predict amino acid digestibility, however, have not been sufficiently accurate to fulfill the aim of being able to formulate diets according to the requirement for digestible amino acids in practice. Suggestions for future strategies to work toward a more accurate predictability of amino acid digestibility are included. Model calculations show that increasing amino acid digestibility can considerably raise protein utilization efficiency. When amino acid digestibility is increased by an influence not related to the feed ingredient providing amino acids (e.g., supplemented enzymes), increasing amino acid digestibility by 1 percentage point raises the protein utilization efficiency by ~0.43 percentage points. An increase in protein utilization efficiency of up to 0.5 percentage points can be expected when amino acid digestibility is increased by selecting variants of a feed ingredient for higher amino acid digestibility. The thesis concludes with a critical examination of the general perception that higher amino acid digestibility and maximized protein utilization efficiency are advantageous. Situations in which lower amino acid digestibility and smaller protein utilization efficiency provide benefits are discussed.