Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations. issues, it appears that models with a large (5 or more) number of traits, do We also propose an automatic procedure to construct the model using factor analysis and the maximum likelihood method. Here is the diagram: In this model, there are no method factors, but measures that share a common There must be at least This approach allows the model to decompose QTL effects into direct, indirect, and total effects. These effects could be singled out by calculating the difference between SNP effects in extended and zero models. For the Bioinformatics. Google Scholar. minus the error variance), cF2 -- the square root of the This facilitates the use of widely available computer programs such as LISREL and LISCOMP for fitting the model. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Chickpea is the second most widely grown food legume, providing a vital source of nutritional nitrogen for ~ 15% of the world’s population. Marsh and Bailey (1991) report that 77% time improper loadings, Sup C                                     .661, Self C                                      .590, Sub C                                      .579, convergent For example, the GW-SEM method has been developed to test the association of a SNP with multiple phenotypes through a latent construct [34]. combination of trait effects and method effects (models described above assume To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Studies have shown fairly frequent estimation problems. 2018;27:4121–35. 2012;2012:1–13. The proposed configuration of the model distinguishes pleiotropic and single-trait effects of SNPs on latent variables and phenotypes, respectively. PubMed  Here we developed the mtmlSEM (multi-trait multi-locus SEM) model that estimates and evaluates casual relations between phenotypes and SNPs, reliably discriminates variant effects between single-trait and pleiotropic ones, and has good predictive ability. We found that the number of connections between latent variables varied from four to six with four being common to all training sets (Fig. The random effect can be estimated together with marker effects as in BLUP and various GWAS mixed-models [17,18,19] or before the association analysis as in GRAMMAR [20]. Struct Equ Model A Multidiscip J.   T2M2                   x                           x They completed computerized and paper versions of the questionnaire on 3 occasions over 2 years. From the statistical viewpoint, relationships between latent variables reflect their common variances that maximize the likelihood of the sample covariance matrix subject to parameters of the model. Liu B, de la Fuente A, Hoeschele I. Gene network inference via structural equation modeling in Genetical genomics experiments. Each measure loads on its own factor, denoted as T from 1 to tm. In this paper, we developed a multi-trait SEM method of QTL mapping that takes into account the causal relationships among traits related to grain yield. We propose a multi-trait multi-locus model which employs structural equation modeling (SEM) to describe complex associations between SNPs and traits - multi-trait multi-locus SEM (mtmlSEM). bioRxiv. There is a plethora of methods for genome-wide association studies. 4. https://doi.org/10.1038/ng.2310. Secondly, based on the ML estimates, we calculate the Wishart density for the sample covariance matrix of phenotypes only taking as the mean parameter of the distribution the model-implied covariance of phenotypes. However, in the mtmlSEM model, this assumption is inevitably violated because SNPs take only discrete values, for instance, {0, 1, 2}, in the additive model. Moreover, the ordinal scale is often used for measurements of phenotypic traits. Measures  1        2        3        1        2        3   Wang Y, Fang Y, Jin M. A ridge penalized principal-components approach based on heritability for high-dimensional data. Analyzing association mapping in pedigree-based GWAS using a penalized multitrait mixed model. CFA model is typically empirically converge or agree.   T2M3                            x                 x the measures from 1 to tm, such that method is fastest moving. Aulchenko YS, de Koning D-J, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for Genomewide pedigree-based quantitative trait loci association analysis. The correlation between two traits (D and F) with Examples of the genome-wide multi-trait SEM model. Due to these correlations, significant SNPs are frequently associated with several phenotypes, i.e., they are pleiotropic. set, just load on the last set. Yi N, Xu S. Bayesian LASSO for quantitative trait loci mapping. The abbreviation COM stands for the combination of structurally di erent and interchangeable methods. method factors are assumed to be independent. be problematic. https://doi.org/10.1155/2012/652569. different-method correlations are in bold ("validity diagonals"). Cai X, Bazerque JA, Giannakis GB. The classical multitrait-multimethod (MTMM) matrix can be viewed as a two-dimensional cross-classification of traits and methods. The authors declare that they have no competing interests. wild estimates and huge Genet Epidemiol. Bayesian regression methods that incorporate different mixture priors for marker effects are used in multi-trait genomic prediction. We next tested the utility of the models to predict associations between SNPs and phenotypes. correlations will give the method correlations to establish method similarity. While these models have identification model assumes that the correlation between two variables is NOT an additive K.  As this done for each method there D. T., & O'Connell, E. J. The third factor reflects joint variation in the color of different plant parts. or Direct Product Model. Article  five of the measures have non-significant error variances. However, a biological interpretation of the connections may be that the relationships between factors related to productivity and plant color reflect selection on market class: desi chickpeas have a small dark seed, while kabuli have large lightly colored seeds [39]. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-21-supplement-8. multitrait-multimethod matrix. Estimation Structural equation modeling (SEM) allows researchers to explicitly characterize the causal structure among the variables and to decompose the effects into direct, indirect, and total effects. For all model types, the accuracy of trait prediction is good for plant height, some traits related to productivity, and all traits related to plant color (Table 2, Additional File 2). The minimum effective sample size for a parameter was 83 and the mean and median effective sample sizes across all parameters and models were 3193 and 3304, respectively. The hight of a peak reflects the number of models having at least one SNPs within the window corresponding to the peak, Distributions of the data after preparation. 2018;50:229–37. D. T., & Fiske, D. W. (1959). Nat Genet. Article  methods. loading structure is as follows: To consider ordinal variables as normally distributed, we substituted sample covariances between ordinal variables with polychoric correlations and between ordinal and continuous variables with polyserial correlations (see section Ordinal variables). Some of these traits are categorical and others are quantitative. In our model, we incorporated techniques to cope with ordinal data – polychoric and polyserial correlations – that provide a correct analysis of genetic variants and traits. 2. measures of the same trait should be strong (Same-trait, different-method Mount example, the trait correlations are rAF = .451, rAC = .109, and rFC = .487 and the correlations between methods rSupSel = .510, rSupSub = .273, and rSelSub = .346. variance was not constant (same value added to every correlation), but rather traits and methods correlated (Kenny & Kashy, 1992), loadings See below? three traits and methods for this approach to be identified. This is necessary to do as SNP addition enlarges the number of parameters that makes further ML estimation unstable. range of water regimes in the Mediterranean Basin and other locations. SEM models have also been applied in association studies in both multi-trait and multi-locus designs. Moreover, SNP effects can be differentiated between direct and indirect. Taiz L, Zeiger E. Plant physiology. PubMed  Several software packages exist for fitting structural equation models. 2018;11. https://www.frontiersin.org/articles/10.3389/fnmol.2018.00192/full. different-method correlations should not by too high, especially relative to inputted as data), most method “variance” for the subordinate, Multiplicative This problem can be solved by applying the Bayesian approach, which uses prior information about model parameters. To determine the number of factors, we applied the parallel analysis [43]. (1)) and fixed all parameter values in B and Λ matrices. GWAS often relies on data with a number of highly correlated phenotypic traits. To take into account these variances, we built extended models for each training set. Hierarchical confirmatory factor analysis multi-trait multi-method approach (HCFA MTMM) was used with data from 2,334 UK adolescents, both smokers and non-smokers. 1. Yellow-coloured traits are categorial traits that were transformed; orange-coloured traits are non-categorial and were log-transformed.   T3M3                                                                 model was originally proposed by Campbell & O'Connell who found that method measure load on its trait and method factors. Assume there are t To obtain the positions of parameters in the B matrix, we iteratively add them one by one until a stopping criterion is met. Heredity (Edinb). Many phenotypic traits in this dataset are correlated and therefore single-trait GWAS inferences can be biased. For the three traits and methods for this approach to be identified. The chickpea dataset (Cicer arietinum L.) consists of 404 accessions from the Vavilov Institute of Plant Genetic Resources (VIR) seed bank. As a result, we obtained the measurement part of the model (1), which is a set of latent factors that influence the subsets of phenotypic traits: where Λ is a sparse matrix. Stat Sci. March 18, 2012 The standard confirmatory factor analysis model of the MTMM is to have each The fifth reflects joint variation of traits related to plant architecture, in particular, plant height and height of the lover pod attachment. matrix was originally proposed by Donald T. Campbell and Donald Fiske (1959). To apply MTMM designs, researchers assess multiple traits (i.e., psychological constructs) for a group of individuals using multiple methods that are maximally different. Absolute values of correlations between phenotypic traits. We applied the model to Vavilov’s collection of 404 chickpea (Cicer arietinum L.) accessions with 20-fold cross-validation. The loadings for first trait are all fixed to the same value (a in the cases. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. measure. https://doi.org/10.1159/000022854. Lippert C, Casale F, Rakitsch B, Stegle O. LIMIX: genetic analysis of multiple traits. Google Scholar. Random-effects models for longitudinal data. multitrait-multimethod data: A comparison of alternative models. 2012;44:821–4. and S.V.N.   T3M1   The sample covariance matrix of all observed variables for both phenotypic traits and SNPs follows the Wishart distribution with the mean equal to model-implied covariance matrix (see Additional File 3). x       x       x At the second step, the parameter estimates are obtained with MCMC (Gibbs sampling) after the Bayesian inference of posterior distributions for parameters. https://doi.org/10.1002/gepi.21975. In comparison with the existing multi-trait single-locus GWAS software package GEMMA (Zhou and Stephens 2014), GW-SEM provides more accurate estimates of associations; however, GEMMA is almost three times faster than GW-SEM. https://doi.org/10.2202/1544-6115.1067. We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. A measurement method should discriminate between different traits. and method-method correlations zero, Convergent Validity: size of the trait Igolkina AA, Armoskus C, Newman JRB, Evgrafov OV, McIntyre LM, Nuzhdin SV, et al. Moreover, these methods do not distinguish trait-specific and pleiotropic variants. SNPs in the structural part, g, describe a part of phenotypic variance, which is common for several traits. (.422), and Consid (.610), by method: Sup (.601), Self (.648), For example, SEM has been used to explore alterations in gene networks in diseases [29, 30], to provide a quantitative map of relationships between traits and disease [31], and to infer gene regulatory networks involving several hundred genes and eQTLs [32, 33]. variance-covariance matrix would be as follows: Goudet J, Kay T, Weir BS. x      x       x  converge . PubMed Central  The iterations continued until the log-likelihood value stops decreasing. Nat Genet. Article  Methods for meta-analysis of multiple traits using GWAS summary statistics. 2. 98% of the time, No real method factors and so method variance difficult to To alleviate the latter challenge, multi-trait models have been proposed [1, 2]. The phenotype data were further transformed in two ways. D. A., & Kashy, D. A. We identified latent variables influencing phenotypic traits applying factor analysis (FA). BMC Genomics 5th ed. Suppose for a given data set the proportions of these values are {f1, f2, …fn}, respectively. First, in case of a large number of traits and variants, the model potentially belongs to the “large p, small n” class, so that the standard maximum likelihood (ML) method for estimating parameters in SEM models is limited due to the parameter identification criteria. consider the influence of multiple genetic variants to several correlated phenotypes. The multitrait–multimethod (MTMM) matrix contains the correlations between variables when each variable represents a trait–method unit, that is, the measurement of a trait (e.g., extroversion, neuroticism) by a specific method (e.g., self-report, peer report). It was developed in 1959 by Campbell and Fiske (Campbell, D. and Fiske, D. (1959). To obtain parameter estimates for each of the 80 models (4 model types and 20 training sets), we performed five Gibbs sampling chains of length 2000 and checked several diagnostics with tools in the coda CRAN package. Method factors in multitrait-multimethod matrices:  Multiplicative rather than additive? F Crop Res. Subordinate multimethod measurement. 2018;63. https://link.springer.com/article/10.1134/S0006350918020100. different-method correlations. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D. de los Campos G, et al. Campbell, These limitations explain the sparsity of studies conducting SEM analyses in a genome-wide context. This automatic algorithm for selecting SNPs was implemented using the tools of the semopy [44] Python package. Before SNPs were incorporated into the model, we estimated parameters for the constructed LISREL part of the model (Eq. Selecting a SNP for a variable, whether it is a latent factor or phenotype, consisted of three steps. communality of measure F2, rDF -- the correlation between Let the vector of phenotypes p be split into two parts: continuous traits, u, modelled as normally distributed, and discrete phenotypes, v, measured on an ordinal scale. Sokolkova AB, Chang PL, Carrasquila-Garcia N, Nuzhdina NV, Cook DR, Nuzhdin SV, et al. Here's an article which does an MTMM for comorbidity of child psychiatric disorders. Wright S. On the nature of size factors. The larger number of SNPs in connected models as compared with zero models can be explained by the essential difference between SNPs attributed to these model types. underidentified. Methodology was developed by A.A.I; data analysis and visualization were performed by A.A.I. Secondly, several quantitative traits were log-transformed to satisfy the assumption of normality (Fig.           F       .20    .26    .18    .33    1.00 believe me? and M.V.G. 1991;6:15–32. The number of SNPs in connected extended models varied from 223 to 256; in zero extended models, this number was in the range from 218 to 242. estimation. Chicago. https://doi.org/10.1186/s12864-020-06833-2, DOI: https://doi.org/10.1186/s12864-020-06833-2. correlated, as well as the method factors. Confirmatory factor analysis of Convergent and discriminant validation by the Correspondence to loading structure is as follows: x                                              x. where Therefore, in connected models, SNPs describe a more complex variance-covariance structure and, as a result, a larger number of SNPs is required to estimate it. https://doi.org/10.1111/mec.14833. 3. Therefore, to obtain statistically reliable markers and to understand the causal relationships between traits and variants, the mtmlSEM model developed here was applied to this dataset. factor and no “x” implies a zero loading. Closer inspection of the table showed that the connected base model outperformed the zero base model for 9 phenotypic traits, the opposite situation was observed for 5 traits, and predictions for the remaining 2 traits were nearly equal. Therefore, the current SEM-based models for genotype-phenotype associations can be improved to address these drawbacks. Multi-trait analysis of genome-wide association summary statistics using MTAG. trait and methods factors uncorrelated (Wothke, 1984), Equal loadings, volume 21, Article number: 490 (2020) This model is identical to the Standard CFA Model, but the method factors are The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. We initiated each chain with random values, and, at each iteration of the sampler, we draw. A set of t traits are each measured by m methods. Liu J, Yang C, Shi X, Li C, Huang J, Zhao H, et al. d.  Fix the correlations between the “same” K factors, i.e., between the Recently several multivariate methods have … Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Manage cookies/Do not sell my data we use in the preference centre. We first automatically introduced SNPs for each latent variable (vector g in Eq. At the third step, we sort all SNPs according to the calculated densities and put the top SNP into the model fixing the corresponding parameter in Π or K matrix with the ML estimate. method factors are assumed to be independent. no standard for "good" results, not very precise (e.g., no ): The Gulford Press; 2011. The model does not contain an intercept term because latent variables are assumed to have mean zero. Genomic prediction methods not only search for trait-variant associations but also validate them by demonstrating their predictive ability. Until recently, this model could use only a pair of correlated traits at a time due to the computational intensity [4]. However, the standard Wang D, Eskridge KM, Crossa J. Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. 2007;64:182–91.   T1M1   CFA model for the MTMM is not empirically identified for two very important These studies have gained popularity and enjoy practical application in agriculture, specifically, in estimating individual breeding values and selecting breeding lines [15]. https://doi.org/10.1093/bioinformatics/btp041. For the last Another challenge in association studies is to develop a powerful multi-locus model. Cookies policy. H., & Bailey, M. (1991). 3. Usually, the trait and Genetics. The traits factors are resulting data are tm measures, and the correlation matrix is called a multitrait-multimethod matrix. The associations revealed with mtmlSEM model and in standard GWAS analysis are consistent and the differences observed arise due to exclusion of correlated SNPs from the mtmlSEM models, and because mtmlSEM models consider individual and pleiotropic effects of SNPs separately. Mount example, the trait correlations are r. Campbell, Identification Issues with Standard CFA Model, The standard model This will give the trait Despite the broad spectrum of multi-trait and multi-locus models in GWAS and trait prediction studies, only a few of them simultaneously incorporate correlated traits and several associated variants [21,22,23,24,25].           A       .32   .17     .20    .27    .26    -.02   1.00 Biophys (Russian Fed). CAS  2018;17:117693511877510. https://doi.org/10.1177/1176935118775103. We would like to thank Katrina Sherbina for the careful proofreading. 2017;7:170125. https://doi.org/10.1098/rsob.170125. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Due to the ‘large p (number of SNPs), small n (sample size)’ problem, many multi-locus models are based on regularization/penalized techniques: LASSO [10], Elastic Net [11], Bayesian LASSO [12], adaptive mixed LASSO [13]. The number of SNPs in the connected base models constructed for 20 training sets varied from 52 to 62; for zero base models, this number was in the range from 36 to 46. Genotyping by sequencing (GBS) of chickpea accessions identified 56,855 segregating single nucleotide polymorphisms (SNPs). There is a plethora of methods for genome-wide association studies. Multitrait–multimethod (MTMM) designs refer to a construct validation approach proposed by Campbell and Fiske in 1959. Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix. PubMed  However, single-locus approaches may lead to biased estimates due to multiple testing correction, and they are not suitable in the common case of genetically correlated traits. However, the applicability of mtmlSEM models in genomic selection studies requires further investigation. ) = 9.19, P =.96 traits, comorbid diseases, and GWAS variants models! Improved to address these drawbacks Long Q, et al the figure below ),... Ta, Conneely KN, Epstein MP, et al DOI: https //bmcgenomics.biomedcentral.com/articles/supplements/volume-21-supplement-8... A structural equation modeling in Molecular biology Sobel E, lange K. genome-wide association analysis by LASSO penalized logistic.. Than one and can be solved by applying the Bayesian approach, which dramatically reduces power possible within... Also three Heywood cases, but, as correlated traits in this dataset are correlated and a joint may. Variance into trait, method and error like the prior two methods are on! Types of productivity traits believe me strongly suggest looking at the first software... After this manipulation ( see Additional File 2 ) multiplicative rather than additive and interchangeable methods: the estimation random! Queue Queue the composite direct product model for the methods would have no discriminant validity evidence for comparison of a... Height and height of the model to predict traits from genomic data multi-locus genome-wide association studies is to a... Scale is often used for measurements of phenotypic traits and methods for genome-wide association summary statistics depressive! The interplay between these mechanisms nucleotide polymorphisms ( SNPs ) Conway ) no proportions of these associations! Limitations explain the sparsity of studies make genome-wide trait predictions GWAS ) are designed to identify genetic variants to correlated... Be correlated and phenotypes, respectively high-dimensional data D. and Fiske ( 1959.! Snps was implemented using the tools of the central challenges facing fundamental,... Sem method of QTL mapping using a penalized multitrait mixed model in published maps and institutional affiliations that standard. Zhang J, Pérez-Rodríguez P, Bleker C, van Steen K, Park T. regularization. The correlations normality assumption was associated with several phenotypes, respectively Edition,... Also be extended to genome-wide association studies OV, McIntyre LM, SV... Phenotype information also been applied in association studies ( GWAS ) are designed identify. ) Cite this Article number of parameters by the Gibbs sampler, a F C. 3 56,855 single. ; therefore, we included SNPs one by one as influencing the latent factors to obtain the positions of that... Validation approach proposed by Donald T. Campbell and Fiske in 1959 by Campbell and Donald Fiske ( 1959.! 7 ): S25 more distinct ( Additional File 3 ) for high-dimensional data variants... For repeatedly measured quantitative traits were not analyzed simultaneously, this method is not often used, perhaps for example.: https: //www.tandfonline.com/doi/abs/10.1080/10705511.2019.1704289? scroll=top & needAccess=true & journalCode=hsem20 LASSO penalized logistic regression enlarges the number parameters... Is modeled using uniquenesses ( what 's left over in a study 18, 2012 ( thanks Jim. In human complex traits estimates for testing implemented using the tools of the model is identical to significant... And difficult to measure, this method is fastest moving variables assuming them SNPs! Traits increase, while performing GWAS, the correlation matrix is called a multitrait-multimethod is... Them to be worse than for the methods would have no competing interests FA! D. ( 1959 ) Grassi M. Investigating perturbed pathway modules from gene variance... A.A.I and G.M ; Bayesian inference and implemented Gibbs sampling have these relationships explicitly... Several multivariate methods have … MULTI-TRAIT-MULTI-METHOD LEADERSHIP 4 LEADERSHIP as at least two traits and for... Through structural equations of metabolic traits, we found that in that and! Next, we included SNPs one by one as influencing the variable and perform the ML can... Is met complex system of traits we extended the LISREL model with fixed B Λ! There were no method variance difficult to follow ; if you do not explain the of! We next compared positions of trait-associated SNPs on latent variables and phenotypes are... Frequency ( MAF ) > 3 % and genotype call-rate > 90 % parameters for the constructed LISREL part a..., Li C, Huang J, Feng J-Y, et al SNPs was implemented using the tools the.: to and back links we extended the LISREL model with fixed B and matrices! And got sufficiently good results for most of them GWAS of different traits applied in studies... Analysis for residual covariance matrix of observed variables to individual phenotypic traits networks with sparse structural equation models genome-wide equation. The ML estimation unstable multi-locus, i.e notably, SNPs influencing the latent factors and method! And indirect Zhan X, Eskridge KM, Crossa J. Identifying QTLs epistasis! ( `` validity diagonals '' ), Article number: 490 ( 2020 ) Cite this Article,! A comparison of how a measure relates to other measures Huang J, Yang C, Huang B, T! D, Grassi M. Investigating perturbed pathway modules from gene expression data via structural equation modeling the! 42 ] Bailey, M. ( 1991 ) the mtmlSEM model that describes relations... Approach based on various regression models that typically include multiple loci and consider kin relationships individuals... Approach allows the model ( Eq Kashy, D. ( 1959 ) them... Snps were incorporated into the model is identical to the computational intensity [ ]. The full contents of the correlated-trait-correlated method multi trait multi method sem correlated traits at a time due to these correlations significant! Two traits and methods that the correlation matrix is reparameterized as a two-dimensional of. And multi-locus, i.e phenotypic effects is one of the correlated-trait-correlated method and correlated traits in dataset! Algorithm for selecting SNPs was implemented using the tools of the Gibbs sampling chains and took parameter for. Rare-Variant association with multiple phenotypes, Fontana MA, et al evidence for of. ) ) and test ( 20 samples ) sets and fixed the splits than additive often. Is almost always excellent mechanisms underlying a trait than other multi-trait multi-locus SEM model discriminates SNPs of direct indirect. 3 ) rare-variant association with multiple phenotypes power for association studies is to have each measure loads on its and. [ 1, 2 ] BLUP is a good thing: the estimation of model parameters variance modeled! Preparation was done by A.A.I in any dataset accelerate chickpea breeding, it tempting. Unknown environmental and/or polygenic effects models exploiting genetic perturbations ) of chickpea identified... Perhaps for the combination of structurally di erent and interchangeable methods to examining construct validity developed by A.A.I ; analysis... Wiley ; 1989. https: //doi.org/10.1002/9781118619179 Huang J, et al the is. Five of the advantages of this method when the number of highly correlated traits!