Multi Trait Set Test (mtSet)¶
Set tests are a powerful approach for genomewide association testing between
groups of genetic variants and quantitative traits.
Here we implement MTSet
, a mixedmodel approach that enables joint
analysis across multiple correlated traits while accounting for population
structure and relatedness.
MTSet
effectively combines the benefits of set tests with multitrait
modeling and is computationally efficient, enabling genetic analysis of large
cohorts (up to 500,000 individuals) and multiple traits.
A detailed description of the method can be found at [CRS15].
References
[CRS15]  (1, 2) Casale FP, Rakitsch B, Lippert C, Stegle O (2015) Efficient set tests for the genetic analysis of correlated traits. Nature Methods, Vol. 12, No. 8. (15 June 2015), pp. 755758. 
Public interface¶

class
limix.mtset.
MTSet
(Y=None, R=None, S_R=None, U_R=None, traitID=None, F=None, rank=1)[source]¶ Multitrait set tests (mtSet).
Mixedmodel approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness [CRS15].
Parameters:  Y ((N, P) ndarray) – Individuals by phenotypes.
 F ((N, K) ndarray, optional) – Individuals by covariates. By default, no fixed effect term is set.
 R ((N, N) ndarray, optional) – LMMcovariance/kinship coefficients.
U_R
andS_R
can be provided instead ofR
. If neitherR
norU_R
andS_R
are provided, the null models has iid normal residuals.  U_R ((N, N) ndarray, optional) – Eigenvectors of
R
.U_R
andS_R
can be provided instead ofR
. If neitherR
norU_R
andS_R
are provided, iid normal residuals are considered.  S_R ((N, ) ndarray, optional) – Eigenvalues of
R
.U_R
andS_R
can be provided instead ofR
. If neitherR
norU_R
andS_R
are provided, iid normal residuals are considered.  traitID (ndarray, optional) – Phenotype IDs.
 rank (int, optional) – Rank of the trait set covariance. The default value is 1.
Examples
This example shows how to fit mtSet when modelling population structure/relatedness using the full genetic relatedness matrix.
>>> from numpy.random import RandomState >>> from limix.mtset import MTSet >>> from numpy import dot, eye, ones, concatenate >>> from numpy import set_printoptions >>> set_printoptions(4) >>> >>> random = RandomState(1) >>> >>> N = 100 >>> P = 2 >>> >>> pheno = random.randn(N, P) >>> pheno= pheno.mean(0) >>> pheno/= pheno.std(0) >>> >>> W = random.randn(N, 10) >>> kinship = dot(W, W.T) / float(10) >>> kinship+= 1e4 * eye(N) >>> >>> mtset = MTSet(pheno, R=kinship) >>> res_null = mtset.fitNull() >>> >>> print(res_null['conv'][0]) True >>> print('%.2f'%res_null['NLL0']) 98.43 >>> >>> S = 4 >>> snp_set = (random.rand(N, S) < 0.2).astype(float) >>> res = mtset.optimize(snp_set) >>> >>> print("%.4f" % res['LLR'][0]) 0.0616 >>> print(res['Cr']) [[ 0.0006 0.0025] [ 0.0025 0.0113]]
This example shows how to fit mtSet when modelling population structure/relatedness by introducing the top principle components of the genetic relatedness matrix (
pc_rrm
) as fixed effects.>>> random = RandomState(0) >>> >>> mean = ones((N, 1)) >>> pc_rrm = random.randn(N, 4) >>> covs = concatenate([mean, pc_rrm], 1) >>> >>> mtset = MTSet(pheno, F=covs) >>> res_null = mtset.fitNull() >>> >>> print(res_null['conv'][0]) True >>> print('%.2f'%res_null['NLL0']) 118.36 >>> >>> res = mtset.optimize(snp_set) >>> >>> print("%.4f" % res['LLR'][0]) 0.1373 >>> print(res['Cr']) [[ 0.0002 0.0019] [ 0.0019 0.0207]]

fitNull
(cache=False, out_dir=’./cache’, fname=None, rewrite=False, seed=None, factr=1000.0, n_times=10, init_method=None, verbose=False)[source]¶ Fit null model
Parameters:  () (verbose) –
 cache (bool, optional) –
If False (default), the null model is fitted and the results are not cached. If True, the cache is activated. The cache file dir and name can be specified using
hcache
andfname
. Whencache=True
, we distinguish the following cases: if the specified file does not exist, the output of the null model fiting is cached in the file.
 if the specified file exists and
rewrite=True
, the cache file is overwritten.  if the specified file exists and
rewrite=False
, the results from the cache file are imported (the null model is not refitted).
 out_dir (str, optional) – output dir of the cache file. The default value is “./cache”.
 fname (str, optional) – Name of the cache hdf5 file.
It must be specified if
cache=True
.  rewrite (bool, optional) – It has effect only if cache cache=True`.
In this case, if
True
, the cache file is overwritten in case it exists. The default value isFalse
 factr (float, optional) – optimization paramenter that determines the accuracy of the solution. By default it is 1000. (see scipy.optimize.fmin_l_bfgs_b for more details).
 verbose (bool, optional) – verbose flag.
Returns:  dictionary containing:
 B (ndarray): estimated effect sizes (null);
 Cg (ndarray): estimated relatedness trait covariance (null);
 Cn (ndarray): estimated genetic noise covariance (null);
 conv (bool): convergence indicator;
 NLL0 (ndarray): negative loglikelihood (NLL) of the null model;
 LMLgrad (ndarray): norm of the gradient of the NLL.
 time (time): elapsed time (in seconds).
Return type: (dict)

fitNullTraitByTrait
(verbose=False, cache=False, out_dir=’./cache’, fname=None, rewrite=False)[source]¶ Fit null model trait by trait

optimize
(G, params0=None, n_times=10, vmax=5, factr=10000000.0, verbose=False)[source]¶ Fit alternative model
Parameters:  G (ndarray) – (N, S) genotype values for N samples and S variants (defines the set component)
 params0 (ndarray, optional) – initial parameter values for optimization. By default the parameters are initialized by randonmly pertumrning the maximum likelihood estimators (MLE) under the null.
 n_times (ndarray, optional) – maximum number of restarts in case the optimization does not
converge.
It has no effect if
params0
is provided.  vmax (float, option) – maximum value for variance parameters that is accepted.
If any of the variance MLE is
>vmax
then we say that the optimization has not converged. Default value is 5.  factr (float, optional) – optimization paramenter that determines the accuracy of the solution. By default it is 1e7. (see scipy.optimize.fmin_l_bfgs_b for more details).
 verbose (bool, optional) – verbose flag.
Returns:  dictionary containing:
 Cr (ndarray): estimated set trait covariance (alt);
 Cg (ndarray): estimated relatedness trait covariance (alt);
 Cn (ndarray): estimated genetic noise covariance (alt);
 conv (ndarray): convergence indicator;
 NLLAlt (ndarray): negative loglikelihood (NLL) of the alternative model;
 LMLgrad (ndarray): norm of the gradient of the NLL.
 LLR (ndarray*): log likelihood ratio statistics;
 var (ndarray): (P, K) ndarray of variance estimates
for the
P
traits and theK
 time (ndarray): elapsed time (in seconds).
 nit (ndarray): number of iterations LBFGS. (see scipy.optimize.fmin_l_bfgs_b for more details).
Return type: (dict)