Statistics

PCA

limix.stats.pca(X, ncomp)[source]

Principal component analysis.

Parameters:
  • X (array_like) – data.
  • ncomp (int) – number of components.
Returns:

dict containing:
  • components (array_like): first components ordered by explained variance.
  • explained_variance (array_like): explained variance.
  • explained_variance_ratio (array_like): percentage of variance explained.

Return type:

dict

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import pca
>>> from numpy import set_printoptions
>>> set_printoptions(4)
>>>
>>> X = RandomState(1).randn(4, 5)
>>> result = pca(X, ncomp=2)
>>> print(result['components'])
[[-0.7502  0.5835 -0.0797  0.1957 -0.2285]
 [ 0.4884  0.7227  0.0197 -0.4616 -0.1603]]
>>> print(result['explained_variance'])
[ 4.8349  0.3859]
>>> print(result['explained_variance_ratio'])
[ 0.9205  0.0735]

Normalization

limix.stats.boxcox(X)[source]

Gaussianize X using the Box-Cox transformation.

Each phentoype is brought to a positive schale by first subtracting the minimum value and adding 1. Then each phenotype is transformed by the Box-Cox transformation.

Parameters:X (array_like) – samples by phenotypes.
Returns:Box-Cox power transformed array.
Return type:array_like

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import boxcox
>>> from numpy import set_printoptions
>>> set_printoptions(4)
>>>
>>> random = RandomState(0)
>>> X = random.randn(5, 2)
>>>
>>> print(boxcox(X))
[[ 2.7136  0.9544]
 [ 1.3844  1.6946]
 [ 2.9066  0.    ]
 [ 1.3407  0.644 ]
 [ 0.      0.9597]]

P-values

limix.stats.qvalues(pv, m=None, return_pi0=False, lowmem=False, pi0=None, fix_lambda=None)[source]

FDR estimation using Benjamini-Hochberg and Stories methods

Parameters:
  • pv (array_like) – pvalues.
  • m (int) – total number of tests. The default value is len(pv).
  • return_pi0 (bool) – if True, also returns the estimated fraction of null p-values.
  • lowmem (bool) – employs low-memory implementation that only uses one pv and one qv matrix.
  • pi0 – fraction of true null.
Returns:

tuple containing:
  • qv (array_like): estimated qvalues
  • pi0 (float): estimated fraction of null p-values. Only provided if return_counts is True.

Return type:

(tuple)

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import qvalues
>>> from numpy import set_printoptions
>>> set_printoptions(4)
>>>
>>> pv = RandomState(1).rand(1000)
>>> qv, pi0 = qvalues(pv, return_pi0 = True)
>>> print(qv[:4])
[ 0.9874  0.9874  0.1144  0.9874]
>>> print('%.2f' % pi0)
1.00
limix.stats.empirical_pvalues(xt, x0)[source]

Function to compute empirical P values.

Compute empirical P values from the test statistics observed on the data and the null test statistics (from permutations, parametric bootstraps, etc).

Parameters:
  • xt (array-like) – test statistcs observed on data.
  • x0 (array-like) – null test statistcs. The minimum P value that can be estimated is 1./float(len(x0)).
Returns:

estimated empirical P values.

Return type:

pv (ndarray)

Example

>>> from numpy.random import RandomState
>>> from limix.stats import empirical_pvalues
>>> from numpy import set_printoptions
>>> set_printoptions(4)
>>> random = RandomState(1)
>>>
>>> xt = random.chisquare(1, 1000)
>>> x0 = random.chisquare(1, 10000)
>>>
>>> pv = empirical_pvalues(xt, x0)
>>>
>>> print(pv.shape)
(1000,)
>>>
>>> print(pv[:4])
[ 0.5599  1.      0.8389  0.7975]

Kinship processing

limix.stats.linear_kinship(G, out=None, verbose=True)[source]

Estimate Kinship matrix via linear kernel.

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import linear_kinship
>>>
>>> random = RandomState(1)
>>> X = random.randn(4, 100)
>>> K = linear_kinship(X, verbose=False)
>>> print(K)
[[ 0.9131 -0.1928 -0.3413 -0.379 ]
 [-0.1928  0.8989 -0.2356 -0.4704]
 [-0.3413 -0.2356  0.9578 -0.3808]
 [-0.379  -0.4704 -0.3808  1.2302]]
limix.stats.gower_norm(K, out=None)[source]

Perform Gower rescaling of covariance matrix K.

The rescaled covariance matrix has sample variance of 1.

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import gower_norm
>>> import scipy as sp
>>>
>>> random = RandomState(1)
>>> X = random.randn(4, 4)
>>> K = sp.dot(X,X.T)
>>> Z = random.multivariate_normal(sp.zeros(4), K, 50)
>>> print("%.3f" % sp.mean(Z.var(1,ddof=1)))
2.335
>>> Kn = gower_norm(K)
>>> Zn = random.multivariate_normal(sp.zeros(4), Kn, 50)
>>> print("%.3f" % sp.mean(Zn.var(1, ddof=1)))
0.972
limix.stats.indep_pairwise(X, window_size, step_size, threshold, verbose=True)[source]

Determine pair-wise independent variants.

Independent variants are defined via squared Pearson correlations between pairs of variants inside a sliding window.

Parameters:
  • X (array_like) – Sample by variants matrix.
  • window_size (int) – Number of variants inside each window.
  • step_size (int) – Number of variants the sliding window skips.
  • threshold (float) – Squared Pearson correlation threshold for independence.
  • verbose (bool) – True for progress information; False otherwise.
Returns:

ok

Return type:

boolean array defining independent variants

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import indep_pairwise
>>>
>>> random = RandomState(0)
>>> X = random.randn(10, 20)
>>>
>>> indep_pairwise(X, 4, 2, 0.5, verbose=False)
array([ True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True], dtype=bool)
limix.stats.maf(X)[source]

Compute minor allele frequencies.

It assumes that X encodes 0, 1, and 2 representing the number of alleles.

Parameters:X (array_like) – Genotype matrix.
Returns:minor allele frequencies.
Return type:array_like

Examples

>>> from numpy.random import RandomState
>>> from limix.stats import maf
>>>
>>> random = RandomState(0)
>>> X = random.randint(0, 3, size=(100, 10))
>>>
>>> print(maf(X))
[ 0.49   0.49   0.445  0.495  0.5    0.45   0.48   0.48   0.47   0.435]

Chi2

limix.stats.Chi2mixture(scale_min=0.1, scale_max=5.0, dof_min=0.1, dof_max=5.0, n_intervals=100, qmax=0.1, tol=0)[source]

A class for continuous random variable following a chi2 mixture

Class for evaluation of P values for a test statistic that follows a two-component mixture of chi2

\[(1-\pi)\chi^2(0) + \pi a \chi^2(d).\]

Here \(\pi\) is the probability being in the first component and \(a\) and \(d\) are the scale parameter and the number of degrees of freedom of the second component.

Parameters:
  • scale_min (float) – minimum value used for fitting the scale parameter
  • scale_max (float) – maximum value used for fitting the scale parameter
  • dofmin (float) – minimum value used for fitting the dof parameter
  • dofmax (float) – maximum value used for fitting the dof parameter
  • qmax (float) – only the top qmax quantile is used for the fit
  • n_interval (int) – number of intervals when performing gridsearch
  • tol (float) – tolerance of being zero

Examples

>>> from numpy.random import RandomState
>>> import scipy as sp
>>> from limix.stats import Chi2mixture
>>>
>>> scale = 0.3
>>> dof = 2
>>> mixture = 0.2
>>> n = 100
>>>
>>> random = RandomState(1)
>>> x =  random.chisquare(dof, n)
>>> n0 = int( (1-mixture) * n)
>>> idxs = random.choice(n, n0, replace=False)
>>> x[idxs] = 0
>>>
>>> chi2mix = Chi2mixture(scale_min=0.1, scale_max=5.0,
...                       dof_min=0.1, dof_max=5.0,
...                       qmax=0.1, tol=4e-3)
>>> chi2mix.estimate_chi2mixture(x)
>>> pv = chi2mix.sf(x)
>>> print(pv[:4])
[ 0.2  0.2  0.2  0.2]
>>>
>>> print('%.2f' % chi2mix.scale)
1.98
>>> print('%.2f' % chi2mix.dof)
0.89
>>> print('%.2f' % chi2mix.mixture)
0.20