Utility functions

Public interface

limix.util.sets_from_bim(bim, size=50000, step=None, chrom=None, minSnps=1, maxSnps=None)[source]

Builds a dataframe of variant-sets from a bim considering a sliding window approach.

Parameters:
  • bim (pandas.DataFrame) – bim dataframe from :func:limix.io.read_plink.
  • size (int, optional) – set size (in base pairs). The default value is 50000.
  • step (int, optional) – sliding-window step. The default value is size/2.
  • chrom (int, optional) – can be set to extract set file for an only chromosome. By default a genome-wide set file is built.
  • minSnps (int, optional) – minimin number of variants. Sets with number of variants that is lower than minSnps are excluded. Default value is 1.
  • maxSnps (int, optional) – maximum number of variants. Default value is numpy.inf.
Returns:

dataframe of variant-sets. It has columns:

  • ”setid”: window id
  • ”chrom”: chromosome
  • ”start”: start position
  • ”end”: end position
  • ”nsnps”: number of variants in the region

Return type:

pandas.DataFrame

limix.util.annotate_sets(sets, bim, minSnps=1, maxSnps=None)[source]

Helper function to annotate and filter variant-sets.

Provided the variant-sets to consider and the bim of the bed file to analyse, it computes the number of variants within each set and filters them and returns a variant-set dataframe with all info.

Parameters:
  • sets (pandas.DataFrame) –

    dataframe defining the variant-sets. It should contain the columns:

    • ”setid”: set id
    • ”chrom”: chromosome
    • ”start”: start position
    • ”end”: end position
  • bim (pandas.DataFrame) – bim dataframe from :func:limix.io.read_plink.
  • minSnps (int, optional) – minimin number of variants. Sets with number of variants that is lower than minSnps are excluded. Default value is 1.
  • maxSnps (int, optional) – maximum number of variants. Default value is numpy.inf.
Returns:

dataframe of variant-sets. It has columns:

  • ”setid”: set id
  • ”chrom”: chromosome
  • ”start”: start position
  • ”end”: end position
  • ”nsnps”: number of variants in the region

Return type:

pandas.DataFrame

limix.util.estCumPos(position, offset=0, chrom_len=None, return_chromstart=False)[source]

Compute the cumulative position of variants from position dataframe

Parameters:
  • position (list or pandas.DataFrame) –

    positions in chromosome/chromosomal basepair position format. It can be specified as

    • list [chrom, pos] where chrom and pos are ndarray with chromosome values and basepair positions;
    • pandas DataFrame of chromosome values (key=’chrom’) and basepair positions (key=’pos’).
  • chrom_len (ndarray, optional) – vector with predefined chromosome length. By default, the length of the chromosome is taken to be the maximum basepair position (key=’pos’) in position on that chromosome.
  • offset (float, optional) – offset between chromosomes for cumulative position (default is 0 bp).
  • return_chromstart (bool, optional) – if True, starting cumulative position of each chromosome is also returned (default is False).
Returns:

tuple containing:
  • pos_cum (ndarray): cumulative positions.
  • chromstart (array_like): starting cumulative positions for each chromosome. Returned only if return_chromstart=True.

Return type:

(tuple)

Examples

This function can be applied on a list of chrom and pos arrays

>>> import scipy as sp
>>> import pandas as pd
>>> from limix.util import estCumPos
>>>
>>> pos = sp.kron(sp.ones(2), sp.arange(1,5)).astype(int)
>>> chrom = sp.kron(sp.arange(1,3), sp.ones(4)).astype(int)
>>>
>>> pos_cum, chromstart = estCumPos([chrom, pos],
...                                 return_chromstart=True)
>>>
>>> print(chrom)
[1 1 1 1 2 2 2 2]
>>>
>>> print(pos)
[1 2 3 4 1 2 3 4]
>>>
>>> print(pos_cum)
[1 2 3 4 5 6 7 8]
>>>
>>> print(chromstart)
[1 5]

or on a position dataframe:

>>> position = pd.DataFrame(sp.array([chrom, pos]).T,
...                         columns=['chrom', 'pos'])
>>> pos_cum, chromstart = estCumPos(position,
...                                 return_chromstart=True)
>>> position['pos_cum'] = pos_cum
>>> print(position)
   chrom  pos  pos_cum
0      1    1        1
1      1    2        2
2      1    3        3
3      1    4        4
4      2    1        5
5      2    2        6
6      2    3        7
7      2    4        8
limix.util.unique_variants(snps, return_idxs=False)[source]

Filters out variants with the same genetic profile.

Parameters:snps (ndarray) – (N, S) ndarray of genotype values for N individuals and S variants.
Returns:genotype array with unique variants.
Return type:ndarray

Examples

>>> from numpy.random import RandomState
>>> from numpy import kron, ones
>>> from limix.util import unique_variants
>>> from numpy import set_printoptions
>>> set_printoptions(4)
>>> random = RandomState(1)
>>>
>>> N = 4
>>> snps = kron(random.randn(N,3)<0., ones((1,2)))
>>>
>>> print(snps)
[[ 0.  0.  1.  1.  1.  1.]
 [ 1.  1.  0.  0.  1.  1.]
 [ 0.  0.  1.  1.  0.  0.]
 [ 1.  1.  0.  0.  1.  1.]]
>>>
>>> snps_u = unique_variants(snps)
>>>
>>> print(snps_u)
[[ 0.  1.  1.]
 [ 1.  0.  1.]
 [ 0.  1.  0.]
 [ 1.  0.  1.]]
class limix.util.TemporaryDirectory(suffix=None, prefix=None, dir=None)[source]

Create and return a temporary directory. This has the same behavior as mkdtemp but can be used as a context manager. For .. rubric:: example

with TemporaryDirectory() as tmpdir:

Upon exiting the context, the directory and everything contained in it are removed.

limix.util.urlretrieve(url, filename=None, reporthook=None, data=None)[source]

Retrieve a URL into a temporary location on disk.

Requires a URL argument. If a filename is passed, it is used as the temporary file location. The reporthook argument should be a callable that accepts a block number, a read size, and the total file size of the URL target. The data argument should be valid URL encoded data.

If a filename is passed and the URL points to a local resource, the result is a copy from local file to new file.

Returns a tuple containing the path to the newly created data file as well as the resulting HTTPMessage object.