lmpy.statistics.mcpa

Module for performing a MetaCommunity Phylogenetics Analysis.

See:
Leibold, M.A., E.P. Economo and P.R. Peres-Neto. 2010. Metacommunity

phylogenetics: separating the roles of environmental filters and historical biogeography. Ecology letters 13: 1290-1299.

Module Contents

Functions

_beta_helper(mtx1, mtx2, weights)

This helper function avoids creating large temporary matrices.

_calculate_beta(pred_std, weights, phylo_std, use_lock=True)

Calculates the regression model (beta) for the provided inputs.

_calculate_r_squared(y_hat, phylo_std)

Calculates the R-squared value for the inputs.

_calculate_y_hat(pred_std, beta)

Calculates the predicted value for a regression (Y hat).

_mcpa_for_node(incidence_mtx, env_mtx, bg_mtx, phylo_col, use_locks=False)

Runs MCPA computations for a single tree node.

_standardize_matrix(mtx, weights)

Standardizes a phylogenetic or predictor matrix.

_trace_mtx_by_transverse(mtx)

Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way.

get_p_values(observed_value, test_values, num_permutations=None)

Gets an array of P-Values.

mcpa(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)

Runs MCPA for a set of matrices.

mcpa_parallel(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)

Run MCPA for a set of matrices using parallelism.

Attributes

CONCURRENCY_FACTOR

lock

lmpy.statistics.mcpa.CONCURRENCY_FACTOR = 5[source]
lmpy.statistics.mcpa._beta_helper(mtx1, mtx2, weights)[source]

This helper function avoids creating large temporary matrices.

Parameters
  • mtx1 (Matrix) – A (n [sites] by i [predictors]) standardized matrix.

  • mtx2 (Matrix) – A (n [sites] by i [predictors]) standardized matrix.

  • weights (Matrix) – A (n [sites]) array of site weights.

Returns

The beta matrix created by the helper.

Return type

Matrix

lmpy.statistics.mcpa._calculate_beta(pred_std, weights, phylo_std, use_lock=True)[source]

Calculates the regression model (beta) for the provided inputs.

Parameters
  • pred_std (Matrix) – A standardized predictor matrix (n [sites] by i [predictors]).

  • weights (Matrix) – A matrix of site weights (n by n).

  • phylo_std (Matrix) – A standardized phylo matrix (n by k [nodes]).

  • use_lock (boolean) – If true, use a lock when performing the computation to save on overall memory usage. This is useful for large, or full, predictor matrices but can be skipped for single columns to improve performance.

Note

  • The computation is::

    beta = (M_T.W.M)^-1.M_T.W.P

  • M is the predictor matrix

  • M_T is the transverse of the predictor matrix

  • W is the weights column

  • P is the phylo matrix

  • “^-1” is the inverse of the matrix

  • Locking is available to prevent many threads / subprocesses from

    performing memory intensive computations concurrently and overwhelming the system.

Returns

An (i by k) numpy ndarray, where i is the number of predictors in

pred_std and k is the number of nodes in phylo_std.

Return type

Matrix

lmpy.statistics.mcpa._calculate_r_squared(y_hat, phylo_std)[source]

Calculates the R-squared value for the inputs.

Parameters
  • y_hat (Matrix) – A predicted value for a regression (n [sites] by k [nodes]).

  • phylo_std (Matrix) – A standardized phylo matrix (n [sites] by k [nodes]).

Note

  • R^2 = trace(y_hat . y_hat) / trace(P . P_T)

  • y_hat_T is the transverse of y_hat

  • P_T is the transverse of phylo_std

Returns

R squared value for the inputs.

Return type

Matrix

lmpy.statistics.mcpa._calculate_y_hat(pred_std, beta)[source]

Calculates the predicted value for a regression (Y hat).

Parameters
  • pred_std (Matrix) – A standardized predictor matrix (n [sites] by i [predictors]).

  • beta (Matrix) – The regression model associated with this prediction (i by k [nodes]).

Returns

An (n [sites] by k [nodes]) numpy ndarray

Return type

Matrix

lmpy.statistics.mcpa._mcpa_for_node(incidence_mtx, env_mtx, bg_mtx, phylo_col, use_locks=False)[source]

Runs MCPA computations for a single tree node.

Parameters
  • incidence_mtx (Matrix) – An incidence matrix (PAM).

  • env_mtx (Matrix) – An environmental matrix (GRIM).

  • bg_mtx (Matrix) – A matrix of encoded Biogeographic hypotheses.

  • phylo_col (Matrix) – A column from the phylo matrix for a single node.

  • use_locks (boolean) – Indicator if locks are needed for larger computations. This is probably only true for parallel runs.

Returns

Tuple of observed Matrix, f-values Matrix

Return type

tuple

lmpy.statistics.mcpa._standardize_matrix(mtx, weights)[source]

Standardizes a phylogenetic or predictor matrix.

Parameters
  • mtx (Matrix) – The matrix to standardize

  • weights (Matrix) – A one-dimensional array of sums to use for standardization.

Note

  • Formula for standardization ::
    Mstd = M - 1c.1r.W.M(1/trace(W)) ./ 1c(1r.W(M*M)
    • ((1r.W.M)*(1r.W.M)(1/trace(W))(1/trace(W)-1))^0.5

  • M - Matrix to be standardized

  • W - A k by k diagonal matrix of weights, where each non-zero value is

    the column or row sum (depending on the M) for an incidence matrix.

  • 1r - A row of k ones

  • 1c - A column of k ones

  • trace - Returns the sum of the input matrix

  • “./” indicates Hadamard division

  • “*” indicates Hadamard multiplication

  • Code adopted from supplemental material MATLAB code

See:

Literature supplemental materials

Returns

Standardized matrix.

Return type

Matrix

lmpy.statistics.mcpa._trace_mtx_by_transverse(mtx)[source]

Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way.

This method takes advantage of the fact that we are really only interested

in the diagonal matrix created by performing a dot product of a matrix and its transverse. Because of that, we can perform the dot product of each row by its transverse and then sum the results.

Parameters

mtx (Matrix) – A matrix to use to calculate the trace(M . M_T).

Note

  • There was an error in the formula that called for a dot product of

    each matrix transverse with itself. This was a typo as that would only work with square matrices.

Returns

Trace matrix dot matrix transverse.

Return type

Matrix

lmpy.statistics.mcpa.get_p_values(observed_value, test_values, num_permutations=None)[source]

Gets an array of P-Values.

Gets an (1 or 2 dimension) array of P values where the P value for an array

location is determined by finding the number of test values at corresponding locations are greater than or equal to that value and then dividing that number by the number of permutations

Parameters
  • observed_value (Matrix) – An array of observed values to use as a reference.

  • test_values (Matrix) – A list of arrays generated from randomizations that will be compared to the observed

  • num_permutations – (optional) The total number of randomizations performed. Divide the P-values by this if provided.

Returns

Calculated p-values.

Return type

Matrix

lmpy.statistics.mcpa.lock[source]
lmpy.statistics.mcpa.mcpa(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]

Runs MCPA for a set of matrices.

Parameters
  • incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).

  • phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).

  • env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).

  • bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).

Returns

Tuple of Matrix of observed values and Matrix of F-pseudo values.

Return type

tuple

lmpy.statistics.mcpa.mcpa_parallel(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]

Run MCPA for a set of matrices using parallelism.

Performs MCPA across each of the tree nodes in parallel.

Parameters
  • incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).

  • phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).

  • env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).

  • bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).

Returns

Tuple of Matrix of observed values and Matrix of F-pseudo values.

Return type

tuple