`lmpy.statistics.mcpa`

Module for performing a MetaCommunity Phylogenetics Analysis.

See:

Leibold, M.A., E.P. Economo and P.R. Peres-Neto. 2010. Metacommunity: phylogenetics: separating the roles of environmental filters and historical biogeography. Ecology letters 13: 1290-1299.

Module Contents

Functions

`_beta_helper`(mtx1, mtx2, weights)	This helper function avoids creating large temporary matrices.
`_calculate_beta`(pred_std, weights, phylo_std, use_lock=True)	Calculates the regression model (beta) for the provided inputs.
`_calculate_r_squared`(y_hat, phylo_std)	Calculates the R-squared value for the inputs.
`_calculate_y_hat`(pred_std, beta)	Calculates the predicted value for a regression (Y hat).
`_mcpa_for_node`(incidence_mtx, env_mtx, bg_mtx, phylo_col, use_locks=False)	Runs MCPA computations for a single tree node.
`_standardize_matrix`(mtx, weights)	Standardizes a phylogenetic or predictor matrix.
`_trace_mtx_by_transverse`(mtx)	Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way.
`get_p_values`(observed_value, test_values, num_permutations=None)	Gets an array of P-Values.
`mcpa`(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)	Runs MCPA for a set of matrices.
`mcpa_parallel`(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)	Run MCPA for a set of matrices using parallelism.

Attributes

`CONCURRENCY_FACTOR`
`lock`

lmpy.statistics.mcpa.CONCURRENCY_FACTOR = 5[source]

lmpy.statistics.mcpa._beta_helper(mtx1, mtx2, weights)[source]

This helper function avoids creating large temporary matrices.

Parameters

mtx1 (Matrix) – A (n [sites] by i [predictors]) standardized matrix.
mtx2 (Matrix) – A (n [sites] by i [predictors]) standardized matrix.
weights (Matrix) – A (n [sites]) array of site weights.

Returns

The beta matrix created by the helper.

Return type

Matrix

lmpy.statistics.mcpa._calculate_beta(pred_std, weights, phylo_std, use_lock=True)[source]

Calculates the regression model (beta) for the provided inputs.

Parameters

pred_std (Matrix) – A standardized predictor matrix (n [sites] by i [predictors]).
weights (Matrix) – A matrix of site weights (n by n).
phylo_std (Matrix) – A standardized phylo matrix (n by k [nodes]).
use_lock (boolean) – If true, use a lock when performing the computation to save on overall memory usage. This is useful for large, or full, predictor matrices but can be skipped for single columns to improve performance.

Note

The computation is::
beta = (M_T.W.M)^-1.M_T.W.P
M is the predictor matrix
M_T is the transverse of the predictor matrix
W is the weights column
P is the phylo matrix
“^-1” is the inverse of the matrix
Locking is available to prevent many threads / subprocesses from
performing memory intensive computations concurrently and overwhelming the system.

Returns

An (i by k) numpy ndarray, where i is the number of predictors in: pred_std and k is the number of nodes in phylo_std.

Return type

Matrix

lmpy.statistics.mcpa._calculate_r_squared(y_hat, phylo_std)[source]

Calculates the R-squared value for the inputs.

Parameters

y_hat (Matrix) – A predicted value for a regression (n [sites] by k [nodes]).
phylo_std (Matrix) – A standardized phylo matrix (n [sites] by k [nodes]).

Note

R^2 = trace(y_hat . y_hat) / trace(P . P_T)
y_hat_T is the transverse of y_hat
P_T is the transverse of phylo_std

Returns: R squared value for the inputs.
Return type: Matrix

lmpy.statistics.mcpa._calculate_y_hat(pred_std, beta)[source]

Calculates the predicted value for a regression (Y hat).

Parameters

pred_std (Matrix) – A standardized predictor matrix (n [sites] by i [predictors]).
beta (Matrix) – The regression model associated with this prediction (i by k [nodes]).

Returns

An (n [sites] by k [nodes]) numpy ndarray

Return type

Matrix

lmpy.statistics.mcpa._mcpa_for_node(incidence_mtx, env_mtx, bg_mtx, phylo_col, use_locks=False)[source]

Runs MCPA computations for a single tree node.

Parameters

incidence_mtx (Matrix) – An incidence matrix (PAM).
env_mtx (Matrix) – An environmental matrix (GRIM).
bg_mtx (Matrix) – A matrix of encoded Biogeographic hypotheses.
phylo_col (Matrix) – A column from the phylo matrix for a single node.
use_locks (boolean) – Indicator if locks are needed for larger computations. This is probably only true for parallel runs.

Returns

Tuple of observed Matrix, f-values Matrix

Return type

tuple

lmpy.statistics.mcpa._standardize_matrix(mtx, weights)[source]

Standardizes a phylogenetic or predictor matrix.

Parameters

mtx (Matrix) – The matrix to standardize
weights (Matrix) – A one-dimensional array of sums to use for standardization.

Note

Formula for standardization ::
Mstd = M - 1c.1r.W.M(1/trace(W)) ./ 1c(1r.W(M*M)
((1r.W.M)*(1r.W.M)(1/trace(W))(1/trace(W)-1))^0.5
M - Matrix to be standardized
W - A k by k diagonal matrix of weights, where each non-zero value is
the column or row sum (depending on the M) for an incidence matrix.
1r - A row of k ones
1c - A column of k ones
trace - Returns the sum of the input matrix
“./” indicates Hadamard division
“*” indicates Hadamard multiplication
Code adopted from supplemental material MATLAB code

See:: Literature supplemental materials

Returns: Standardized matrix.
Return type: Matrix

lmpy.statistics.mcpa._trace_mtx_by_transverse(mtx)[source]

Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way.

This method takes advantage of the fact that we are really only interested: in the diagonal matrix created by performing a dot product of a matrix and its transverse. Because of that, we can perform the dot product of each row by its transverse and then sum the results.

Parameters: mtx (Matrix) – A matrix to use to calculate the trace(M . M_T).

Note

There was an error in the formula that called for a dot product of
each matrix transverse with itself. This was a typo as that would only work with square matrices.

Returns: Trace matrix dot matrix transverse.
Return type: Matrix

lmpy.statistics.mcpa.get_p_values(observed_value, test_values, num_permutations=None)[source]

Gets an array of P-Values.

Gets an (1 or 2 dimension) array of P values where the P value for an array: location is determined by finding the number of test values at corresponding locations are greater than or equal to that value and then dividing that number by the number of permutations

Parameters

observed_value (Matrix) – An array of observed values to use as a reference.
test_values (Matrix) – A list of arrays generated from randomizations that will be compared to the observed
num_permutations – (optional) The total number of randomizations performed. Divide the P-values by this if provided.

Returns: Calculated p-values.
Return type: Matrix

lmpy.statistics.mcpa.lock[source]

lmpy.statistics.mcpa.mcpa(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]

Runs MCPA for a set of matrices.

Parameters

incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).
phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).
env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).
bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).

Returns

Tuple of Matrix of observed values and Matrix of F-pseudo values.

Return type

tuple

lmpy.statistics.mcpa.mcpa_parallel(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]

Run MCPA for a set of matrices using parallelism.

Performs MCPA across each of the tree nodes in parallel.

Parameters

incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).
phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).
env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).
bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).

Returns

Tuple of Matrix of observed values and Matrix of F-pseudo values.

Return type

tuple

lmpy.statistics.mcpa

Module Contents

Functions

Attributes

`lmpy.statistics.mcpa`