lmpy.statistics.mcpa
Module for performing a MetaCommunity Phylogenetics Analysis.
- See:
- Leibold, M.A., E.P. Economo and P.R. Peres-Neto. 2010. Metacommunity
phylogenetics: separating the roles of environmental filters and historical biogeography. Ecology letters 13: 1290-1299.
Module Contents
Functions
|
This helper function avoids creating large temporary matrices. |
|
Calculates the regression model (beta) for the provided inputs. |
|
Calculates the R-squared value for the inputs. |
|
Calculates the predicted value for a regression (Y hat). |
|
Runs MCPA computations for a single tree node. |
|
Standardizes a phylogenetic or predictor matrix. |
Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way. |
|
|
Gets an array of P-Values. |
|
Runs MCPA for a set of matrices. |
|
Run MCPA for a set of matrices using parallelism. |
Attributes
- lmpy.statistics.mcpa._beta_helper(mtx1, mtx2, weights)[source]
This helper function avoids creating large temporary matrices.
- lmpy.statistics.mcpa._calculate_beta(pred_std, weights, phylo_std, use_lock=True)[source]
Calculates the regression model (beta) for the provided inputs.
- Parameters
pred_std (Matrix) – A standardized predictor matrix (n [sites] by i [predictors]).
weights (Matrix) – A matrix of site weights (n by n).
phylo_std (Matrix) – A standardized phylo matrix (n by k [nodes]).
use_lock (boolean) – If true, use a lock when performing the computation to save on overall memory usage. This is useful for large, or full, predictor matrices but can be skipped for single columns to improve performance.
Note
- The computation is::
beta = (M_T.W.M)^-1.M_T.W.P
M is the predictor matrix
M_T is the transverse of the predictor matrix
W is the weights column
P is the phylo matrix
“^-1” is the inverse of the matrix
- Locking is available to prevent many threads / subprocesses from
performing memory intensive computations concurrently and overwhelming the system.
- Returns
- An (i by k) numpy ndarray, where i is the number of predictors in
pred_std and k is the number of nodes in phylo_std.
- Return type
- lmpy.statistics.mcpa._calculate_r_squared(y_hat, phylo_std)[source]
Calculates the R-squared value for the inputs.
- Parameters
Note
R^2 = trace(y_hat . y_hat) / trace(P . P_T)
y_hat_T is the transverse of y_hat
P_T is the transverse of phylo_std
- Returns
R squared value for the inputs.
- Return type
- lmpy.statistics.mcpa._calculate_y_hat(pred_std, beta)[source]
Calculates the predicted value for a regression (Y hat).
- lmpy.statistics.mcpa._mcpa_for_node(incidence_mtx, env_mtx, bg_mtx, phylo_col, use_locks=False)[source]
Runs MCPA computations for a single tree node.
- Parameters
incidence_mtx (Matrix) – An incidence matrix (PAM).
env_mtx (Matrix) – An environmental matrix (GRIM).
bg_mtx (Matrix) – A matrix of encoded Biogeographic hypotheses.
phylo_col (Matrix) – A column from the phylo matrix for a single node.
use_locks (boolean) – Indicator if locks are needed for larger computations. This is probably only true for parallel runs.
- Returns
Tuple of observed Matrix, f-values Matrix
- Return type
tuple
- lmpy.statistics.mcpa._standardize_matrix(mtx, weights)[source]
Standardizes a phylogenetic or predictor matrix.
- Parameters
Note
- Formula for standardization ::
- Mstd = M - 1c.1r.W.M(1/trace(W)) ./ 1c(1r.W(M*M)
((1r.W.M)*(1r.W.M)(1/trace(W))(1/trace(W)-1))^0.5
M - Matrix to be standardized
- W - A k by k diagonal matrix of weights, where each non-zero value is
the column or row sum (depending on the M) for an incidence matrix.
1r - A row of k ones
1c - A column of k ones
trace - Returns the sum of the input matrix
“./” indicates Hadamard division
“*” indicates Hadamard multiplication
Code adopted from supplemental material MATLAB code
- See:
Literature supplemental materials
- Returns
Standardized matrix.
- Return type
- lmpy.statistics.mcpa._trace_mtx_by_transverse(mtx)[source]
Prevent a memory bomb by performing trace(mtx . mtx.T) in a smarter way.
- This method takes advantage of the fact that we are really only interested
in the diagonal matrix created by performing a dot product of a matrix and its transverse. Because of that, we can perform the dot product of each row by its transverse and then sum the results.
- Parameters
mtx (Matrix) – A matrix to use to calculate the trace(M . M_T).
Note
- There was an error in the formula that called for a dot product of
each matrix transverse with itself. This was a typo as that would only work with square matrices.
- Returns
Trace matrix dot matrix transverse.
- Return type
- lmpy.statistics.mcpa.get_p_values(observed_value, test_values, num_permutations=None)[source]
Gets an array of P-Values.
- Gets an (1 or 2 dimension) array of P values where the P value for an array
location is determined by finding the number of test values at corresponding locations are greater than or equal to that value and then dividing that number by the number of permutations
- Parameters
observed_value (Matrix) – An array of observed values to use as a reference.
test_values (Matrix) – A list of arrays generated from randomizations that will be compared to the observed
num_permutations – (optional) The total number of randomizations performed. Divide the P-values by this if provided.
- Returns
Calculated p-values.
- Return type
- lmpy.statistics.mcpa.mcpa(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]
Runs MCPA for a set of matrices.
- Parameters
incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).
phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).
env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).
bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).
- Returns
Tuple of Matrix of observed values and Matrix of F-pseudo values.
- Return type
tuple
- lmpy.statistics.mcpa.mcpa_parallel(incidence_matrix, phylo_mtx, env_mtx, bg_mtx)[source]
Run MCPA for a set of matrices using parallelism.
Performs MCPA across each of the tree nodes in parallel.
- Parameters
incidence_matrix (Matrix) – A binary Lifemapper Matrix object representing the incidence of each species for each site by coding them as ones. This is the same thing as a Lifemapper Presence Absence Matrix, or PAM (n [sites] by k+1 [species]).
phylo_mtx (Matrix) – A matrix encoding of a phylogenetic tree where each cell represents the relative contribution of each tip to each inner tree node (k+1 [species] by k [nodes]).
env_mtx (Matrix) – A matrix encoding of the environment for each site (n [sites] by ei [environmental predictors]).
bg_mtx (Matirx) – A matrix of Helmert contrasts (-1, 0, 1) for Biogeographic hypotheses (n [sites] by bi [biogeographic predictors]).
- Returns
Tuple of Matrix of observed values and Matrix of F-pseudo values.
- Return type
tuple