lmpy.data_preparation.layer_encoder
This module contains a class for encoding spatial layers into a Matrix.
The ‘LayerEncoder’ class uses a shapegrid to generate a base matrix structure and then each layer is encoded as a new column (or columns) for the resulting encoded matrix.
Note
Data array is oriented at top left (min x, max y)
Module Contents
Classes
Constructor for the layer encoder. |
- class lmpy.data_preparation.layer_encoder.LayerEncoder(shapegrid_filename)[source]
Constructor for the layer encoder.
- Parameters
shapegrid_filename (str) – A file path for the shapegrid.
- _encode_layer(self, window_func, encode_func, column_name, num_columns=1)[source]
Encodes the layer using the provided encoding function.
- Parameters
window_func – A function that returns a window of array data for a provided x, y pair.
encode_func – A function that encodes a window of array data.
column_name – The header name to use for the column in the encoded matrix.
num_columns – The number of columns that will be encoded by ‘encode_func’. This can be non-zero if we are testing for multiple biogeographic hypotheses in a single vector layer for example.
- Returns
A list of column headers for the newly encoded columns.
- Return type
list
- static _get_window_function(data, layer_bbox, cell_size, num_cell_sides=4)[source]
Gets a windowing function for the data.
This function generates a function that will return a “window” of array data for a given (x, y) pair.
- Parameters
data (np.ndarray) – A numpy array with data for a layer
layer_bbox (tuple) – The bounding box of the layer in the map units of the layer.
cell_size (float or tuple) – Either a single value or a tuple with two values. If it is a single value, it will be used for both x and y cell sizes. If a tuple is provided, the first value will be used for the size of each cell in the x dimension and the second will be used for the size of the cell in the y dimension.
num_cell_sides (int) –
- The number of sides each shapegrid cell has::
4 – square 6 – hexagon
Note
- The origin (0, 0) of the data array should represent (min x, max y)
for the layer.
- Raises
NotImplementedError – Raised if cell sides does note equal 4.
- Returns
A function for processing a window of data.
- Return type
Method
- _read_layer(self, layer_filename, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]
Reads a layer for processing.
- Parameters
layer_filename (str) – The file path for the layer to read.
resolution (numeric) – An optional resolution to use for the input data if it is a vector layer.
bbox (tuple) – An optional bounding box in the form (min x, min y, max x, max y) to use if the layer is a vector layer.
nodata (numeric) – An optional nodata value to use if the layer is a vector layer.
attribute_field (str) – If provided, use this shapefile attribute as the data value for a vector layer.
- Returns
- A tuple containing a window function for returning a portion of the
numpy array generated by the layer and the NODATA value to use with this layer.
- Return type
tuple
- _read_raster_layer(self, raster_filename)[source]
Reads a raster layer for processing.
- Parameters
raster_filename – The file path for the raster layer.
- Returns
- A tuple containing a window function for returning a portion of the
numpy array generated by the layer and the NODATA value to use with this layer.
- Return type
tuple
- _read_shapegrid(self, shapegrid_filename)[source]
Read the shapegrid.
- Parameters
shapegrid_filename – The file location of the shapegrid.
- _read_vector_layer(self, vector_filename, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]
Reads a vector layer for processing.
- Parameters
vector_filename – The vector file path for the layer to read.
resolution (numeric) – An optional resolution to use for the input data if it is a vector layer.
bbox (tuple) – An optional bounding box in the form (min x, min y, max x, max y) to use if the layer is a vector layer.
nodata (numeric) – An optional nodata value to use if the layer is a vector layer.
attribute_field (str) – If provided, use this shapefile attribute as the data value for a vector layer.
- Returns
- A tuple containing a window function for returning a portion of the
numpy array generated by the layer, the NODATA value to use with this layer, and a set of distinct attributes to be used for processing.
- Return type
tuple
- encode_biogeographic_hypothesis(self, layer_filename, column_name, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]
Encodes a biogeographic hypothesis layer.
Encodes a biogeographic hypothesis layer by creating a Helmert contrast column in the encoded matrix.
- Parameters
layer_filename – The file location of the layer to encode.
column_name – What to name this column in the encoded matrix.
min_coverage – The minimum percentage of each data window that must be covered.
resolution – If the layer is a vector, optionally use this as the resolution of the data grid.
bbox – If the layer is a vector, optionally use this bounding box for the data grid.
nodata – If the layer is a vector, optionally use this as the data grid nodata value.
attribute_field – If the layer is a vector and contains multiple hypotheses, use this field to separate the vector file.
- Returns
A list of column headers for the newly encoded columns.
- Return type
list of str
- encode_largest_class(self, layer_filename, column_name, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]
Encodes a layer based on the largest class in each data window.
- Parameters
layer_filename – The file location of the layer to encode.
column_name – What to name this column in the encoded matrix.
min_coverage – The minimum percentage of each data window that must be the covered by the largest class.
resolution – If the layer is a vector, optionally use this as the resolution of the data grid.
bbox – If the layer is a vector, optionally use this bounding box for the data grid.
nodata – If the layer is a vector, optionally use this as the data grid nodata value.
attribute_name – If the layer is a vector, use this field to determine the largest class.
- Returns
A list of column headers for the newly encoded columns.
- Return type
list of str
- encode_mean_value(self, layer_filename, column_name, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]
Encodes a layer based on the mean value for each data window.
- Parameters
layer_filename – The file location of the layer to encode.
column_name – What to name this column in the encoded matrix.
resolution – If the layer is a vector, optionally use this as the resolution of the data grid.
bbox – If the layer is a vector, optionally use this bounding box for the data grid.
nodata – If the layer is a vector, optionally use this as the data grid nodata value.
attribute_name – If the layer is a vector, use this field to determine value.
- Returns
A list of column headers for the newly encoded columns
- Return type
list of str
- encode_presence_absence(self, layer_filename, column_name, min_presence, max_presence, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]
Encodes a distribution layer into a presence absence column.
- Parameters
layer_filename – The file location of the layer to encode.
column_name – What to name this column in the encoded matrix.
min_presence – The minimum value that should be treated as presence.
max_presence – The maximum value to be considered as present.
min_coverage – The minimum percentage of each data window that must be present to consider that cell present.
resolution – If the layer is a vector, optionally use this as the resolution of the data grid.
bbox – If the layer is a vector, optionally use this bounding box for the data grid.
nodata – If the layer is a vector, optionally use this as the data grid nodata value.
attribute_name – If the layer is a vector, use this field to determine presence.
- Returns
A list of column headers for the newly encoded columns.
- Return type
list of str