lmpy.data_preparation.layer_encoder

This module contains a class for encoding spatial layers into a Matrix.

The ‘LayerEncoder’ class uses a shapegrid to generate a base matrix structure and then each layer is encoded as a new column (or columns) for the resulting encoded matrix.

Note

Data array is oriented at top left (min x, max y)

Module Contents

Classes

LayerEncoder

Constructor for the layer encoder.

class lmpy.data_preparation.layer_encoder.LayerEncoder(shapegrid_filename)[source]

Constructor for the layer encoder.

Parameters

shapegrid_filename (str) – A file path for the shapegrid.

_encode_layer(self, window_func, encode_func, column_name, num_columns=1)[source]

Encodes the layer using the provided encoding function.

Parameters
  • window_func – A function that returns a window of array data for a provided x, y pair.

  • encode_func – A function that encodes a window of array data.

  • column_name – The header name to use for the column in the encoded matrix.

  • num_columns – The number of columns that will be encoded by ‘encode_func’. This can be non-zero if we are testing for multiple biogeographic hypotheses in a single vector layer for example.

Returns

A list of column headers for the newly encoded columns.

Return type

list

static _get_window_function(data, layer_bbox, cell_size, num_cell_sides=4)[source]

Gets a windowing function for the data.

This function generates a function that will return a “window” of array data for a given (x, y) pair.

Parameters
  • data (np.ndarray) – A numpy array with data for a layer

  • layer_bbox (tuple) – The bounding box of the layer in the map units of the layer.

  • cell_size (float or tuple) – Either a single value or a tuple with two values. If it is a single value, it will be used for both x and y cell sizes. If a tuple is provided, the first value will be used for the size of each cell in the x dimension and the second will be used for the size of the cell in the y dimension.

  • num_cell_sides (int) –

    The number of sides each shapegrid cell has::

    4 – square 6 – hexagon

Note

The origin (0, 0) of the data array should represent (min x, max y)

for the layer.

Raises

NotImplementedError – Raised if cell sides does note equal 4.

Returns

A function for processing a window of data.

Return type

Method

_read_layer(self, layer_filename, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]

Reads a layer for processing.

Parameters
  • layer_filename (str) – The file path for the layer to read.

  • resolution (numeric) – An optional resolution to use for the input data if it is a vector layer.

  • bbox (tuple) – An optional bounding box in the form (min x, min y, max x, max y) to use if the layer is a vector layer.

  • nodata (numeric) – An optional nodata value to use if the layer is a vector layer.

  • attribute_field (str) – If provided, use this shapefile attribute as the data value for a vector layer.

Returns

A tuple containing a window function for returning a portion of the

numpy array generated by the layer and the NODATA value to use with this layer.

Return type

tuple

_read_raster_layer(self, raster_filename)[source]

Reads a raster layer for processing.

Parameters

raster_filename – The file path for the raster layer.

Returns

A tuple containing a window function for returning a portion of the

numpy array generated by the layer and the NODATA value to use with this layer.

Return type

tuple

_read_shapegrid(self, shapegrid_filename)[source]

Read the shapegrid.

Parameters

shapegrid_filename – The file location of the shapegrid.

_read_vector_layer(self, vector_filename, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]

Reads a vector layer for processing.

Parameters
  • vector_filename – The vector file path for the layer to read.

  • resolution (numeric) – An optional resolution to use for the input data if it is a vector layer.

  • bbox (tuple) – An optional bounding box in the form (min x, min y, max x, max y) to use if the layer is a vector layer.

  • nodata (numeric) – An optional nodata value to use if the layer is a vector layer.

  • attribute_field (str) – If provided, use this shapefile attribute as the data value for a vector layer.

Returns

A tuple containing a window function for returning a portion of the

numpy array generated by the layer, the NODATA value to use with this layer, and a set of distinct attributes to be used for processing.

Return type

tuple

encode_biogeographic_hypothesis(self, layer_filename, column_name, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_field=None)[source]

Encodes a biogeographic hypothesis layer.

Encodes a biogeographic hypothesis layer by creating a Helmert contrast column in the encoded matrix.

Parameters
  • layer_filename – The file location of the layer to encode.

  • column_name – What to name this column in the encoded matrix.

  • min_coverage – The minimum percentage of each data window that must be covered.

  • resolution – If the layer is a vector, optionally use this as the resolution of the data grid.

  • bbox – If the layer is a vector, optionally use this bounding box for the data grid.

  • nodata – If the layer is a vector, optionally use this as the data grid nodata value.

  • attribute_field – If the layer is a vector and contains multiple hypotheses, use this field to separate the vector file.

Returns

A list of column headers for the newly encoded columns.

Return type

list of str

encode_largest_class(self, layer_filename, column_name, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]

Encodes a layer based on the largest class in each data window.

Parameters
  • layer_filename – The file location of the layer to encode.

  • column_name – What to name this column in the encoded matrix.

  • min_coverage – The minimum percentage of each data window that must be the covered by the largest class.

  • resolution – If the layer is a vector, optionally use this as the resolution of the data grid.

  • bbox – If the layer is a vector, optionally use this bounding box for the data grid.

  • nodata – If the layer is a vector, optionally use this as the data grid nodata value.

  • attribute_name – If the layer is a vector, use this field to determine the largest class.

Returns

A list of column headers for the newly encoded columns.

Return type

list of str

encode_mean_value(self, layer_filename, column_name, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]

Encodes a layer based on the mean value for each data window.

Parameters
  • layer_filename – The file location of the layer to encode.

  • column_name – What to name this column in the encoded matrix.

  • resolution – If the layer is a vector, optionally use this as the resolution of the data grid.

  • bbox – If the layer is a vector, optionally use this bounding box for the data grid.

  • nodata – If the layer is a vector, optionally use this as the data grid nodata value.

  • attribute_name – If the layer is a vector, use this field to determine value.

Returns

A list of column headers for the newly encoded columns

Return type

list of str

encode_presence_absence(self, layer_filename, column_name, min_presence, max_presence, min_coverage, resolution=None, bbox=None, nodata=DEFAULT_NODATA, attribute_name=None)[source]

Encodes a distribution layer into a presence absence column.

Parameters
  • layer_filename – The file location of the layer to encode.

  • column_name – What to name this column in the encoded matrix.

  • min_presence – The minimum value that should be treated as presence.

  • max_presence – The maximum value to be considered as present.

  • min_coverage – The minimum percentage of each data window that must be present to consider that cell present.

  • resolution – If the layer is a vector, optionally use this as the resolution of the data grid.

  • bbox – If the layer is a vector, optionally use this bounding box for the data grid.

  • nodata – If the layer is a vector, optionally use this as the data grid nodata value.

  • attribute_name – If the layer is a vector, use this field to determine presence.

Returns

A list of column headers for the newly encoded columns.

Return type

list of str

get_encoded_matrix(self)[source]

Returns the encoded matrix.

Returns

The encoded matrix as a Matrix object

Return type

Matrix

get_geojson(self)[source]

Formats the encoded matrix as GeoJSON.

Returns

A JSON dictionary for the encoded matrix.

Return type

dict