erlab.utils.hashing¶
Utilities for hashing xarray DataArrays.
Functions
|
Fast, approximate hash including data, coordinates, and attributes. |
- erlab.utils.hashing.fingerprint_dataarray(darr, *, sample_max=4096, blocks=3, coord_sample_max=1024)[source]¶
Fast, approximate hash including data, coordinates, and attributes.
This function computes a hash string for an xarray DataArray that incorporates its data, coordinates, and attributes. The hash is designed to change if any of these components change (most of the time). It uses sampling for large arrays to balance speed and accuracy.
- Parameters:
darr (
DataArray) – Thexarray.DataArrayto calculate the fingerprint for.sample_max (
int, optional) – Maximum number of data elements to sample for hashing. If the total number of elements in the DataArray exceeds this value, a subset of the data will be sampled. Default is 4096.blocks (
int, optional) – Number of blocks to sample from the data when sampling is needed. More blocks increase the chance of detecting changes in the data but also increase computation time. Default is 3, which provides a good balance for most cases.coord_sample_max (
int, optional) – Maximum number of elements to sample from each coordinate array for hashing. Default is 1024.
- Returns:
str– A string representing the fingerprint of the DataArray.- Return type:
Note
Different Python processes will produce different fingerprints for the same data due to the use of the built-in
hash(). Use only for comparisons within a single process.This function is not cryptographically secure and should not be used for security purposes.