Data IO (`erlab.io`)¶

Read & write ARPES data.

This module provides functions that enables loading various files such as hdf5 files, igor pro files, and ARPES data from different beamlines and laboratories.

For a single session, it is very common to use only one type of loader for a single folder with all your data. Hence, the module provides a way to set a default loader for a session. This is done using the set_loader() function. The same can be done for the data directory using the set_data_dir() function.

For instructions on how to write a custom loader, see erlab.io.dataloader.

Examples

View all registered loaders:
```
>>> erlab.io.loaders
```

Load data by explicitly specifying the loader:

>>> dat = erlab.io.loaders["merlin"].load(...)

Set the default loader for the session:
```
>>> erlab.io.set_loader("merlin")
```

Learn more about loaders in the User Guide.

Modules

`plugins`	Data loading plugins.
`dataloader`	Base functionality for implementing data loaders.
`utils`	General-purpose I/O utilities.
`igor`	Backend for Igor Pro files.
`nexusutils`	Utilities for reading NeXus files into xarray objects.
`fitsutils`
`exampledata`	Generates simple simulated ARPES data for testing and demonstration.
`characterization`	Data import for characterization experiments.

Module Attributes

erlab.io.loaders¶: A global registry of all loaders registered in the session. The keys are the names of the loaders and the values are the loader objects.

See also

set_loader(), set_data_dir(), loader_context()

Functions

`load`(*[, single, combine, parallel, ...])	Load ARPES data.
`loader_context`(self[, loader, data_dir])	Context manager that temporarily sets the current loader and data directory.
`set_data_dir`(data_dir)	Set the default data directory for the current context.
`set_loader`(loader)	Set the current data loader for the current context.
`extend_loader`([average_attrs, ...])	Context manager that temporarily extends various loader attributes.
`summarize`(*[, cache, display, rc])	Summarize the data in the given directory.

erlab.io.extend_loader(coordinate_attrs=None, average_attrs=None, additional_attrs=None, overridden_attrs=None, additional_coords=None, overridden_coords=None)[source]¶

Context manager that temporarily extends various loader attributes.

This context manager can be used to temporarily customize the behavior of the data loader. This is particularly useful when loading data across multiple files, where the coordinate_attrs can be extended so that the attributes in the data are promoted to coordinates and propagated when combining data across files.

For one-off loads, the same arguments can be passed to load or erlab.io.load() with the loader_extensions keyword. This keeps the extension settings attached to the load call, which is useful for generated loading code and ImageTool manager reload metadata.

Parameters:

name_map – Extends name_map.
coordinate_attrs (tuple[str, ...] | None, default: None) – Extends coordinate_attrs.
average_attrs (tuple[str, ...] | None, default: None) – Extends average_attrs.
additional_attrs (dict[str, str | float | Callable[[DataArray], str | float]] | None, default: None) – Extends additional_attrs.
overridden_attrs (tuple[str, ...] | None, default: None) – Extends overridden_attrs.
additional_coords (dict[str, str | float | Callable[[DataArray], str | float]] | None, default: None) – Extends additional_coords.
overridden_coords (tuple[str, ...] | None, default: None) – Extends overridden_coords.

Example

import erlab

erlab.io.set_loader("loader_name")

with erlab.io.extend_loader(coordinate_attrs=("scan_number",)):
    data = erlab.io.load("file_name")

data = erlab.io.load(
    "file_name",
    loader_extensions={"coordinate_attrs": ("scan_number",)},
)

See also

load: Load data with optional loader_extensions.
coordinate_attrs: The attribute that is temporarily extended.

erlab.io.load(data_dir=None, *, single=False, combine=True, parallel=False, progress=True, load_kwargs=None, loader_extensions=None, **kwargs)[source]¶

Load ARPES data.

This method is the main entry point for loading ARPES data.

Note

This method is not meant to be overridden in subclasses.

Parameters:

identifier –
Value that identifies a scan uniquely.
- If a string or path-like object is given, it is assumed to be the path to the data file relative to data_dir. If data_dir is not specified, identifier is assumed to be the full path to the data file.
- If an integer is given, it is assumed to be a number that specifies the scan number, and is used to automatically determine the path to the data file(s). In this case, the data_dir argument must be specified.
data_dir (str | PathLike | None, default: None) –
Where to look for the data. Must be a path to a valid directory. This argument is required when identifier is an integer.

When called as erlab.io.load(), this argument defaults to the value set by erlab.io.set_data_dir() or erlab.io.loader_context().
chunks – Chunking strategy for loading data with dask for supported loaders.
single (bool, default: False) –
This argument is only used when always_single is False, and identifier is given as a string or path-like object.

If identifier points to a file that is included in a multiple file scan, the default behavior when single is False is to return data from all files in the same scan. How the data is combined is determined by the combine argument. If True, only the data from the file given is returned.
combine (bool, default: True) –
Whether to attempt to combine multiple files into a single data object. If False, a list of data is returned. If True, the loader tries to combine the data into a single data object and return it. Depending on the type of each data object, the returned object can be a xarray.DataArray, xarray.Dataset, or a xarray.DataTree.

This argument is only used when single is False.
parallel (bool, default: False) –
Whether to load multiple files in parallel using dask. For possible values, see load_multiple_parallel.

This argument is only used when single is False.
progress (bool, default: True) –
Whether to show a progress bar when loading multiple files.

This argument is only used when single is False.
load_kwargs (dict[str, Any] | None, default: None) – Additional keyword arguments to be passed to load_single. You can also pass additional keyword arguments directly to load, and they will be dispatched to either identify or load_single based on their signatures. See the **kwargs argument for details.
loader_extensions (Mapping[str, Any] | None, default: None) – Temporary extensions to loader attributes, with the same keys accepted by extend_loader.
**kwargs – Additional keyword arguments are passed to identify and load_single based on their signatures. If a keyword argument is accepted by both methods, it is passed to identify. Use the load_kwargs argument to pass an ambiguous keyword argument to load_single.

Returns:

xarray.DataArray or xarray.Dataset or xarray.DataTree – The loaded data.

Return type:

Notes

The data_dir set by erlab.io.set_data_dir() or erlab.io.loader_context() is only used when called as erlab.io.load(). When called directly on a loader instance, the data_dir argument must be specified.
For convenience, the data_dir set by erlab.io.set_data_dir() or erlab.io.loader_context() is silently ignored when all of the following are satisfied:
- identifier is an absolute path to an existing file.
- data_dir is not explicitly provided.
- The path created by joining data_dir and identifier does not point to an existing file.
This way, absolute file paths can be passed directly to the loader without changing the default data directory. For instance, consider the following directory structure.
```
cwd/

├── data/

└── example.txt
```
The following code will load ./example.txt instead of raising an error that ./data/example.txt is missing:
```
import erlab

erlab.io.set_data_dir("data")

erlab.io.load("example.txt")
```
However, if ./data/example.txt also exists, the same code will load that one instead while warning about the ambiguity. This behavior may lead to unexpected results when the directory structure is not organized. Keep this in mind and try to keep all data files in the same level.

erlab.io.load_hdf5(filename, **kwargs)[source]¶

Load data from an HDF5 file saved with save_as_hdf5.

This is a thin wrapper around xarray.load_dataarray and xarray.load_dataset.

Deprecated since version 3.14.0: Use xarray.load_dataarray or xarray.load_dataset directly.

Parameters:

filename (str | PathLike) – The path to the HDF5 file.
**kwargs – Extra arguments to xarray.load_dataarray or xarray.load_dataset.

Returns:

xarray.DataArray or xarray.Dataset – The loaded data.

Return type:

DataArray | Dataset

erlab.io.loader_context(self, loader=None, data_dir=None)[source]¶

Context manager that temporarily sets the current loader and data directory.

Parameters:

loader (str, optional) – The name or alias of the loader to use in the context.
data_dir (str or os.PathLike, optional) – The data directory to use in the context.

Examples

Load data within a context manager:

>>> with erlab.io.loader_context("merlin"):
...     dat_merlin = erlab.io.load(...)

Load data with different loaders and directories:

>>> erlab.io.set_loader("ssrl52", data_dir="/path/to/dir1")
>>> dat_ssrl_1 = erlab.io.load(...)
>>> with erlab.io.loader_context("merlin", data_dir="/path/to/dir2"):
...     dat_merlin = erlab.io.load(...)
>>> dat_ssrl_2 = erlab.io.load(...)

erlab.io.open_hdf5(filename, **kwargs)[source]¶

Open data from an HDF5 file saved with save_as_hdf5.

This is a thin wrapper around xarray.open_dataarray and xarray.open_dataset.

Deprecated since version 3.14.0: Use xarray.open_dataarray or xarray.open_dataset directly.

Parameters:

filename (str | PathLike) – The path to the HDF5 file.
**kwargs – Extra arguments to xarray.open_dataarray or xarray.open_dataset.

Returns:

xarray.DataArray or xarray.Dataset – The opened data.

Return type:

DataArray | Dataset

erlab.io.save_as_hdf5(data, filename, igor_compat=True, **kwargs)[source]¶

Save data in HDF5 format.

Deprecated since version 3.14.0: Use xarray.DataArray.to_netcdf or xarray.Dataset.to_netcdf directly. To save data in a format compatible with Igor, use erlab.io.igor.save_wave().

Parameters:

data (DataArray | Dataset) – xarray.DataArray to save.
filename (str | PathLike) – Target file name.
igor_compat (bool, default: True) – (Experimental) Make the resulting file compatible with Igor’s HDF5OpenFile for DataArrays with up to 4 dimensions. A convenient Igor procedure is included in the repository. Default is True.
**kwargs – Extra arguments to xarray.DataArray.to_netcdf: refer to the xarray documentation for a list of all possible arguments.

erlab.io.save_as_netcdf(data, filename, **kwargs)[source]¶

Save data in netCDF4 format.

Deprecated since version 3.14.0: Use xarray.DataArray.to_netcdf or xarray.Dataset.to_netcdf directly.

Discards invalid netCDF4 attributes and produces a warning.

Parameters:

data (DataArray) – xarray.DataArray to save.
filename (str | PathLike) – Target file name.
**kwargs – Extra arguments to xarray.DataArray.to_netcdf: refer to the xarray documentation for a list of all possible arguments.

erlab.io.set_data_dir(data_dir)[source]¶

Set the default data directory for the current context.

All subsequent calls to erlab.io.load() will use the provided data_dir unless specified.

Parameters:: data_dir (str | PathLike | None) – The default data directory to use.

Note

This will only affect erlab.io.load(). If the loader’s load method is called directly, it will not use the default data directory.

erlab.io.set_loader(loader)[source]¶

Set the current data loader for the current context.

All subsequent calls to load will use the provided loader.

Parameters:: loader (str | LoaderBase | None) – The loader to set. It can be either a string representing the name or alias of the loader, or a valid loader class.

Example

>>> erlab.io.set_loader("merlin")
>>> dat_merlin_1 = erlab.io.load(...)
>>> dat_merlin_2 = erlab.io.load(...)

erlab.io.summarize(exclude=None, *, cache=True, display=True, rc=None)[source]¶

Summarize the data in the given directory.

Note

This method is not meant to be overridden in subclasses.

Takes a path to a directory and summarizes the data in the directory to a table, much like a log file. This is useful for quickly inspecting the contents of a directory.

The dataframe is formatted using the style from get_styler and displayed in the IPython shell. Results are cached in a pickle file in the directory.

Parameters:

data_dir – Directory to summarize.
exclude (default: None) – A string or sequence of strings specifying glob patterns for files to be excluded from the summary. If provided, caching will be disabled.
cache (default: True) – Whether to use caching for the summary.
display (default: True) – Whether to display the formatted dataframe using the IPython shell. If False, the dataframe will be returned without formatting. If True but the IPython shell is not detected, the dataframe styler will be returned.
rc (default: None) – Optional dictionary of matplotlib rcParams to override the default for the plot in the interactive summary. Plot options such as the figure size and colormap can be changed using this argument.

Returns:

pandas.DataFrame or pandas.io.formats.style.Styler or None – Summary of the data in the directory.

If display is False, the summary DataFrame is returned.
If display is True and the IPython shell is detected, the summary will be displayed, and None will be returned.
- If ipywidgets is installed, an interactive widget will be returned instead of None.
If display is True but the IPython shell is not detected, the styler for the summary DataFrame will be returned.

Return type:

pandas.DataFrame | pandas.io.formats.style.Styler | None

Data IO (erlab.io)¶

Data IO (`erlab.io`)¶