Data IO (erlab.io)

Read & write ARPES data.

This module provides functions that enables loading various files such as hdf5 files, igor pro files, and ARPES data from different beamlines and laboratories.

Modules

plugins

Data loading plugins.

dataloader

Base functionality for implementing data loaders.

utilities

igor

exampledata

Generates simple simulated ARPES data for testing purposes.

characterization

Data import for characterization experiments.

For a single session, it is very common to use only one type of loader for a single folder with all your data. Hence, the module provides a way to set a default loader for a session. This is done using the set_loader() function. The same can be done for the data directory using the set_data_dir() function.

For instructions on how to write a custom loader, see erlab.io.dataloader.

Examples

  • View all registered loaders:

    >>> erlab.io.loaders
    
  • Load data by explicitly specifying the loader:

    >>> dat = erlab.io.loaders["merlin"].load(...)
    
erlab.io.load(data_dir=None, **kwargs)[source]

Load ARPES data.

Parameters:
  • identifier – Value that identifies a scan uniquely. If a string or path-like object is given, it is assumed to be the path to the data file. If an integer is given, it is assumed to be a number that specifies the scan number, and is used to automatically determine the path to the data file(s).

  • data_dir (str | os.PathLike | None) – Where to look for the data. If None, the default data directory will be used.

  • single – For some setups, data for a single scan is saved over multiple files. This argument is only used for such setups. When identifier is resolved to a single file within a multiple file scan, the default behavior when single is False is to return a single concatenated array that contains data from all files in the same scan. If single is set to True, only the data from the file given is returned. This argument is ignored when identifier is a number.

  • **kwargs – Additional keyword arguments are passed to identify.

Returns:

The loaded data.

Return type:

xarray.DataArray or xarray.Dataset or list of xarray.DataArray

erlab.io.load_experiment(filename, folder=None, *, prefix=None, ignore=None, recursive=False, **kwargs)[source]

Load waves from an igor experiment (pxp) file.

Parameters:
  • filename (str | PathLike) – The experiment file.

  • folder (str | None) – Target folder within the experiment, given as a slash-separated string. If None, defaults to the root.

  • prefix (str | None) – If given, only include waves with names that starts with the given string.

  • ignore (list[str] | None) – List of wave names to ignore.

  • recursive (bool) – If True, includes waves in child directories.

  • **kwargs – Extra arguments to load_wave().

Returns:

Dataset containing the waves.

Return type:

xarray.Dataset

erlab.io.load_hdf5(filename, **kwargs)[source]

Load data from an HDF5 file saved with save_as_hdf5.

This is a thin wrapper around xarray.load_dataarray and xarray.load_dataset.

Parameters:
Returns:

The loaded data.

Return type:

xarray.DataArray or xarray.Dataset

erlab.io.load_wave(wave, data_dir=None)[source]

Load a wave from Igor binary format.

Parameters:
  • wave (dict | WaveRecord | str | PathLike) – The wave to load. It can be provided as a dictionary, an instance of igor2.record.WaveRecord, or a string representing the path to the wave file.

  • data_dir (str | PathLike | None) – The directory where the wave file is located. This parameter is only used if wave is a string or PathLike object. If None, wave must be a valid path.

Returns:

The loaded wave.

Return type:

xarray.DataArray

Raises:
  • ValueError – If the wave file cannot be found or loaded.

  • TypeError – If the wave argument is of an unsupported type.

erlab.io.loader_context(data_dir=None)[source]

Context manager for the current data loader and data directory.

Parameters:
  • loader (str, optional) – The name or alias of the loader to use in the context.

  • data_dir (str or os.PathLike, optional) – The data directory to use in the context.

Examples

  • Load data within a context manager:

    >>> with erlab.io.loader_context("merlin"):
    ...     dat_merlin = erlab.io.load(...)
    
  • Load data with different loaders and directories:

    >>> erlab.io.set_loader("ssrl52", data_dir="/path/to/dir1")
    >>> dat_ssrl_1 = erlab.io.load(...)
    >>> with erlab.io.loader_context("merlin", data_dir="/path/to/dir2"):
    ...     dat_merlin = erlab.io.load(...)
    >>> dat_ssrl_2 = erlab.io.load(...)
    
erlab.io.open_hdf5(filename, **kwargs)[source]

Open data from an HDF5 file saved with save_as_hdf5.

This is a thin wrapper around xarray.open_dataarray and xarray.open_dataset.

Parameters:
Returns:

The opened data.

Return type:

xarray.DataArray or xarray.Dataset

erlab.io.save_as_hdf5(data, filename, igor_compat=True, **kwargs)[source]

Save data in HDF5 format.

Parameters:
erlab.io.save_as_netcdf(data, filename, **kwargs)[source]

Save data in netCDF4 format.

Discards invalid netCDF4 attributes and produces a warning.

Parameters:
erlab.io.set_data_dir(data_dir)[source]

Set the default data directory for the data loader.

All subsequent calls to load will use the data_dir set here unless specified.

Parameters:

data_dir (str | PathLike | None) – The path to a directory.

Note

This will only affect load. If the loader’s load method is called directly, it will not use the default data directory.

erlab.io.set_loader(loader)[source]

Set the current data loader.

All subsequent calls to load will use the loader set here.

Parameters:

loader (str | LoaderBase | None) – The loader to set. It can be either a string representing the name or alias of the loader, or a valid loader class.

Example

>>> erlab.io.set_loader("merlin")
>>> dat_merlin_1 = erlab.io.load(...)
>>> dat_merlin_2 = erlab.io.load(...)
erlab.io.summarize(usecache=True, *, cache=True, display=True, **kwargs)[source]

Summarize the data in the given directory.

Takes a path to a directory and summarizes the data in the directory to a table, much like a log file. This is useful for quickly inspecting the contents of a directory.

The dataframe is formatted using the style from get_styler and displayed in the IPython shell. Results are cached in a pickle file in the directory.

Parameters:
  • data_dir – Directory to summarize.

  • usecache (bool) – Whether to use the cached summary if available. If False, the summary will be regenerated. The cache will be updated if cache is True.

  • cache (bool) – Whether to cache the summary in a pickle file in the directory. If False, no cache will be created or updated. Note that existing cache files will not be deleted, and will be used if usecache is True.

  • display (bool) – Whether to display the formatted dataframe using the IPython shell. If False, the dataframe will be returned without formatting. If True but the IPython shell is not detected, the dataframe styler will be returned.

  • **kwargs – Additional keyword arguments to be passed to generate_summary.

Returns:

df – Summary of the data in the directory.

  • If display is False, the summary DataFrame is returned.

  • If display is True and the IPython shell is detected, the summary will be displayed, and None will be returned.

    • If ipywidgets is installed, an interactive widget will be returned instead of None.

  • If display is True but the IPython shell is not detected, the styler for the summary DataFrame will be returned.

Return type:

pandas.DataFrame or pandas.io.formats.style.Styler or None

erlab.io.loaders

Global instance of LoaderRegistry.