Reading and writing data

In ERLabPy, most data are represented as xarray.Dataset objects or xarray.DataArray objects. xarray.DataArray are similar to waves in Igor Pro, but are much more flexible. Opposed to the maximum of 4 dimensions in Igor, xarray.DataArray can have as many dimensions as you want (up to 64). Another advantage is that the coordinates of the dimensions do not have to be evenly spaced. In fact, they are not limited to numbers but can be any type of data, such as date and time representations.

This guide will introduce you to reading and writing data from and to various file formats, and how to implement a custom reader for a experimental setup.

Skip to the corresponding section for guides on loading ARPES data.

Reading data

Python has a wide range of libraries to read and write data. Here, we will focus on working with Igor Pro and xarray objects.

From Igor Pro

Warning

Loading waves from complex .pxp files may fail or produce unexpected results. It is recommended to export the waves to a .ibw file to load them in ERLabPy. If you encounter any problems, please let us know by opening an issue.

ERLabPy can read .ibw, .pxt, .pxp, and HDF5 files exported from Igor Pro by using the functions in erlab.io.igor:

For easy access, the first two functions are also available in the erlab.io namespace.

Note

Internally, the igor2 package is used to read the data.

From arbitrary formats

Spreadsheet data can be read using pandas.read_csv() or pandas.read_excel(). The resulting DataFrame can be converted to an xarray object using pandas.DataFrame.to_xarray() or xarray.Dataset.from_dataframe().

When reading HDF5 files with arbitrary groups and metadata, you must first explore the group structure using h5netcdf or h5py. Loading a specific HDF5 group into an xarray object can be done using xarray.open_dataset() or xarray.open_mfdataset() by supplying the group argument. For an example that handles complex HDF5 groups, see the implementation of erlab.io.plugins.ssrl52.SSRL52Loader.load_single().

FITS files can be read with astropy.

Writing data

Since the state and variables of a Python interpreter are not saved, it is important to save your data in a format that can be easily read and written.

While it is possible to save and load entire Python interpreter sessions using pickle or the more versatile dill, it is out of the scope of this guide. Instead, we recommend saving your data in a format that is easy to read and write, such as HDF5 or NetCDF. These formats are supported by many programming languages and are optimized for fast read and write operations.

To save and load xarray objects, see the xarray documentation on I/O operations. ERLabPy offers convenience functions load_hdf5 and save_as_hdf5 for loading and saving xarray objects from and to HDF5 files, and save_as_netcdf for saving to NetCDF files.

To Igor Pro

As an experimental feature, save_as_hdf5 can save certain xarray.DataArrays in a format that is compatible with the Igor Pro HDF5 loader. An accompanying Igor procedure is available in the repository. If loading in Igor Pro fails, try saving again with all attributes removed.

Alternatively, igorwriter can be used to write numpy arrays to .ibw and .itx files directly.

Loading ARPES data

Warning

ERLabPy is still in development and the API may change. Some major changes regarding data loading and handling are planned:

  • The xarray datatree structure will enable much more intuitive and powerful data handling. Once the feature gets incorporated into xarray, ERLabPy will be updated to use it.

  • A universal translation layer between true data header attributes and human-readable representations will be implemented. This will allow for more consistent and user-friendly data handling.

ERLabPy’s data loading framework consists of various plugins, or loaders, each designed to load data from a different beamline or laboratory. Each loader is a class that has a load method which takes a file path or sequence number and returns data.

Let’s see the list of loaders available by default:

[1]:
import erlab.io

erlab.io.loaders
[1]:
NameAliasesLoader class
ssrlssrl52, bl5-2erlab.io.plugins.ssrl52.SSRL52Loader
merlinALS_BL4, als_bl4, BL403, bl403erlab.io.plugins.merlin.BL403Loader
da30DA30erlab.io.plugins.da30.DA30Loader
krissKRISSerlab.io.plugins.kriss.KRISSLoader

You can access each loader by its name or alias, both as an attribute or as an item. For example, to access the loader for the ALS beamline 4.0.3, you can use any of the following:

[3]:
erlab.io.loaders["merlin"]
erlab.io.loaders["bl403"]
erlab.io.loaders.merlin
erlab.io.loaders.bl403
[3]:
<erlab.io.plugins.merlin.BL403Loader at 0x7ff76837da10>

Data loading is done by calling the load method of the loader. It requires an identifier parameter, which can be a path to a file or a sequence number. It also accepts a data_dir parameter, which specifies the directory where the data is stored.

  • If identifier is a sequence number, data_dir must be provided.

  • If identifier is a string and data_dir is provided, the path is constructed by joining data_dir and identifier.

  • If identifier is a string and data_dir is not provided, identifier should be a valid path to a file.

Suppose we have data from the ALS beamline 4.0.3 stored as /path/to/data/f_001.pxt, /path/to/data/f_002.pxt, etc. To load f_001.pxt, all three of the following are valid:

loader = erlab.io.loaders["merlin"]

loader.load("/path/to/data/f_001.pxt")
loader.load("f_001.pxt", data_dir="/path/to/data")
loader.load(1, data_dir="/path/to/data")

Setting the default loader and data directory

In practice, a loader and a single directory will be used repeatedly in a session to load different data from the same experiment.

Instead of explicitly specifying the loader and directory each time, a default loader and data directory can be set with erlab.io.set_loader() and erlab.io.set_data_dir(). All subsequent calls to the shortcut function erlab.io.load() will use the specified loader and data directory.

erlab.io.set_loader("merlin")
erlab.io.set_data_dir("/path/to/data")
data_1 = erlab.io.load(1)
data_2 = erlab.io.load(2)

The loader and data directory can also be controlled with a context manager:

with erlab.io.loader_context("merlin", data_dir="/path/to/data"):
    data_1 = erlab.io.load(1)

Data across multiple files

For setups like the ALS beamline 4.0.3, some scans are stored over multiple files like f_003_S001.pxt, f_003_S002.pxt, and so on. In this case, the loader will automatically concatenate all files in the same scan. For example, all of the following will return the same concatenated data:

erlab.io.load(3)
erlab.io.load("f_003_S001.pxt")
erlab.io.load("f_003_S002.pxt")
...

If you want to cherry-pick a single file, you can pass single=True to load:

erlab.io.load("f_003_S001.pxt", single=True)

Handling multiple data directories

If you call erlab.io.set_loader() or erlab.io.set_data_dir() multiple times, the last call will override the previous ones. While this is useful for changing the loader or data directory, it makes data loading dependent on execution order. This may lead to unexpected behavior.

If you plan to use multiple loaders or data directories in the same session, it is recommended to use the context manager. If you have to load data from multiple directories multiple times, it may be convenient to define functions that set the loader and data directory and call erlab.io.load() with the appropriate arguments. For example:

def load1(identifier):
    with erlab.io.loader_context("merlin", data_dir="/path/to/data1"):
        return erlab.io.load(identifier)

Summarizing data

Some loaders have generate_summary implemented, which generates a pandas.DataFrame containing an overview of the data in a given directory. The generated summary can be viewed as a table with the summarize method. If ipywidgets is installed, an interactive widget is also displayed. This is useful for quickly skimming through the data.

Just like load, summarize can also be accessed with the shortcut function erlab.io.summarize(). For example, to display a summary of the data available in the directory /path/to/data using the 'merlin' loader:

erlab.io.set_loader("merlin")
erlab.io.set_data_dir("/path/to/data")
erlab.io.summarize()

To see what the generated summary looks like, see the example below.

Implementing a data loader plugin

It is easy to add new loaders to the framework. Any subclass of LoaderBase is automatically registered as a loader! The class must have a valid name attribute which is used to access the loader.

If the name attribute is prefixed with an underscore, the registration is skipped. This is useful for base classes that are not meant to be used directly.

Data loading flow

ARPES data from a single experiment are usually stored in one folder, with files that look like file_0001.h5, file_0002.h5, etc. If the naming scheme does not deviate from this pattern, only two methods need to be implemented: identify and load_single. The following flowchart shows the process of loading data from a single scan, given either a file path or a sequence number:

../_images/flowchart_single.pdf

Here, identify is given an integer sequence number(identifier) and the path to the data folder(data_dir), and returns the full path to the corresponding data file.

The method load_single is given a full path to a single data file and must return the data as an xarray.DataArray or a xarray.Dataset. If the data cannot be combined into a single object, the method can also return a list of xarray.DataArray objects.

If only all data formats were as simple as this! Unfortunately, there are some setups where data for a single scan is saved over multiple files. In this case, the files will look like file_0001_0001.h5, file_0001_0002.h5, etc. For these kinds of setups, an additional method infer_index must be implemented. The following flowchart shows the process of loading data from multiple files:

../_images/flowchart_multiple.pdf

In this case, the method identify should resolve all files that belong to the given sequence number, and return a list of file paths along with a dictionary of corresponding coordinates.

The method infer_index is given a bare file name (without the extension and path) like and must return the sequence number of the scan. For example, given the file name file_0003_0123, the method should return 3.

Conventions

There are some rules that loaded ARPES data must follow to ensure that analysis procedures such as momentum conversion and fitting works seamlessly:

  • The experimental geometry should be stored in the 'configuration' attribute as an integer. See Nomenclature and AxesConfiguration for more information.

  • All standard angle coordinates must follow the naming conventions in Nomenclature.

  • The sample temperature, if available, should be stored in the 'temp_sample' attribute.

  • The sample work function, if available, should be stored in the 'sample_workfunction' attribute.

  • Energies should be given in electronvolts.

  • Angles should be given in degrees.

  • Temperatures should be given in Kelvins.

All loaders by default does a basic check for a subset of these rules using validate and will raise a warning if some are missing. This behavior can be controlled with loader class attributes skip_validate and strict_validation.

A minimal example

Consider a setup that saves data into a .csv file named data_0001.csv and so on. A bare minimum implementation of a loader for the setup will look something like this:

[4]:
import os

import pandas as pd
from erlab.io.dataloader import LoaderBase


class MyLoader(LoaderBase):
    name = "my_loader"
    aliases = None
    name_map = {}
    coordinate_attrs = {}
    additional_attrs = {"information": "any metadata you want to load with the data"}
    skip_validate = False
    always_single = True

    def identify(self, num, data_dir):
        file = os.path.join(data_dir, f"data_{str(num).zfill(4)}.csv")
        return [file], {}

    def load_single(self, file_path):
        return pd.read_csv(file_path).to_xarray()
[5]:
erlab.io.loaders
[5]:
NameAliasesLoader class
ssrlssrl52, bl5-2erlab.io.plugins.ssrl52.SSRL52Loader
merlinALS_BL4, als_bl4, BL403, bl403erlab.io.plugins.merlin.BL403Loader
da30DA30erlab.io.plugins.da30.DA30Loader
krissKRISSerlab.io.plugins.kriss.KRISSLoader
my_loader__main__.MyLoader
[6]:
erlab.io.loaders["my_loader"]
[6]:
<__main__.MyLoader at 0x7ff757ab3590>

We can see that the loader has been registered.

A complex example

Next, let’s try to write a more realistic loader for a hypothetical setup that saves data as HDF5 files with the following naming scheme: data_001.h5, data_002.h5, and so on, with multiple scans named like data_001_S001.h5, data_001_S002.h5, etc. with the scan axis information stored in a separate file named data_001_axis.csv.

Let us first generate a data directory and place some synthetic data in it. Before saving, we rename and set some attributes that resemble real ARPES data.

[7]:
import csv
import datetime
import os
import tempfile

import erlab.io
import numpy as np
from erlab.io.exampledata import generate_data_angles


def make_data(beta=5.0, temp=20.0, hv=50.0, bandshift=0.0):
    data = generate_data_angles(
        shape=(250, 1, 300),
        angrange={"alpha": (-15, 15), "beta": (beta, beta)},
        hv=hv,
        configuration=1,
        temp=temp,
        bandshift=bandshift,
        assign_attributes=False,
        seed=1,
    ).T

    # Rename coordinates. The loader must rename them back to the original names.
    data = data.rename(
        {
            "alpha": "ThetaX",
            "beta": "Polar",
            "eV": "BindingEnergy",
            "hv": "PhotonEnergy",
            "xi": "Tilt",
            "delta": "Azimuth",
        }
    )

    # Assign some attributes that real data would have
    data = data.assign_attrs(
        {
            "LensMode": "Angular30",  # Lens mode of the analyzer
            "SpectrumType": "Fixed",  # Acquisition mode of the analyzer
            "PassEnergy": 10,  # Pass energy of the analyzer
            "UndPol": 0,  # Undulator polarization
            "DateTime": datetime.datetime.now().isoformat(),  # Acquisition time
            "TB": temp,
            "X": 0.0,
            "Y": 0.0,
            "Z": 0.0,
        }
    )
    return data


# Create a temporary directory
tmp_dir = tempfile.TemporaryDirectory()

# Define coordinates for the scan
beta_coords = np.linspace(2, 7, 10)

# Generate and save cuts with different beta values
for i, beta in enumerate(beta_coords):
    erlab.io.save_as_hdf5(
        make_data(beta=beta, temp=20.0, hv=50.0),
        filename=f"{tmp_dir.name}/data_001_S{str(i+1).zfill(3)}.h5",
        igor_compat=False,
    )

# Write scan coordinates to a csv file
with open(f"{tmp_dir.name}/data_001_axis.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Index", "Polar"])

    for i, beta in enumerate(beta_coords):
        writer.writerow([i + 1, beta])

# Generate some cuts with different band shifts
for i in range(4):
    erlab.io.save_as_hdf5(
        make_data(beta=5.0, temp=20.0, hv=50.0, bandshift=-i * 0.05),
        filename=f"{tmp_dir.name}/data_{str(i+2).zfill(3)}.h5",
        igor_compat=False,
    )

# List the generated files
sorted(os.listdir(tmp_dir.name))
[7]:
['data_001_S001.h5',
 'data_001_S002.h5',
 'data_001_S003.h5',
 'data_001_S004.h5',
 'data_001_S005.h5',
 'data_001_S006.h5',
 'data_001_S007.h5',
 'data_001_S008.h5',
 'data_001_S009.h5',
 'data_001_S010.h5',
 'data_001_axis.csv',
 'data_002.h5',
 'data_003.h5',
 'data_004.h5',
 'data_005.h5']

Now that we have the data, let’s implement the loader. The biggest difference from the previous example is that we need to handle multiple files for a single scan in identify. Also, we have to implement infer_index to extract the scan number from the file name.

[8]:
import glob
import re

import pandas as pd
from erlab.io.dataloader import LoaderBase


class ExampleLoader(LoaderBase):
    name = "example"

    aliases = ["Ex"]

    name_map = {
        "eV": "BindingEnergy",
        "alpha": "ThetaX",
        "beta": [
            "Polar",
            "Polar Compens",
        ],  # Can have multiple names assigned to the same name
        "delta": "Azimuth",
        "xi": "Tilt",
        "x": "X",
        "y": "Y",
        "z": "Z",
        "hv": "PhotonEnergy",
        "polarization": "UndPol",
        "temp_sample": "TB",
    }

    coordinate_attrs: tuple[str, ...] = (
        "beta",
        "delta",
        "xi",
        "hv",
        "x",
        "y",
        "z",
        "polarization",
        "photon_flux",
    )
    # Attributes to be used as coordinates. Place all attributes that we don't want to
    # lose when merging multiple file scans here.

    additional_attrs = {
        "configuration": 1,  # Experimental geometry. Required for momentum conversion
        "sample_workfunction": 4.3,
    }
    # Any additional metadata you want to add to the data. Note that attributes defined
    # here will not be transformed into coordinates. If you wish to promote some fixed
    # attributes to coordinates, add them to additional_coords.

    additional_coords = {}
    # Additional non-dimension coordinates to be added to the data, for instance the
    # photon energy for lab-based ARPES.

    skip_validate = False

    always_single = False

    def identify(self, num, data_dir):
        coord_dict = {}

        # Look for scans with data_###_S###.h5, and sort them
        files = glob.glob(f"data_{str(num).zfill(3)}_S*.h5", root_dir=data_dir)
        files.sort()

        if len(files) == 0:
            # If no files found, look for data_###.h5
            files = glob.glob(f"data_{str(num).zfill(3)}.h5", root_dir=data_dir)

        else:
            # If files found, extract coordinate values from the filenames
            axis_file = f"{data_dir}/data_{str(num).zfill(3)}_axis.csv"
            with open(axis_file) as f:
                header = f.readline().strip().split(",")

            # Load the coordinates from the csv file
            coord_arr = np.loadtxt(axis_file, delimiter=",", skiprows=1)

            # Each header entry will contain a dimension name
            for i, hdr in enumerate(header[1:]):
                key = self.name_map_reversed.get(hdr, hdr)
                coord_dict[key] = coord_arr[: len(files), i + 1].astype(np.float64)

        if len(files) == 0:
            # If no files found up to this point, raise an error
            raise FileNotFoundError(f"No files found for scan {num} in {data_dir}")

        # Files must be full paths
        files = [os.path.join(data_dir, f) for f in files]

        return files, coord_dict

    def load_single(self, file_path):
        data = erlab.io.load_hdf5(file_path)

        # To prevent conflicts when merging multiple scans, we rename the coordinates
        # prior to concatenation
        return self.process_keys(data)

    def infer_index(self, name):
        # Get the scan number from file name
        try:
            scan_num: str = re.match(r".*?(\d{3})(?:_S\d{3})?", name).group(1)
        except (AttributeError, IndexError):
            return None, None

        if scan_num.isdigit():
            # The second return value, a dictionary, is reserved for more complex
            # setups. See tips below for a brief explanation.
            return int(scan_num), {}
        else:
            return None, None
[9]:
erlab.io.loaders
[9]:
NameAliasesLoader class
ssrlssrl52, bl5-2erlab.io.plugins.ssrl52.SSRL52Loader
merlinALS_BL4, als_bl4, BL403, bl403erlab.io.plugins.merlin.BL403Loader
da30DA30erlab.io.plugins.da30.DA30Loader
krissKRISSerlab.io.plugins.kriss.KRISSLoader
my_loader__main__.MyLoader
exampleEx__main__.ExampleLoader

We can see that the example loader has been registered. Let’s test the loader by loading and plotting some data.

[10]:
erlab.io.set_loader("example")
erlab.io.set_data_dir(tmp_dir.name)
erlab.io.load(1)
[10]:
<xarray.DataArray (beta: 10, eV: 300, alpha: 250)> Size: 6MB
54.6 50.96 51.68 60.83 63.45 55.18 ... 3.076 0.9879 0.8741 2.515 1.359 1.262
Coordinates:
  * alpha         (alpha) float64 2kB -15.0 -14.88 -14.76 ... 14.76 14.88 15.0
  * beta          (beta) float64 80B 2.0 2.556 3.111 3.667 ... 5.889 6.444 7.0
  * eV            (eV) float64 2kB -0.45 -0.4481 -0.4462 ... 0.1162 0.1181 0.12
    xi            float64 8B 0.0
    delta         float64 8B 0.0
    hv            float64 8B 50.0
    x             float64 8B 0.0
    y             float64 8B 0.0
    z             float64 8B 0.0
    polarization  int64 8B 0
Attributes:
    LensMode:             Angular30
    SpectrumType:         Fixed
    PassEnergy:           10
    DateTime:             2024-05-16T02:19:18.456487
    TB:                   20.0
    temp_sample:          20.0
    configuration:        1
    sample_workfunction:  4.3
    data_loader_name:     example
[11]:
erlab.io.load(5).qplot()
[11]:
<matplotlib.image.AxesImage at 0x7ff757606ed0>
../_images/user-guide_io_16_1.svg

Brilliant! We now have a working loader for our hypothetical setup. However, we can’t use erlab.io.summarize() with our loader since we haven’t implemented generate_summary.

This method should return a pandas.DataFrame with the index containing file names. The only requirement for the DataFrame is that it should include a column named 'Path' that contains the paths to the data files. Other than that, the DataFrame can contain any metadata you wish to display in the summary. Let’s implement it in a subclass of the example loader:

[12]:
class ExampleLoaderComplete(ExampleLoader):
    name = "example_complete"
    aliases = ["ExC"]

    def generate_summary(self, data_dir):
        # Get all valid data files in directory
        files = {}
        for path in erlab.io.utilities.get_files(data_dir, extensions=[".h5"]):
            # Base name
            data_name = os.path.splitext(os.path.basename(path))[0]

            # If multiple scans, strip the _S### part
            name_match = re.match(r"(.*?_\d{3})_(?:_S\d{3})?", data_name)
            if name_match is not None:
                data_name = name_match.group(1)

            files[data_name] = path

        # Map dataframe column names to data attributes
        attrs_mapping = {
            "Lens Mode": "LensMode",
            "Scan Type": "SpectrumType",
            "T(K)": "temp_sample",
            "Pass E": "PassEnergy",
            "Polarization": "polarization",
            "hv": "hv",
            "x": "x",
            "y": "y",
            "z": "z",
            "polar": "beta",
            "tilt": "xi",
            "azi": "delta",
        }
        column_names = ["File Name", "Path", "Time", "Type", *attrs_mapping.keys()]

        data_info = []

        processed_indices = set()
        for name, path in files.items():
            # Skip already processed multi-file scans
            index, _ = self.infer_index(name)
            if index in processed_indices:
                continue
            elif index is not None:
                processed_indices.add(index)

            # Load data
            data = self.load(path)

            # Determine type of scan
            data_type = "core"
            if "alpha" in data.dims:
                data_type = "cut"
            if "beta" in data.dims:
                data_type = "map"
            if "hv" in data.dims:
                data_type = "hvdep"

            data_info.append(
                [
                    name,
                    path,
                    datetime.datetime.fromisoformat(data.attrs["DateTime"]),
                    data_type,
                ]
            )

            for k, v in attrs_mapping.items():
                # Try to get the attribute from the data, then from the coordinates
                try:
                    val = data.attrs[v]
                except KeyError:
                    try:
                        val = data.coords[v].values
                        if val.size == 1:
                            val = val.item()
                    except KeyError:
                        val = ""

                # Convert polarization values to human readable form
                if k == "Polarization":
                    if np.iterable(val):
                        val = np.asarray(val).astype(int)
                    else:
                        val = [round(val)]
                    val = [{0: "LH", 2: "LV", -1: "RC", 1: "LC"}.get(v, v) for v in val]
                    if len(val) == 1:
                        val = val[0]

                data_info[-1].append(val)

            del data

        # Sort by time and set index
        return (
            pd.DataFrame(data_info, columns=column_names)
            .sort_values("Time")
            .set_index("File Name")
        )

erlab.io.loaders
[12]:
NameAliasesLoader class
ssrlssrl52, bl5-2erlab.io.plugins.ssrl52.SSRL52Loader
merlinALS_BL4, als_bl4, BL403, bl403erlab.io.plugins.merlin.BL403Loader
da30DA30erlab.io.plugins.da30.DA30Loader
krissKRISSerlab.io.plugins.kriss.KRISSLoader
my_loader__main__.MyLoader
exampleEx__main__.ExampleLoader
example_completeExC__main__.ExampleLoaderComplete

The implementation looks complicated, but most of the code is boilerplate, and the actual logic is quite simple. You get a list of file names and paths to generate a summary for, define DataFrame columns and corresponding attributes, and then load the data one by one and extract the metadata. Let’s see how the resulting summary looks like.

Note

  • If ipywidgets is not installed, only the DataFrame will be displayed.

  • If you are viewing this documentation online, the summary will not be interactive. Run the code locally to try it out.

[13]:
erlab.io.set_loader("example_complete")
erlab.io.summarize()
  Type Lens Mode Scan Type T(K) Pass E Polarization hv x y z polar tilt azi
File Name                          
data_001 map Angular30 Fixed 20 10 LH 50 0 0 0 2→7 (0.5556, 10) 0 0
data_002 cut Angular30 Fixed 20 10 LH 50 0 0 0 5 0 0
data_003 cut Angular30 Fixed 20 10 LH 50 0 0 0 5 0 0
data_004 cut Angular30 Fixed 20 10 LH 50 0 0 0 5 0 0
data_005 cut Angular30 Fixed 20 10 LH 50 0 0 0 5 0 0
[13]:

Each cell in the summary table is formatted with formatter. If additional formatting that cannot be achieved within generate_summary is needed, formatter can be inherited in the subclass.

Tips

  • The data loading framework is designed to be simple and flexible, but it may not cover all possible setups. If you encounter a setup that cannot be loaded with the existing loaders, please let us know by opening an issue!

  • Before implementing a loader, see erlab.io.dataloader for descriptions about each attribute, and the values and types of the expected outputs. The implementation of existing loaders in the erlab.io.plugins module is a good starting point; see the source code on github.

  • If you have implemented a new loader or have improved an existing one, consider contributing it to the ERLabPy project by opening a pull request. We are always looking for new loaders to support more experimental setups! See more about contributing in the Contributing Guide.

  • If you wish to add post-processing steps that are applicable to all data loaded by that loader such as fixing the sign of the binding energy coordinates, you can inherit the post_process which by default handles coordinate and attribute renaming. This method is called after the data is loaded and can be used to modify the data before it is returned.

  • For complex data structures, constructing a full path from just the sequence number and the data directory can be difficult. In this case, the identify can be implemented to take additional keyword arguments. All keyword arguments passed to load are passed to identify!

    For instance, consider data with different prefixes like A_001.h5, A_002.h5, B_001.h5, etc. stored in the same directory. The sequence number alone is not enough to construct the full path. In this case, identify can be implemented to take an additional prefix argument which eliminates the ambiguity. Then, A_001.h5 can be loaded with erlab.io.load(1, prefix="A").

    If there are multiple file scans in this setup like A_001_S001.h5, A_001_S002.h5, etc., we would want to pass the prefix parameter to load from an identifier given as a file name. This is where the second return value of infer_index comes in handy, where you can return a dictionary which is passed to load.