Data IO (erlab.io)¶
Read & write ARPES data.
This module provides functions that enables loading various files such as hdf5 files, igor pro files, and ARPES data from different beamlines and laboratories.
For a single session, it is very common to use only one type of loader for a single
folder with all your data. Hence, the module provides a way to set a default loader for
a session. This is done using the set_loader() function. The same can be done for
the data directory using the set_data_dir() function.
For instructions on how to write a custom loader, see erlab.io.dataloader.
Examples
View all registered loaders:
>>> erlab.io.loaders
Load data by explicitly specifying the loader:
>>> dat = erlab.io.loaders["merlin"].load(...)
Set the default loader for the session:
>>> erlab.io.set_loader("merlin")
Learn more about loaders in the User Guide.
Modules
Data loading plugins. |
|
Base functionality for implementing data loaders. |
|
General-purpose I/O utilities. |
|
Backend for Igor Pro files. |
|
Utilities for reading NeXus files into xarray objects. |
|
Generates simple simulated ARPES data for testing and demonstration. |
|
Data import for characterization experiments. |
Module Attributes
- erlab.io.loaders¶
A global registry of all loaders registered in the session. The keys are the names of the loaders and the values are the loader objects.
See also
Functions
|
Load ARPES data. |
|
Context manager that temporarily sets the current loader and data directory. |
|
Set the default data directory for the current context. |
|
Set the current data loader for the current context. |
|
Context manager that temporarily extends various loader attributes. |
|
Summarize the data in the given directory. |
- erlab.io.extend_loader(coordinate_attrs=None, average_attrs=None, additional_attrs=None, overridden_attrs=None, additional_coords=None, overridden_coords=None)[source]¶
Context manager that temporarily extends various loader attributes.
This context manager can be used to temporarily customize the behavior of the data loader. This is particularly useful when loading data across multiple files, where the
coordinate_attrscan be extended so that the attributes in the data are promoted to coordinates and propagated when combining data across files.For one-off loads, the same arguments can be passed to
loadorerlab.io.load()with theloader_extensionskeyword. This keeps the extension settings attached to the load call, which is useful for generated loading code and ImageTool manager reload metadata.- Parameters:
name_map – Extends
name_map.coordinate_attrs (
tuple[str,...] |None, default:None) – Extendscoordinate_attrs.average_attrs (
tuple[str,...] |None, default:None) – Extendsaverage_attrs.additional_attrs (
dict[str,str|float|Callable[[DataArray],str|float]] |None, default:None) – Extendsadditional_attrs.overridden_attrs (
tuple[str,...] |None, default:None) – Extendsoverridden_attrs.additional_coords (
dict[str,str|float|Callable[[DataArray],str|float]] |None, default:None) – Extendsadditional_coords.overridden_coords (
tuple[str,...] |None, default:None) – Extendsoverridden_coords.
Example
import erlab erlab.io.set_loader("loader_name") with erlab.io.extend_loader(coordinate_attrs=("scan_number",)): data = erlab.io.load("file_name") data = erlab.io.load( "file_name", loader_extensions={"coordinate_attrs": ("scan_number",)}, )
See also
loadLoad data with optional
loader_extensions.coordinate_attrsThe attribute that is temporarily extended.
- erlab.io.load(data_dir=None, *, single=False, combine=True, parallel=False, progress=True, load_kwargs=None, loader_extensions=None, **kwargs)[source]¶
Load ARPES data.
This method is the main entry point for loading ARPES data.
Note
This method is not meant to be overridden in subclasses.
- Parameters:
identifier –
Value that identifies a scan uniquely.
If a string or path-like object is given, it is assumed to be the path to the data file relative to
data_dir. Ifdata_diris not specified,identifieris assumed to be the full path to the data file.If an integer is given, it is assumed to be a number that specifies the scan number, and is used to automatically determine the path to the data file(s). In this case, the
data_dirargument must be specified.
data_dir (
str|PathLike|None, default:None) –Where to look for the data. Must be a path to a valid directory. This argument is required when
identifieris an integer.When called as
erlab.io.load(), this argument defaults to the value set byerlab.io.set_data_dir()orerlab.io.loader_context().chunks – Chunking strategy for loading data with
daskfor supported loaders.single (
bool, default:False) –This argument is only used when
always_singleisFalse, andidentifieris given as a string or path-like object.If
identifierpoints to a file that is included in a multiple file scan, the default behavior whensingleisFalseis to return data from all files in the same scan. How the data is combined is determined by thecombineargument. IfTrue, only the data from the file given is returned.combine (
bool, default:True) –Whether to attempt to combine multiple files into a single data object. If
False, a list of data is returned. IfTrue, the loader tries to combine the data into a single data object and return it. Depending on the type of each data object, the returned object can be axarray.DataArray,xarray.Dataset, or axarray.DataTree.This argument is only used when
singleisFalse.parallel (
bool, default:False) –Whether to load multiple files in parallel using
dask. For possible values, seeload_multiple_parallel.This argument is only used when
singleisFalse.progress (
bool, default:True) –Whether to show a progress bar when loading multiple files.
This argument is only used when
singleisFalse.load_kwargs (
dict[str,Any] |None, default:None) – Additional keyword arguments to be passed toload_single. You can also pass additional keyword arguments directly toload, and they will be dispatched to eitheridentifyorload_singlebased on their signatures. See the**kwargsargument for details.loader_extensions (
Mapping[str,Any] |None, default:None) – Temporary extensions to loader attributes, with the same keys accepted byextend_loader.**kwargs – Additional keyword arguments are passed to
identifyandload_singlebased on their signatures. If a keyword argument is accepted by both methods, it is passed toidentify. Use theload_kwargsargument to pass an ambiguous keyword argument toload_single.
- Returns:
xarray.DataArrayorxarray.Datasetorxarray.DataTree– The loaded data.- Return type:
DataArray | Dataset | DataTree | list[DataArray] | list[Dataset] | list[DataTree]
Notes
The
data_dirset byerlab.io.set_data_dir()orerlab.io.loader_context()is only used when called aserlab.io.load(). When called directly on a loader instance, thedata_dirargument must be specified.For convenience, the
data_dirset byerlab.io.set_data_dir()orerlab.io.loader_context()is silently ignored when all of the following are satisfied:identifieris an absolute path to an existing file.data_diris not explicitly provided.The path created by joining
data_dirandidentifierdoes not point to an existing file.
This way, absolute file paths can be passed directly to the loader without changing the default data directory. For instance, consider the following directory structure.
cwd/ ├── data/ └── example.txt
The following code will load
./example.txtinstead of raising an error that./data/example.txtis missing:import erlab erlab.io.set_data_dir("data") erlab.io.load("example.txt")
However, if
./data/example.txtalso exists, the same code will load that one instead while warning about the ambiguity. This behavior may lead to unexpected results when the directory structure is not organized. Keep this in mind and try to keep all data files in the same level.
- erlab.io.load_hdf5(filename, **kwargs)[source]¶
Load data from an HDF5 file saved with
save_as_hdf5.This is a thin wrapper around
xarray.load_dataarrayandxarray.load_dataset.Deprecated since version 3.14.0: Use
xarray.load_dataarrayorxarray.load_datasetdirectly.- Parameters:
**kwargs – Extra arguments to
xarray.load_dataarrayorxarray.load_dataset.
- Returns:
xarray.DataArrayorxarray.Dataset– The loaded data.- Return type:
- erlab.io.loader_context(self, loader=None, data_dir=None)[source]¶
Context manager that temporarily sets the current loader and data directory.
- Parameters:
loader (
str, optional) – The name or alias of the loader to use in the context.data_dir (
stroros.PathLike, optional) – The data directory to use in the context.
Examples
Load data within a context manager:
>>> with erlab.io.loader_context("merlin"): ... dat_merlin = erlab.io.load(...)
Load data with different loaders and directories:
>>> erlab.io.set_loader("ssrl52", data_dir="/path/to/dir1") >>> dat_ssrl_1 = erlab.io.load(...) >>> with erlab.io.loader_context("merlin", data_dir="/path/to/dir2"): ... dat_merlin = erlab.io.load(...) >>> dat_ssrl_2 = erlab.io.load(...)
- erlab.io.open_hdf5(filename, **kwargs)[source]¶
Open data from an HDF5 file saved with
save_as_hdf5.This is a thin wrapper around
xarray.open_dataarrayandxarray.open_dataset.Deprecated since version 3.14.0: Use
xarray.open_dataarrayorxarray.open_datasetdirectly.- Parameters:
**kwargs – Extra arguments to
xarray.open_dataarrayorxarray.open_dataset.
- Returns:
xarray.DataArrayorxarray.Dataset– The opened data.- Return type:
- erlab.io.save_as_hdf5(data, filename, igor_compat=True, **kwargs)[source]¶
Save data in
HDF5format.Deprecated since version 3.14.0: Use
xarray.DataArray.to_netcdforxarray.Dataset.to_netcdfdirectly. To save data in a format compatible with Igor, useerlab.io.igor.save_wave().- Parameters:
data (
DataArray|Dataset) –xarray.DataArrayto save.igor_compat (
bool, default:True) – (Experimental) Make the resulting file compatible with Igor’sHDF5OpenFilefor DataArrays with up to 4 dimensions. A convenient Igor procedure is included in the repository. Default isTrue.**kwargs – Extra arguments to
xarray.DataArray.to_netcdf: refer to thexarraydocumentation for a list of all possible arguments.
- erlab.io.save_as_netcdf(data, filename, **kwargs)[source]¶
Save data in
netCDF4format.Deprecated since version 3.14.0: Use
xarray.DataArray.to_netcdforxarray.Dataset.to_netcdfdirectly.Discards invalid
netCDF4attributes and produces a warning.- Parameters:
data (
DataArray) –xarray.DataArrayto save.**kwargs – Extra arguments to
xarray.DataArray.to_netcdf: refer to thexarraydocumentation for a list of all possible arguments.
- erlab.io.set_data_dir(data_dir)[source]¶
Set the default data directory for the current context.
All subsequent calls to
erlab.io.load()will use the provideddata_dirunless specified.Note
This will only affect
erlab.io.load(). If the loader’sloadmethod is called directly, it will not use the default data directory.
- erlab.io.set_loader(loader)[source]¶
Set the current data loader for the current context.
All subsequent calls to
loadwill use the provided loader.- Parameters:
loader (
str|LoaderBase|None) – The loader to set. It can be either a string representing the name or alias of the loader, or a valid loader class.
Example
>>> erlab.io.set_loader("merlin") >>> dat_merlin_1 = erlab.io.load(...) >>> dat_merlin_2 = erlab.io.load(...)
- erlab.io.summarize(exclude=None, *, cache=True, display=True, rc=None)[source]¶
Summarize the data in the given directory.
Note
This method is not meant to be overridden in subclasses.
Takes a path to a directory and summarizes the data in the directory to a table, much like a log file. This is useful for quickly inspecting the contents of a directory.
The dataframe is formatted using the style from
get_stylerand displayed in the IPython shell. Results are cached in a pickle file in the directory.- Parameters:
data_dir – Directory to summarize.
exclude (default:
None) – A string or sequence of strings specifying glob patterns for files to be excluded from the summary. If provided, caching will be disabled.cache (default:
True) – Whether to use caching for the summary.display (default:
True) – Whether to display the formatted dataframe using the IPython shell. IfFalse, the dataframe will be returned without formatting. IfTruebut the IPython shell is not detected, the dataframe styler will be returned.rc (default:
None) – Optional dictionary of matplotlib rcParams to override the default for the plot in the interactive summary. Plot options such as the figure size and colormap can be changed using this argument.
- Returns:
pandas.DataFrameorpandas.io.formats.style.StylerorNone– Summary of the data in the directory.If
displayisFalse, the summary DataFrame is returned.If
displayisTrueand the IPython shell is detected, the summary will be displayed, andNonewill be returned.If
ipywidgetsis installed, an interactive widget will be returned instead ofNone.
If
displayisTruebut the IPython shell is not detected, the styler for the summary DataFrame will be returned.
- Return type: