Selecting and indexing data

In most cases, the powerful data manipulation and indexing methods provided by xarray are sufficient. In this page, some frequently used xarray features are summarized in addition to some utilities provided by this package. Refer to the xarray user guide for more information.

First, let us import some example data: a simple tight binding simulation of graphene.

[1]:
import xarray as xr

xr.set_options(display_expand_data=False)
[1]:
<xarray.core.options.set_options at 0x7fed44167f90>
[2]:
from erlab.io.exampledata import generate_data

dat = generate_data(seed=1).T
[3]:
dat
[3]:
<xarray.DataArray (eV: 300, ky: 250, kx: 250)> Size: 150MB
0.5243 1.033 0.6037 1.048 0.4388 ... 0.0003526 5.536e-06 2.813e-07 6.99e-08
Coordinates:
  * kx       (kx) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
  * ky       (ky) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
  * eV       (eV) float64 2kB -0.45 -0.4482 -0.4464 ... 0.08639 0.08819 0.09

We can see that the generated data is a three-dimensional xarray.DataArray . Now, let’s extract a cut along \(k_y = 0.3\).

[4]:
dat.sel(ky=0.3, method="nearest")
[4]:
<xarray.DataArray (eV: 300, kx: 250)> Size: 600kB
1.535 1.377 0.9181 0.4302 0.5897 ... 1.171e-06 8.757e-06 0.0002878 0.001415
Coordinates:
  * kx       (kx) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
    ky       float64 8B 0.2967
  * eV       (eV) float64 2kB -0.45 -0.4482 -0.4464 ... 0.08639 0.08819 0.09

How about the Fermi surface?

[5]:
dat.sel(eV=0.0, method="nearest")
[5]:
<xarray.DataArray (ky: 250, kx: 250)> Size: 500kB
0.3501 0.1119 0.1255 0.1379 0.05128 ... 0.5261 0.2332 0.1398 0.1466 0.1662
Coordinates:
  * kx       (kx) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
  * ky       (ky) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
    eV       float64 8B -0.000301

In many scenarios, it is necessary to perform integration across multiple array slices. This can be done by slicing and averaging. The following code integrates the intensity over a window of 50 meV centered at \(E_F\).

[6]:
dat.sel(eV=slice(-0.025, 0.025)).mean("eV")
[6]:
<xarray.DataArray (ky: 250, kx: 250)> Size: 500kB
0.2707 0.2155 0.2026 0.2084 0.1769 0.1773 ... 0.1942 0.2472 0.2516 0.2399 0.3594
Coordinates:
  * kx       (kx) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
  * ky       (ky) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89

However, doing this every time is cumbersome, and we have lost the coordinate eV. ERLabPy provides a callable accessor qsel to streamline this process.

[7]:
dat.qsel(eV=0.0, eV_width=0.05)
[7]:
<xarray.DataArray (ky: 250, kx: 250)> Size: 500kB
0.2707 0.2155 0.2026 0.2084 0.1769 0.1773 ... 0.1942 0.2472 0.2516 0.2399 0.3594
Coordinates:
  * kx       (kx) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
  * ky       (ky) float64 2kB -0.89 -0.8829 -0.8757 ... 0.8757 0.8829 0.89
    eV       float64 8B 0.000602

Note that the averaged coordinate is automatically added to the data array. This is useful for plotting and further analysis.

If the width is not specified, qsel behaves like passing method=’nearest’ to sel. If a slice is given instead of a single value, no integration is performed. All of these methods can be combined:

[8]:
dat.qsel(kx=slice(-0.3, 0.3), ky=0.3, eV=0.0, eV_width=0.05)
[8]:
<xarray.DataArray (kx: 84)> Size: 672B
0.3407 0.3622 0.3589 0.3659 0.2786 0.3363 ... 0.3541 0.318 0.3214 0.305 0.2766
Coordinates:
  * kx       (kx) float64 672B -0.2967 -0.2895 -0.2824 ... 0.2824 0.2895 0.2967
    ky       float64 8B 0.2967
    eV       float64 8B 0.000602

Masking

In some cases, it is necessary to mask the data. Although basic masks are supported by xarray, ERLabPy provides a way to mask data with arbitrary polygons.

Work in Progress

This part of the user guide is still under construction. For now, see erlab.analysis.mask. For the full list of packages and modules provided by ERLabPy, see API Reference.

Interpolation

In addition to the powerful interpolation methods provided by xarray, ERLabPy provides a convenient way to interpolate data along an arbitrary path.

Consider a Γ-M-K-Γ high symmetry path given as a list of kx and ky coordinates:

[9]:
import erlab.plotting.erplot as eplt
import matplotlib.pyplot as plt
import numpy as np

a = 6.97
kx = [0, 2 * np.pi / (a * np.sqrt(3)), 2 * np.pi / (a * np.sqrt(3)), 0]
ky = [0, 0, 2 * np.pi / (a * 3), 0]


dat.qsel(eV=-0.2).qplot(aspect="equal", cmap="Greys")
plt.plot(kx, ky, "o-")
[9]:
[<matplotlib.lines.Line2D at 0x7fecfdda9a50>]
../_images/user-guide_indexing_15_1.svg

To interpolate the data along this path with a step of 0.01 Å\(^{-1}\), we can use the following code:

[10]:
import erlab.analysis as era

dat_sliced = era.interpolate.slice_along_path(
    dat, vertices={"kx": kx, "ky": ky}, step_size=0.01
)
dat_sliced
[10]:
<xarray.DataArray (eV: 300, path: 140)> Size: 336kB
0.07295 0.1004 0.4831 0.6724 0.1885 ... 1.159e-13 1.01e-07 0.00131 0.138 0.1486
Coordinates:
  * eV       (eV) float64 2kB -0.45 -0.4482 -0.4464 ... 0.08639 0.08819 0.09
    kx       (path) float64 1kB 0.0 0.01021 0.02041 ... 0.01764 0.008821 0.0
    ky       (path) float64 1kB 0.0 0.0 0.0 0.0 ... 0.01528 0.01019 0.005093 0.0
  * path     (path) float64 1kB 0.0 0.01021 0.02041 ... 1.402 1.412 1.422

We can see that the data has been interpolated along the path. The new coordinate path contains the distance along the path, and the dimensions kx and ky are now expressed in terms of path.

The distance along the path can be calculated as the sum of the distances between consecutive points in the path.

[11]:
dat_sliced.plot(cmap="Greys")
eplt.fermiline()

# Distance between each pair of consecutive points
distances = np.linalg.norm(np.diff(np.vstack([kx, ky]), axis=-1), axis=0)
seg_coords = np.concatenate(([0], np.cumsum(distances)))

plt.xticks(seg_coords, labels=["Γ", "M", "K", "Γ"])
plt.xlim(0, seg_coords[-1])
for seg in seg_coords[1:-1]:
    plt.axvline(seg, ls="--", c="k", lw=1)
../_images/user-guide_indexing_19_0.svg

You will learn more about plotting in the next section.