This page was generated from docs/examples/DataSet/Working-With-Pandas-and-XArray.ipynb. Interactive online version: Binder badge.

Working with Pandas and XArray

This notebook demonstrates how Pandas and XArray can be used to work with the QCoDeS DataSet. It is not meant as a general introduction to Pandas and XArray. We refer to the official documentation for Pandas and XArray for this. This notebook requires that both Pandas and XArray are installed.

Setup

First we borrow an example from the measurement notebook to have some data to work with. We split the measurement in two so we can try merging it with Pandas.

[1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import qcodes as qc
from qcodes import load_or_create_experiment, initialise_database, Measurement
from qcodes.tests.instrument_mocks import DummyInstrument, DummyInstrumentWithMeasurement

qc.logger.start_all_logging()
Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename       : /home/runner/.qcodes/logs/command_history.log
Mode           : append
Output logging : True
Raw input log  : False
Timestamping   : True
State          : active
Qcodes Logfile : /home/runner/.qcodes/logs/220701-7767-qcodes.log
[2]:
# preparatory mocking of physical setup
dac = DummyInstrument('dac', gates=['ch1', 'ch2'])
dmm = DummyInstrumentWithMeasurement('dmm', setter_instr=dac)
station = qc.Station(dmm, dac)
[3]:
initialise_database()
exp = load_or_create_experiment(experiment_name='working_with_pandas',
                          sample_name="no sample")
[4]:
meas = Measurement(exp)
meas.register_parameter(dac.ch1)  # register the first independent parameter
meas.register_parameter(dac.ch2)  # register the second independent parameter
meas.register_parameter(dmm.v2, setpoints=(dac.ch1, dac.ch2))  # register the dependent one
[4]:
<qcodes.dataset.measurements.Measurement at 0x7fac59c456d0>

We then perform a very basic experiment. To be able to demonstrate merging of datasets in Pandas we will perform the measurement in two parts.

[5]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(-1, 0, 200, endpoint=False):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v2, val))

dataset1 = datasaver.dataset
Starting experimental run with id: 53.
[6]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(0, 1, 201):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v2, val))

dataset2 = datasaver.dataset
Starting experimental run with id: 54.

Two methods exists for extracting data to pandas dataframes. to_pandas_dataframe exports all the data from the dataset into a single dataframe. to_pandas_dataframe_dict returns the data as a dict from measured (dependent) parameters to DataFrames.

Please note that the to_pandas_dataframe is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_pandas_dataframe_dict should be used.

[7]:
df1 = dataset1.to_pandas_dataframe()
df2 = dataset2.to_pandas_dataframe()

Working with Pandas

Lets first inspect the Pandas DataFrame. Note how both dependent variables are used for the index. Pandas refers to this as a MultiIndex. For visual clarity, we just look at the first N points of the dataset.

[8]:
N = 10
[9]:
df1[:N]
[9]:
dmm_v2
dac_ch1 dac_ch2
-1.0 -1.00 0.000891
-0.99 -0.000381
-0.98 -0.000688
-0.97 0.000250
-0.96 -0.000409
-0.95 0.000540
-0.94 0.000677
-0.93 0.000736
-0.92 -0.000486
-0.91 -0.000128

We can also reset the index to return a simpler view where all data points are simply indexed by a running counter. As we shall see below this can be needed in some situations. Note that calling reset_index leaves the original dataframe untouched.

[10]:
df1.reset_index()[0:N]
[10]:
dac_ch1 dac_ch2 dmm_v2
0 -1.0 -1.00 0.000891
1 -1.0 -0.99 -0.000381
2 -1.0 -0.98 -0.000688
3 -1.0 -0.97 0.000250
4 -1.0 -0.96 -0.000409
5 -1.0 -0.95 0.000540
6 -1.0 -0.94 0.000677
7 -1.0 -0.93 0.000736
8 -1.0 -0.92 -0.000486
9 -1.0 -0.91 -0.000128

Pandas has built-in support for various forms of plotting. This does not, however, support MultiIndex at the moment so we use reset_index to make the data available for plotting.

[11]:
df1.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[11]:
<AxesSubplot:xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_20_1.png

Similarly, for the other dataframe:

[12]:
df2.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[12]:
<AxesSubplot:xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_22_1.png

Merging two dataframes with the same labels is fairly simple.

[13]:
df = pd.concat([df1, df2], sort=True)
[14]:
df.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[14]:
<AxesSubplot:xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_25_1.png

It is also possible to select a subset of data from the datframe based on the x and y values.

[15]:
df.loc[(slice(-1, -0.95), slice(-1, -0.97)), :]
[15]:
dmm_v2
dac_ch1 dac_ch2
-1.000 -1.00 0.000891
-0.99 -0.000381
-0.98 -0.000688
-0.97 0.000250
-0.995 -1.00 -0.000143
-0.99 0.000107
-0.98 -0.000485
-0.97 -0.000506
-0.990 -1.00 0.000868
-0.99 -0.000670
-0.98 -0.001091
-0.97 -0.000654
-0.985 -1.00 -0.000696
-0.99 0.000466
-0.98 0.000365
-0.97 -0.000936
-0.980 -1.00 0.000477
-0.99 0.000177
-0.98 -0.000623
-0.97 0.000006
-0.975 -1.00 -0.000679
-0.99 -0.001189
-0.98 -0.000606
-0.97 0.000098
-0.970 -1.00 0.000763
-0.99 0.000246
-0.98 0.000369
-0.97 0.000829
-0.965 -1.00 -0.000221
-0.99 0.000213
-0.98 0.000165
-0.97 0.000128
-0.960 -1.00 0.000655
-0.99 -0.000028
-0.98 0.000741
-0.97 0.000003
-0.955 -1.00 -0.000450
-0.99 0.000122
-0.98 -0.000197
-0.97 -0.000634
-0.950 -1.00 -0.000536
-0.99 -0.000051
-0.98 -0.000249
-0.97 0.000065

Working with XArray

In many cases when working with data on rectangular grids it may be more convenient to export the data to a XArray Dataset or DataArray. This is especially true when working in multi-dimentional parameter space.

Let’s setup and rerun the above measurment with the added dependent parameter dmm.v1.

[16]:
meas.register_parameter(dmm.v1, setpoints=(dac.ch1, dac.ch2))  # register the 2nd dependent parameter
[16]:
<qcodes.dataset.measurements.Measurement at 0x7fac59c456d0>
[17]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(-1, 1, 200):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val1 = dmm.v1.get()
            val2 = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v1, val1),
                                 (dmm.v2, val2))

dataset3 = datasaver.dataset
Starting experimental run with id: 55.

The QCoDeS DataSet can be directly converted to a XArray Dataset from the to_xarray_dataset method. This method returns the data from measured (dependent) parameters to an XArray Dataset. It’s also possible to return a dictionary of XArray DataArray’s if you were only interested in a single parameter using the to_xarray_dataarray method. For convenience we will access the DataArray’s from XArray’s Dataset directly.

Please note that the to_xarray_dataset is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_xarray_dataarray should be used.

[18]:
xaDataSet = dataset3.to_xarray_dataset()
[19]:
xaDataSet
[19]:
<xarray.Dataset>
Dimensions:  (dac_ch1: 200, dac_ch2: 201)
Coordinates:
  * dac_ch1  (dac_ch1) float64 -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0
  * dac_ch2  (dac_ch2) float64 -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0
Data variables:
    dmm_v1   (dac_ch1, dac_ch2) float64 6.151 6.216 6.224 ... 4.169 4.091 4.007
    dmm_v2   (dac_ch1, dac_ch2) float64 5.556e-05 0.000255 ... 0.0007645
Attributes: (12/14)
    ds_name:                  results
    sample_name:              no sample
    exp_name:                 working_with_pandas
    snapshot:                 {"station": {"instruments": {"dmm": {"functions...
    guid:                     aaaaaaaa-0000-0000-0000-0181ba44d599
    run_timestamp:            2022-07-01 14:58:02
    ...                       ...
    captured_counter:         3
    run_id:                   55
    run_description:          {"version": 3, "interdependencies": {"paramspec...
    parent_dataset_links:     []
    run_timestamp_raw:        1656687482.2728593
    completed_timestamp_raw:  1656687494.150549

As mentioned above it’s also possible to work with a XArray DataArray directly from the DataSet. The DataArray can only contain a single dependent variable and can be obtained from the Dataset by indexing using the parameter name.

[20]:
xaDataArray = xaDataSet['dmm_v2']# or xaDataSet.dmm_v2
[21]:
xaDataArray
[21]:
<xarray.DataArray 'dmm_v2' (dac_ch1: 200, dac_ch2: 201)>
array([[ 5.55649711e-05,  2.54969578e-04, -4.81175044e-04, ...,
         3.62686263e-04,  6.91763795e-04, -3.28102121e-04],
       [ 6.22475527e-04,  1.33951939e-04, -7.37645618e-04, ...,
        -4.28379921e-04, -2.35687667e-04,  7.05352297e-04],
       [-1.33653225e-04, -3.28160221e-04,  5.63552090e-04, ...,
        -1.34629214e-03, -4.72394688e-04,  9.42514078e-05],
       ...,
       [ 8.46047874e-04, -7.05544230e-05, -2.69396298e-04, ...,
        -5.30144856e-04,  1.26457024e-04, -6.68770145e-05],
       [-8.20728094e-05, -4.61750086e-04, -6.39024144e-04, ...,
        -8.71034232e-04,  4.53266082e-05, -4.90992214e-05],
       [-1.27091062e-04,  1.18756684e-04, -8.85552188e-04, ...,
        -7.66723175e-05,  4.63397156e-04,  7.64541981e-04]])
Coordinates:
  * dac_ch1  (dac_ch1) float64 -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0
  * dac_ch2  (dac_ch2) float64 -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0
Attributes:
    name:           dmm_v2
    paramtype:      numeric
    label:          Gate v2
    unit:           V
    inferred_from:  []
    depends_on:     ['dac_ch1', 'dac_ch2']
    units:          V
    long_name:      Gate v2
[22]:
fig, ax = plt.subplots(2,2)
xaDataSet.dmm_v2.plot(ax=ax[0,0])
xaDataSet.dmm_v1.plot(ax=ax[1,1])
xaDataSet.dmm_v2.mean(dim='dac_ch1').plot(ax=ax[1,0])
xaDataSet.dmm_v1.mean(dim='dac_ch2').plot(ax=ax[0,1])
fig.tight_layout()
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_38_0.png

Above we demonstrated a few ways to index the data from a DataArray. For instance the DataArray can be directly plotted, the extracted mean or a specific row/column can also be plotted.

[ ]: