This page was generated from docs/examples/DataSet/Benchmarking.ipynb. Interactive online version: Binder badge.

Dataset Benchmarking

This notebook is a behind-the-scenes benchmarking notebook, mainly for use by developers. The recommended way for users to interact with the dataset is via the Measurement object and its associated context manager. See the corresponding notebook for a comprehensive toturial on how to use those.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import qcodes as qc
from qcodes import ParamSpec, new_data_set, new_experiment, initialise_database, load_or_create_experiment
import numpy as np
[2]:
qc.config.core.db_location
[2]:
'~/experiments.db'
[3]:
initialise_database()
Upgrading database; v0 -> v1: : 0it [00:00, ?it/s]
Upgrading database; v1 -> v2: 100%|██████████| 1/1 [00:00<00:00, 409.48it/s]
Upgrading database; v2 -> v3: : 0it [00:00, ?it/s]
Upgrading database; v3 -> v4: : 0it [00:00, ?it/s]
Upgrading database; v4 -> v5: 100%|██████████| 1/1 [00:00<00:00, 480.72it/s]
Upgrading database; v5 -> v6: : 0it [00:00, ?it/s]
Upgrading database; v6 -> v7: 100%|██████████| 1/1 [00:00<00:00, 468.22it/s]
Upgrading database; v7 -> v8: 100%|██████████| 1/1 [00:00<00:00, 505.89it/s]
Upgrading database; v8 -> v9: 100%|██████████| 1/1 [00:00<00:00, 989.92it/s]

Setup

[4]:
exp = load_or_create_experiment("benchmarking", sample_name="the sample is a lie")
exp
[4]:
benchmarking#the sample is a lie#1@/home/runner/experiments.db
--------------------------------------------------------------

Now we can create a dataset. Note two things:

- if we don't specfiy a exp_id, but we have an experiment in the experiment container the dataset will go into that one.
- dataset can be created from the experiment object
[5]:
dataSet = new_data_set("benchmark_data", exp_id=exp.exp_id)
exp
[5]:
benchmarking#the sample is a lie#1@/home/runner/experiments.db
--------------------------------------------------------------
1-benchmark_data-1-None-0

In this benchmark we will assueme that we are doing a 2D loop and investigate the performance implications of writing to the dataset

[6]:
x_shape = 100
y_shape = 100

Baseline: Generate data

[7]:
%%time
for x in range(x_shape):
    for y in range(y_shape):
        z = np.random.random_sample(1)
CPU times: user 8.69 ms, sys: 60 µs, total: 8.75 ms
Wall time: 8.66 ms

and store in memory

[8]:
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))
[9]:
%%time
for x in range(x_shape):
    for y in range(y_shape):
        x_data[x,y] = x
        y_data[x,y] = y
        z_data[x,y] = np.random.random_sample()
CPU times: user 9.77 ms, sys: 166 µs, total: 9.93 ms
Wall time: 9.85 ms

Add to dataset inside double loop

[10]:
double_dataset = new_data_set("doubledata", exp_id=exp.exp_id,
                              specs=[ParamSpec("x", "numeric"),
                                     ParamSpec("y", "numeric"),
                                     ParamSpec('z', "numeric")])
double_dataset.mark_started()

Note that this is so slow that we are only doing a 10th of the computation

[11]:
%%time
for x in range(x_shape//10):
    for y in range(y_shape):
        double_dataset.add_results([{"x": x, 'y': y, 'z': np.random.random_sample()}])
CPU times: user 229 ms, sys: 36.9 ms, total: 266 ms
Wall time: 550 ms

Add the data in outer loop and store as np array

[12]:
single_dataset = new_data_set("singledata", exp_id=exp.exp_id,
                              specs=[ParamSpec("x", "array"),
                                     ParamSpec("y", "array"),
                                     ParamSpec('z', "array")])
single_dataset.mark_started()
x_data = np.zeros((y_shape))
y_data = np.zeros((y_shape))
z_data = np.zeros((y_shape))
[13]:
%%time
for x in range(x_shape):
    for y in range(y_shape):
        x_data[y] = x
        y_data[y] = y
        z_data[y] = np.random.random_sample(1)
    single_dataset.add_results([{"x": x_data, 'y': y_data, 'z': z_data}])
CPU times: user 44.2 ms, sys: 400 µs, total: 44.6 ms
Wall time: 56.1 ms

Save once after loop

[14]:
zero_dataset = new_data_set("zerodata", exp_id=exp.exp_id,
                            specs=[ParamSpec("x", "array"),
                                   ParamSpec("y", "array"),
                                   ParamSpec('z', "array")])
zero_dataset.mark_started()
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))
[15]:
%%time
for x in range(x_shape):
    for y in range(y_shape):
        x_data[x,y] = x
        y_data[x,y] = y
        z_data[x,y] = np.random.random_sample(1)
zero_dataset.add_results([{'x':x_data, 'y':y_data, 'z':z_data}])
CPU times: user 16.8 ms, sys: 62 µs, total: 16.9 ms
Wall time: 17 ms

Array parameter

[16]:
array1D_dataset = new_data_set("array1Ddata", exp_id=exp.exp_id,
                               specs=[ParamSpec("x", "array"),
                                      ParamSpec("y", "array"),
                                      ParamSpec('z', "array")])
array1D_dataset.mark_started()
y_setpoints = np.arange(y_shape)
[17]:
%%timeit
for x in range(x_shape):
    x_data[x,:] = x
    array1D_dataset.add_results([{'x':x_data[x,:], 'y':y_setpoints, 'z':np.random.random_sample(y_shape)}])
40.6 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[18]:
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))
y_setpoints = np.arange(y_shape)
[19]:
array0D_dataset = new_data_set("array0Ddata", exp_id=exp.exp_id,
                               specs=[ParamSpec("x", "array"),
                                      ParamSpec("y", "array"),
                                      ParamSpec('z', "array")])
array0D_dataset.mark_started()
[20]:
%%timeit
for x in range(x_shape):
    x_data[x,:] = x
    y_data[x,:] = y_setpoints
    z_data[x,:] = np.random.random_sample(y_shape)
array0D_dataset.add_results([{'x':x_data, 'y':y_data, 'z':z_data}])
2.1 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Insert many

[21]:
data = []
for i in range(100):
    for j in range(100):
        data.append({'x': i, 'y':j, 'z':np.random.random_sample()})
[22]:
many_Data = new_data_set("many_data", exp_id=exp.exp_id,
                         specs=[ParamSpec("x", "numeric"),
                                ParamSpec("y", "numeric"),
                                ParamSpec("z", "numeric")])
many_Data.mark_started()
[23]:
%%timeit
many_Data.add_results(data)
34.4 ms ± 545 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)