This page was generated from docs/examples/DataSet/Dataset_Performance.ipynb. Interactive online version: Binder badge.

DataSet Performance

This notebook shows the trade-off between inserting data into a database row-by-row and as binary blobs. Inserting the data row-by-row means that we have direct access to all the data and may perform queries directly on the values of the data. On the other hand, as we shall see, this is much slower than inserting the data directly as binary blobs.

First, we choose a new location for the database to ensure that we don’t add a bunch of benchmarking data to the default one.

[1]:
import os
cwd = os.getcwd()
import qcodes as qc
qc.config["core"]["db_location"] = os.path.join(cwd, 'testing.db')

[2]:
%matplotlib inline
import time
import matplotlib.pyplot as plt
import numpy as np

import qcodes as qc
from qcodes.instrument.parameter import ManualParameter
from qcodes.dataset.experiment_container import (Experiment,
                                                 load_last_experiment,
                                                 new_experiment)
from qcodes.dataset.sqlite.database import initialise_database
from qcodes import load_or_create_experiment
from qcodes.dataset.measurements import Measurement
[3]:
initialise_database()
exp = load_or_create_experiment(experiment_name='tutorial_exp', sample_name="no sample")
Upgrading database; v0 -> v1: : 0it [00:00, ?it/s]
Upgrading database; v1 -> v2: 100%|██████████| 1/1 [00:00<00:00, 1168.98it/s]
Upgrading database; v2 -> v3: : 0it [00:00, ?it/s]
Upgrading database; v3 -> v4: : 0it [00:00, ?it/s]
Upgrading database; v4 -> v5: 100%|██████████| 1/1 [00:00<00:00, 576.06it/s]
Upgrading database; v5 -> v6: : 0it [00:00, ?it/s]
Upgrading database; v6 -> v7: 100%|██████████| 1/1 [00:00<00:00, 604.11it/s]
Upgrading database; v7 -> v8: 100%|██████████| 1/1 [00:00<00:00, 1039.22it/s]
Upgrading database; v8 -> v9: 100%|██████████| 1/1 [00:00<00:00, 1001.74it/s]

Here, we define a simple function to benchmark the time it takes to insert n points with either numeric or array data type. We will compare both the time used to call add_result and the time used for the full measurement.

[4]:
def insert_data(paramtype, npoints, nreps=1):

    meas = Measurement(exp=exp)

    x1 = ManualParameter('x1')
    x2 = ManualParameter('x2')
    x3 = ManualParameter('x3')
    y1 = ManualParameter('y1')
    y2 = ManualParameter('y2')

    meas.register_parameter(x1, paramtype=paramtype)
    meas.register_parameter(x2, paramtype=paramtype)
    meas.register_parameter(x3, paramtype=paramtype)
    meas.register_parameter(y1, setpoints=[x1, x2, x3],
                            paramtype=paramtype)
    meas.register_parameter(y2, setpoints=[x1, x2, x3],
                            paramtype=paramtype)
    start = time.perf_counter()
    with meas.run() as datasaver:
        start_adding = time.perf_counter()
        for i in range(nreps):
            datasaver.add_result((x1, np.random.rand(npoints)),
                                 (x2, np.random.rand(npoints)),
                                 (x3, np.random.rand(npoints)),
                                 (y1, np.random.rand(npoints)),
                                 (y2, np.random.rand(npoints)))
        stop_adding = time.perf_counter()
        run_id = datasaver.run_id
    stop = time.perf_counter()
    tot_time = stop - start
    add_time = stop_adding - start_adding
    return tot_time, add_time, run_id

Comparison between numeric/array data and binary blob

Case1: Short experiment time

[5]:
sizes = [1,500,1000,2000,3000,4000,5000]
t_numeric = []
t_numeric_add = []
t_array = []
t_array_add = []
for size in sizes:
    tn, tna, run_id_n =  insert_data('numeric', size)
    t_numeric.append(tn)
    t_numeric_add.append(tna)

    ta, taa, run_id_a =  insert_data('array', size)
    t_array.append(ta)
    t_array_add.append(taa)
Starting experimental run with id: 1.
Starting experimental run with id: 2.
Starting experimental run with id: 3.
Starting experimental run with id: 4.
Starting experimental run with id: 5.
Starting experimental run with id: 6.
Starting experimental run with id: 7.
Starting experimental run with id: 8.
Starting experimental run with id: 9.
Starting experimental run with id: 10.
Starting experimental run with id: 11.
Starting experimental run with id: 12.
Starting experimental run with id: 13.
Starting experimental run with id: 14.
[6]:
fig, ax = plt.subplots(1,1)
ax.plot(sizes, t_numeric, 'o-', label='Inserting row-by-row')
ax.plot(sizes, t_numeric_add, 'o-', label='Inserting row-by-row: add_result only')
ax.plot(sizes, t_array, 'd-', label='Inserting as binary blob')
ax.plot(sizes, t_array_add, 'd-', label='Inserting as binary blob: add_result only')
ax.legend()
ax.set_xlabel('Array length')
ax.set_ylabel('Time (s)')
fig.tight_layout()
../../_images/examples_DataSet_Dataset_Performance_10_0.png

As shown in the latter figure, the time to setup and and close the experiment is approximately 0.4 sec. In case of small array sizes, the difference between inserting values of data as arrays and inserting them row-by-row is relatively unimportant. At larger array sizes, i.e. above 10000 points, the cost of writing data as individual datapoints starts to become important.

Case2: Long experiment time

[7]:
sizes = [1,500,1000,2000,3000,4000,5000]
nreps = 100
t_numeric = []
t_numeric_add = []
t_numeric_run_ids = []
t_array = []
t_array_add = []
t_array_run_ids = []
for size in sizes:
    tn, tna, run_id_n =  insert_data('numeric', size, nreps=nreps)
    t_numeric.append(tn)
    t_numeric_add.append(tna)
    t_numeric_run_ids.append(run_id_n)

    ta, taa, run_id_a =  insert_data('array', size, nreps=nreps)
    t_array.append(ta)
    t_array_add.append(taa)
    t_array_run_ids.append(run_id_a)
Starting experimental run with id: 15.
Starting experimental run with id: 16.
Starting experimental run with id: 17.
Starting experimental run with id: 18.
Starting experimental run with id: 19.
Starting experimental run with id: 20.
Starting experimental run with id: 21.
Starting experimental run with id: 22.
Starting experimental run with id: 23.
Starting experimental run with id: 24.
Starting experimental run with id: 25.
Starting experimental run with id: 26.
Starting experimental run with id: 27.
Starting experimental run with id: 28.
[8]:
fig, ax = plt.subplots(1,1)
ax.plot(sizes, t_numeric, 'o-', label='Inserting row-by-row')
ax.plot(sizes, t_numeric_add, 'o-', label='Inserting row-by-row: add_result only')
ax.plot(sizes, t_array, 'd-', label='Inserting as binary blob')
ax.plot(sizes, t_array_add, 'd-', label='Inserting as binary blob: add_result only')
ax.legend()
ax.set_xlabel('Array length')
ax.set_ylabel('Time (s)')
fig.tight_layout()
../../_images/examples_DataSet_Dataset_Performance_14_0.png

However, as we increase the length of the experiment, as seen here by repeating the insertion 100 times, we see a big difference between inserting values of the data row-by-row and inserting it as a binary blob.

Loading the data

[9]:
from qcodes.dataset.data_set import load_by_id

As usual you can load the data by using the load_by_id function but you will notice that the different storage methods are reflected in shape of the data as it is retrieved.

[10]:
run_id_n = t_numeric_run_ids[0]
run_id_a = t_array_run_ids[0]
[11]:
ds = load_by_id(run_id_n)
ds.get_parameter_data('x1')
[11]:
{'x1': {'x1': array([0.01346573, 0.01346573, 0.8432185 , 0.8432185 , 0.08071499,
         0.08071499, 0.82926901, 0.82926901, 0.37114735, 0.37114735,
         0.85179036, 0.85179036, 0.72814874, 0.72814874, 0.5327098 ,
         0.5327098 , 0.4667212 , 0.4667212 , 0.81347027, 0.81347027,
         0.51841558, 0.51841558, 0.43342339, 0.43342339, 0.96265911,
         0.96265911, 0.83824878, 0.83824878, 0.94317987, 0.94317987,
         0.91746835, 0.91746835, 0.0491853 , 0.0491853 , 0.4221663 ,
         0.4221663 , 0.53719481, 0.53719481, 0.74621705, 0.74621705,
         0.33580721, 0.33580721, 0.9753307 , 0.9753307 , 0.77717915,
         0.77717915, 0.06106411, 0.06106411, 0.1281455 , 0.1281455 ,
         0.44580944, 0.44580944, 0.0824832 , 0.0824832 , 0.90812735,
         0.90812735, 0.90853343, 0.90853343, 0.99960267, 0.99960267,
         0.2975469 , 0.2975469 , 0.59558917, 0.59558917, 0.40758156,
         0.40758156, 0.14955556, 0.14955556, 0.01821124, 0.01821124,
         0.87748426, 0.87748426, 0.22188407, 0.22188407, 0.7862608 ,
         0.7862608 , 0.28996349, 0.28996349, 0.59894537, 0.59894537,
         0.31179838, 0.31179838, 0.89873094, 0.89873094, 0.91641634,
         0.91641634, 0.37623725, 0.37623725, 0.06564658, 0.06564658,
         0.63482662, 0.63482662, 0.12172576, 0.12172576, 0.74165676,
         0.74165676, 0.8634362 , 0.8634362 , 0.07471977, 0.07471977,
         0.37186142, 0.37186142, 0.61430332, 0.61430332, 0.67566345,
         0.67566345, 0.96140762, 0.96140762, 0.70376986, 0.70376986,
         0.54480584, 0.54480584, 0.94810448, 0.94810448, 0.39819477,
         0.39819477, 0.22487531, 0.22487531, 0.34197173, 0.34197173,
         0.1601389 , 0.1601389 , 0.19068002, 0.19068002, 0.84497352,
         0.84497352, 0.86952449, 0.86952449, 0.33056892, 0.33056892,
         0.61211073, 0.61211073, 0.66149723, 0.66149723, 0.07183546,
         0.07183546, 0.25839494, 0.25839494, 0.83463609, 0.83463609,
         0.39691771, 0.39691771, 0.07244563, 0.07244563, 0.48459465,
         0.48459465, 0.99709613, 0.99709613, 0.03866557, 0.03866557,
         0.26049027, 0.26049027, 0.58867165, 0.58867165, 0.19855127,
         0.19855127, 0.65397625, 0.65397625, 0.95550617, 0.95550617,
         0.53514686, 0.53514686, 0.43831907, 0.43831907, 0.06514858,
         0.06514858, 0.8712124 , 0.8712124 , 0.07168781, 0.07168781,
         0.62566845, 0.62566845, 0.15661193, 0.15661193, 0.57482874,
         0.57482874, 0.50089445, 0.50089445, 0.47171968, 0.47171968,
         0.9717618 , 0.9717618 , 0.54782214, 0.54782214, 0.72480658,
         0.72480658, 0.17813286, 0.17813286, 0.64681186, 0.64681186,
         0.13805777, 0.13805777, 0.33153965, 0.33153965, 0.83568035,
         0.83568035, 0.29816295, 0.29816295, 0.05082316, 0.05082316])}}

And a dataset stored as binary arrays

[12]:
ds = load_by_id(run_id_a)
ds.get_parameter_data('x1')
[12]:
{'x1': {'x1': array([[0.30913285],
         [0.30913285],
         [0.31824056],
         [0.31824056],
         [0.82218038],
         [0.82218038],
         [0.6753658 ],
         [0.6753658 ],
         [0.56556494],
         [0.56556494],
         [0.72976881],
         [0.72976881],
         [0.67597207],
         [0.67597207],
         [0.72848964],
         [0.72848964],
         [0.30639625],
         [0.30639625],
         [0.38798129],
         [0.38798129],
         [0.68009059],
         [0.68009059],
         [0.1879681 ],
         [0.1879681 ],
         [0.33886652],
         [0.33886652],
         [0.80202562],
         [0.80202562],
         [0.58778956],
         [0.58778956],
         [0.65183972],
         [0.65183972],
         [0.95386294],
         [0.95386294],
         [0.21441348],
         [0.21441348],
         [0.05193749],
         [0.05193749],
         [0.09731245],
         [0.09731245],
         [0.13418259],
         [0.13418259],
         [0.14194028],
         [0.14194028],
         [0.07980188],
         [0.07980188],
         [0.95327752],
         [0.95327752],
         [0.900506  ],
         [0.900506  ],
         [0.69603282],
         [0.69603282],
         [0.37128089],
         [0.37128089],
         [0.92003015],
         [0.92003015],
         [0.4481168 ],
         [0.4481168 ],
         [0.31179849],
         [0.31179849],
         [0.64601218],
         [0.64601218],
         [0.04829135],
         [0.04829135],
         [0.90663841],
         [0.90663841],
         [0.1878819 ],
         [0.1878819 ],
         [0.41785743],
         [0.41785743],
         [0.46853813],
         [0.46853813],
         [0.48795281],
         [0.48795281],
         [0.98346393],
         [0.98346393],
         [0.38163253],
         [0.38163253],
         [0.94890564],
         [0.94890564],
         [0.11643892],
         [0.11643892],
         [0.31667163],
         [0.31667163],
         [0.74981608],
         [0.74981608],
         [0.66474571],
         [0.66474571],
         [0.87071527],
         [0.87071527],
         [0.29439328],
         [0.29439328],
         [0.63717439],
         [0.63717439],
         [0.14130455],
         [0.14130455],
         [0.19906482],
         [0.19906482],
         [0.34356463],
         [0.34356463],
         [0.14277711],
         [0.14277711],
         [0.89398716],
         [0.89398716],
         [0.04054561],
         [0.04054561],
         [0.87474202],
         [0.87474202],
         [0.30260937],
         [0.30260937],
         [0.88512918],
         [0.88512918],
         [0.04696474],
         [0.04696474],
         [0.01594667],
         [0.01594667],
         [0.64793833],
         [0.64793833],
         [0.61376341],
         [0.61376341],
         [0.5280676 ],
         [0.5280676 ],
         [0.76860908],
         [0.76860908],
         [0.97932637],
         [0.97932637],
         [0.0772994 ],
         [0.0772994 ],
         [0.28047965],
         [0.28047965],
         [0.54350451],
         [0.54350451],
         [0.770824  ],
         [0.770824  ],
         [0.27147099],
         [0.27147099],
         [0.03751226],
         [0.03751226],
         [0.65895933],
         [0.65895933],
         [0.63358615],
         [0.63358615],
         [0.21091822],
         [0.21091822],
         [0.88667308],
         [0.88667308],
         [0.71423018],
         [0.71423018],
         [0.64167674],
         [0.64167674],
         [0.46542636],
         [0.46542636],
         [0.79486003],
         [0.79486003],
         [0.83735581],
         [0.83735581],
         [0.32039545],
         [0.32039545],
         [0.16688128],
         [0.16688128],
         [0.71683849],
         [0.71683849],
         [0.11780032],
         [0.11780032],
         [0.6924127 ],
         [0.6924127 ],
         [0.4168249 ],
         [0.4168249 ],
         [0.5715653 ],
         [0.5715653 ],
         [0.61755778],
         [0.61755778],
         [0.72665461],
         [0.72665461],
         [0.68600566],
         [0.68600566],
         [0.83585606],
         [0.83585606],
         [0.28176351],
         [0.28176351],
         [0.75446735],
         [0.75446735],
         [0.72062123],
         [0.72062123],
         [0.12449837],
         [0.12449837],
         [0.63130581],
         [0.63130581],
         [0.69580866],
         [0.69580866],
         [0.16088049],
         [0.16088049],
         [0.44421616],
         [0.44421616],
         [0.96281966],
         [0.96281966],
         [0.44780501],
         [0.44780501],
         [0.60527775],
         [0.60527775]])}}
[ ]: