Interactive hypothesis testing for anti-anxiety medicine

Python and atoti: experimentation in data analytics

Crafting an analytical model is a truly creative process. Often when you start, you don’t really know what model you want to build, how you want to name and visualize things, or what features will work best for your machine learning algorithm. Building and iterating on a complex model can be fun!

atoti is a great addition to your JupyterLab toolkit, as it is designed to fuel up experimentation in this interactive environment. When you import the atoti python module and start a session, it spins up an in-memory database – similar to Apache Spark – ready to slice and dice your big data set. In addition to that, it launches a dashboarding Tableau-like web-app for you to visualize and explore the results:

In this post, I want to illustrate how to use atoti and build a model incrementally cell-by-cell on the example of a hypothesis testing problem. 

We’ll do it in steps and I will show you how to:

  • inject data into the app as you go
  • define your own complex metrics in python – as an example, I’ve implemented a measure computing “H0 Rejected” and “Can’t Reject H0” for a paired-t-test 
  • re-define metrics as you go
  • experiment by uploading alternative data sets
  • experiment by running simulations on measures

and keep visualizing the impact immediately using dynamic pivot tables. Your calculations will run on-the-fly and can be broken down and filtered by any attributes and their combinations.

You can download my notebook here.

Hypothesis testing on memory recall performance

The experiment is inspired by research showing that sad memories might improve memory recall abilities for evolutionary purposes. At the same time, Benzodiazepines are known to have adverse effects on memory performance. Would priming with good or bad memories compensate for the obstructive impact of these anti-anxiety drugs? 

In this notebook, we will use a dataset from a Kaggle user submission – Islanders data to study the impact of anti-anxiety medicine on memory recall ability. 

  • The dataset contains observations on Alprazolam and Triazolam, as well as Placebo (Sugar) – see column “Drug” below and different dosages – see column “Dosage
  • Participants were primed with happy and sad memories ten minutes before the memory test – see column “Happy_Sad_group” – as it is believed that a person’s mood may impact memory recall.
  • Memory scores represent response time, i.e. how long it takes to finish the memory test. Higher Memory scores mean the ability actually reduces

We will test the hypothesis that memory scores do not change as a result of treatment. 

We will inject a measure displaying the Test Result into our model and then use it in a dashboard to apply the test interactively to any scope of observations – by the drug, dosage, the happy and sad group – and their combinations to explore how these factors affect memory recall ability, without writing any additional code. 

Let’s have a quick look at the raw data.

df = pd.read_csv("")

Launching the atoti app

As a first step, I’m importing atoti and creating a session. 

from atoti.config import create_config
config = create_config(metadata_db="./metadata.db")
session = tt.create_session(config=config)

After this step, the BI tool is up and running, and I can access it using the URL provided by  session.url:

The in-memory database has been spun up too, but it doesn’t have any data yet. It’s time to inject my sample data:

observations_datastore = session.read_pandas(
    df, keys=["index"], array_sep=";", store_name="Observations"

After running the create_cube command, I will be able to access a basic data summary:

cube = session.create_cube(observations_datastore)

To visualize the data it is convenient to use atoti embedded widgets where you can slice your data right inside your notebook – cube.visualize() comes in handy here. See below an example where I’ve selected an average Memory Score – and expanded it by Drug name using the atoti JupyterLab extension.

The drug Alprazolam shows some effect on the memory recall response time, we’ll look at that more later.

So far, we’ve launched an app, injected our data set and used atoti to visualize basic data summaries via atoti widgets and the app. Let’s load additional attributes and create metrics for hypothesis testing.

Extending the data model 

At any point I can enrich the data model by adding more attributes into datastores. For example, in the following cell I’m joining the observations with “age groups”:

age_groups_store = session.read_pandas(
        data=[("0-25Y", i) for i in range(25)]
        + [("25Y - 40Y", i) for i in range(25, 40)]
        + [("40Y - 55Y", i) for i in range(40, 55)]
        + [("55Y+", i) for i in range(55, 100)],
        columns=["age group", "age"],
    store_name="Age Groups",


Viola, now we can summarize the data along the new attribute – Age Group:

You can always view your current data model by calling cube.schema:

Injecting more data doesn’t require changing the measures and can be done at any point harmlessly.

Defining reusable measures

So far, we have loaded the data and were able to explore it using the default aggregation functions. Now, let’s see how to create custom measures in atoti. Every time we define a new measure – it is published into the app, and can be used by anyone in the team to analyze data in the atoti app. 

I want to start with a simple example to illustrate how custom calculations can be added into the app. Here I’m creating measures that will display the standard statistics:

m["Mean"] = tt.agg.mean(observations_datastore["MemoryScores"])
m["Std"] = tt.agg.std(observations_datastore["MemoryScores"])
m["Min"] = tt.agg.min(observations_datastore["MemoryScores"])
m["Max"] = tt.agg.max(observations_datastore["MemoryScores"])
m["25%"] = tt.agg.quantile(observations_datastore["MemoryScores"], 0.25)
m["50%"] = tt.agg.quantile(observations_datastore["MemoryScores"], 0.50)
m["75%"] = tt.agg.quantile(observations_datastore["MemoryScores"], 0.75)

The new measures are created and can be used by anyone who has access to the application. For example, In the below screenshot the new measures are visualized and broken down by “Happy_Sad_group”.

The atoti API provides numerous aggregation and mathematical functions, ‘case when’ and other types of expressions to allow complex aggregations. Paired with hierarchical data support (for example, a parent_value function and using siblings as an aggregation scope), multi-dimensional analysis (location shifts) and vector aggregations it allows the design of fairly complex on-the-fly aggregations. We will look into implementing a hypothesis testing metric below, and you are invited to explore atoti’s gallery for more examples.

More complex aggregations

As the possibilities to create analytical measures in atoti are practically endless, we want to define a measure that will apply a paired t-test and simply display: “H0 Rejected” or “Can’t Reject H0” for any scope of data that we select. 

While hypothesis testing itself is not that novel, using atoti gives us the ability to interactively apply it to the data. As illustrated by the following animation, we are displaying the “Test Result” measure and then breaking it down for different scopes of data – by the drug, by dosage and by age – and the measure is recomputed on-the-fly based on the selected scope – the mean, stdev, degrees of freedom and critical values are re-evaluated automatically.

Now let’s look at the code snippets creating the “Test Result” measure.

As a refresher, a paired t-test is a statistical routine that can help to test a medication effect, given the before and after measurements. We will check that the data provides evidence that will allow us to reject the null hypothesis:

  • H0: on average, there’s no difference in the memory scores before and after treatment, 
  • H1: on average, the memory score after the treatment is larger (response time longer) than before, 


  • H0: mean difference of memory scores is equal to 0, mu = 0
  • H1: mean difference of memory scores between after and before measurements is above 0, mu>0.

Firstly, we need to compute t-statistic for the differences between memory scores after and before treatment, the statistic is defined as per the formula:

I’ll need mean, stdev and the number of observations (differences) to compute it.

Let’s create a measure for the difference. I’m using atoti .filter function to create average memory scores for the “Before” and “After” measurements, and then taking their difference.

m["MemoryScoresAfter.Mean"] = tt.filter(
    m["MemoryScores.MEAN"], l["Before or After"] == "After"
m["MemoryScoresBefore.Mean"] = tt.filter(
    m["MemoryScores.MEAN"], l["Before or After"] == "Before"

m["Diff.Mean"] = m["MemoryScoresAfter.Mean"] - m["MemoryScoresBefore.Mean"]

Every time we define a new measure – it is published into the app, and can be used for data visualization, or it can be used as an input to another function, creating a calculations chain. Having defined the differences, we can create .std aggregation across patients on top of it:

m["Diff.Std"] = tt.agg.std(m["Diff.Mean"], scope=tt.scope.origin(l["Patient_Id"]))

To compute the number of observations we will use atoti .count_distinct:

m["Number of observations"] = tt.agg.count_distinct(

We are ready to create a measure for the t-statistic now:

m["t-statistic"] = m["Diff.Mean"] / (
    m["Diff.Std"] / tt.sqrt(m["Number of observations"])

We will be comparing the t-statistic to the right tail critical value, and if it’s above the critical value, we will conclude that the data provides the evidence to reject the null hypothesis. Let’s load the 95% critical value into the cube.

Now, depending on the number of observations for each cell, we will pick a critical value and visualize it as a measure:

# Loading a "table" of critical values, 101 values in total.
m["t-critical values list"] = [t.ppf(0.95, d) for d in range(1, 101)] + [1.645]

# Computing degrees of freedom as the number of observations minus 1:
df = m["contributors.COUNT"] - 1

# Shifting the df by -1 to use as an index and look up critical value from the list:
df_as_index = df - 1

# If there're too many observations (more than 101), we'll cap it:
capped_df_as_index = tt.where(df_as_index > 100, 100, df_as_index)

# This measure will be looking up a critical value for the current scope:
m["t-critical"] = m["t-critical values list"][capped_df_as_index]

Finally, the “Test Result” measure is displaying whether the observed t-statistics is to the right from the critical value, i.e. there’s evidence that H0 can be rejected. It will visualize the result of t-test every time we expand and collapse data.

m["Test Result"] = tt.where(
    m["t-statistic"] > m["t-critical"], "H0 rejected", "Can't reject H0"

Let’s see a small illustration in the next section.

Applying the test interactively

Having defined the measures, we can apply them to any groups of data and apply the test interactively. The metrics are computed on-the-fly from the input data and can follow any sophisticated rules that we have defined.

We expand by the name of the drug, then by dosages, then by patient “Happy/Sad” group, and this is what we found out:

  • there’s an evidence that the Drug Alprazolam had impact on the memory scores (response time), while Sugar and Triazolam did not 
  • when we break down the Drug Alprazolam observations by dosage – we notice that only the dosages 2 and 3 result in statistically significant increase in the response time 
  • Neither Happy nor Sad memories can not compensate for the obstructive impact of the drugs on the memory recall abilities, which can be seen in the following view:

Alternative data sets

Let’s imagine that we obtained memory scores data using an alternative methodology, and want to compare the test results side-by-side with the original approach.

The good news is that we don’t need to recreate calculations and datastores, we just need to create a scenario using this one-liner:

observations_datastore.scenarios["Multicenter study data"].load_csv(

And then adding the “Source simulation” hierarchy onto the columns, we obtain a side-by-side comparison with the original data set:

You can upload as many versions of the data as you wish and use the auto created “Source Simulation” dimension to calculate existing measures on top of them and visualize side-by-side. Please refer to this post to read more about: Two easy ways to perform simulations in atoti.

Instead of conclusion

I invite you to test-drive atoti and reach out to me, I’ll be happy to walk you through using atoti for your own project.