Categories
Starting guides

Getting Started with atoti

atoti is a Python library and a JupyterLab extension to create data-viz widgets, such as pivot tables and charts in the notebook used to create the data model. In this article, we will step by step:

  • Explain how to install the library
  • Show how it changes the notebook workflow
  • Start analyzing data with a small use case.

Note: This article has been tested against atoti 0.3.1

Installation

To get started, install the atoti Conda package and its companion packages, create a new Conda environment, activate it, and run this shell command:

conda install atoti jupyterlab jupyterlab-atoti nodejs openjdk python

Note: This adds the atoti extension to JupyterLab which triggers a rebuild of JupyterLab’s front end assets so it can take several minutes to return.

Once this is done, you can start JupyterLab:

jupyter lab

Building and exploring the model

For the purpose of this guide, we’ll work on this popular Kaggle data set of trending YouTube video statistics. The goal will be to define some key metrics and create practical visualizations.

We start with some data prep with Pandas:

import pandas as pd

videos_df = pd.read_csv(
    "USvideos.csv",
    usecols=[
        "category_id",
        "channel_title",
        "title",
        "trending_date",
        "video_id",
        "views",
    ],
)

# Parse trending date and split it into year/month/day columns.
trending_date = pd.to_datetime(
    videos_df["trending_date"],
    format="%y.%d.%m"
)
videos_df["trending_date"] = trending_date.dt.date
videos_df["trending_year"] = trending_date.dt.year
videos_df["trending_month"] = trending_date.dt.month
videos_df["trending_day"] = trending_date.dt.day

videos_df.sample(5)
5 random lines of our videos DataFrame
5 random lines of our videos DataFrame
import json
from pathlib import Path

# Parse JSON file holding mapping between a category ID and its title, and make a DataFrame out of it.
category_data = json.loads(Path("US_category_id.json").read_text())
data = [
    [int(item["id"]), item["snippet"]["title"]]
    for item in category_data["items"]
]
categories_df = pd.DataFrame(data, columns=["id", "category_title"])

categories_df.head()
First five categories with their ID and title
First five categories with their ID and title

Now that the DataFrames are prepped, we can create the atoti analytical cube:

import atoti as tt

# An atoti session is a bit like PySpark context
session = tt.create_session()

# Load the Pandas DataFrames into the atoti session.
# The API also supports loading CSV files, Parquet files, Spark DataFrames, and soon Arrow Tables.
videos_store = session.read_pandas(
    videos_df,
    # These are the DataFrame's columns that make each row unique
    keys=["video_id", "trending_date"],
    store_name="videos"
)
categories_store = session.read_pandas(
    categories_df, keys=["id"], store_name="categories"
)

# Join the two stores together (this keeps the data normalized).
videos_store.join(categories_store, mapping={"category_id": "id"})

cube = session.create_cube(videos_store, mode="manual")

In this case we choose the manual cube creation mode to shape the cube later. By default however, the cube structure is inferred from the types of the stores’ columns.

We also create analytical hierarchies – extra available axes in pivot tables or charts:

# A channel has multiple videos and each video can be renamed so it can have multiple titles.
cube.hierarchies["video"] = [
    videos_store["channel_title"],
    videos_store["video_id"],
    videos_store["title"]
]

# The trending date can also be organized with multiple levels.
cube.hierarchies["trending_date"] = [
    videos_store["trending_year"],
    videos_store["trending_month"],
    videos_store["trending_day"],
    videos_store["trending_date"]
]

# The category hierarchy has a single level: the category title.
cube.hierarchies["category"] = [categories_store["category_title"]]
cube
The cube structure is returned as a JSON tree that JupyterLab displays nicely
The cube structure is returned as a JSON tree that JupyterLab displays nicely

From there, we can create visualizations to get a sense of the data set. The visualize method on Cube instances outputs an interactive widget that can be built with mouse & keyboard inputs – no code needed.

A widget showing that there are almost always 200 trending videos per day

A widget showing the 10 channels with the most accumulated trending days

Drilling down the trending_date hierarchy while showing the numbers of trending videos per category
 
We’ve created these widgets without defining any specific metrics but one of the strengths of atoti is for building a data model with aggregated indicators:

views_max = av.agg.max(videos_store["views"])
views_per_video = av.agg.single_value(views_max, on=["video_id"])
cube.measures["views"] = av.agg.sum(views_per_video)

Adding the views metric to cube.measures makes it directly available in the atoti JupyterLab extension:

Drilling down on the video hierarchy to see the most viewed channels and their corresponding videos
 

Let’s define another metric that will give us the aggregated distinct count of trending videos:

cube.measures["trending_videos"] = av.agg.count_distinct(
    videos_store["video_id"],
    on=["video_id"]
)

Sorting categories by amount of trending videos
 

Let’s make one more widget:

Plotting the amount of views Vs. amount of trending videos per channel

 

Sharing our insights

We can publish all the widgets we’ve built in JupyterLab in the atoti dashboarding app:

Publishing a widget to the app and opening it there
 

Widgets published in the app can be added to dashboards with additional features such as quick filters and filtering on multi selection:

Filtering a dashboard in the app by category and then by channel
 

The dashboarding application is a “safe” environment: all the queries are read-only so there is no risk of breaking the model or tampering with its data.

You can share a link to your atoti app to show it to other people.

If you would like to know more, head over to the documentation.