Atoti: Working with dates in Python

What is the most problematic data type you have ever dealt with when working with data? I would say dates! Depending on the locale, dates come in different formats such as YYYY-mm-dd, d/m/YYYY, d-mmm-yy etc. Not to mention, sometimes it comes with timestamps and time zones!

We can let programs infer the date format or explicitly cast the data to date with a specific format e.g. in Python with Pandas DataFrame:

from datetime import datetime

import pandas as pd

custom_date_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S")
df = pd.read_csv(
    "data/events1.csv", parse_dates=["startDate1"], date_parser=custom_date_parser
)

What if we open the CSV file in Microsoft Excel, update it, and try to read it again? The above code snippet will throw out exceptions such as this:

That’s because Microsoft Excel usually automatically changes the dates into the d/m/yyyy or m/d/yyyy format (based on your locale).

Excel automatically changes the date format from yyyy-MM-dd to d/m/yyyy

Discovering dates in Atoti

So, how do dates behave in Atoti? First of all, Atoti supports the following types for dates:

  • LocalDate
  • LocalDateTime
  • LocalTime
  • ZonedDateTime

During table creation, we can pre-define the data types accordingly for each column:

ticker_hist = session.create_table(
    name="ticker_history",
    keys=["Datetime", "Ticker"],
    types={
        "Datetime": tt.type.LOCAL_DATE,
        "Ticker": tt.type.STRING,
        "Adj Close": tt.type.FLOAT,
        "Close": tt.type.FLOAT,
        "High": tt.type.FLOAT,
        "Low": tt.type.FLOAT,
        "Open": tt.type.FLOAT,
        "Volume": tt.type.FLOAT,
    },
)

Alternatively, when we create a table by reading from a source, Atoti either inherits the data types from the source’s metadata, as in the case of Pandas DataFrame or Parquet; or it infers the data type based on the first 1,000 lines of data.

ticker_hist = session.read_csv(
    "historical_returns/test-ticker.csv",
    table_name="ticker_history",
    keys=["Datetime", "Ticker"],
)

The default accepted format for dates is yyyy-MM-dd as shown below: 

This means that if our data is already in the mentioned format, Atoti can infer that it’s a local date type and upload it directly to the table. 

For all other date formats, we have to declare the type of the data column and its date pattern. Otherwise, it will assume it is of String type.

String type vs date type

There are some issues that we might encounter when our date is treated like a string. 

Display in chronicle order

Suppose we need our query to be ordered in chronological order and our date is of the format d/m/yyyy. Apparently, string sorting doesn’t go down well for this date format as shown below:

Date Bucketing

Now that we have our date data as date type, let’s look at a common use case around dates – bucketing.

Using the Atoti function create_date_hierarchy, we can create date bucketing with only a date column.

cube.create_date_hierarchy(
    "Date parts",
    column=ticker_hist["Datetime"],
    levels={"Year": "yyyy", "Quarter": "QQQ", "Month": "MMM", "Day": "dd"},
)

With this, you don’t have to pre-process the data to get separate columns for the Year, Quarter, Month and Day.

Note: Since the Atoti server is built using Java, the date patterns follow the datetime formatter pattern defined for Java.

Formatter patterns as provided in Oracle docs.

Declaring date-type data and date patterns

So, we have our date in the format d/m/yyyy. How do we load it as a date type into the Atoti table?

Declare the following for the data column:

  1. Type mapping – types={"Datetime": tt.type.LOCAL_DATE}
  2. Date pattern – date_patterns={"Datetime": "d/M/y"}

Remember, the date formatter pattern follows the Java formatter as provided in the table in the previous section.

ticker_hist = session.read_csv(
    "historical_returns/test-ticker.csv",
    table_name="ticker_history",
    keys=["Datetime", "Ticker"],
    types={"Datetime": tt.type.LOCAL_DATE},
    date_patterns={"Datetime": "d/M/y"},
)

Regardless of the source file date patterns, all displays in Atoti are standardized to the format yyyy-MM-dd. Thus, we can have a neat and consistent display across all our queries.

ZonedDateTime

Just one more example before ending this article. Let’s take a quick look at the historical data downloaded from yFinance

What we get is actually a ZoneDateTime:

This time round, we’ll declare the data type as ZONED_DATE_TIME. We declare the date pattern accordingly, including a lowercase z for the time zone. 

ticker_hist = session.read_csv(
    "historical_returns/*.csv",
    table_name="ticker_history",
    keys=["Datetime", "Ticker"],
    types={"Datetime": tt.type.ZONED_DATE_TIME},
    date_patterns={"Datetime": "yyyy-MM-dd HH:mm:ssz"},
)

We hope this article helps you understand dates in Atoti Python better. To see more examples of Atoti, check out the Atoti CE notebook gallery. Feel free to give it a try with the free Atoti Community Edition.

Latest posts

Understanding Logs in Atoti
From the default log to how to configure additional logging Application logs are extremely important in any system! Most commonly, they are used to troubleshoot any issue that users may encounter while using an application. For instance, developers use them for debugging and the production support crew uses them to investigate outages. Not just that, in production, they are used to monitor an application’s performance and health. For instance, monitoring tools can pick up certain keywords to identify events such as “server down” or “system out of memory”. It can also serve as an audit trail to track user activity...
Understanding conditional statements in Atoti
When do we use filter, where and switch statements? We know that we can perform aggregations and multi-dimensional analysis with Atoti. Aggregation is not always as simple as 1 + 2. Sometimes we end up in an “if…else” situation, and that is where conditional statements come in. Let’s explore some examples with the XVA use case from the Atoti CE Notebook Gallery. Some definitions before moving on: Measures – we refer to metrics or quantifiable data that measure certain aspects of our goals. In the code snippets below, they are represented by measure[<name of metrics>]. Members of a level –...
What’s New in Atoti Python 0.7.3?
Python 3.11 Compatibility, an Updated Way to Define Joins, Telemetry Changes Ahead. Hey Atoti Community! We’re super excited to announce our Atoti Python 0.7.3 release. Wondering what we have in store for you? While breaking nothing this release, we have some great updates ahead. Read on to see some of the changes we’ve implemented, or check out our changelog for the full list of updates. Added Python Compatibility Excited by the promising performance improvements in Python 3.11, but bummed Atoti wasn’t compatible with it? Well, fret no more! With 0.7.3, Atoti is now compatible with Python 3.11. First and Last...

Join our Community


    Like this post ? Please share

    Documentation
    Information
    Follow Us

    atoti Free Community Edition is developed and brought to you by ActiveViam. Learn more about ActiveViam at activeviam.com.