What is the most problematic data type you have ever dealt with when working with data? I would say dates! Depending on the locale, dates come in different formats such as YYYY-mm-dd, d/m/YYYY, d-mmm-yy etc. Not to mention, sometimes it comes with timestamps and time zones!
We can let programs infer the date format or explicitly cast the data to date with a specific format e.g. in Python with Pandas DataFrame:
from datetime import datetime
import pandas as pd
custom_date_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S")
df = pd.read_csv(
"data/events1.csv", parse_dates=["startDate1"], date_parser=custom_date_parser
)
What if we open the CSV file in Microsoft Excel, update it, and try to read it again? The above code snippet will throw out exceptions such as this:
That’s because Microsoft Excel usually automatically changes the dates into the d/m/yyyy or m/d/yyyy format (based on your locale).
Discovering dates in Atoti
So, how do dates behave in Atoti? First of all, Atoti supports the following types for dates:
- LocalDate
- LocalDateTime
- LocalTime
- ZonedDateTime
During table creation, we can pre-define the data types accordingly for each column:
ticker_hist = session.create_table(
name="ticker_history",
keys=["Datetime", "Ticker"],
types={
"Datetime": tt.type.LOCAL_DATE,
"Ticker": tt.type.STRING,
"Adj Close": tt.type.FLOAT,
"Close": tt.type.FLOAT,
"High": tt.type.FLOAT,
"Low": tt.type.FLOAT,
"Open": tt.type.FLOAT,
"Volume": tt.type.FLOAT,
},
)
Alternatively, when we create a table by reading from a source, Atoti either inherits the data types from the source’s metadata, as in the case of Pandas DataFrame or Parquet; or it infers the data type based on the first 1,000 lines of data.
ticker_hist = session.read_csv(
"historical_returns/test-ticker.csv",
table_name="ticker_history",
keys=["Datetime", "Ticker"],
)
The default accepted format for dates is yyyy-MM-dd as shown below:
This means that if our data is already in the mentioned format, Atoti can infer that it’s a local date type and upload it directly to the table.
For all other date formats, we have to declare the type of the data column and its date pattern. Otherwise, it will assume it is of String type.
String type vs date type
There are some issues that we might encounter when our date is treated like a string.
Display in chronicle order
Suppose we need our query to be ordered in chronological order and our date is of the format d/m/yyyy. Apparently, string sorting doesn’t go down well for this date format as shown below:
Date Bucketing
Now that we have our date data as date type, let’s look at a common use case around dates – bucketing.
Using the Atoti function create_date_hierarchy
, we can create date bucketing with only a date column.
cube.create_date_hierarchy(
"Date parts",
column=ticker_hist["Datetime"],
levels={"Year": "yyyy", "Quarter": "QQQ", "Month": "MMM", "Day": "dd"},
)
With this, you don’t have to pre-process the data to get separate columns for the Year, Quarter, Month and Day.
Note: Since the Atoti server is built using Java, the date patterns follow the datetime formatter pattern defined for Java.
Declaring date-type data and date patterns
So, we have our date in the format d/m/yyyy. How do we load it as a date type into the Atoti table?
Declare the following for the data column:
- Type mapping –
types={"Datetime": tt.type.LOCAL_DATE}
- Date pattern –
date_patterns={"Datetime": "d/M/y"}
Remember, the date formatter pattern follows the Java formatter as provided in the table in the previous section.
ticker_hist = session.read_csv(
"historical_returns/test-ticker.csv",
table_name="ticker_history",
keys=["Datetime", "Ticker"],
types={"Datetime": tt.type.LOCAL_DATE},
date_patterns={"Datetime": "d/M/y"},
)
Regardless of the source file date patterns, all displays in Atoti are standardized to the format yyyy-MM-dd. Thus, we can have a neat and consistent display across all our queries.
ZonedDateTime
Just one more example before ending this article. Let’s take a quick look at the historical data downloaded from yFinance.
What we get is actually a ZoneDateTime:
This time round, we’ll declare the data type as ZONED_DATE_TIME
. We declare the date pattern accordingly, including a lowercase z
for the time zone.
ticker_hist = session.read_csv(
"historical_returns/*.csv",
table_name="ticker_history",
keys=["Datetime", "Ticker"],
types={"Datetime": tt.type.ZONED_DATE_TIME},
date_patterns={"Datetime": "yyyy-MM-dd HH:mm:ssz"},
)
We hope this article helps you understand dates in Atoti Python better. To see more examples of Atoti, check out the Atoti CE notebook gallery. Feel free to give it a try with the free Atoti Community Edition.