Automated data sampling for fast application modeling

Having a snappy, interactive experience is very important when modeling an Atoti application. In order to provide…

Romain Colle
March 18, 2020
Scroll to read the aricle

Automated data sampling for fast application modeling

Having a snappy, interactive experience is very important when modeling an Atoti application. In order to provide such an experience, Atoti can automatically sample your data during the modeling phase, and seamlessly load the full data set when publishing the application to its users.

Photo by Guillaume Jaillet on Unsplash

One of Atoti top features is that it can provide speed-of-thoughts analytics on very large volumes of data. Some projects load multiple terabytes of data in memory on large machines with hundreds of cores and can enjoy sub-second query response time thanks to our high-performance, multi-core columnar database.

Nonetheless, loading such an amount of data during the modeling phase of the application is rarely a good idea. It requires a large, expensive machine and even though Atoti excels at loading data quickly into memory (a few minutes per terabyte), there is no reason to waste time doing so when modeling can be performed very efficiently on a subset of the data.

People, therefore, model their applications either on their personal computers or on a cheap machine in the cloud. In order to do so, they used to extract a sample of their production data to model their application in a Jupyter notebook using this sample. Once this was done, they had to change their code to point to the actual data, sometimes encountering unforeseen issues, verifying that their model still matched the data before finally being able to deploy their application to their users.


To ease and speed up this process, we have incorporated an automated sampling mechanism in the latest version of Atoti. When modeling your application, you can write it using the actual production data and the library will automatically sample your data and load a subset of it. This can be configured when creating the session: https://docs.atoti.io/0.3.1/lib/atoti.html#atoti.create_session

Loading a subset of the production data ensures that the modeling phase is very snappy and that the created code will not need to change to handle the full data set. Once the application is ready to be consumed, the full data set is loaded when calling session.load_all_data(): https://docs.atoti.io/0.3.1/lib/atoti.html#atoti.session.Session.load_all_data

Join our Community

Join our thriving article community and start sharing your insights today!

Like this post? Please share

Latest Articles

View all

Retail Banking Analytics with Atoti

Make smarter decisions by analyzing consumer credit cards and risk profiles. Retail banking, otherwise known as consumer...

Putting Python Code in Production

Python is great for research, but can you really use it for production quality projects? Python, once...

Changes ahead: Atoti Server 6.0-springboot3

ActiveViam is upgrading version 6.0 of Atoti Server from Spring Boot 2 to Spring Boot 3 to...
Documentation
Information
Follow Us

Atoti Free Community Edition is developed and brought to you by ActiveViam. Learn more about ActiveViam at activeviam.com

Follow Us