“Open Source is a win-win for everyone” - Philip Vollet

‘One of the benefits of growing an open-source product is the community around it. They contribute to the code and grow their skills together.’

I have now interviewed quite a few inspiring members of the data science community, and I have noticed that they all have one thing in common: they are not afraid to take a break from their corporate jobs and spend their time exploring what they like.

Philip Vollet

Philip Vollet is one of them. We had a chat to talk about his career path, his passion for communication and NLP, and his participation in the open-source community.

Huifang: What did you learn from your first experience at MCL IT Solutions — from the junior technical role to the managerial position?

Philip: My junior roles required a lot of technical knowledge, for example about the frameworks we were using in the Operations team. However, people’s skills were also central — to manage the team, to communicate with clients and deal with the challenges we were facing. Communication is part of my major and it’s also one of my favourite subjects.

That is probably why, with time, I gravitated towards a role that allows me to use my communications skills more, while also integrating all of the technical knowledge I have acquired during the years.

Huifang: What do you think about the importance of communication, even in the data and technology community?

Philip: It’s essential. There are different steps to implement new technologies and approaches and there is always some degree of resistance from stakeholders. It almost always boils down to stakeholder management — it’s about understanding the covert interests and communicating the big vision behind a project right.

“People function better with stories — and technology implementations are all about telling a good story”

Huifang: What drove you to start your own consultancy company, the moonwalk venture, and what have you learnt from that experience?

Philip: I needed a break from corporate games. moonwalk GmbH was a small freelancing business that gave me more time to study and learn new stuff. For five years, I got to choose the projects that I wanted to work on and where to focus my attention.

I was freelancing for a KPMG project when they offered me a role there. At that point, I was deciding whether to hire new people and grow my business or to go back to a bigger company.

All freelancers struggle with income stability from time to time — you never know how many projects you will work on next month, maybe many, maybe none.

So I decided to give KPMG a go.

Huifang: What are you working on at KPMG?

Philip: It’s a mix between data science and data engineering. We are the stakeholders of the internal data and we build data modulation pipelines with machine learning to analyze the internal KPMG data. We act as a filtering layer for projects that need an API or our internal data. We also build KPIs for our management.

Huifang: Do you have any tips for someone new to this industry — transitioning from technical roles into the data science arena?

Philip: My role in KPMG is mostly about data and how to train the model. However, looking at our machine learning projects, it boils down to data engineering, which keeps growing in importance.

Overall, companies need two types of data specialists — the data science people and the engineering people. Data is the golden nugget for both, so people should really study how data pipelines work to understand where they fit and what they need to learn to progress in the field.

It’s also good to stay up-to-date with research on the state-of-the-art and the latest implementations in machine learning. I do that regularly through my LinkedIn hobby and my work at KPMG as a mentor for students writing their theses. I am reading a lot of academic papers and keeping track of what’s new in the industry.

Another interesting point to note is that people approaching the data community often hide themselves behind their math anxiety. However, you really don’t need to be a math expert to enter the world of machine learning. It’s enough to understand how math works.

Math is about logic — you need to understand the concept rather than being a pro at linear algebra.

Huifang: Did you fall in love with NLP at KPMG?

Philip: NLP is an old lover from 2016, when I was heading into machine learning. However, it was mainly statistical machine learning at the time. It’s a growing field and there’s so much more in store.

As part of the job at KPMG, I supervise academic theses. At the moment, we are writing about supplement reviews and adverse drug effects, using NLP to analyze the data. There are also fun projects such as training chatbots to tell jokes.

Huifang: What’s the relationship between NLP and open source/open science?

Philip: Big companies have big machine learning models in place and spend millions of dollars to train them, but they are always in need of open source and open science to develop.

One of the benefits of growing an open-source product is the community around it. They contribute to the code and grow their skills together, so it’s a win-win for everyone involved.

Brainsources for #NLP enthusiasts
Brainsources for #NLP enthusiasts maintained by Philip

Open-source communities are super supportive, you get help when you need it. We also need to watch out for some aspects of open source — for instance how cloud providers adopt open-source software without adding value or supporting future development.

Huifang: How would you encourage people to contribute to open source?

Philip: Make them curious. If they are interested in the technology, they are digging into it. It’s a journey that you have to start. Don’t hold back thinking that my code isn’t good or not working. I also started as a script kid — and now, we are deploying big machine learning pipelines!

Follow Philip on Twitter and LinkedIn for exciting updates for NLP and machine learning.

Latest posts

Understanding Logs in Atoti
From the default log to how to configure additional logging Application logs are extremely important in any system! Most commonly, they are used to troubleshoot any issue that users may encounter while using an application. For instance, developers use them for debugging and the production support crew uses them to investigate outages. Not just that, in production, they are used to monitor an application’s performance and health. For instance, monitoring tools can pick up certain keywords to identify events such as “server down” or “system out of memory”. It can also serve as an audit trail to track user activity...
Atoti: Working with dates in Python
What is the most problematic data type you have ever dealt with when working with data? I would say dates! Depending on the locale, dates come in different formats such as YYYY-mm-dd, d/m/YYYY, d-mmm-yy etc. Not to mention, sometimes it comes with timestamps and time zones! We can let programs infer the date format or explicitly cast the data to date with a specific format e.g. in Python with Pandas DataFrame: What if we open the CSV file in Microsoft Excel, update it, and try to read it again? The above code snippet will throw out exceptions such as this:...
Understanding conditional statements in Atoti
When do we use filter, where and switch statements? We know that we can perform aggregations and multi-dimensional analysis with Atoti. Aggregation is not always as simple as 1 + 2. Sometimes we end up in an “if…else” situation, and that is where conditional statements come in. Let’s explore some examples with the XVA use case from the Atoti CE Notebook Gallery. Some definitions before moving on: Measures – we refer to metrics or quantifiable data that measure certain aspects of our goals. In the code snippets below, they are represented by measure[<name of metrics>]. Members of a level –...

Join our Community

    Like this post ? Please share

    Follow Us

    atoti Free Community Edition is developed and brought to you by ActiveViam. Learn more about ActiveViam at activeviam.com.