Top Twitter Topics by Data Scientists #34

Trending this week: Boost your NLP projects using FLAN by Google AI; Use Maze for Solving real-world reinforcement learning problems; Perform efficient similarity search with FAISS by Facebook AI Research.

Every week we analyze the most discussed topics on Twitter by Data Science & AI influencers.

The following topics, URLs, resources, and tweets have been automatically extracted using a topic modeling technique based on Sentence BERT, which we have enhanced to fit our use case.

Want to know more about the methodology used? Check out this article for more details, and find the codes in this Github repository!


This week, Data Science and AI influencers on Twitter have talked about:

  • ML Updates
  • AI Discussions
  • Data Science How-Tos

ML Updates

This week, data science and AI influencers have shared some really interesting updates on machine learning.

KDnuggets have shared a post introducing the Fine-tuned LAnguage Net (FLAN), a more generalizable language models with instruction fine-tuning. FAN is a simple technique calledinstruction fine-tuning, or instruction tuningfor short, introduced by Google Research. It involves fine-tuning a model not to solve a specific task, but to make it more amenable to solving NLP tasks in general. The FLAN model is not the first to train on a set of instructions, but Google Research state that they are the first to apply this technique at scale and show that it can improve the generalization ability of the model.

Mike Tamir has shared:

tutorial that gets you started with Facebook AI Similarity Search (FAISS), a library — developed by Facebook AI — that enables efficient similarity search. Given a set of vectors, you can index them using Faiss, then using another vector (the query vector), you can retrieve the most similar vectors within the index. This article explores some of the options FAISS provides, how they work, and — most importantly — how Faiss can make your search faster.

Am article introducing Maze for Applied Reinforcement Learning for Real-World ProblemsMaze is a new framework for applied reinforcement learning (RL). Beyond introducing Maze, this blog post also outlines the motivation behind it, what distinguishes it from other RL frameworks, and how Maze can support you when applying RL — and hopefully prevent some headaches along the way.

AI Discussions

The data science and AI influencers have also shared articles discussing artificial intelligence.

Andreas Staub has shared an article on general intelligence and the evolution of language. The article talks about the consequence of humans trading in nouns is that they spend an inordinate amount of effort attempting to understand the design of natural things without understanding why that thing emerged.

He also shared an article on why Artificial Intelligence is not a technological revolution. The article talks about how it is something able to use cognitive capabilities in order to perform certain tasks faster than we can. So it’s not a tool; it’s an artificial extension of our brain.

Ronald Van Loon has shared an interesting article on what are the best programming languages for artificial intelligence. In addition to Python, Java, C++, the article also mentions the likes of Prolog and LISP.

Finally, Marcus Borba has shared a super interesting article on Artificial intelligence is smart, but does it play well with others. The article cites studies that mentions that humans find AI to be a frustrating teammate when playing a cooperative game together, posing challenges for “teaming intelligence”.

Data Science How-Tos

Useful practical tips to be applied in data science and machine learning projects have been shared by data science and AI influencer.

Here is a collection published by KDnuggets:

An article presenting two better options than Pandas for Big Data processing. This post discusses about Dask and Vaex, two Python libraries for processing bigger than memory datasets, as being faster and better alternative than Pandas for dealing with big data analysis. It introduces the two libraries, and presents an experiment, based on large CSV files with 1 million rows and 1000 column (18 GB and 36 GB), that demonstrates their effectiveness, then finally the post end up comparing these two solutions.

An article talking about how to transform ETL pipelines into ELT ones. This article, more related to data engineering, provides very insightful advices on dat governance. It explains what are ELT and ETL data pipelines, and presents how the author redesigned over 100+ ETL to ELT pipelines transformed in his organization, then it also go through the reasons they did it.

guide for Dealing with Data Leakage. This post talks about target leakage and data leakage, two challenging problems in machine learning, and it gives hints on how to recognize and avoid these potentially messy problems.