Top Twitter Topics by Data Scientists #31

Trending this week: Read about TLDR for extreme summarization of scientific documents; Use Lightning-Flash for fast prototyping of your deep learning solutions; How to correctly select a sample from a huge dataset.

Every week we analyze the most discussed topics on Twitter by Data Science & AI influencers.

The following topics, URLs, resources, and tweets have been automatically extracted using a topic modeling technique based on Sentence BERT, which we have enhanced to fit our use case.

Want to know more about the methodology used? Check out this article for more details, and find the codes in this Github repository!


This week, Data Science and AI influencers on Twitter have talked about:

  • ML Updates
  • ML How-Tos
  • Discussions on AI

ML Updates

This week, very interesting updates on machine learning were shared by the influencers.

Nige Willson has shared an amazing AI project that distills research papers into a single sentence. This project introduces a language-generation software that uses deep neural networks trained on vast amounts of text to provide a new form of extreme summarization, for scientific papers. This tool is an implementation of the paper “TLDR: Extreme Summarization of Scientific Documents”. It aims at helping to read quickly new scientific papers, and it can also help non-specialists who are reading complicated papers and struggling to find the gist. It is very to use, you can simply enter a paper’s abstract, the site will then generate a short summary.

On his side, Mike Tamir has shared a post Introducing Lightning Flash — From Deep Learning Baseline To Research in a FlashFlash is a collection of tasks for fast prototyping, baselining, and fine-tuning scalable Deep Learning models, built on PyTorch Lightning. This post gives a quick description of Flash and how it works. It also provides some illustrations of how to use it for inference, fine-tuning, and training from scratch.

Finally, KDnuggets has shared an article providing a few tips, tricks, and hacks for data scientists on using Streamlit, a free Python tool that turns data scripts into shareable web apps in minutes. Applying all these tips will definitely help you to boost the way you deliver your data science or machine learning solutions.

ML How-Tos

The data science and AI influencers have also been posting articles providing guidance on how to use machine learning to solve specific use cases.

ipfconline shared an article on How to Determine if Your Machine Learning Model is Overtrained. In this post, they tackle the trap of overfitting and provide advice on how to spot it using WeightWatcher, an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. WeightWatcher analyzes the weight matrices of a pre/trained DNN, layer-by-layer, to help you detect potential problems. Problems that can not be seen by just looking at the test accuracy or the training loss.

Fig. 2. Overtraining (Image from Unsplash)

KDnuggets shared:

An article on How (not) to use Machine Learning for time series forecasting. This article discusses some related issues when it comes to time series forecasting and machine learning, and how to avoid some of the common pitfalls. In particular, this article shows an example to learn how to spot trouble in time series data. Also, it discusses the difference between correlations and causality. Then, it gives a list of machine learning models for time series forecasting and explains how to train and evaluate them.

A post on How to correctly select a sample from a huge dataset in machine learning. This post explains how choosing a small, representative dataset from a large population can improve model training reliability. It provides some statistical approach to select a proper sample that can be statistically significant to represent the whole population. This may help us in machine learning because a small dataset can make us train models more quickly than a larger one, carrying the same amount of information.

Fig. 3. Samples selection from populations (Image from Unsplash)

Discussions on AI

This week data science and AI influencers shared a plethora of topics around AI.

Tamara McCleary shared an article introducing Artificial Intelligence and Machine Learning, exploring implications for the future.

Ipfconline has shared an article on What is Explainable AI, and How Does it Apply to Data Ethics. The article talks about the need for explainable AI, explainable AI in practice: self-driving cars, and finally the limitations and counterarguments.

They also shared a video discussing if AI can Threaten Human Jobs.

Nige Willson shared a very interesting and thought-provoking article on Why are so many scummy, scammy AI companies thriving?

He also shared an article on should AI be treated like humans. The article takes a deep dive into the ethics of artificial intelligence and why AI ethics are important to think about.

Finally, Marcus Borba shared an interesting article on Can Artificial Intelligence Give Thoughtful Gifts? The article is an exploration of the possibilities and limits of AI’s humanity.

Fig. 4. Robots handling gifts (Image from Indivstock)