Top Twitter Topics by Data Scientists #29

Trending this week: Use cuDF to implement Pandas on GPUs; 12 amazing features selection methods; How GPT-3 and AI will destroy the internet.

Every week we analyze the most discussed topics on Twitter by Data Science & AI influencers.

The following topics, URLs, resources, and tweets have been automatically extracted using a topic modeling technique based on Sentence BERT, which we have enhanced to fit our use case.

Want to know more about the methodology used? Check out this article for more details, and find the codes in this Github repository!


This week, Data Science and AI influencers on Twitter have talked about:

  • ML & Data Science How-Tos
  • AI Uses Cases
  • AI Discussions

ML & Data Science How-Tos

This week, AI & data science influencers have shared amazing tips and tricks to be used in any data science or machine learning project.

Ronald Van Loon has shared a post presenting various features selection approaches for machine learning. This post focuses on the case of a classification problem and provides 12 features selection methods that could be classified into 3 types, depending on how they interact with the classifier, namely: filter, wrapper, and embedded methods. These methods range from very basic to statistically advanced ones. This article introduces methods like Chi-square, mutual information, Anova, recursive feature elimination, permutation importance, SHAP, embedded Random Forest, and embedded LightGBM amongst others.

Dr. Georgina Cosma has shared an article detailing how to Build an Article Recommendation Engine With AI/ML. This post shows how to create an article recommendation engine using Pinecone SDK, a similarity search service, to find the more relevant articles to suggest to users. This application is built using Python 3.9+, Flask 2.0+, and Pinecone. This post also provides a demo of the application with its interactive user interface and explanations on how it works. The Python and Flask code of the demo is provided along with the inner workings of the app.

KDnuggets has shared have published an article talking about a few good alternatives to processing larger and faster data in Python. This post introduces interesting alternatives to Pandas to improve your large data sets handling performance in order to overcome some existing limitations that can impact efficiency. This blog revolves around handling tabular data in CSV format and processing it with Pandas and some alternatives like cuDF, dask, modin, and datatable. The use case presented here consists of the exploration of a sales data record of 5 million rows and 14 columns. This post explains how the cuDF library aims to implement the Pandas API on GPUs, and Modin and the Dask Dataframe library provide parallel algorithms around the Pandas API.

AI Uses Cases

This week, data science influencers have shared articles on the applications of artificial intelligence.

Nige Willson has shared an article exhibiting how AI-designed road trips beat the ones planned by humans. Generative Pre-trained Transformer 3 (GPT-3) was used to plan road trips with eight different goals, including best overall road trip, best road trip for couples, best cross-country road trip, best coastal road trip, best road trip for foodies, best summer road trip, best road trips for views, and, lastly, best road trip for visiting national monuments. Here are the results:

Marcus Borba has shared an article on how airlines use AI to streamline operations, and save fuel. The article talks about how big airlines like Air France, Air Alaska used AI to streamline operations following the idea of AI in aviation which was presented at the Singapore Air Show.

Nando De Freitas has shared an article diagnosing DeepMind’s AlphaFold artificial intelligence success and what’s the real significance for protein folding research and drug discovery. The article talks about The Critical Assessment of protein Structure Prediction (CASP) ‘competition’ and how it has been received by the scientific community. It also talks about the benefits of AlphaFold’s predictive power on drug discovery.

AI Discussions

Again, the AI & data science influencers have also shared some posts discussing the benefits and risks related to the use of artificial intelligence.

Nige Willson has shared:

An article talking about How AI-Powered Tech Landed A Chicago Man In Jail With Little Evidence. this post explains how Michael Williams, a 65-year-old American citizen, was falsely accused of murder last August and spent a whole year in jail. How did it happen? Prosecutors used dubious data from ShotSpotter, a gunshot detection system, to build a murder case against him. But, ShotSpotter can miss live gunfire right under its microphones. It often misclassifies fireworks or cars backfiring as gunshots, too, making it a faulty system that can easily lead to false accusations, and in this case, arrests. Even worse, ShotSpotter employees often have opted to change the location of where the gunshot sounds came from after the fact, introducing an inherent human bias. The sensors are also often disproportionately set up in Black and Latino communities. So, in the end, a judge dismissed the case against him last month at the request of prosecutors, who said they had insufficient evidence.

Also, a post talking about How GPT-3 and Artificial Intelligence Will Destroy the Internet. This article questions: What if you could produce 10x the amount of content at 10x cost savings, what would you do? Even if the content were mediocre would you still be tempted to take advantage of the ability to throw content against the wall and see what sticks? What would that mean for websites, link farms, private blog networks, link builders, SEOs, and search engine algorithms? What would it mean for quality, believable, original content? All these questions emphasize the risk introduced by the use of GPT-3 with its ability to generate content at massive scale. The post concludes by saying that when anyone can create content at scale with little to no cost, then the only thing that will differentiation in the future will be the quality. As OpenAI’s suggested, strict controls should be placed on the quantity and purpose of the content produced by GPT-3. Otherwise, we would have much more of much less when it comes to written content on the web.

Finally, An optimist’s guide to the future: How quantum AI could make Earth a paradise. This post explains how the use of AI combined with quantum computing will change our lives in various domains. This post exposes more or less fictitious scenarios where AI used with quantum computers could help for solving disease, eradicating hunger, and mitigating our ability to harm each other. Which would go a long way towards increasing our collective life-spans.