Categories
Articles

Top Twitter Topics by Data Scientists #32

Trending this week: NLP can take your business to the next level; 5 things you need to know about sentiment analysis; Choose the right clustering algorithm for your data.

Every week we analyze the most discussed topics on Twitter by Data Science & AI influencers.

The following topics, URLs, resources, and tweets have been automatically extracted using a topic modeling technique based on Sentence BERT, which we have enhanced to fit our use case.

Want to know more about the methodology used? Check out this article for more details, and find the codes in this Github repository!

Overview

This week, Data Science and AI influencers on Twitter have talked about:

  • Insightful NLP Use Cases
  • ML How-Tos
  • Amazing AI Use Cases

Insightful NLP Use cases

Natural language processing (NLP) has been one of the topics among the most posted on Twitter this week. AI & data science influencers have shared some insightful examples of the use of NLP in various industries.

Ronald Van Loon has shared an article on NLP that explains How This Technique Can Take Your Business to the Next Level. This post states that NLP is one ML technique that stands out as a focus of recent adoption in the data center due to its unique capabilities to analyze unstructured text data. It focuses on the company Services Express who uses this technique to improve the processes of their IT department and diagnose hardware failures in their data center. It also explains how Service Express’s Data Science team retrains their NLP model monthly to learn from their evolving service ticket database, improving its real-time accuracy. Finally, this article provides 5 steps to follow to jumpstart your NLP model.

KDnuggets have shared a series of articles on NLP. Here are a few interesting ones:

A post providing a Top 14 Use Cases of Natural Language Processing in Healthcare. This article is about the most effective uses and role of NLP in healthcare corporations, including benchmarking patient experience, review administration and sentiment analysis, dictation and the implications of EMR, and, lastly, predictive analytics. It explains how healthcare organizations can leverage NLP to make the best use of unstructured data. In particular, how this technology facilitates providers to automate the managerial job, invest more time in taking care of the patients, and enrich the patient’s experience using real-time data.

Another article providing a practical guide to build a content-based movie recommender model based on NLP. This post presents, firstly, the two types of recommender systems: collaborative and content-based filters. Then, it gives a detailed example of how to implement a content-based recommender system based on the dataset IMDB, one of the top 250 English movies downloaded from data.world. This is a simplification of what is going under the hood of the most sophisticated recommender systems.

ML How-Tos

Several insightful posts have been shared on machine learning (ML). They are providing some tips and packages quite useful to deal with various ML problems.

KDnuggets have posted the following series of articles:

A post presenting Four Techniques for Outlier Detection. This article focuses on four of the most frequently used — traditional and novel — techniques for outlier detection, and they show how you can implement them in using the KNIME Analytics Platform. Explanations are given on the following four methods: Numeric Outlier, Z-ScoreDBSCAN, and Isolation Forest, then their implementation in a KNIME workflow is provided. The dataset they used to test and compare the proposed outlier detection techniques is an airline dataset, the objective of the analysis is to find outliers airports in the early arrival direction. The results are presented in a map at the end of the article.

Fig. 3. An unusual observation (photo from Istock Photo)

A second article dealing with Choosing the Right Clustering Algorithm for your Dataset. This post talks about four basic clustering algorithms: Hierarchical Clustering, k-Nearest Neighbors (kNN), Gaussian Mixture Models (GMM), and DBSAN. It exposes the pros and cons of each of these algorithms that must be considered if you’re striving for a tidy cluster structure. It also helps understand how to choose the one that best fits your problem.

Fig. 4. Dendrogram

Then, an article that talks about 5 Things You Need to Know about Sentiment Analysis and Classification. This post takes a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics, and how to visualize the results. It gives the main sources to consider to train a sentiment analysis model, how to process the data, how to classify sentiment, and which evaluation metrics you should consider.

Amazing AI Use Cases

This week, data science and AI influencers have shared some really interesting updates on machine learning.

Nige Willson shared an interesting article on how Valencia crushed Covid19 with AI. The article talks about an MIT researcher who has made Valencia a Covid-19 data pioneer by leveraging algorithms and unorthodox data sources. He also shared an article which talks about AI techniques that make sure that the data centers are always up and running.

He also talks about an article that raises a question: isn’t it time for an artificial intelligence reality check? The article talks about a realistic assessment of the problems that rank-and-file computer scientists wrestle with every day — namely, the problem of intelligence.

The above article was also shared by Marcus Borba. In addition, he shared an article on how artificial intelligence in the film industry is sophisticating production.

Finally, Pascal Bornet shared a link to his LinkedIn post on Intelligent Automation Newsletter. It talks about the latest news in intelligent automation and some of the upcoming intelligent automation events.