您的位置 > 首页 > 商业智能 > Predicting Movie Genres using NLP – An Awesome Introduction to Multi-Label Clas ...

Predicting Movie Genres using NLP – An Awesome Introduction to Multi-Label Clas ...

来源:分析大师 | 2019-04-22 | 发布:经管之家

I was intrigued going through this amazing article on building a multi-label image classification model last week. The data scientist in me started exploring possibilities of transforming this idea into a Natural Language Processing (NLP) problem.That article showcases computer vision techniques to predict a movie’s genre. So I had to find a way to convert that problem statement into text-based data. Now, most NLP tutorials look at solving single-label classification challenges (when there’s only one label per observation).But movies are not one-dimensional. One movie can span several genres. Now THAT is a challenge I love to embrace as a data scientist. I extracted a bunch of movie plot summaries and got down to work using this concept of multi-label classification. And the results, even using a simple model, are truly impressive.In this article, we will take a very hands-on approach to understanding multi-label classification in NLP. I had a lot fun building the movie genre prediction model using NLP and I’m sure you will as well. Let’s dig in!I’m as excited as you are to jump into the code and start building our genre classification model. Before we do that, however, let me introduce you to the concept of multi-label classification in NLP. It’s important to first understand the technique before diving into the implementation.The underlying concept is apparent in the name – multi-label classification. Here, an instance/record can have multiple labels and the number of labels per instance is not fixed.Let me explain this using a simple example. Take a look at the below tables, where ‘X’ represents the input variables and ‘y’ represents the target variables (which we are predicting):We cannot apply traditional classification algorithms directly on this kind of dataset. Why? Because these algorithms expect a single label for every input, when instead we have multiple labels. It’s an intriguing challenge and one that we will solve in this article.You can get a more in-depth understanding of multi-label classification problems in the below article:There are several ways of building a recommendation engine. When it comes to movie genres, you can slice and dice the data based on multiple variables. But here’s a simple approach – build a model that can automatically predict genre tags! I can already imagine the possibilities of adding such an option to a recommender. A win-win for everyone.Our task is to build a model that can predict the genre of a movie using just the plot details (available in text form).Take a look at the below snapshot from IMDb and pick out the different things on display:There’s a LOT of information in such a tiny space:Genres tell us what to expect from the movie. And since these genres are clickable (at least on IMDb), they allow us to discover other similar movies of the same ilk. What seemed like a simple product feature suddenly has so many promising options. We will use the CMU Movie Summary Corpus open dataset for our project. You can download the dataset directly from this link.This dataset contains multiple files, but we’ll focus on only two of them for now:We know that we can’t use supervised classification algorithms directly on a multi-label dataset. Therefore, we’ll first have to transform our target variable. Let’s see how to do this using a dummy dataset:Here, X and y are the features and labels, respectively – it is a multi-label dataset. Now, we will use the Binary Relevance approach to transform our target variable, y. We will first take out the unique labels in our dataset:Unique labels = [ t1, t2, t3, t4, t5 ]There are 5 unique tags in the data. Next, we need to replace the current target variable with multiple target variables, each belonging to the unique labels of the dataset. Since there are 5 unique labels, there will be 5 new target variables with values 0 and 1 as shown below:We have now covered the necessary ground to finally start solving this problem. In the next section, we will finally make an Automatic Movie Genre Prediction System using Python!We have understood the problem statement and built a logical strategy to design our model. Let’s bring it all together and start coding!We will start by importing the libraries necessary to our project:Let’s load the movie metadata file first. Use ‘\t’ as the separator as it is a
本文已经过优化显示,查看原文请点击以下链接:
查看原文:https://www.analyticsvidhya.com/blog/2019/04/predicting-movie-genres-nlp-multi-label-classification/

院校点评more

京ICP备11001960号  京ICP证090565号 365bet外围体育投注+体育在线投注网址+网上在线足球开户注册+经管之家【信誉网投】 论坛法律顾问:王进律师知识产权保护声明免责及隐私声明   主办单位:人大经济论坛 版权所有
联系QQ:2881989700  邮箱:service@pinggu.org
合作咨询电话:(010)62719935 广告合作电话:13661292478(刘老师)

投诉电话:(010)68466864 不良信息处理电话:(010)68466864