NLP: Skeleton of a Supervised Training and Testing Mechanism Using Apple Stock News

Asad Malik
Analytics Vidhya
Published in
6 min readMar 31, 2020

--

Natural Language Processing is a branch of Machine Learning to come up with models to be able to extract certain metrics from text data. Most of the data that exists out there are in text format, hence there is a lot of value from a data science point of view to be able to extract value from that. There is an array of outputs that are currently possible with existing NLP technologies, the big ones being sentiment analysis, topic recognition, and text creation. In this essay, I will look to produce sentiment analysis scores on a specific domain of text using a base level trained Naive Bayes Model.

The starting point of any Machine Learning endeavor is data, you need features as well as labeled outputs to be able to train any model that you might select. I am actually going to correct myself and say that the starting point is not just data, but labeled data. If you are lucky enough to find a labeled data set on the specific type of text you are looking to evaluate, you’re in luck, you’re job is ~80% done. However, that is not the case for the vast majority of cases, labeled data isn’t really available for most types of text.

For the purposes of clearly differentiating the different parts that go into a project like this, I will structure it by asking myself questions that are constant through projects such as these and answer them on how I approached them.

What is the domain of text data that we are going to look at?

Let’s look at financial news headlines, more specifically news headlines about Apple stock. Some of the main reasons for this:

  1. There is plenty of data, it is something that is talked about a lot.
  2. The scope of the lingo that is used is relatively narrow. There is a set of terminology that is used to describe which way the stock is going to go. That is going to be helpful in terms of training the model. There are also repeated events that occur, so if we want to expand to some of the other NLP outputs such as topic recognition, we can add that as well.
  3. There are numerical values that come with stock so we can measure how the sentiment we derive from data does compare to those numerical values such as stock price and volume.

Here is a sample of what the headlines look like:

headlines sample

We have some additional data points about the headlines as well.

We have scraped this data from Google News. This was also a whole process, and I will highlight the scraping mechanism in a different post.

How do we label the data?

I wish there was an automated mechanism for this, and to a certain degree we can scale it in an automated way, but unless there is a pre-existing set of labeled data, it is a manual process. Even the likes of Stanford who have their own NLP infrastructure, have had community efforts where people have manually labeled sets of data.

1 = positive, 0 = neutral, -1 = negative

The above is a subset of news headlines from our set labeled with values. We label the full headline, and we look at the headline and try to extract the words that make the sentiment lean one way or another and label those as well. This will give us more than one option when we are training and testing and allow us to compare two different training approaches.

In terms of data set size, we have over 6000 unlabeled headlines and we have labeled and extracted keywords from 200 of those headlines. By doing this process, here is the set of positive and negative words we were able to extract.

We have a set of 253 words. Not a comprehensive set in the slightest but it is a starting point.

Now that we have our data what next?

Well, we need to figure out how to train and test the data.

Training:

We will use the most basic library for NLP, TextBlob for this. It is quite powerful but simplifies the sentiment analysis training and testing. There are many different options, the most common being NLTK to do this task, and that allows for a higher level of control over parameters, but this post is about simplicity and to that end TextBlob suits our needs the best.

In order to format to be able to train, TextBlob requires a list of tuples with the values of the tuple being the feature and a label of either pos or neg.

train_set = [('cuts', 'neg'),
('cash', 'pos'),
('highs', 'pos'),
('dropped', 'neg'),
('correction', 'neg'),
('highs', 'pos'),
('launch', 'pos')]
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train_set)

The above is a mini tuple that I included to show the format of the input argument. This is the 253-word set that I mentioned above in that format going in as the training data.

Testing:

In order to create a dataset to test our classifier against, I went back into our headline set of 6000, picked out 100 headlines at random and labeled them. Here is a subset of what that looks like.

labelled_data_test = [('Monster sues Apple’s Beats over headphone deal', 'neg'),
('Apple Pay to help drive multiyear expansion at VeriFone', 'pos'),
('Why millennials hate the stock market', 'neg'),
('Apple still on top as U.S. corporate cash holdings reach $1.73 trillion',
'pos'),
('Carl Icahn’s tweet worth over $8 billion for Apple investors', 'pos'),
('Apple’s stock to blame for more than half Dow’s drop', 'neg'),
('Apple confirms iPhone 6 Plus camera glitch', 'neg'),
('Dow closes down triple digits as stocks end one of worst first weeks ever',
'neg'),
('7 reasons Apple is a buy-—commentary\n', 'pos'),
('Tech giants face child labor storm', 'neg'),
('Adoption of Apple Pay slows, survey suggests', 'neg'),
('Trump calls for Apple boycott over San Bernardino killer phone encryption',
'neg'),
('Apple unveils 9.7-inch iPad Pro, 4-inch iPhone SE', 'pos')]

Here is what the output looks like:

So from the mini list of 102 headlines, our classifier predicted 87 headlines correctly. Which I don’t think is too bad given the minuscule amount of data we used for both training and test purposes. Here is a subset of our classifications:

headline, manual label, classifier label

Conclusion:

I think I was able to highlight the base level process of what it takes to train an NLP classifier for sentiment analysis to a certain degree. How can we improve this experiment, well the answer to that is simply more data.

Where can we take this?

Well, the possibilities are endless. I would love to explore some deep learning techniques but those require vast amounts of data, and the thought of sitting there and labeling data for hours is a thought that terrorizes me. I am interested in how this relates to some stock numerical values. It would be interesting to see if these classifications have a relationship to the stock price. If at the publication of positive or negative news there is a pattern with stock price or volume and if that can be of value to traders and other decision-makers. Once we get into analyzing that relationship, we can also look into topic detection to see if there are certain topics of news with a given sentiment that have an impact.

I will dig through my code and look to publish a github repo for this shortly.

--

--