Skip to main content

Domain Research - Stock Market Prediction

Hi, as part of my research on a domain of Big Data implementation, I chose Stock Market Prediction. Here I present to you the things that I have learned during my research in the domain.

Can stock market be predicted?

Early researches on stock market prediction revolved around whether it could be predicted. One of such researches suggested that “short term stock price movements were governed by the random walk hypothesis and thus were unpredictable”. Another stated that “the stock price reflected completed market information and the market behaved efficiently so that instantaneous price corrections to equilibrium would make stock prediction useless.” In simple terms, the researches inferred that since the market was affected by a lot of factors which were random predicting the stock market is almost impossible. However, researches carried out later (Brown & Jennings 1998; Abarbanel & Bushee 1998) made use of a variety of methods to derive future price information. One of the methods used financial ratios, earning, and management effectiveness to derive the stock price movements whereas the other derived the trends of stock prices and trading volumes from historical prices and volumes.
Compared to predicting stock market using structured data (such as price, trading volume, etc.). it would be more difficult to predict stock price movements based on unstructured data. Unstructured data could be news articles (printed or online), posts on social media, financial reports of companies which contain textual and numerical data as well. Such unstructured data can be used to analyze what the market feels about a stock. This analysis of “sentiments” of the market can then be used to predict the stock price movements.
Here, I talk about both ways of stock market prediction – using unstructured data and structured data.

Using unstructured data – Sentiment Analysis

Introduction to Sentiment Analysis

In the simplest terms, sentiment analysis tries to extract the emotion or 'feeling' of a body of text. Sentiment analysis attempts to derive intelligent information about how a person feels about a product or an issue using raw textual data (from the internet).
Increased use of Social networking sites like Facebook, Twitter, etc., have allowed people to express their opinions and views about a lot of topic ranging from news, movies, events and so on, relating to products. Business analysts have been using these opinions to mine for feedback by classifying them as positive, negative or neutral opinions. Such kind of information obtained from social media is beneficial for businesses. The tool that is used here is Sentiment Analysis.
As explained above, Sentiment Analysis tries to extract intelligent information about an object or a topic from a person’s opinions. Thus, it is all about trying to understand the gist of an opinion text. And since language can be very complex for even the human brain, sentiment analysis does have challenges. But we’ll get to it later. Let me now talk about the techniques used for Sentiment Analysis.

Sentiment Classification Methodologies - Bag of Words and NLP

There are a lot of various approaches towards Sentiment Analysis. A classification of Sentiment Analysis methodologies is shown in the following figure.
[caption id="attachment_28" align="aligncenter" width="650"]sentiment-analysis-methods Fig: Sentiment Analysis Classification Methodologies [1][/caption]
While all the above techniques are all usable, all of these basically boil down to the following models:
  1. "Bag of Words" Model:
In this model, we’re basically trying to create a “bag of words”. This model breaks the text into a collection of words, gives weights to them and finally uses those weights to determine whether the text had positive, negative or neutral sentiment. This model, thus, puts more emphasis on the words rather than the context. It does not try to understand the language. It is a huge drawback of this model.
  1. Using Natural Language Processing, and the attempt to truly "understand" the text:
Contrasting to the previous model, using NLP, the model tries to understand the language – the sentence structures, context. The text is read as a string of words rather than individual words. Identifying parts of speech, named entities, context and so on are implement using NLP techniques.
Now, let us look at a model of Sentiment Analysis used for Stock Market Prediction.

A Model of Sentiment Analysis in Stock Market Prediction

[caption id="attachment_26" align="aligncenter" width="366"]model-sentiment-analysis 
Fig: Model implementing Sentiment Analysis for Stock Price Prediction [2][/caption]
The above model implements the stock market prediction using Sentiment Analysis. This model proposed by Rajput and Bobde (2016) collects the data from different sources including social networking sites, news articles, etc., and processes the data to make it generalized.
Data collected or fetched from various sources undergo through various processes such as:
The final score that is calculated is used to classify post’s sentiments as positive, negative or neutral. This information is then used to determine the sectors (industries) which can be affected by the post. Then a comparison of keywords from the posts with sector specific dictionaries is performed. This is sort of a cross checking process. If found some keywords in dictionaries, all stocks from that sector are checked further for analyzing if those stocks will show some movement with reference to post related to that sector.

How Accurate can Sentiment Analysis be?

[caption id="attachment_29" align="aligncenter" width="554"]sentiment-results 
Fig: Results of Stock Price Prediction by Sentdex [3][/caption]
Above figures shows a stock market prediction performed by Sentdex. The graphs show the predicted prices using the greens and actual prices using the dark blues. The one on the right is close to accurate but the one on the left is far away from it. Sentdex explains that sentiment analysis for stock market prediction is about 80% accurate currently.
The question whether we will ever get close to 100% accuracy in sentiment analysis currently has a negative answer as linguistics is still a very complex area for even the human mind. Language differs from place to place and person to person. To be able to achieve such accuracy is thus seems almost impossible right now.

Using structured data - Clustering/Classification Algorithms

There is a lot of structured data available in the domain of stock market. Historical price data, company specific information and daily data can all be used. Below, I talk about a model that uses structured data and clustering algorithms for stock market prediction.

A Model of Stock Market Prediction using Clustering/Classification

The following models (Rajput and Bobde, 2016) implements the stock market prediction using clustering techniques. The clustering technique is based on technical parameters of every stock. These parameters are used as a basis for creating the different clusters. The model gives three types of clusters – Positive set, Negative set and Neutral set.  Stocks that show similar kind of behavior will be clustered into one set.
[caption id="attachment_27" align="aligncenter" width="442"]model-structured-data 
Fig: Model implementing Clustering/Classification Techniques on Structured Data [2][/caption]

Decision Trees

We could also use decision trees for classifying the movements of the stock. In the following table, the columns following the Volume column are technical indicators that have been calculated.
[caption id="attachment_24" align="aligncenter" width="632"]decision-tree-data Fig: Sample Data for Decision Tree implementation in Stock Price Prediction [4][/caption]
Using the above data, we use a decision tree to decide whether a stock is going to move up or move down. The following diagram shows an example of actualization of stock classification using decision trees.
[caption id="attachment_23" align="aligncenter" width="633"]decision-tree Fig: Realization of Decision tree [4][/caption]

A Hybrid Approach

[caption id="attachment_25" align="aligncenter" width="344"]hybrid-model 
Fig: The Hybrid Approach [2][/caption]
The above diagram shows a hybrid approach (Rajput and Bobde, 2016) which combines output of both the previously discussed models. Model A represents the one that uses Sentiment analysis and Model B represents the one that uses structured data. Technical indicators are used to analyze the collective outputs from both the models. The final sets of positive, negative and neutral stocks are obtained from this model.

What are the other areas of research in stock market where Big Data analytics is being used?

As I was researching in this domain, I realized that there were more areas that are being researched an implemented. I have put up a short description of each below:

Algorithmic Trading:

Investopedia defines it as follows:
Algorithmic trading (automated trading, black-box trading or simply algo-trading) is the process of using computers programed to follow a defined set of instructions (an algorithm) for placing a trade in order to generate profits at a speed and frequency that is impossible for a human trader.

Manipulation Detection in Stock Market:

Here’s how Zhai et al. (2017)  define this problem:
The term ‘‘price manipulation’’ is used to describe the actions of ‘‘rogue’’ traders who employ carefully designed trading tactics to incur equity prices up or down to make profit. Such activities damage the proper functioning, integrity, and stability of the financial markets. In response to that, the regulators proposed new regulatory guidance to prohibit such activities on the financial markets.

Summary

Can stock be predicted? – YES
Data Sources – Social Media, News Articles, Financial Reports, Historical Data, Company Specific Information, Daily Data
Analysis Methods – Sentiment Analysis, Clustering and Classification Techniques
Results – Sets of positive, negative and neutral stocks

References:

Aditya Bhardwaj, Yogendra Narayan, Vanraj Pawan, Maitreyee Dutta. (2015). Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty. / Procedia Computer Science 70, 85-91.
Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/
Jia Zhai1, Yi Cao, Xuemei Ding (2018). Data analytic approach for manipulation detection
in stock market / Rev Quant Finan Acc (2018) 50:897–932 
Investopedia. Basics of Algorithmic Trading. URL: https://www.investopedia.com/articles/active-trading/101014/basics-algorithmic-trading-concepts-and-examples.asp
Paul J. Darwen Questioning (2018). The Efficient Markets Hypothesis: Big Data Evidence of Non-Random
Stock Prices /2018 IEEE 3rd International Conference on Big Data Analysis, 201 - 205
Kavitha S, Raja Vadhana P, Nivi A N (2015). BIG DATA ANALYTICS IN FINANCIAL MARKET / IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308, 422 - 427
Meryem Ouahilal , Mohammed El Mohajir, Mohamed Chahhou, Badr Eddine El Mohajir. A novel hybrid model based on Hodrick–Prescott flter and support vector regression algorithm for optimizing stock market price prediction
Eric. W. K., Yang Yang. Market sentiment dispersion and its effects on stock return and volatility / Electron Markets (2017) 27:283–296
Bag of Words and TF-IDF Explained. URL: http://datameetsmedia.com/bag-of-words-tf-idf-explained/
Siraz Raval. Natural Language Processing and Sentiment Analysis. URL: https://medium.com/udacity/natural-language-processing-and-sentiment-analysis-43111c33c27e
An Introduction to Clustering and Different Methods of Clustering. URL: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/

Image Sources:
[1] Aditya Bhardwaj, Yogendra Narayan, Vanraj Pawan, Maitreyee Dutta. (2015). Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty. / Procedia Computer Science 70, 85-91.
[2] Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
[3] Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
[4] Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/

Popular posts from this blog

Tutorial 6 - Statistics and Probability

Statistics and Probability with Python HW 6 Statistics and probability homework ¶ Complete homework notebook in a homework directory with your name and zip up the homework directory and submit it to our class blackboard/elearn site. Complete all the parts 6.1 to 6.5 for score of 3. Investigate plotting, linearegression, or complex matrix manipulation to get a score of 4 or cover two additional investigations for a score of 5. 6.1 Coin flipping ¶ 6.1.1 ¶ Write a function, flip_sum, which generates $n$ random coin flips from a fair coin and then returns the number of heads. A fair coin is defined to be a coin where $P($heads$)=\frac{1}{2}$ The output type should be a numpy integer, hint: use random.rand() In [4]: import numpy as np import random """def random_flip(): return random.choice(["H", "T"]) def flip_sum(n): heads_count = 0 ...

Tutorial 5 - Matplotlib

Matplotlib Tutorial In [13]: % matplotlib inline import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt # Setting some Pandas options pd . set_option ( 'display.notebook_repr_html' , False ) pd . set_option ( 'display.max_columns' , 25 ) pd . set_option ( 'display.max_rows' , 25 ) Homework 4 ¶ Couple of reference site: http://matplotlib.org/examples/pylab_examples/ http://docs.scipy.org/doc/numpy/ Homework 4.1 ¶ 4.1.a Create a figure with two subplots in a row. One shows a sine wave of x from with x = 0 ... 2*pi the other shows the tagent of x with the same range. Label the figures. Should look something like: We can follow the following steps to get the required graphs showing sine and tangents of x: Create a numpy array x with values from 0 to 2*pi with 0.001 as step value Set the height and w...