Hi, as part of my research on a domain of Big Data implementation, I chose Stock Market Prediction. Here I present to you the things that I have learned during my research in the domain.
Compared to predicting stock market using structured data (such as price, trading volume, etc.). it would be more difficult to predict stock price movements based on unstructured data. Unstructured data could be news articles (printed or online), posts on social media, financial reports of companies which contain textual and numerical data as well. Such unstructured data can be used to analyze what the market feels about a stock. This analysis of “sentiments” of the market can then be used to predict the stock price movements.
Here, I talk about both ways of stock market prediction – using unstructured data and structured data.
Increased use of Social networking sites like Facebook, Twitter, etc., have allowed people to express their opinions and views about a lot of topic ranging from news, movies, events and so on, relating to products. Business analysts have been using these opinions to mine for feedback by classifying them as positive, negative or neutral opinions. Such kind of information obtained from social media is beneficial for businesses. The tool that is used here is Sentiment Analysis.
As explained above, Sentiment Analysis tries to extract intelligent information about an object or a topic from a person’s opinions. Thus, it is all about trying to understand the gist of an opinion text. And since language can be very complex for even the human brain, sentiment analysis does have challenges. But we’ll get to it later. Let me now talk about the techniques used for Sentiment Analysis.
Now, let us look at a model of Sentiment Analysis used for Stock Market Prediction.
Data collected or fetched from various sources undergo through various processes such as:
The question whether we will ever get close to 100% accuracy in sentiment analysis currently has a negative answer as linguistics is still a very complex area for even the human mind. Language differs from place to place and person to person. To be able to achieve such accuracy is thus seems almost impossible right now.
Data Sources – Social Media, News Articles, Financial Reports, Historical Data, Company Specific Information, Daily Data
Analysis Methods – Sentiment Analysis, Clustering and Classification Techniques
Results – Sets of positive, negative and neutral stocks
Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/
Jia Zhai1, Yi Cao, Xuemei Ding (2018). Data analytic approach for manipulation detection
in stock market / Rev Quant Finan Acc (2018) 50:897–932
Investopedia. Basics of Algorithmic Trading. URL: https://www.investopedia.com/articles/active-trading/101014/basics-algorithmic-trading-concepts-and-examples.asp
Paul J. Darwen Questioning (2018). The Efficient Markets Hypothesis: Big Data Evidence of Non-Random
Stock Prices /2018 IEEE 3rd International Conference on Big Data Analysis, 201 - 205
Kavitha S, Raja Vadhana P, Nivi A N (2015). BIG DATA ANALYTICS IN FINANCIAL MARKET / IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308, 422 - 427
Meryem Ouahilal , Mohammed El Mohajir, Mohamed Chahhou, Badr Eddine El Mohajir. A novel hybrid model based on Hodrick–Prescott flter and support vector regression algorithm for optimizing stock market price prediction
Eric. W. K., Yang Yang. Market sentiment dispersion and its effects on stock return and volatility / Electron Markets (2017) 27:283–296
Bag of Words and TF-IDF Explained. URL: http://datameetsmedia.com/bag-of-words-tf-idf-explained/
Siraz Raval. Natural Language Processing and Sentiment Analysis. URL: https://medium.com/udacity/natural-language-processing-and-sentiment-analysis-43111c33c27e
An Introduction to Clustering and Different Methods of Clustering. URL: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/
Image Sources:
[1] Aditya Bhardwaj, Yogendra Narayan, Vanraj Pawan, Maitreyee Dutta. (2015). Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty. / Procedia Computer Science 70, 85-91.
[2] Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
[3] Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
[4] Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/
Can stock market be predicted?
Early researches on stock market prediction revolved around whether it could be predicted. One of such researches suggested that “short term stock price movements were governed by the random walk hypothesis and thus were unpredictable”. Another stated that “the stock price reflected completed market information and the market behaved efficiently so that instantaneous price corrections to equilibrium would make stock prediction useless.” In simple terms, the researches inferred that since the market was affected by a lot of factors which were random predicting the stock market is almost impossible. However, researches carried out later (Brown & Jennings 1998; Abarbanel & Bushee 1998) made use of a variety of methods to derive future price information. One of the methods used financial ratios, earning, and management effectiveness to derive the stock price movements whereas the other derived the trends of stock prices and trading volumes from historical prices and volumes.Compared to predicting stock market using structured data (such as price, trading volume, etc.). it would be more difficult to predict stock price movements based on unstructured data. Unstructured data could be news articles (printed or online), posts on social media, financial reports of companies which contain textual and numerical data as well. Such unstructured data can be used to analyze what the market feels about a stock. This analysis of “sentiments” of the market can then be used to predict the stock price movements.
Here, I talk about both ways of stock market prediction – using unstructured data and structured data.
Using unstructured data – Sentiment Analysis
Introduction to Sentiment Analysis
In the simplest terms, sentiment analysis tries to extract the emotion or 'feeling' of a body of text. Sentiment analysis attempts to derive intelligent information about how a person feels about a product or an issue using raw textual data (from the internet).Increased use of Social networking sites like Facebook, Twitter, etc., have allowed people to express their opinions and views about a lot of topic ranging from news, movies, events and so on, relating to products. Business analysts have been using these opinions to mine for feedback by classifying them as positive, negative or neutral opinions. Such kind of information obtained from social media is beneficial for businesses. The tool that is used here is Sentiment Analysis.
As explained above, Sentiment Analysis tries to extract intelligent information about an object or a topic from a person’s opinions. Thus, it is all about trying to understand the gist of an opinion text. And since language can be very complex for even the human brain, sentiment analysis does have challenges. But we’ll get to it later. Let me now talk about the techniques used for Sentiment Analysis.
Sentiment Classification Methodologies - Bag of Words and NLP
There are a lot of various approaches towards Sentiment Analysis. A classification of Sentiment Analysis methodologies is shown in the following figure.
[caption id="attachment_28" align="aligncenter" width="650"]
Fig: Sentiment Analysis Classification Methodologies [1][/caption]
While all the above techniques are all usable, all of these basically boil down to the following models:
- "Bag of Words" Model:
- Using Natural Language Processing, and the attempt to truly "understand" the text:
Now, let us look at a model of Sentiment Analysis used for Stock Market Prediction.
A Model of Sentiment Analysis in Stock Market Prediction
[caption id="attachment_26" align="aligncenter" width="366"]

Fig: Model implementing Sentiment Analysis for Stock Price Prediction [2][/caption]
The above model implements the stock market prediction using Sentiment Analysis. This model proposed by Rajput and Bobde (2016) collects the data from different sources including social networking sites, news articles, etc., and processes the data to make it generalized.Data collected or fetched from various sources undergo through various processes such as:
- Parsing
- Tokenization
- Filter
- Stemming
- TF (Term Frequency)
- TF-IDF
- Calculating score of post based on TF-IDF.
How Accurate can Sentiment Analysis be?
[caption id="attachment_29" align="aligncenter" width="554"]

Fig: Results of Stock Price Prediction by Sentdex [3][/caption]
Above figures shows a stock market prediction performed by Sentdex. The graphs show the predicted prices using the greens and actual prices using the dark blues. The one on the right is close to accurate but the one on the left is far away from it. Sentdex explains that sentiment analysis for stock market prediction is about 80% accurate currently.The question whether we will ever get close to 100% accuracy in sentiment analysis currently has a negative answer as linguistics is still a very complex area for even the human mind. Language differs from place to place and person to person. To be able to achieve such accuracy is thus seems almost impossible right now.
Using structured data - Clustering/Classification Algorithms
There is a lot of structured data available in the domain of stock market. Historical price data, company specific information and daily data can all be used. Below, I talk about a model that uses structured data and clustering algorithms for stock market prediction.A Model of Stock Market Prediction using Clustering/Classification
The following models (Rajput and Bobde, 2016) implements the stock market prediction using clustering techniques. The clustering technique is based on technical parameters of every stock. These parameters are used as a basis for creating the different clusters. The model gives three types of clusters – Positive set, Negative set and Neutral set. Stocks that show similar kind of behavior will be clustered into one set.
[caption id="attachment_27" align="aligncenter" width="442"]

Fig: Model implementing Clustering/Classification Techniques on Structured Data [2][/caption]
Decision Trees
We could also use decision trees for classifying the movements of the stock. In the following table, the columns following the Volume column are technical indicators that have been calculated.
[caption id="attachment_24" align="aligncenter" width="632"]
Fig: Sample Data for Decision Tree implementation in Stock Price Prediction [4][/caption]
Using the above data, we use a decision tree to decide whether a stock is going to move up or move down. The following diagram shows an example of actualization of stock classification using decision trees.
[caption id="attachment_23" align="aligncenter" width="633"]
Fig: Realization of Decision tree [4][/caption]

A Hybrid Approach
[caption id="attachment_25" align="aligncenter" width="344"]

Fig: The Hybrid Approach [2][/caption]
The above diagram shows a hybrid approach (Rajput and Bobde, 2016) which combines output of both the previously discussed models. Model A represents the one that uses Sentiment analysis and Model B represents the one that uses structured data. Technical indicators are used to analyze the collective outputs from both the models. The final sets of positive, negative and neutral stocks are obtained from this model.What are the other areas of research in stock market where Big Data analytics is being used?
As I was researching in this domain, I realized that there were more areas that are being researched an implemented. I have put up a short description of each below:Algorithmic Trading:
Investopedia defines it as follows:Algorithmic trading (automated trading, black-box trading or simply algo-trading) is the process of using computers programed to follow a defined set of instructions (an algorithm) for placing a trade in order to generate profits at a speed and frequency that is impossible for a human trader.
Manipulation Detection in Stock Market:
Here’s how Zhai et al. (2017) define this problem:The term ‘‘price manipulation’’ is used to describe the actions of ‘‘rogue’’ traders who employ carefully designed trading tactics to incur equity prices up or down to make profit. Such activities damage the proper functioning, integrity, and stability of the financial markets. In response to that, the regulators proposed new regulatory guidance to prohibit such activities on the financial markets.
Summary
Can stock be predicted? – YESData Sources – Social Media, News Articles, Financial Reports, Historical Data, Company Specific Information, Daily Data
Analysis Methods – Sentiment Analysis, Clustering and Classification Techniques
Results – Sets of positive, negative and neutral stocks
References:
Aditya Bhardwaj, Yogendra Narayan, Vanraj Pawan, Maitreyee Dutta. (2015). Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty. / Procedia Computer Science 70, 85-91.Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/
Jia Zhai1, Yi Cao, Xuemei Ding (2018). Data analytic approach for manipulation detection
in stock market / Rev Quant Finan Acc (2018) 50:897–932
Investopedia. Basics of Algorithmic Trading. URL: https://www.investopedia.com/articles/active-trading/101014/basics-algorithmic-trading-concepts-and-examples.asp
Paul J. Darwen Questioning (2018). The Efficient Markets Hypothesis: Big Data Evidence of Non-Random
Stock Prices /2018 IEEE 3rd International Conference on Big Data Analysis, 201 - 205
Kavitha S, Raja Vadhana P, Nivi A N (2015). BIG DATA ANALYTICS IN FINANCIAL MARKET / IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308, 422 - 427
Meryem Ouahilal , Mohammed El Mohajir, Mohamed Chahhou, Badr Eddine El Mohajir. A novel hybrid model based on Hodrick–Prescott flter and support vector regression algorithm for optimizing stock market price prediction
Eric. W. K., Yang Yang. Market sentiment dispersion and its effects on stock return and volatility / Electron Markets (2017) 27:283–296
Bag of Words and TF-IDF Explained. URL: http://datameetsmedia.com/bag-of-words-tf-idf-explained/
Siraz Raval. Natural Language Processing and Sentiment Analysis. URL: https://medium.com/udacity/natural-language-processing-and-sentiment-analysis-43111c33c27e
An Introduction to Clustering and Different Methods of Clustering. URL: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/
Image Sources:
[1] Aditya Bhardwaj, Yogendra Narayan, Vanraj Pawan, Maitreyee Dutta. (2015). Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty. / Procedia Computer Science 70, 85-91.
[2] Bibek Rajpu, Sarika Bobde (2016). Stock Market Prediction Using Hybrid Approach / International Conference on Computing, Communication and Automation (ICCCA2016), 82 -86
[3] Sentiment Analysis Accuracy. Sentdex. URL: http://sentdex.com/how-accurate-is-sentiment-analysis-for-stocks/
[4] Use Decision Trees in Machine Learning to Predict Stock Movements. Quantinsti. URL: https://www.quantinsti.com/blog/use-decision-trees-machine-learning-predict-stock-movements/