This is current ongoing work on predicting stock price using machine learning and data science application.
Capstone Document:
● First, we will import all the python libraries that are required for this, which include NumPy for numerical calculations and scientific computing, Pandas for handling data, and Matplotlib and Seaborn for visualization. We also added the nsepy library to get updated quotes from NSE. Will add ML Libraries going further
● Then we fetched historical data of Indian stocks from nsepy by using get_history function which returns daily data of stock's ticker requested over the desired time frame in a pandas format.
● Creating a function for getting data from nsepy.
● Creating Check_anomalies function for checking anomalies. We are converting the Date Index column to datetime so that we can perform any action. TCS has had two splits since 2007, created for loop to neutralize the the impact of splits. Plotting the Stock splits 1 and Stock split 2 for better understanding. Adjusting the anomalies in 'Prev Close' feature in records after adjusting the splits.
● Now let's prepare Indicators:
Simple Moving Average (SMA): It’s the average of price over a time period.
Exponential Moving Average (EMA): A weighted average of price which provides more weightage to recent prices.
Relative strength index (RSI): It is an indicator that predicts the strength of price based on the percentage change in price. Bollinger Band (BB):
John Bollinger developed a technical indicator called the Bollinger Band which is used to measure a market’s volatility and identify “overbought” or “oversold” conditions. Bollinger Band helps us to know whether the market is quiet or the market is loud! We observe that when the market is quiet, the bands shrink and when the market is loud, the bands expand. Look at the chart above. The Bollinger Bands (BB) is a chart overlay indicator meaning it’s displayed over the price. What are Bollinger Bands? Bollinger Bands are typically plotted as three lines: An Upper band A Middle line A Lower band. The middle line of the indicator is a simple moving average (SMA) of 20 days. The upper and lower bands, by default, represent two standard deviations above and below the middle line The standard deviation (SD) is a measure of how spread out numbers are.
● We have created a function called “prepare_indicators” to add features to our datasets using the indicator values. ● We have done feature engineering and added multiple features with respect to features we have prepared above. THese features are binary in nature or even multi classification features. ● Next we have removed rows with any Null Values and Insignificant features from our data. ● And calling all functions in the “Main function”.