I will create a basic linear regression model for Machine Learning. This model gets historical prices from the past APPLE prices and returns APPLE price prediction the next day.
In the context of finance and predicting the next day’s price using a regression model in machine learning, here’s how it typically works:
Regression models in finance, such as linear regression, aim to establish a relationship between one or more independent variables (also known as features or predictors (X)) and a dependent variable (the target (y) or outcome). In this case, the independent variables could include various financial indicators, technical indicators, historical price movements, economic data, etc., while the dependent variable is the price of a financial asset like a stock, ETF, or commodity.
In summary, a regression model in finance uses historical data and the relationship between independent variables and price to make predictions about future price movements. However, due to the complex and dynamic nature of financial markets, using such models for trading or investment decisions requires careful consideration and risk management strategies.
But in this session, I will keep simple as much as I can.
Libraries
# Data Manipulation import pandas as pd import numpy as np # fetching Historical data import yfinance as yf # matplotlib and seaborn are used for plotting graphs import matplotlib.pyplot as plt %matplotlib inline # machine learning library for linear regression from sklearn.linear_model import LinearRegression pd.options.display.float_format = '{:,.3f}'.format
Let’s Close Price as a data
We can add more than one asset to ‘stocks’
stocks = 'AAPL' start = '2015-01-01' end= '2023-05-31' # end = dt.datetime.now().date() #change to current day date if required interval = '1d'
Getting the historical data of any assets ( AAPL )
stocks = 'AAPL' start = '2015-01-01' end= '2023-05-31' # end = dt.datetime.now().date() #change to current day date if required interval = '1d'
Closing Prices as Data
df= yf.download(stocks,start,end, auto_adjust=True).dropna() df=df[['Close']] df.head()
Output:
Plotting AAPLE Closing price
df.plot(figsize=(10, 7),color='r') plt.ylabel("AAPL Prices") plt.title("AAPL Price Series") plt.show()
Output :
Defining Explanatory variables
An explanatory variable is a variable that is manipulated to determine the value of the AAPL price the next day. They are the features we want to use to predict the Apple AAPL price.
The explanatory variables in this strategy are the moving average for past 12 days and 21 days. We drop the NaN values using dropna() function and store the feature variables in X.
However, you can add more variables to X which you think are useful to predict the prices of the AAPL. These variables can be technical indicators, the price of any stocks, ETF such as Technology ETF ( QQQ) or S&P 500 (^GSPC), or US economic data
# Define explanatory variables df['ma12'] = df['Close'].rolling(window=12).mean() df['ma21'] = df['Close'].rolling(window=21).mean() df['next_day_price'] = df['Close'].shift(-1) df = df.dropna() X = df[['ma12', 'ma21']] # Define dependent variable y = df['next_day_price']
Split the data into train and test dataset
First 80% of the data is used for training and the remaining data is for testing
X_train & y_train are training dataset
X_test & y_test are test dataset
train_size_rate = 0.8 train_size = int(len(X) * train_size_rate) test_size = len(X) - train_size X_train = X.head(train_size) y_train = y.head(train_size) X_test = X.tail(test_size) y_test = y.tail(test_size) size_check = len(y_test) + len(y_train) == len(X) print("Shape of X_train: ", X_train.shape) print("Shape of y_train: ", y_train.shape) print("Shape of X_test: ", X_test.shape) print("Shape of y_test: ", y_test.shape) print("Size Matches: ", size_check)
Applying Model Formula
Y = m1 * X1 + m2 * X2 + C |
AAPL price = m1 * 12 days moving average + m2 * 21 days moving average + c |
Linear Regression
# Linear Regression model linReg = LinearRegression().fit(X_train, y_train) print("Linear Regression model") print("AAPL Price (y) = %.2f * 12 Days Moving Average (x1) \ + %.2f * 21 Days Moving Average (x2) \ + %.2f (constant)" % (linReg.coef_[0], linReg.coef_[1], linReg.intercept_))
Output :
Linear Regression model |
AAPL Price (y) = 1.74 * 12 Days Moving Average (x1) + -0.74 * 21 Days Moving Average (x2) + 0.11 (constant) |
Now comes the phase of evaluating the model’s performance on the test dataset. We forecast the prices of Apple AAPL by employing the linear model established using the training dataset. The prediction method is utilized to deduce the Apple AAPL (y) based on the provided explanatory variable X
Predicting the Stock Prices
# Predicting the Aaple AAPL prices predict_price = linReg.predict(X_test) predict_price = pd.DataFrame( predict_price, index=y_test.index, columns=['price']) predict_price.plot(figsize=(12, 8) , color= 'red') y_test.plot(color='black') plt.ylabel("AAPL Price") plt.xlabel("Date") plt.title('The graph visually displays a comparison\nbetween the projected ( predicted ) and actual prices of the AAPL', style = "italic" , color='black' , fontsize=12 ) plt.legend(['predict_price', 'actual_price']) plt.grid(linestyle = "dashed" , color = "grey" ,linewidth = 1, alpha = 0.25) plt.show()
Output:
R2_Score
Let’s proceed to quantify the quality of the fit by utilizing the R2_score() function.
r2_score = linReg.score(X_test, y_test)*100 float("{0:.2f}".format(r2_score))
Output :
81.47 |
R-squared ranges from 0% to 100%, with a value approaching 100% signifying that the model effectively elucidates the variations in AAPL prices