Finance & Algoritm

Predicting the next day’s price of any asset(s)

I will create a basic linear regression model for Machine Learning. This model gets historical prices from the past APPLE prices and returns APPLE price prediction the next day.

In the context of finance and predicting the next day’s price using a regression model in machine learning, here’s how it typically works:

Regression models in finance, such as linear regression, aim to establish a relationship between one or more independent variables (also known as features or predictors (X)) and a dependent variable (the target (y) or outcome). In this case, the independent variables could include various financial indicators, technical indicators, historical price movements, economic data, etc., while the dependent variable is the price of a financial asset like a stock, ETF, or commodity.

In summary, a regression model in finance uses historical data and the relationship between independent variables and price to make predictions about future price movements. However, due to the complex and dynamic nature of financial markets, using such models for trading or investment decisions requires careful consideration and risk management strategies.

But in this session, I will keep simple as much as I can.

Libraries

# Data Manipulation
import pandas as pd
import numpy as np

# fetching Historical data
import yfinance as yf

# matplotlib and seaborn are used for plotting graphs
import matplotlib.pyplot as plt
%matplotlib inline

# machine learning library for linear regression
from sklearn.linear_model import LinearRegression

pd.options.display.float_format = '{:,.3f}'.format
Let’s Close Price as a data

We can add more than one asset to ‘stocks’

stocks = 'AAPL'                  
start = '2015-01-01'
end= '2023-05-31'
# end =   dt.datetime.now().date()    #change to current day date if required
interval = '1d'

Getting the historical data of any assets ( AAPL )

stocks = 'AAPL'                  
start = '2015-01-01'
end= '2023-05-31'
# end =   dt.datetime.now().date()    #change to current day date if required
interval = '1d'

Closing Prices as Data

df= yf.download(stocks,start,end, auto_adjust=True).dropna()
df=df[['Close']]
df.head()

Output:

Plotting AAPLE Closing price
df.plot(figsize=(10, 7),color='r')
plt.ylabel("AAPL Prices")
plt.title("AAPL Price Series")
plt.show()

Output :

Defining Explanatory variables

An explanatory variable is a variable that is manipulated to determine the value of the AAPL price the next day. They are the features we want to use to predict the Apple AAPL price.

The explanatory variables in this strategy are the moving average for past 12 days and 21 days. We drop the NaN values using dropna() function and store the feature variables in X.

However, you can add more variables to X which you think are useful to predict the prices of the AAPL. These variables can be technical indicators, the price of any stocks, ETF such as Technology ETF ( QQQ) or S&P 500 (^GSPC), or US economic data

# Define explanatory variables
df['ma12'] = df['Close'].rolling(window=12).mean()
df['ma21'] = df['Close'].rolling(window=21).mean()
df['next_day_price'] = df['Close'].shift(-1)

df = df.dropna()
X = df[['ma12', 'ma21']]

# Define dependent variable
y = df['next_day_price']
Split the data into train and test dataset

First 80% of the data is used for training and the remaining data is for testing

X_train & y_train are training dataset

X_test & y_test are test dataset

train_size_rate = 0.8
train_size = int(len(X) * train_size_rate)
test_size = len(X) - train_size

X_train = X.head(train_size)
y_train = y.head(train_size)

X_test = X.tail(test_size)
y_test = y.tail(test_size)

size_check = len(y_test) + len(y_train) == len(X)
print("Shape of X_train: ", X_train.shape)
print("Shape of y_train: ", y_train.shape)
print("Shape of X_test: ", X_test.shape)
print("Shape of y_test: ", y_test.shape)
print("Size Matches: ", size_check)
Applying Model Formula
Y = m1 * X1 + m2 * X2 + C
AAPL price = m1 * 12 days moving average + m2 * 21 days moving average + c
Subsequently, the fit method is employed to match the independent and dependent variables (represented as x’s and y’s) in order to compute coefficients and constants for the regression analysis
Linear Regression
# Linear Regression model
linReg = LinearRegression().fit(X_train, y_train)
print("Linear Regression model")
print("AAPL Price (y) = %.2f * 12 Days Moving Average (x1)  \
+ %.2f * 21 Days Moving Average (x2)  \
+ %.2f (constant)" % (linReg.coef_[0], linReg.coef_[1], linReg.intercept_))

Output :

Linear Regression model
AAPL Price (y) = 1.74 * 12 Days Moving Average (x1) + -0.74 * 21 Days Moving Average (x2) + 0.11 (constant)

Now comes the phase of evaluating the model’s performance on the test dataset. We forecast the prices of Apple AAPL by employing the linear model established using the training dataset. The prediction method is utilized to deduce the Apple AAPL (y) based on the provided explanatory variable X

Predicting the Stock Prices
# Predicting the Aaple AAPL prices

predict_price = linReg.predict(X_test)
predict_price = pd.DataFrame(
    predict_price, index=y_test.index, columns=['price'])
predict_price.plot(figsize=(12, 8) , color= 'red')
y_test.plot(color='black')
plt.ylabel("AAPL Price")
plt.xlabel("Date")
plt.title('The graph visually displays a comparison\nbetween the projected ( predicted ) and actual prices of the AAPL', 
          style = "italic" , color='black' , fontsize=12 )
plt.legend(['predict_price', 'actual_price'])

plt.grid(linestyle = "dashed" , color = "grey" ,linewidth = 1, alpha = 0.25)
plt.show()

Output:

R2_Score

Let’s proceed to quantify the quality of the fit by utilizing the R2_score() function.

r2_score = linReg.score(X_test, y_test)*100
float("{0:.2f}".format(r2_score))

Output :

81.47
As observed, the model’s R-squared value stands at 81.47%.
R-squared ranges from 0% to 100%, with a value approaching 100% signifying that the model effectively elucidates the variations in AAPL prices