Implementing a Multi-Layer Perceptron from Scratch
Author : Mobin Nesari
Prepared for : The Artificial Neural Network Graduate course 2023 Shahid Beheshti University
Introduction
A Multi-Layer Perceptron (MLP) is a type of artificial neural network that is commonly used for supervised learning tasks, such as binary and multi-class classification. In this notebook, we’ll implement an MLP from scratch using only Numpy and implement it to solve a binary classification problem.
Problem Definition
We will use the well-known Iris dataset for this demonstration. The Iris dataset contains 150 instances, where each instance has 4 features and a binary label indicating the species of the Iris flower. Our goal is to train an MLP to correctly classify the species based on the 4 features.
Introduction to target dataset
The Iris dataset is a widely used dataset in machine learning, it contains information about different species of iris flowers. The dataset contains 150 samples of iris flowers with 4 features for each flower: sepal length, sepal width, petal length, and petal width. The goal of using this dataset is to train a machine learning model that can accurately predict the species of a given iris flower based on its features.
Features Used in the Jupyter Notebook
In this Jupyter notebook, we will use the petal length and petal width features of the Iris dataset to train and evaluate our Multi-Layer Perceptron (MLP). These two features were selected because they provide a good representation of the variations in the different species of iris flowers, and they are also easy to visualize.
By using only 2 features, we were able to create a two-dimensional scatter plot of the data, which makes it easier to understand the relationships between the features and the target variable (species). The MLP was then trained on this data, and the accuracy of the model was evaluated based on its predictions.
Preparation
Let’s start by importing the required libraries and loading the Iris dataset.
import numpy as np
from sklearn import datasets
# load iris dataset
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(int) # 1 if Iris-Virginica, else 0
y = y.reshape([150,1])
Activation Function
Before we build our MLP, let’s define the activation functions that we’ll be using. For this demonstration, we’ll be using the sigmoid activation function.
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(z):
s = sigmoid(z)
return s * (1 - s)
MLP Class Definition
Now, let’s define the MLP class. The MLP class has 3 methods:
__init__
method that initializes the weights and biases of the MLPfit
method that trains the MLP on the training datapredict
method that uses the trained MLP to make predictions on new data
class MLP:
def __init__(self, input_size, hidden_size, output_size, learning_rate=0.01):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.learning_rate = learning_rate
# initialize weights randomly
self.weights1 = np.random.randn(self.input_size, self.hidden_size)
self.weights2 = np.random.randn(self.hidden_size, self.output_size)
# initialize biases to 0
self.bias1 = np.zeros((1, self.hidden_size))
self.bias2 = np.zeros((1, self.output_size))
def fit(self, X, y, epochs=1000):
for epoch in range(epochs):
# feedforward
layer1 = X.dot(self.weights1) + self.bias1
activation1 = sigmoid(layer1)
layer2 = activation1.dot(self.weights2) + self.bias2
activation2 = sigmoid(layer2)
# backpropagation
error = activation2 - y
d_weights2 = activation1.T.dot(error * sigmoid_derivative(layer2))
d_bias2 = np.sum(error * sigmoid_derivative(layer2), axis=0, keepdims=True)
error_hidden = error.dot(self.weights2.T) * sigmoid_derivative(layer1)
d_weights1 = X.T.dot(error_hidden)
d_bias1 = np.sum(error_hidden, axis=0, keepdims=True)
# update weights and biases
self.weights2 -= self.learning_rate * d_weights2
self.bias2 -= self.learning_rate * d_bias2
self.weights1 -= self.learning_rate * d_weights1
self.bias1 -= self.learning_rate * d_bias1
def predict(self, X):
layer1 = X.dot(self.weights1) + self.bias1
activation1 = sigmoid(layer1)
layer2 = activation1.dot(self.weights2) + self.bias2
activation2 = sigmoid(layer2)
return (activation2 > 0.5).astype(int)
This code defines a class called MLP
that implements a Multi-Layer Perceptron (MLP) algorithm. The MLP is a type of artificial neural network that can be used for classification and regression problems.
The class takes 4 parameters in its constructor method: input_size, hidden_size, output_size, and learning_rate. input_size is the number of input features, hidden_size is the number of neurons in the hidden layer, output_size is the number of output classes, and learning_rate is the learning rate used in the training process.
The class also has two methods: fit and predict. The fit method trains the MLP on the provided data and target values, and the predict method makes predictions for new data based on the trained MLP.
In the fit
method, the MLP uses a feedforward and backpropagation algorithm to learn the weights and biases that minimize the prediction error. The feedforward part calculates the activations of the hidden and output layers, while the backpropagation part updates the weights and biases based on the gradient of the prediction error. The training process repeats for a specified number of epochs until convergence or until the maximum number of epochs is reached.
In the predict
method, the MLP applies the trained weights and biases to new data to make predictions, and returns the predictions as binary values.
Training and Evaluating the MLP
Let’s now train and evaluate the MLP on the Iris dataset.
# create an instance of the MLP class
mlp = MLP(input_size=2, hidden_size=4, output_size=1)
# train the MLP on the training data
mlp.fit(X, y)
# make predictions on the test data
y_pred = mlp.predict(X)
# evaluate the accuracy of the MLP
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy:.2f}")
Accuracy: 0.95
Conclusion
In this notebook, we implemented a Multi-Layer Perceptron (MLP) from scratch using only Numpy. We trained the MLP on the Iris dataset and achieved an accuracy of around 90%. This demonstration serves as a good starting point for understanding the fundamentals of MLPs and building more complex neural networks.