2.14. MLP model from scratch in Python
Contents
2.14. MLP model from scratch in Python#
We will be building Neural Network (Multi Layer Perceptron) model from scratch using Numpy in Python. Please check out the following list of ingredients
(if you have not already done so), so that you can cook (code) the MLP model from scratch because this is going to be the most general MLP model that you can find anywhere on the net (without using any for loops, except for the epochs part :))!
Note: I have already explained (in detail) most of the code sections in my previous chapters (like developing Activation function class, developing class for Cost function, etc). I will just put the list for you to go and check them out (so that I can skip the tedious work of explaining them again and concentrate more on the fun part). I know it will be laborious for you to visit each and every page, but the fruits of the hard work is always sweet.
Ingredients
Activation functions
Data Pre-processing
Scaling
Standardization
Normalization
Encoding
Label Encoding
One-hot encoding
Data Augmentation
Train Test Split
Performance Metrics
Perceptron model
Neurons
Weights, Biases
Terminologies - Part 1
Input, Output and Hidden layers
Notations
Parameter Initialize
Learning Algorithm
Cost function
Forward propagation
Back Propagation
Terminologies - Part 2
Epochs, Iterations, Batch size, and learning rate
Gradient Descent
Update law
Momentum
RMSProp
Adam
LR Decay
Gradient exploding and Vanishing
Variance/ Bias
Regularization
Drop-out
Early stopping
Batch normalization
Numerical example (with code) - Forward pass and Backpropagation (step by step vectorized form)
Shortcut to calculate forward pass and backpropagation across layers (Very Important)
Now that we have all the ingredients available, we are ready to code the most general Neural Network (Multi Layer Perceptron) model
from scratch using Numpy in Python.
The structure/design of the code (recipe) will be similar to that of the Tensorflow's Keras Sequential layers
just to get a taste of the MLP models.
Import essential libraries#
# numpy for linear algebra
import numpy as np
# matplotlib for plotting the loss functions and/or accuracy
import matplotlib.pyplot as plt
# loading iris dataset from sklearn
from sklearn.datasets import load_iris
# confusion matrix
from sklearn.metrics import confusion_matrix
# accuracy score
from sklearn.metrics import accuracy_score
# show progress bar
from tqdm import tqdm
Activation class#
This class will contain class methods to calculate activation functions and also it will calculate the forward propagation and backpropagation as per the decsription in the chapter Shortcut to calculate forward pass and backpropagation across layers (link to previous chapter).
class Activation:
def __init__(self, activation_type=None):
'''
Parameters
activation_type: type of activation
available options are 'sigmoid', 'linear', 'tanh', 'softmax', 'prelu' and 'relu'
'''
if activation_type is None:
self.activation_type = 'linear'
else:
self.activation_type = activation_type
def linear(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return x
def d_linear(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return np.ones(x.shape)
def sigmoid(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return 1/(1+np.exp(-x))
def d_sigmoid(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return self.sigmoid(x) * (1-self.sigmoid(x))
def tanh(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
def d_tanh(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return 1-(self.tanh(x))**2
def ReLU(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return x * (x > 0)
def d_ReLU(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
return (x>0)*np.ones(x.shape)
def PReLU(self, x, alpha=0.2):
'''
Parameters
alpha: slope parameter (𝛼)
x: input matrix of shape (m, d)
where 'm' is the number of samples (or rows)
and 'd' is the number of features (or columns)
'''
return np.where(x > 0, x, alpha*x)
def d_PReLU(self, x, alpha=0.2):
'''
Parameters
alpha: slope parameter (𝛼)
x: input matrix of shape (m, d)
where 'm' is the number of samples (or rows)
and 'd' is the number of features (or columns)
'''
return np.where(x > 0, 1, alpha)
def softmax(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
z = x - np.max(x, axis=-1, keepdims=True)
numerator = np.exp(z)
denominator = np.sum(numerator, axis=-1, keepdims=True)
softmax = numerator / denominator
return softmax
def d_softmax(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
if len(x.shape)==1:
x = np.array(x).reshape(1,-1)
else:
x = np.array(x)
m, d = x.shape
a = self.softmax(x)
tensor1 = np.einsum('ij,ik->ijk', a, a)
tensor2 = np.einsum('ij,jk->ijk', a, np.eye(d, d))
return tensor2 - tensor1
def get_activation(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
if self.activation_type == 'sigmoid':
return self.sigmoid(x)
elif self.activation_type == 'tanh':
return self.tanh(x)
elif self.activation_type == 'relu':
return self.ReLU(x)
elif self.activation_type == 'linear':
return self.linear(x)
elif self.activation_type == 'prelu':
return self.PReLU(x)
elif self.activation_type == 'softmax':
return self.softmax(x)
else:
raise ValueError("Valid Activations are only 'sigmoid', 'linear', 'tanh' 'softmax', 'prelu' and 'relu'")
def get_d_activation(self, x):
'''
Parameters
x: input matrix of shape (m, d)
where 'm' is the number of samples (in case of batch gradient descent of size m)
and 'd' is the number of features
'''
if self.activation_type == 'sigmoid':
return self.d_sigmoid(x)
elif self.activation_type == 'tanh':
return self.d_tanh(x)
elif self.activation_type == 'relu':
return self.d_ReLU(x)
elif self.activation_type == 'linear':
return self.d_linear(x)
elif self.activation_type == 'prelu':
return self.d_PReLU(x)
elif self.activation_type == 'softmax':
return self.d_softmax(x)
else:
raise ValueError("Valid Activations are only 'sigmoid', 'linear', 'tanh', 'softmax', 'prelu' and 'relu'")
def forward(self, X):
self.X = X
z = self.get_activation(X)
return z
def backpropagation(self, dz):
f_prime = self.get_d_activation(self.X)
if self.activation_type=='softmax':
# because derivative of softmax is a tensor
dx = np.einsum('ijk,ik->ij', f_prime, dz)
else:
dx = dz * f_prime
return dx
Cost function#
Follow the lecture to develop the cost function class
class Cost:
def __init__(self, cost_type='mse'):
'''
Parameters
cost_type: type of cost function
available options are 'mse', and 'cross-entropy'
'''
self.cost_type = cost_type
def mse(self, a, y):
'''
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
return (1/2)*np.sum((np.linalg.norm(a-y, axis=1))**2)
def d_mse(self, a, y):
'''
represents dJ/da
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
return a - y
def cross_entropy(self, a, y, epsilon=1e-12):
'''
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
a = np.clip(a, epsilon, 1. - epsilon)
return -np.sum(y*np.log(a))
def d_cross_entropy(self, a, y, epsilon=1e-12):
'''
represents dJ/da
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
a = np.clip(a, epsilon, 1. - epsilon)
return -y/a
def get_cost(self, a, y):
'''
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
if self.cost_type == 'mse':
return self.mse(a, y)
elif self.cost_type == 'cross-entropy':
return self.cross_entropy(a, y)
else:
raise ValueError("Valid cost functions are only 'mse', and 'cross-entropy'")
def get_d_cost(self, a, y):
'''
Parameters
a: Predicted output array of shape (m, d)
y: Actual output array of shape (m, d)
'''
if self.cost_type == 'mse':
return self.d_mse(a, y)
elif self.cost_type == 'cross-entropy':
return self.d_cross_entropy(a, y)
else:
raise ValueError("Valid cost functions are only 'mse', and 'cross-entropy'")
Optimizers#
This class contains different optimizers (such as RMSProp, Adam, etc) used for updating the parameters.
class Optimizer:
def __init__(self, optimizer_type=None, shape_W=None, shape_b=None,
momentum1=0.9, momentum2=0.999, epsilon=1e-8):
'''
Parameters
momentum1: float hyperparameter >= 0 that accelerates gradient descent in the relevant
direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent.
Also used in RMSProp
momentum2: used in Adam only
optimizer_type: type of optimizer
available options are 'gd', 'sgd' (This also includes momentum), 'adam', and 'rmsprop'
shape_W: Shape of the weight matrix W
shape_b: Shape of the bias matrix b
epsilon: parameter used in RMSProp and Adam to avoid division by zero error
'''
if optimizer_type is None:
self.optimizer_type = 'adam'
else:
self.optimizer_type = optimizer_type
self.momentum1 = momentum1
self.momentum2 = momentum2
self.epsilon = epsilon
self.vdW = np.zeros(shape_W)
self.vdb = np.zeros(shape_b)
self.SdW = np.zeros(shape_W)
self.Sdb = np.zeros(shape_b)
def GD(self, dW, db, k):
'''
dW: gradient of Weight W for iteration k
db: gradient of bias b for iteration k
k: iteration number
'''
return dW, db
def SGD(self, dW, db, k):
'''
dW: gradient of Weight W for iteration k
db: gradient of bias b for iteration k
k: iteration number
'''
self.vdW = self.momentum1*self.vdW + (1-self.momentum1)*dW
self.vdb = self.momentum1*self.vdb + (1-self.momentum1)*db
return self.vdW, self.vdb
def RMSProp(self, dW, db, k):
'''
dW: gradient of Weight W for iteration k
db: gradient of bias b for iteration k
k: iteration number
'''
self.SdW = self.momentum2*self.SdW + (1-self.momentum2)*(dW**2)
self.Sdb = self.momentum2*self.Sdb + (1-self.momentum2)*(db**2)
den_W = np.sqrt(self.SdW) + self.epsilon
den_b = np.sqrt(self.Sdb) + self.epsilon
return dW/den_W, db/den_b
def Adam(self, dW, db, k):
'''
dW: gradient of Weight W for iteration k
db: gradient of bias b for iteration k
k: iteration number
'''
# momentum
self.vdW = self.momentum1*self.vdW + (1-self.momentum1)*dW
self.vdb = self.momentum1*self.vdb + (1-self.momentum1)*db
# rmsprop
self.SdW = self.momentum2*self.SdW + (1-self.momentum2)*(dW**2)
self.Sdb = self.momentum2*self.Sdb + (1-self.momentum2)*(db**2)
# correction
if k>1:
vdW_h = self.vdW / (1-(self.momentum1**k))
vdb_h = self.vdb / (1-(self.momentum1**k))
SdW_h = self.SdW / (1-(self.momentum2**k))
Sdb_h = self.Sdb / (1-(self.momentum2**k))
else:
vdW_h = self.vdW
vdb_h = self.vdb
SdW_h = self.SdW
Sdb_h = self.Sdb
den_W = np.sqrt(SdW_h) + self.epsilon
den_b = np.sqrt(Sdb_h) + self.epsilon
return vdW_h/den_W, vdb_h/den_b
def get_optimization(self, dW, db, k):
if self.optimizer_type == 'gd':
return self.GD(dW, db, k)
if self.optimizer_type == 'sgd':
return self.SGD(dW, db, k)
if self.optimizer_type == 'rmsprop':
return self.RMSProp(dW, db, k)
if self.optimizer_type == 'adam':
return self.Adam(dW, db, k)
else:
raise ValueError("Valid optimizer options are only 'gd', 'sgd', 'rmsprop', and 'adam'.")
Learning Rate decay#
This class contains different methods to implement the learning rate decay scheduler.
class LearningRateDecay:
def __init__(self):
pass
def constant(self, t, lr_0):
'''
t: iteration
lr_0: initial learning rate
'''
return lr_0
def time_decay(self, t, lr_0, k):
'''
lr_0: initial learning rate
k: Decay rate
t: iteration number
'''
lr = lr_0 /(1+(k*t))
return lr
def step_decay(self, t, lr_0, F, D):
'''
lr_0: initial learning rate
F: factor value controlling the rate in which the learning date drops
D: “Drop every” iteration
t: current iteration
'''
mult = F**np.floor((1+t)/D)
lr = lr_0 * mult
return lr
def exponential_decay(self, t, lr_0, k):
'''
lr_0: initial learning rate
k: Exponential Decay rate
t: iteration number
'''
lr = lr_0 * np.exp(-k*t)
return lr
Utility function#
This class contains several utility functions such as one-hot vector, label encoder, normalization, etc
class Utility:
def __init__(self):
pass
def label_encoding(self, Y):
'''
Parameters:
Y: (m,d) shape matrix with categorical data
Return
result: label encoded data of 𝑌
idx_list: list of the dictionaries containing the unique values
of the columns and their mapping to the integer.
'''
idx_list = []
result = []
for col in range(Y.shape[1]):
indexes = {val: idx for idx, val in enumerate(np.unique(Y[:, col]))}
result.append([indexes[s] for s in Y[:, col]])
idx_list.append(indexes)
return np.array(result).T, idx_list
def onehot(self, X):
'''
Parameters:
X: 1D array of labels of length "m"
Return
X_onehot: (m,d) one hot encoded matrix (one-hot of X)
(where d is the number of unique values in X)
indexes: dictionary containing the unique values of X and their mapping to the integer column
'''
indexes = {val: idx for idx, val in enumerate(np.unique(X))}
y = np.array([indexes[s] for s in X])
X_onehot = np.zeros((y.size, len(indexes)))
X_onehot[np.arange(y.size), y] = 1
return X_onehot, indexes
def minmax(self, X, min_X=None, max_X=None):
if min_X is None:
min_X = np.min(X, axis=0)
if max_X is None:
max_X = np.max(X, axis=0)
Z = (X - min_X) / (max_X - min_X)
return Z, min_X, max_X
def standardize(self, X, mu=None, std=None):
if mu is None:
mu = np.mean(X, axis=0)
if std is None:
std = np.std(X, axis=0)
Z = (X - mu) / std
return Z, mu, std
def inv_standardize(self, Z, mu, std):
X = Z*std + mu
return X
def train_test_split(self, X, y, test_ratio=0.2, seed=None):
if seed is not None:
np.random.seed(seed)
train_ratio = 1-test_ratio
indices = np.random.permutation(X.shape[0])
train_idx, test_idx = indices[:int(train_ratio*len(X))], indices[int(train_ratio*len(X)):]
X_train, X_test = X[train_idx,:], X[test_idx,:]
y_train, y_test = y[train_idx], y[test_idx]
return X_train, X_test, y_train, y_test
Weights initializer class#
class Weights_initializer:
def __init__(self, shape, initializer_type=None, seed=None):
'''
Parameters
shape: Shape of the weight matrix
initializer_type: type of weight initializer
available options are 'zeros', 'ones', 'random_normal', 'random_uniform',
'he_normal', 'xavier_normal' and 'glorot_normal'
'''
self.shape = shape
if initializer_type is None:
self.initializer_type = "he_normal"
else:
self.initializer_type = initializer_type
self.seed = seed
def zeros_initializer(self):
if self.seed is not None:
np.random.seed(self.seed)
return np.zeros(self.shape)
def ones_initializer(self):
if self.seed is not None:
np.random.seed(self.seed)
return np.ones(self.shape)
def random_normal_initializer(self):
if self.seed is not None:
np.random.seed(self.seed)
return np.random.normal(size=self.shape)
def random_uniform_initializer(self):
if self.seed is not None:
np.random.seed(self.seed)
return np.random.uniform(size=self.shape)
def he_initializer(self):
if self.seed is not None:
np.random.seed(self.seed)
s0, s1 = self.shape
return np.random.randn(s0, s1) * np.sqrt(2/s0)
def xavier_initializer(self):
'''
shape: Shape of the weight matrix.
'''
if self.seed is not None:
np.random.seed(self.seed)
s0, s1 = self.shape
return np.random.randn(s0, s1) * np.sqrt(1/s0)
def glorot_initializer(self):
'''
shape: Shape of the weight matrix.
'''
if self.seed is not None:
np.random.seed(self.seed)
s0, s1 = self.shape
return np.random.randn(s0, s1) * np.sqrt(2/(s0+s1))
def get_initializer(self):
if self.initializer_type == 'zeros':
return self.zeros_initializer()
elif self.initializer_type == 'ones':
return self.ones_initializer()
elif self.initializer_type == 'random_normal':
return self.random_normal_initializer()
elif self.initializer_type == 'random_uniform':
return self.random_uniform_initializer()
elif self.initializer_type == 'he_normal':
return self.he_initializer()
elif self.initializer_type == 'xavier_normal':
return self.xavier_initializer()
elif self.initializer_type == 'glorot_normal':
return self.glorot_initializer()
else:
raise ValueError("Valid initializer options are 'zeros', 'ones', 'random_normal', 'random_uniform', 'he_normal', 'xavier_normal', and 'glorot_normal'")
Dense class#
Dense class implements the operation:
where activation \(f(.)\) is used if specified, else we do not use it. \(W\) is a weights matrix created by the Dense layer based on type of initialization (link to previous chapter) provided, and \(b\) is a bias vector created by the layer (only applicable if use_bias is True). These are all attributes of Dense.
class Dense:
def __init__(self, neurons, activation_type=None, use_bias=True,
weight_initializer_type=None, weight_regularizer=None, seed=None, input_dim=None):
'''
Parameters:
neurons: Positive integer (number of neurons), dimensionality of the output
activation_type: type of activation
available options are 'sigmoid', 'linear', 'tanh', 'softmax', 'prelu' and 'relu'
If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
use_bias: Boolean, whether the layer uses a bias vector.
weight_initializer_type: Initializer for the kernel weights matrix.
weight_regularizer: Tuple, Regularizer function applied to the weights matrix ('L2', 0.01) or ('L1', 2)
seed: To generate reproducable results
input_dim: integer showing number of neurons in input layer
'''
self.neurons = neurons
self.activation = Activation(activation_type=activation_type)
self.use_bias = use_bias
self.weight_initializer_type = weight_initializer_type # none is handled
if weight_regularizer is None:
self.weight_regularizer = ('L2', 0)
else:
self.weight_regularizer = weight_regularizer
self.seed = seed
self.input_dim = input_dim
def initialize_parameters(self, hl, optimizer_type):
'''
hl: Number of neurons in layer l-1
'''
shape_W = (hl, self.neurons)
shape_b = (self.neurons, 1)
initializer = Weights_initializer(shape=shape_W,
initializer_type=self.weight_initializer_type,
seed=self.seed)
self.W = initializer.get_initializer()
self.b = np.zeros(shape_b)
self.optimizer = Optimizer(optimizer_type=optimizer_type, shape_W=shape_W, shape_b=shape_b)
def forward(self, X):
self.X = X
r = X @ self.W
self.z = r + self.b.T
a = self.activation.forward(self.z)
return a
def backpropagation(self, da):
dz = self.activation.backpropagation(da)
dr = dz.copy()
self.db = np.sum(dz, axis=0).reshape(-1,1)
self.dW = (self.X.T) @ dr
dX = dr @ (self.W.T)
return dX
def update(self, lr, m, k):
'''
Parameters:
lr: learning rate
m: batch_size (sumber of samples in batch)
k: iteration_number
'''
dW, db = self.optimizer.get_optimization(self.dW, self.db, k)
if self.weight_regularizer[0].lower()=='l2':
dW += self.weight_regularizer[1] * self.W
elif self.weight_regularizer[0].lower()=='l1':
dW += self.weight_regularizer[1] * np.sign(self.W)
self.W -= dW*(lr/m)
if self.use_bias:
self.b -= db*(lr/m)
Dropout class#
This class will perform forward and backpropagation for a Dropout layer
class Dropout:
def __init__(self, p):
'''
Parameters
p: Dropout probability
'''
self.p = p
if self.p == 0:
self.p += 1e-6
if self.p == 1:
self.p -= 1e-6
def forward(self, X):
self.mask = (np.random.rand(*X.shape) < self.p) / self.p
Z = X * self.mask
return Z
def backpropagation(self, dZ):
dX = dZ * self.mask
return dX
Batch Normalization class#
This class will perform forward and backpropagation for Batch Normalization layer
Note: We will initialise \(\gamma\) as ones and \(\beta\) as zeroes so that the output of the linear batch-norm transformation initially follows the standard zero-mean unit-variance normal distribution. This provides a normalised starting point, for which the model can update the \(\gamma\) and \(\beta\) to scale and shift the distribution(s) of each input accordingly (for the current layer).
Forward pass
eps
represents: \(\epsilon\)
mu
represents: \(\mu\)
var
represents: \(\sigma^2\)
zmu
represents: \(\bar{z_l}\)
ivar
represents: \(\frac{1}{\sqrt{\sigma^2 + \epsilon}}\)
zhat
represents: \(\hat{z_l}\)
q
represents: \(q_l\)
Backpropagation
This dq
variable below represents \(\frac{\partial J}{\partial q_l}\)
dgamma
represents: \(\frac{\partial J}{\partial \gamma}\)
dbeta
represents: \(\frac{\partial J}{\partial \beta}\)
dzhat
represents: \(\frac{\partial J}{\partial \hat{z_l}}\)
dvar
represents: \(\frac{\partial J}{\partial \sigma^2}\)
dmu
represents: \(\frac{\partial J}{\partial \mu}\)
dz
represents: \(\frac{\partial J}{\partial z_l}\)
class BatchNormalization:
def __init__(self, momentum=0.9, epsilon=1e-6):
'''
Parameters
momentum: Momentum for the moving average
epsilon: 𝜖, Small float added to variance to avoid dividing by zero
'''
self.epsilon = epsilon
self.momentum = momentum
def initialize_parameters(self, d):
'''
d: Shape of input to BN layer
'''
self.gamma = np.ones((d))
self.beta = np.zeros((d))
self.running_mean = np.zeros((d))
self.running_var = np.zeros((d))
def forward(self, z, mode='train'):
'''
z: Input to BN layer
mode: forward pass used for train or test
'''
if mode=='train':
self.m, self.d = z.shape
self.mu = np.mean(z, axis = 0) # 𝜇
self.var = np.var(z, axis=0) # 𝜎^2
self.zmu = z - self.mu # z - 𝜇
self.ivar = 1 / np.sqrt(self.var + self.epsilon) # 𝜎𝑖𝑛𝑣
self.zhat = self.zmu * self.ivar
q = self.gamma*self.zhat + self.beta # ql
self.running_mean = self.momentum * self.running_mean + (1 - self.momentum) * self.mu
self.running_var = self.momentum * self.running_var + (1 - self.momentum) * self.var
elif mode=='test':
q = (z - self.running_mean) / np.sqrt(self.running_var + self.epsilon)
q = self.gamma*q + self.beta
else:
raise ValueError('Invalid forward batchnorm mode "%s"' % mode)
return q
def backpropagation(self, dq):
self.dgamma = np.sum(dq * self.zhat, axis=0)
self.dbeta = np.sum(dq, axis=0)
dzhat = dq * self.gamma
dvar = np.sum(dzhat * self.zmu * (-.5) * (self.ivar**3), axis=0)
dmu = np.sum(dzhat * (-self.ivar), axis=0)
dz = dzhat * self.ivar + dvar * (2/self.m) * self.zmu + (1/self.m)*dmu
return dz
def update(self, lr, m, k):
'''
Parameters:
lr: learning rate
m: batch_size (sumber of samples in batch)
k: iteration_number
'''
self.gamma -= self.dgamma*(lr/m)
self.beta -= self.dbeta*(lr/m)
MLP#
This class finally contains the compile, summary, fit, predict, etc methods for executing our MLP model.
class MLP:
def __init__(self, layers=None):
'''
This is a sequential MLP model
'''
if layers is None:
self.layers = []
else:
self.layers = layers
self.network_architecture_called = False
def add(self, layer):
# adds a layer to MLP model
self.layers.append(layer)
def Input(self, input_dim):
'''
input_dim: integer showing number of neurons in input layer
'''
self.d = input_dim
self.architecture = [self.d]
self.layer_name = ["Input"]
def network_architecture(self):
for layer in self.layers:
if layer.__class__.__name__=='Dense':
if layer.input_dim is not None:
self.Input(layer.input_dim)
self.architecture.append(layer.neurons)
self.layer_name.append(layer.__class__.__name__)
else:
self.architecture.append(self.architecture[-1])
self.layer_name.append(layer.__class__.__name__)
def summary(self):
if self.network_architecture_called==False:
self.network_architecture()
self.network_architecture_called = True
len_assigned = [45, 26, 15]
count = {'Dense': 1, 'Activation': 1, 'Input': 1,
'BatchNormalization': 1, 'Dropout': 1}
col_names = ['Layer (type)', 'Output Shape', '# of Parameters']
print("Model: MLP")
print('-'*sum(len_assigned))
text = ''
for i in range(3):
text += col_names[i] + ' '*(len_assigned[i]-len(col_names[i]))
print(text)
print('='*sum(len_assigned))
total_params = 0
trainable_params = 0
non_trainable_params = 0
for i in range(len(self.layer_name)):
# layer name
layer_name = self.layer_name[i]
name = layer_name.lower() + '_' + str(count[layer_name]) + ' ' + '(' + layer_name + ')'
count[layer_name] += 1
# output shape
out = '(None, ' + str(self.architecture[i]) + ')'
# number of params
if layer_name=='Dense':
h0 = self.architecture[i-1]
h1 = self.architecture[i]
if self.layers[i-1].use_bias:
params = h0*h1 + h1
else:
params = h0*h1
total_params += params
trainable_params += params
elif layer_name=='BatchNormalization':
h = self.architecture[i]
params = 4*h
trainable_params += 2*h
non_trainable_params += 2*h
total_params += params
else:
params = 0
names = [name, out, str(params)]
# print this row
text = ''
for j in range(3):
text += names[j] + ' '*(len_assigned[j]-len(names[j]))
print(text)
if i!=(len(self.layer_name)-1):
print('-'*sum(len_assigned))
else:
print('='*sum(len_assigned))
print("Total params:", total_params)
print("Trainable params:", trainable_params)
print("Non-trainable params:", non_trainable_params)
print('-'*sum(len_assigned))
def compile(self, cost_type, optimizer_type):
self.cost = Cost(cost_type)
self.cost_type = cost_type
self.optimizer_type = optimizer_type
def initialize_parameters(self):
if self.network_architecture_called==False:
self.network_architecture()
self.network_architecture_called = True
# initialize parameters for different layers
for i, layer in enumerate(self.layers):
if layer.__class__.__name__=='Dense':
layer.initialize_parameters(self.architecture[i], self.optimizer_type)
elif layer.__class__.__name__=='BatchNormalization':
layer.initialize_parameters(self.architecture[i])
def fit(self, X, y, epochs=10, batch_size=5, lr=1, X_val=None, y_val=None, verbose=1, lr_decay=None, **kwargs):
self.history = {'Training Loss': [],
'Validation Loss': [],
'Training Accuracy': [],
'Validation Accuracy': []}
iterations = 0
self.m = batch_size
self.initialize_parameters()
self.layers = [layer for layer in self.layers if layer is not None]
for epoch in range(epochs):
cost_train = 0
num_batches = 0
y_pred_train = []
y_train = []
print('Epoch: ' + str(epoch+1) + '/' + str(epochs), end=' ')
for i in tqdm(range(0, len(X), batch_size)):
X_batch = X[i:i+batch_size]
y_batch = y[i:i+batch_size]
Z = X_batch.copy()
# feed-forward
for layer in self.layers:
Z = layer.forward(Z)
# calculating training accuracy
if self.cost_type=='cross-entropy':
y_pred_train += np.argmax(Z, axis=1).tolist()
y_train += np.argmax(y_batch, axis=1).tolist()
# calculating the loss
cost_train += self.cost.get_cost(Z, y_batch) / self.m
# calculating dL/daL (last layer backprop error)
dZ = self.cost.get_d_cost(Z, y_batch)
# backpropagation
for layer in self.layers[::-1]:
dZ = layer.backpropagation(dZ)
# Parameters update
for layer in self.layers:
if layer.__class__.__name__==('Dense' or 'BatchNormalization'):
layer.update(lr, self.m, iterations)
# Learning rate decay
if lr_decay is not None:
lr = lr_decay(iterations, **kwargs)
num_batches += 1
iterations += 1
cost_train /= num_batches
# printing purpose only (Training Accuracy, Validation loss and accuracy)
text = 'Training Loss: ' + str(round(cost_train, 4)) + ' - '
self.history['Training Loss'].append(cost_train)
# training accuracy
if self.cost_type=='cross-entropy':
accuracy_train = np.sum(np.array(y_pred_train) == np.array(y_train)) / len(y_train)
text += 'Training Accuracy: ' + str(round(accuracy_train, 4))
self.history['Training Accuracy'].append(accuracy_train)
else:
text += 'Training Accuracy: ' + str(round(cost_train, 4))
self.history['Training Accuracy'].append(cost_train)
if X_val is not None:
cost_val, accuracy_val = self.evaluate(X_val, y_val, batch_size)
text += ' - Validation Loss: ' + str(round(cost_val, 4)) + ' - '
self.history['Validation Loss'].append(cost_val)
text += 'Validation Accuracy: ' + str(round(accuracy_val, 4))
self.history['Validation Accuracy'].append(accuracy_val)
if verbose:
print(text)
else:
print()
def evaluate(self, X, y, batch_size=None):
if batch_size is None:
batch_size = len(X)
cost = 0
correct = 0
num_batches = 0
utility = Utility()
Y_1hot, _ = utility.onehot(y)
for i in tqdm(range(0, len(X), batch_size)):
X_batch = X[i:i+batch_size]
y_batch = y[i:i+batch_size]
Y_1hot_batch = Y_1hot[i:i+batch_size]
Z = X_batch.copy()
for layer in self.layers:
if layer.__class__.__name__=='BatchNormalization':
Z = layer.forward(Z, mode='test')
else:
Z = layer.forward(Z)
if self.cost_type=='cross-entropy':
cost += self.cost.get_cost(Z, Y_1hot_batch) / len(y_batch)
y_pred = np.argmax(Z, axis=1).tolist()
correct += np.sum(y_pred == y_batch)
else:
cost += self.cost.get_cost(Z, y_batch) / len(y_batch)
num_batches += 1
if self.cost_type=='cross-entropy':
accuracy = correct / len(y)
cost /= num_batches
return cost, accuracy
else:
cost /= num_batches
return cost, cost
def loss_plot(self):
plt.plot(self.history['Training Loss'], 'k')
if len(self.history['Validation Loss'])>0:
plt.plot(self.history['Validation Loss'], 'r')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.title('Model Loss')
else:
plt.title('Training Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.show()
def accuracy_plot(self):
plt.plot(self.history['Training Accuracy'], 'k')
if len(self.history['Validation Accuracy'])>0:
plt.plot(self.history['Validation Accuracy'], 'r')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.title('Model Accuracy')
else:
plt.title('Training Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.show()
def predict(self, X, batch_size=None):
if batch_size==None:
batch_size = len(X)
for i in range(0, len(X), batch_size):
X_batch = X[i:i+batch_size]
Z = X_batch.copy()
for layer in self.layers:
if layer.__class__.__name__=='BatchNormalization':
Z = layer.forward(Z, mode='test')
else:
Z = layer.forward(Z)
if i==0:
if self.cost_type=='cross-entropy':
y_pred = np.argmax(Z, axis=1).tolist()
else:
y_pred = Z
else:
if self.cost_type=='cross-entropy':
y_pred += np.argmax(Z, axis=1).tolist()
else:
y_pred = np.vstack((y_pred, Z))
return np.array(y_pred)
Validating model using Iris Dataset#
Check this page (link to an external website) to know more about Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
utility = Utility()
# train test split
X_train, X_test, y_train, y_test = utility.train_test_split(X, y, test_ratio=0.3, seed=42)
# standardize train data
X_train_std, mu_X_train, std_X_train = utility.standardize(X_train)
# use mean and std of train to standardize test data
X_test_std, _, _ = utility.standardize(X_test, mu_X_train, std_X_train)
# train validation split
X_train_new, X_val, y_train_new, y_val = utility.train_test_split(X_train_std, y_train, test_ratio=0.2, seed=42)
Y_1hot_train, _ = utility.onehot(y_train_new)
lr, epochs, batch_size = 0.1, 100, 2
input_dim = X_train_new.shape[1]
output_dim = Y_1hot_train.shape[1]
Building the MLP model
model = MLP()
model.add(Dense(neurons=6,
activation_type='relu',
input_dim=input_dim,
seed=42,
weight_initializer_type='random_normal'))
model.add(Dense(neurons=output_dim,
activation_type='softmax',
weight_initializer_type='random_normal',
seed=42))
Printing the model summary (description).
model.summary()
Model: MLP
--------------------------------------------------------------------------------------
Layer (type) Output Shape # of Parameters
======================================================================================
input_1 (Input) (None, 4) 0
--------------------------------------------------------------------------------------
dense_1 (Dense) (None, 6) 30
--------------------------------------------------------------------------------------
dense_2 (Dense) (None, 3) 21
======================================================================================
Total params: 51
Trainable params: 51
Non-trainable params: 0
--------------------------------------------------------------------------------------
Compiling the MLP model (that is adding the cost and optimizer types)
model.compile(cost_type='cross-entropy', optimizer_type='gd')
# adding learning rate decay
LR_decay = LearningRateDecay()
# training the data
model.fit(X_train_new, Y_1hot_train, epochs=epochs, batch_size=batch_size, lr=lr, X_val=X_val, y_val=y_val, verbose=1,
lr_decay=LR_decay.time_decay, lr_0=lr, k=lr/epochs)
Epoch: 1/100
100%|██████████| 42/42 [00:00<00:00, 1696.02it/s]
100%|██████████| 11/11 [00:00<00:00, 3224.58it/s]
Training Loss: 0.7243 - Training Accuracy: 0.5476 - Validation Loss: 0.4471 - Validation Accuracy: 0.7143
Epoch: 2/100
100%|██████████| 42/42 [00:00<00:00, 1328.80it/s]
100%|██████████| 11/11 [00:00<00:00, 1949.44it/s]
Training Loss: 0.4963 - Training Accuracy: 0.7381 - Validation Loss: 0.4185 - Validation Accuracy: 0.7143
Epoch: 3/100
100%|██████████| 42/42 [00:00<00:00, 1539.25it/s]
100%|██████████| 11/11 [00:00<00:00, 3577.65it/s]
Training Loss: 0.4477 - Training Accuracy: 0.7619 - Validation Loss: 0.3932 - Validation Accuracy: 0.7143
Epoch: 4/100
100%|██████████| 42/42 [00:00<00:00, 1610.76it/s]
100%|██████████| 11/11 [00:00<00:00, 3512.55it/s]
Training Loss: 0.4178 - Training Accuracy: 0.7738 - Validation Loss: 0.3741 - Validation Accuracy: 0.7619
Epoch: 5/100
100%|██████████| 42/42 [00:00<00:00, 1818.17it/s]
100%|██████████| 11/11 [00:00<00:00, 4690.66it/s]
Training Loss: 0.3956 - Training Accuracy: 0.7857 - Validation Loss: 0.3524 - Validation Accuracy: 0.7619
Epoch: 6/100
100%|██████████| 42/42 [00:00<00:00, 2114.52it/s]
100%|██████████| 11/11 [00:00<00:00, 3576.82it/s]
Training Loss: 0.3789 - Training Accuracy: 0.7857 - Validation Loss: 0.3414 - Validation Accuracy: 0.7619
Epoch: 7/100
100%|██████████| 42/42 [00:00<00:00, 1132.53it/s]
100%|██████████| 11/11 [00:00<00:00, 1941.40it/s]
Training Loss: 0.3638 - Training Accuracy: 0.7857 - Validation Loss: 0.3285 - Validation Accuracy: 0.7619
Epoch: 8/100
100%|██████████| 42/42 [00:00<00:00, 1445.35it/s]
100%|██████████| 11/11 [00:00<00:00, 3676.58it/s]
Training Loss: 0.3509 - Training Accuracy: 0.7976 - Validation Loss: 0.3144 - Validation Accuracy: 0.7619
Epoch: 9/100
100%|██████████| 42/42 [00:00<00:00, 1416.41it/s]
100%|██████████| 11/11 [00:00<00:00, 6378.73it/s]
Training Loss: 0.341 - Training Accuracy: 0.7976 - Validation Loss: 0.2954 - Validation Accuracy: 0.7619
Epoch: 10/100
100%|██████████| 42/42 [00:00<00:00, 1842.34it/s]
100%|██████████| 11/11 [00:00<00:00, 3710.28it/s]
Training Loss: 0.3293 - Training Accuracy: 0.7976 - Validation Loss: 0.2765 - Validation Accuracy: 0.8095
Epoch: 11/100
100%|██████████| 42/42 [00:00<00:00, 1579.80it/s]
100%|██████████| 11/11 [00:00<00:00, 2944.69it/s]
Training Loss: 0.3166 - Training Accuracy: 0.8095 - Validation Loss: 0.2604 - Validation Accuracy: 0.8095
Epoch: 12/100
100%|██████████| 42/42 [00:00<00:00, 1642.69it/s]
100%|██████████| 11/11 [00:00<00:00, 2689.60it/s]
Training Loss: 0.3059 - Training Accuracy: 0.8214 - Validation Loss: 0.2412 - Validation Accuracy: 0.8095
Epoch: 13/100
100%|██████████| 42/42 [00:00<00:00, 1941.38it/s]
100%|██████████| 11/11 [00:00<00:00, 2120.28it/s]
Training Loss: 0.2911 - Training Accuracy: 0.8095 - Validation Loss: 0.2254 - Validation Accuracy: 0.8095
Epoch: 14/100
100%|██████████| 42/42 [00:00<00:00, 1719.21it/s]
100%|██████████| 11/11 [00:00<00:00, 3017.48it/s]
Training Loss: 0.2792 - Training Accuracy: 0.8452 - Validation Loss: 0.205 - Validation Accuracy: 0.9048
Epoch: 15/100
100%|██████████| 42/42 [00:00<00:00, 1226.87it/s]
100%|██████████| 11/11 [00:00<00:00, 2623.83it/s]
Training Loss: 0.2624 - Training Accuracy: 0.8452 - Validation Loss: 0.1882 - Validation Accuracy: 0.9048
Epoch: 16/100
100%|██████████| 42/42 [00:00<00:00, 1336.61it/s]
100%|██████████| 11/11 [00:00<00:00, 1518.47it/s]
Training Loss: 0.2467 - Training Accuracy: 0.8452 - Validation Loss: 0.1743 - Validation Accuracy: 0.9524
Epoch: 17/100
100%|██████████| 42/42 [00:00<00:00, 1574.97it/s]
100%|██████████| 11/11 [00:00<00:00, 3349.84it/s]
Training Loss: 0.2319 - Training Accuracy: 0.869 - Validation Loss: 0.1616 - Validation Accuracy: 1.0
Epoch: 18/100
100%|██████████| 42/42 [00:00<00:00, 1268.37it/s]
100%|██████████| 11/11 [00:00<00:00, 3029.57it/s]
Training Loss: 0.216 - Training Accuracy: 0.8929 - Validation Loss: 0.1504 - Validation Accuracy: 1.0
Epoch: 19/100
100%|██████████| 42/42 [00:00<00:00, 1463.66it/s]
100%|██████████| 11/11 [00:00<00:00, 2586.17it/s]
Training Loss: 0.2019 - Training Accuracy: 0.9167 - Validation Loss: 0.1415 - Validation Accuracy: 0.9524
Epoch: 20/100
100%|██████████| 42/42 [00:00<00:00, 1194.52it/s]
100%|██████████| 11/11 [00:00<00:00, 3067.44it/s]
Training Loss: 0.1874 - Training Accuracy: 0.9286 - Validation Loss: 0.1347 - Validation Accuracy: 0.9524
Epoch: 21/100
100%|██████████| 42/42 [00:00<00:00, 932.19it/s]
100%|██████████| 11/11 [00:00<00:00, 3359.84it/s]
Training Loss: 0.1738 - Training Accuracy: 0.9286 - Validation Loss: 0.1289 - Validation Accuracy: 0.9524
Epoch: 22/100
100%|██████████| 42/42 [00:00<00:00, 1300.88it/s]
100%|██████████| 11/11 [00:00<00:00, 6565.72it/s]
Training Loss: 0.1613 - Training Accuracy: 0.9405 - Validation Loss: 0.1245 - Validation Accuracy: 0.9524
Epoch: 23/100
100%|██████████| 42/42 [00:00<00:00, 1989.93it/s]
100%|██████████| 11/11 [00:00<00:00, 978.48it/s]
Training Loss: 0.1497 - Training Accuracy: 0.9524 - Validation Loss: 0.1209 - Validation Accuracy: 0.9524
Epoch: 24/100
100%|██████████| 42/42 [00:00<00:00, 695.94it/s]
100%|██████████| 11/11 [00:00<00:00, 3587.94it/s]
Training Loss: 0.1397 - Training Accuracy: 0.9524 - Validation Loss: 0.1182 - Validation Accuracy: 0.9524
Epoch: 25/100
100%|██████████| 42/42 [00:00<00:00, 936.68it/s]
100%|██████████| 11/11 [00:00<00:00, 1873.60it/s]
Training Loss: 0.1305 - Training Accuracy: 0.9524 - Validation Loss: 0.1159 - Validation Accuracy: 0.9524
Epoch: 26/100
100%|██████████| 42/42 [00:00<00:00, 1181.92it/s]
100%|██████████| 11/11 [00:00<00:00, 2746.43it/s]
Training Loss: 0.1227 - Training Accuracy: 0.9524 - Validation Loss: 0.1141 - Validation Accuracy: 0.9524
Epoch: 27/100
100%|██████████| 42/42 [00:00<00:00, 1224.18it/s]
100%|██████████| 11/11 [00:00<00:00, 2292.77it/s]
Training Loss: 0.1159 - Training Accuracy: 0.9524 - Validation Loss: 0.1124 - Validation Accuracy: 0.9524
Epoch: 28/100
100%|██████████| 42/42 [00:00<00:00, 1508.21it/s]
100%|██████████| 11/11 [00:00<00:00, 3057.48it/s]
Training Loss: 0.11 - Training Accuracy: 0.9524 - Validation Loss: 0.1114 - Validation Accuracy: 0.9524
Epoch: 29/100
100%|██████████| 42/42 [00:00<00:00, 980.94it/s]
100%|██████████| 11/11 [00:00<00:00, 3299.30it/s]
Training Loss: 0.105 - Training Accuracy: 0.9524 - Validation Loss: 0.1102 - Validation Accuracy: 0.9524
Epoch: 30/100
100%|██████████| 42/42 [00:00<00:00, 1768.63it/s]
100%|██████████| 11/11 [00:00<00:00, 2317.88it/s]
Training Loss: 0.1007 - Training Accuracy: 0.9524 - Validation Loss: 0.1092 - Validation Accuracy: 0.9524
Epoch: 31/100
100%|██████████| 42/42 [00:00<00:00, 1441.83it/s]
100%|██████████| 11/11 [00:00<00:00, 3642.61it/s]
Training Loss: 0.0968 - Training Accuracy: 0.9524 - Validation Loss: 0.1085 - Validation Accuracy: 0.9524
Epoch: 32/100
100%|██████████| 42/42 [00:00<00:00, 1949.05it/s]
100%|██████████| 11/11 [00:00<00:00, 3954.86it/s]
Training Loss: 0.0935 - Training Accuracy: 0.9524 - Validation Loss: 0.1077 - Validation Accuracy: 0.9524
Epoch: 33/100
100%|██████████| 42/42 [00:00<00:00, 1533.23it/s]
100%|██████████| 11/11 [00:00<00:00, 3134.97it/s]
Training Loss: 0.0907 - Training Accuracy: 0.9524 - Validation Loss: 0.1069 - Validation Accuracy: 0.9524
Epoch: 34/100
100%|██████████| 42/42 [00:00<00:00, 1922.56it/s]
100%|██████████| 11/11 [00:00<00:00, 3718.05it/s]
Training Loss: 0.088 - Training Accuracy: 0.9524 - Validation Loss: 0.1063 - Validation Accuracy: 0.9524
Epoch: 35/100
100%|██████████| 42/42 [00:00<00:00, 1512.85it/s]
100%|██████████| 11/11 [00:00<00:00, 6179.66it/s]
Training Loss: 0.0858 - Training Accuracy: 0.9524 - Validation Loss: 0.1057 - Validation Accuracy: 0.9524
Epoch: 36/100
100%|██████████| 42/42 [00:00<00:00, 1323.05it/s]
100%|██████████| 11/11 [00:00<00:00, 2013.94it/s]
Training Loss: 0.0838 - Training Accuracy: 0.9524 - Validation Loss: 0.1051 - Validation Accuracy: 0.9524
Epoch: 37/100
100%|██████████| 42/42 [00:00<00:00, 1647.56it/s]
100%|██████████| 11/11 [00:00<00:00, 4246.42it/s]
Training Loss: 0.0819 - Training Accuracy: 0.9524 - Validation Loss: 0.1047 - Validation Accuracy: 0.9524
Epoch: 38/100
100%|██████████| 42/42 [00:00<00:00, 1147.62it/s]
100%|██████████| 11/11 [00:00<00:00, 3053.63it/s]
Training Loss: 0.0803 - Training Accuracy: 0.9524 - Validation Loss: 0.1043 - Validation Accuracy: 0.9524
Epoch: 39/100
100%|██████████| 42/42 [00:00<00:00, 1230.71it/s]
100%|██████████| 11/11 [00:00<00:00, 3138.17it/s]
Training Loss: 0.0788 - Training Accuracy: 0.9524 - Validation Loss: 0.1038 - Validation Accuracy: 0.9524
Epoch: 40/100
100%|██████████| 42/42 [00:00<00:00, 1428.59it/s]
100%|██████████| 11/11 [00:00<00:00, 3405.72it/s]
Training Loss: 0.0775 - Training Accuracy: 0.9524 - Validation Loss: 0.1035 - Validation Accuracy: 0.9524
Epoch: 41/100
100%|██████████| 42/42 [00:00<00:00, 1109.47it/s]
100%|██████████| 11/11 [00:00<00:00, 2639.59it/s]
Training Loss: 0.0763 - Training Accuracy: 0.9524 - Validation Loss: 0.1033 - Validation Accuracy: 0.9524
Epoch: 42/100
100%|██████████| 42/42 [00:00<00:00, 1152.96it/s]
100%|██████████| 11/11 [00:00<00:00, 1992.72it/s]
Training Loss: 0.0751 - Training Accuracy: 0.9643 - Validation Loss: 0.103 - Validation Accuracy: 0.9524
Epoch: 43/100
100%|██████████| 42/42 [00:00<00:00, 1632.75it/s]
100%|██████████| 11/11 [00:00<00:00, 3376.56it/s]
Training Loss: 0.0741 - Training Accuracy: 0.9643 - Validation Loss: 0.1027 - Validation Accuracy: 0.9524
Epoch: 44/100
100%|██████████| 42/42 [00:00<00:00, 949.39it/s]
100%|██████████| 11/11 [00:00<00:00, 3607.58it/s]
Training Loss: 0.0729 - Training Accuracy: 0.9762 - Validation Loss: 0.1025 - Validation Accuracy: 0.9524
Epoch: 45/100
100%|██████████| 42/42 [00:00<00:00, 1338.93it/s]
100%|██████████| 11/11 [00:00<00:00, 3029.57it/s]
Training Loss: 0.0721 - Training Accuracy: 0.9643 - Validation Loss: 0.1022 - Validation Accuracy: 0.9524
Epoch: 46/100
100%|██████████| 42/42 [00:00<00:00, 1768.90it/s]
100%|██████████| 11/11 [00:00<00:00, 5516.18it/s]
Training Loss: 0.0713 - Training Accuracy: 0.9762 - Validation Loss: 0.102 - Validation Accuracy: 0.9524
Epoch: 47/100
100%|██████████| 42/42 [00:00<00:00, 1429.64it/s]
100%|██████████| 11/11 [00:00<00:00, 2990.30it/s]
Training Loss: 0.0704 - Training Accuracy: 0.9762 - Validation Loss: 0.1018 - Validation Accuracy: 0.9524
Epoch: 48/100
100%|██████████| 42/42 [00:00<00:00, 1246.51it/s]
100%|██████████| 11/11 [00:00<00:00, 3700.16it/s]
Training Loss: 0.0697 - Training Accuracy: 0.9762 - Validation Loss: 0.1017 - Validation Accuracy: 0.9524
Epoch: 49/100
100%|██████████| 42/42 [00:00<00:00, 1826.67it/s]
100%|██████████| 11/11 [00:00<00:00, 3545.75it/s]
Training Loss: 0.069 - Training Accuracy: 0.9762 - Validation Loss: 0.1015 - Validation Accuracy: 0.9524
Epoch: 50/100
100%|██████████| 42/42 [00:00<00:00, 1519.86it/s]
100%|██████████| 11/11 [00:00<00:00, 2398.12it/s]
Training Loss: 0.0683 - Training Accuracy: 0.9762 - Validation Loss: 0.1014 - Validation Accuracy: 0.9524
Epoch: 51/100
100%|██████████| 42/42 [00:00<00:00, 1524.74it/s]
100%|██████████| 11/11 [00:00<00:00, 2952.79it/s]
Training Loss: 0.0676 - Training Accuracy: 0.9762 - Validation Loss: 0.1013 - Validation Accuracy: 0.9524
Epoch: 52/100
100%|██████████| 42/42 [00:00<00:00, 1692.55it/s]
100%|██████████| 11/11 [00:00<00:00, 3449.78it/s]
Training Loss: 0.0671 - Training Accuracy: 0.9762 - Validation Loss: 0.1012 - Validation Accuracy: 0.9524
Epoch: 53/100
100%|██████████| 42/42 [00:00<00:00, 1318.23it/s]
100%|██████████| 11/11 [00:00<00:00, 2431.22it/s]
Training Loss: 0.0665 - Training Accuracy: 0.9762 - Validation Loss: 0.1011 - Validation Accuracy: 0.9524
Epoch: 54/100
100%|██████████| 42/42 [00:00<00:00, 1106.04it/s]
100%|██████████| 11/11 [00:00<00:00, 4389.43it/s]
Training Loss: 0.066 - Training Accuracy: 0.9762 - Validation Loss: 0.101 - Validation Accuracy: 0.9524
Epoch: 55/100
100%|██████████| 42/42 [00:00<00:00, 1246.68it/s]
100%|██████████| 11/11 [00:00<00:00, 1540.84it/s]
Training Loss: 0.0655 - Training Accuracy: 0.9762 - Validation Loss: 0.101 - Validation Accuracy: 0.9524
Epoch: 56/100
100%|██████████| 42/42 [00:00<00:00, 1999.69it/s]
100%|██████████| 11/11 [00:00<00:00, 3671.60it/s]
Training Loss: 0.065 - Training Accuracy: 0.9762 - Validation Loss: 0.101 - Validation Accuracy: 0.9524
Epoch: 57/100
100%|██████████| 42/42 [00:00<00:00, 1114.23it/s]
100%|██████████| 11/11 [00:00<00:00, 2954.87it/s]
Training Loss: 0.0645 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 58/100
100%|██████████| 42/42 [00:00<00:00, 1547.73it/s]
100%|██████████| 11/11 [00:00<00:00, 3512.02it/s]
Training Loss: 0.0641 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 59/100
100%|██████████| 42/42 [00:00<00:00, 1205.51it/s]
100%|██████████| 11/11 [00:00<00:00, 1440.98it/s]
Training Loss: 0.0636 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 60/100
100%|██████████| 42/42 [00:00<00:00, 1362.30it/s]
100%|██████████| 11/11 [00:00<00:00, 3351.06it/s]
Training Loss: 0.0632 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 61/100
100%|██████████| 42/42 [00:00<00:00, 1317.98it/s]
100%|██████████| 11/11 [00:00<00:00, 3086.32it/s]
Training Loss: 0.0628 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 62/100
100%|██████████| 42/42 [00:00<00:00, 1411.48it/s]
100%|██████████| 11/11 [00:00<00:00, 2451.64it/s]
Training Loss: 0.0624 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 63/100
100%|██████████| 42/42 [00:00<00:00, 1108.47it/s]
100%|██████████| 11/11 [00:00<00:00, 2813.94it/s]
Training Loss: 0.0621 - Training Accuracy: 0.9762 - Validation Loss: 0.1007 - Validation Accuracy: 0.9524
Epoch: 64/100
100%|██████████| 42/42 [00:00<00:00, 1571.66it/s]
100%|██████████| 11/11 [00:00<00:00, 2762.88it/s]
Training Loss: 0.0617 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 65/100
100%|██████████| 42/42 [00:00<00:00, 1709.19it/s]
100%|██████████| 11/11 [00:00<00:00, 3021.44it/s]
Training Loss: 0.0613 - Training Accuracy: 0.9762 - Validation Loss: 0.1007 - Validation Accuracy: 0.9524
Epoch: 66/100
100%|██████████| 42/42 [00:00<00:00, 2204.27it/s]
100%|██████████| 11/11 [00:00<00:00, 2985.85it/s]
Training Loss: 0.061 - Training Accuracy: 0.9762 - Validation Loss: 0.1007 - Validation Accuracy: 0.9524
Epoch: 67/100
100%|██████████| 42/42 [00:00<00:00, 1494.52it/s]
100%|██████████| 11/11 [00:00<00:00, 3684.80it/s]
Training Loss: 0.0606 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 68/100
100%|██████████| 42/42 [00:00<00:00, 1260.67it/s]
100%|██████████| 11/11 [00:00<00:00, 3555.32it/s]
Training Loss: 0.0603 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 69/100
100%|██████████| 42/42 [00:00<00:00, 1493.12it/s]
100%|██████████| 11/11 [00:00<00:00, 3619.47it/s]
Training Loss: 0.06 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 70/100
100%|██████████| 42/42 [00:00<00:00, 1004.93it/s]
100%|██████████| 11/11 [00:00<00:00, 3521.93it/s]
Training Loss: 0.0597 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 71/100
100%|██████████| 42/42 [00:00<00:00, 1323.60it/s]
100%|██████████| 11/11 [00:00<00:00, 3709.98it/s]
Training Loss: 0.0594 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 72/100
100%|██████████| 42/42 [00:00<00:00, 1239.87it/s]
100%|██████████| 11/11 [00:00<00:00, 3717.46it/s]
Training Loss: 0.0592 - Training Accuracy: 0.9762 - Validation Loss: 0.1008 - Validation Accuracy: 0.9524
Epoch: 73/100
100%|██████████| 42/42 [00:00<00:00, 1470.56it/s]
100%|██████████| 11/11 [00:00<00:00, 3610.69it/s]
Training Loss: 0.0589 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 74/100
100%|██████████| 42/42 [00:00<00:00, 1068.01it/s]
100%|██████████| 11/11 [00:00<00:00, 3428.76it/s]
Training Loss: 0.0586 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 75/100
100%|██████████| 42/42 [00:00<00:00, 1231.45it/s]
100%|██████████| 11/11 [00:00<00:00, 3745.52it/s]
Training Loss: 0.0584 - Training Accuracy: 0.9762 - Validation Loss: 0.1009 - Validation Accuracy: 0.9524
Epoch: 76/100
100%|██████████| 42/42 [00:00<00:00, 990.10it/s]
100%|██████████| 11/11 [00:00<00:00, 2682.40it/s]
Training Loss: 0.0581 - Training Accuracy: 0.9762 - Validation Loss: 0.101 - Validation Accuracy: 0.9524
Epoch: 77/100
100%|██████████| 42/42 [00:00<00:00, 1286.95it/s]
100%|██████████| 11/11 [00:00<00:00, 3700.76it/s]
Training Loss: 0.0579 - Training Accuracy: 0.9762 - Validation Loss: 0.101 - Validation Accuracy: 0.9524
Epoch: 78/100
100%|██████████| 42/42 [00:00<00:00, 1294.64it/s]
100%|██████████| 11/11 [00:00<00:00, 3162.47it/s]
Training Loss: 0.0577 - Training Accuracy: 0.9762 - Validation Loss: 0.1011 - Validation Accuracy: 0.9524
Epoch: 79/100
100%|██████████| 42/42 [00:00<00:00, 1354.41it/s]
100%|██████████| 11/11 [00:00<00:00, 2436.36it/s]
Training Loss: 0.0575 - Training Accuracy: 0.9762 - Validation Loss: 0.1012 - Validation Accuracy: 0.9524
Epoch: 80/100
100%|██████████| 42/42 [00:00<00:00, 1412.87it/s]
100%|██████████| 11/11 [00:00<00:00, 3492.87it/s]
Training Loss: 0.0573 - Training Accuracy: 0.9762 - Validation Loss: 0.1012 - Validation Accuracy: 0.9524
Epoch: 81/100
100%|██████████| 42/42 [00:00<00:00, 994.74it/s]
100%|██████████| 11/11 [00:00<00:00, 3316.61it/s]
Training Loss: 0.057 - Training Accuracy: 0.9762 - Validation Loss: 0.1013 - Validation Accuracy: 0.9524
Epoch: 82/100
100%|██████████| 42/42 [00:00<00:00, 1455.02it/s]
100%|██████████| 11/11 [00:00<00:00, 2280.75it/s]
Training Loss: 0.0568 - Training Accuracy: 0.9762 - Validation Loss: 0.1014 - Validation Accuracy: 0.9524
Epoch: 83/100
100%|██████████| 42/42 [00:00<00:00, 872.16it/s]
100%|██████████| 11/11 [00:00<00:00, 1753.60it/s]
Training Loss: 0.0566 - Training Accuracy: 0.9762 - Validation Loss: 0.1014 - Validation Accuracy: 0.9524
Epoch: 84/100
100%|██████████| 42/42 [00:00<00:00, 1087.51it/s]
100%|██████████| 11/11 [00:00<00:00, 3690.99it/s]
Training Loss: 0.0564 - Training Accuracy: 0.9762 - Validation Loss: 0.1015 - Validation Accuracy: 0.9524
Epoch: 85/100
100%|██████████| 42/42 [00:00<00:00, 1272.57it/s]
100%|██████████| 11/11 [00:00<00:00, 3216.49it/s]
Training Loss: 0.0562 - Training Accuracy: 0.9762 - Validation Loss: 0.1015 - Validation Accuracy: 0.9524
Epoch: 86/100
100%|██████████| 42/42 [00:00<00:00, 1357.24it/s]
100%|██████████| 11/11 [00:00<00:00, 3529.21it/s]
Training Loss: 0.056 - Training Accuracy: 0.9762 - Validation Loss: 0.1016 - Validation Accuracy: 0.9524
Epoch: 87/100
100%|██████████| 42/42 [00:00<00:00, 1297.89it/s]
100%|██████████| 11/11 [00:00<00:00, 3681.86it/s]
Training Loss: 0.0558 - Training Accuracy: 0.9762 - Validation Loss: 0.1017 - Validation Accuracy: 0.9524
Epoch: 88/100
100%|██████████| 42/42 [00:00<00:00, 1313.88it/s]
100%|██████████| 11/11 [00:00<00:00, 3311.61it/s]
Training Loss: 0.0556 - Training Accuracy: 0.9762 - Validation Loss: 0.1017 - Validation Accuracy: 0.9524
Epoch: 89/100
100%|██████████| 42/42 [00:00<00:00, 1188.59it/s]
100%|██████████| 11/11 [00:00<00:00, 3735.21it/s]
Training Loss: 0.0555 - Training Accuracy: 0.9762 - Validation Loss: 0.1018 - Validation Accuracy: 0.9524
Epoch: 90/100
100%|██████████| 42/42 [00:00<00:00, 1236.75it/s]
100%|██████████| 11/11 [00:00<00:00, 3515.76it/s]
Training Loss: 0.0553 - Training Accuracy: 0.9762 - Validation Loss: 0.1019 - Validation Accuracy: 0.9524
Epoch: 91/100
100%|██████████| 42/42 [00:00<00:00, 1326.26it/s]
100%|██████████| 11/11 [00:00<00:00, 2466.84it/s]
Training Loss: 0.0551 - Training Accuracy: 0.9762 - Validation Loss: 0.1019 - Validation Accuracy: 0.9524
Epoch: 92/100
100%|██████████| 42/42 [00:00<00:00, 1371.21it/s]
100%|██████████| 11/11 [00:00<00:00, 3546.57it/s]
Training Loss: 0.0549 - Training Accuracy: 0.9762 - Validation Loss: 0.102 - Validation Accuracy: 0.9524
Epoch: 93/100
100%|██████████| 42/42 [00:00<00:00, 977.76it/s]
100%|██████████| 11/11 [00:00<00:00, 2847.81it/s]
Training Loss: 0.0548 - Training Accuracy: 0.9762 - Validation Loss: 0.1021 - Validation Accuracy: 0.9524
Epoch: 94/100
100%|██████████| 42/42 [00:00<00:00, 1411.37it/s]
100%|██████████| 11/11 [00:00<00:00, 2951.09it/s]
Training Loss: 0.0546 - Training Accuracy: 0.9762 - Validation Loss: 0.1022 - Validation Accuracy: 0.9524
Epoch: 95/100
100%|██████████| 42/42 [00:00<00:00, 1272.11it/s]
100%|██████████| 11/11 [00:00<00:00, 3932.27it/s]
Training Loss: 0.0544 - Training Accuracy: 0.9881 - Validation Loss: 0.1023 - Validation Accuracy: 0.9524
Epoch: 96/100
100%|██████████| 42/42 [00:00<00:00, 1125.43it/s]
100%|██████████| 11/11 [00:00<00:00, 4090.55it/s]
Training Loss: 0.0543 - Training Accuracy: 0.9881 - Validation Loss: 0.1023 - Validation Accuracy: 0.9524
Epoch: 97/100
100%|██████████| 42/42 [00:00<00:00, 1150.63it/s]
100%|██████████| 11/11 [00:00<00:00, 3517.10it/s]
Training Loss: 0.0541 - Training Accuracy: 0.9881 - Validation Loss: 0.1024 - Validation Accuracy: 0.9524
Epoch: 98/100
100%|██████████| 42/42 [00:00<00:00, 1514.18it/s]
100%|██████████| 11/11 [00:00<00:00, 3678.92it/s]
Training Loss: 0.054 - Training Accuracy: 0.9881 - Validation Loss: 0.1025 - Validation Accuracy: 0.9524
Epoch: 99/100
100%|██████████| 42/42 [00:00<00:00, 1033.43it/s]
100%|██████████| 11/11 [00:00<00:00, 2971.81it/s]
Training Loss: 0.0538 - Training Accuracy: 0.9881 - Validation Loss: 0.1026 - Validation Accuracy: 0.9524
Epoch: 100/100
100%|██████████| 42/42 [00:00<00:00, 1335.24it/s]
100%|██████████| 11/11 [00:00<00:00, 3297.17it/s]
Training Loss: 0.0537 - Training Accuracy: 0.9881 - Validation Loss: 0.1026 - Validation Accuracy: 0.9524
model.loss_plot()
model.accuracy_plot()
y_pred = model.predict(X_test_std, batch_size)
confusion_matrix(y_test, y_pred)
array([[10, 0, 0],
[ 0, 17, 0],
[ 0, 0, 18]])
Testing model.evaluate()
cost, accuracy = model.evaluate(X_test_std, y_test, batch_size)
print('\nTest Accuracy =', round(accuracy*100, 2))
100%|██████████| 23/23 [00:00<00:00, 2890.20it/s]
Test Accuracy = 100.0
Validating model using MNIST Dataset#
Check this page (link to an external website) to know more about MNIST dataset
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]**2)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]**2)
X_train = X_train/255
X_test = X_test/255
utility = Utility()
# train validation split
X_train_new, X_val, y_train_new, y_val = utility.train_test_split(X_train, y_train, test_ratio=0.2, seed=42)
Y_1hot_train, _ = utility.onehot(y_train_new)
lr, epochs, batch_size = 0.8, 60, 400
input_dim = X_train_new.shape[1]
output_dim = Y_1hot_train.shape[1]
model = MLP()
model.add(Dense(neurons=240,
activation_type='tanh',
input_dim=input_dim,
seed=42,
weight_initializer_type='xavier_normal'))
model.add(Dense(neurons=200,
activation_type='relu',
seed=42,
weight_initializer_type='he_normal'))
model.add(Dense(neurons=output_dim,
activation_type='softmax',
weight_initializer_type='random_normal',
seed=42))
model.summary()
Model: MLP
--------------------------------------------------------------------------------------
Layer (type) Output Shape # of Parameters
======================================================================================
input_1 (Input) (None, 784) 0
--------------------------------------------------------------------------------------
dense_1 (Dense) (None, 240) 188400
--------------------------------------------------------------------------------------
dense_2 (Dense) (None, 200) 48200
--------------------------------------------------------------------------------------
dense_3 (Dense) (None, 10) 2010
======================================================================================
Total params: 238610
Trainable params: 238610
Non-trainable params: 0
--------------------------------------------------------------------------------------
model.compile(cost_type='cross-entropy', optimizer_type='adam')
LR_decay = LearningRateDecay()
model.fit(X_train_new, Y_1hot_train, epochs=epochs, batch_size=batch_size, lr=lr, X_val=X_val, y_val=y_val, verbose=1,
lr_decay=LR_decay.constant, lr_0=lr)
Epoch: 1/60
100%|██████████| 120/120 [00:18<00:00, 6.53it/s]
Training Loss: 3.2244 - Training Accuracy: 0.7885 - Validation Loss: 2.6541 - Validation Accuracy: 0.8525
Epoch: 2/60
100%|██████████| 120/120 [00:13<00:00, 8.92it/s]
Training Loss: 2.6662 - Training Accuracy: 0.8632 - Validation Loss: 2.6166 - Validation Accuracy: 0.8615
Epoch: 3/60
100%|██████████| 120/120 [00:08<00:00, 14.10it/s]
Training Loss: 2.6223 - Training Accuracy: 0.8752 - Validation Loss: 2.6038 - Validation Accuracy: 0.8652
Epoch: 4/60
100%|██████████| 120/120 [00:08<00:00, 14.54it/s]
Training Loss: 2.5989 - Training Accuracy: 0.8809 - Validation Loss: 2.5878 - Validation Accuracy: 0.8704
Epoch: 5/60
100%|██████████| 120/120 [00:08<00:00, 14.34it/s]
Training Loss: 2.5792 - Training Accuracy: 0.8869 - Validation Loss: 2.5699 - Validation Accuracy: 0.8772
Epoch: 6/60
100%|██████████| 120/120 [00:10<00:00, 11.97it/s]
Training Loss: 2.5673 - Training Accuracy: 0.8903 - Validation Loss: 2.5693 - Validation Accuracy: 0.8774
Epoch: 7/60
100%|██████████| 120/120 [00:08<00:00, 14.39it/s]
Training Loss: 2.5645 - Training Accuracy: 0.8905 - Validation Loss: 2.565 - Validation Accuracy: 0.8791
Epoch: 8/60
100%|██████████| 120/120 [00:08<00:00, 14.46it/s]
Training Loss: 2.5584 - Training Accuracy: 0.892 - Validation Loss: 2.5678 - Validation Accuracy: 0.8802
Epoch: 9/60
100%|██████████| 120/120 [00:08<00:00, 14.07it/s]
Training Loss: 2.548 - Training Accuracy: 0.896 - Validation Loss: 2.5799 - Validation Accuracy: 0.8772
Epoch: 10/60
100%|██████████| 120/120 [00:08<00:00, 14.20it/s]
Training Loss: 2.5463 - Training Accuracy: 0.8961 - Validation Loss: 2.5883 - Validation Accuracy: 0.8777
Epoch: 11/60
100%|██████████| 120/120 [00:08<00:00, 14.53it/s]
Training Loss: 2.5483 - Training Accuracy: 0.8957 - Validation Loss: 2.6127 - Validation Accuracy: 0.8751
Epoch: 12/60
100%|██████████| 120/120 [00:08<00:00, 13.49it/s]
Training Loss: 2.5489 - Training Accuracy: 0.8956 - Validation Loss: 2.647 - Validation Accuracy: 0.8672
Epoch: 13/60
100%|██████████| 120/120 [00:09<00:00, 12.70it/s]
Training Loss: 2.5475 - Training Accuracy: 0.8953 - Validation Loss: 2.6164 - Validation Accuracy: 0.8731
Epoch: 14/60
100%|██████████| 120/120 [00:08<00:00, 14.43it/s]
Training Loss: 2.5484 - Training Accuracy: 0.8953 - Validation Loss: 2.6148 - Validation Accuracy: 0.8747
Epoch: 15/60
100%|██████████| 120/120 [00:08<00:00, 14.26it/s]
Training Loss: 2.5459 - Training Accuracy: 0.8971 - Validation Loss: 2.597 - Validation Accuracy: 0.8794
Epoch: 16/60
100%|██████████| 120/120 [00:08<00:00, 14.22it/s]
Training Loss: 2.5445 - Training Accuracy: 0.8973 - Validation Loss: 2.6307 - Validation Accuracy: 0.8767
Epoch: 17/60
100%|██████████| 120/120 [00:08<00:00, 14.39it/s]
Training Loss: 2.5425 - Training Accuracy: 0.8978 - Validation Loss: 2.6487 - Validation Accuracy: 0.8746
Epoch: 18/60
100%|██████████| 120/120 [00:08<00:00, 14.53it/s]
Training Loss: 2.5355 - Training Accuracy: 0.9004 - Validation Loss: 2.6066 - Validation Accuracy: 0.8814
Epoch: 19/60
100%|██████████| 120/120 [00:08<00:00, 14.42it/s]
Training Loss: 0.8253 - Training Accuracy: 0.9421 - Validation Loss: 0.1725 - Validation Accuracy: 0.9575
Epoch: 20/60
100%|██████████| 120/120 [00:08<00:00, 14.28it/s]
Training Loss: 0.0544 - Training Accuracy: 0.9829 - Validation Loss: 0.141 - Validation Accuracy: 0.9666
Epoch: 21/60
100%|██████████| 120/120 [00:09<00:00, 13.10it/s]
Training Loss: 0.0344 - Training Accuracy: 0.9881 - Validation Loss: 0.1514 - Validation Accuracy: 0.9652
Epoch: 22/60
100%|██████████| 120/120 [00:15<00:00, 7.81it/s]
Training Loss: 0.0245 - Training Accuracy: 0.9919 - Validation Loss: 0.1357 - Validation Accuracy: 0.97
Epoch: 23/60
100%|██████████| 120/120 [00:09<00:00, 12.37it/s]
Training Loss: 0.0193 - Training Accuracy: 0.993 - Validation Loss: 0.129 - Validation Accuracy: 0.9733
Epoch: 24/60
100%|██████████| 120/120 [00:08<00:00, 14.42it/s]
Training Loss: 0.0146 - Training Accuracy: 0.9942 - Validation Loss: 0.1589 - Validation Accuracy: 0.9694
Epoch: 25/60
100%|██████████| 120/120 [00:08<00:00, 14.35it/s]
Training Loss: 0.0166 - Training Accuracy: 0.9943 - Validation Loss: 0.1481 - Validation Accuracy: 0.9718
Epoch: 26/60
100%|██████████| 120/120 [00:08<00:00, 14.49it/s]
Training Loss: 0.0122 - Training Accuracy: 0.9957 - Validation Loss: 0.1388 - Validation Accuracy: 0.9726
Epoch: 27/60
100%|██████████| 120/120 [00:08<00:00, 14.49it/s]
Training Loss: 0.0109 - Training Accuracy: 0.9961 - Validation Loss: 0.1529 - Validation Accuracy: 0.9698
Epoch: 28/60
100%|██████████| 120/120 [00:08<00:00, 14.54it/s]
Training Loss: 0.0084 - Training Accuracy: 0.9971 - Validation Loss: 0.1318 - Validation Accuracy: 0.9743
Epoch: 29/60
100%|██████████| 120/120 [00:08<00:00, 14.64it/s]
Training Loss: 0.0079 - Training Accuracy: 0.9971 - Validation Loss: 0.1396 - Validation Accuracy: 0.9716
Epoch: 30/60
100%|██████████| 120/120 [00:08<00:00, 14.62it/s]
Training Loss: 0.0075 - Training Accuracy: 0.9973 - Validation Loss: 0.1491 - Validation Accuracy: 0.9729
Epoch: 31/60
100%|██████████| 120/120 [00:08<00:00, 14.50it/s]
Training Loss: 0.0083 - Training Accuracy: 0.9974 - Validation Loss: 0.1701 - Validation Accuracy: 0.9685
Epoch: 32/60
100%|██████████| 120/120 [00:08<00:00, 14.45it/s]
Training Loss: 0.0072 - Training Accuracy: 0.9976 - Validation Loss: 0.1486 - Validation Accuracy: 0.9718
Epoch: 33/60
100%|██████████| 120/120 [00:08<00:00, 14.35it/s]
Training Loss: 0.0041 - Training Accuracy: 0.9988 - Validation Loss: 0.1316 - Validation Accuracy: 0.9762
Epoch: 34/60
100%|██████████| 120/120 [00:08<00:00, 14.54it/s]
Training Loss: 0.0057 - Training Accuracy: 0.998 - Validation Loss: 0.1519 - Validation Accuracy: 0.9722
Epoch: 35/60
100%|██████████| 120/120 [00:08<00:00, 14.58it/s]
Training Loss: 0.0053 - Training Accuracy: 0.9984 - Validation Loss: 0.1398 - Validation Accuracy: 0.9751
Epoch: 36/60
100%|██████████| 120/120 [00:08<00:00, 14.35it/s]
Training Loss: 0.0113 - Training Accuracy: 0.9962 - Validation Loss: 0.186 - Validation Accuracy: 0.9691
Epoch: 37/60
100%|██████████| 120/120 [00:08<00:00, 14.48it/s]
Training Loss: 0.0184 - Training Accuracy: 0.994 - Validation Loss: 0.1747 - Validation Accuracy: 0.9704
Epoch: 38/60
100%|██████████| 120/120 [00:08<00:00, 14.38it/s]
Training Loss: 0.0278 - Training Accuracy: 0.9916 - Validation Loss: 0.1833 - Validation Accuracy: 0.9685
Epoch: 39/60
100%|██████████| 120/120 [00:08<00:00, 14.47it/s]
Training Loss: 0.0188 - Training Accuracy: 0.9944 - Validation Loss: 0.1983 - Validation Accuracy: 0.9665
Epoch: 40/60
100%|██████████| 120/120 [00:08<00:00, 14.48it/s]
Training Loss: 0.0245 - Training Accuracy: 0.9926 - Validation Loss: 0.2126 - Validation Accuracy: 0.9635
Epoch: 41/60
100%|██████████| 120/120 [00:08<00:00, 14.54it/s]
Training Loss: 0.0181 - Training Accuracy: 0.9943 - Validation Loss: 0.197 - Validation Accuracy: 0.967
Epoch: 42/60
100%|██████████| 120/120 [00:11<00:00, 10.90it/s]
Training Loss: 0.0111 - Training Accuracy: 0.9961 - Validation Loss: 0.1674 - Validation Accuracy: 0.9732
Epoch: 43/60
100%|██████████| 120/120 [00:18<00:00, 6.44it/s]
Training Loss: 0.0099 - Training Accuracy: 0.9967 - Validation Loss: 0.1901 - Validation Accuracy: 0.9728
Epoch: 44/60
100%|██████████| 120/120 [00:08<00:00, 14.48it/s]
Training Loss: 0.0129 - Training Accuracy: 0.9958 - Validation Loss: 0.236 - Validation Accuracy: 0.9662
Epoch: 45/60
100%|██████████| 120/120 [00:08<00:00, 14.43it/s]
Training Loss: 0.0126 - Training Accuracy: 0.9962 - Validation Loss: 0.2029 - Validation Accuracy: 0.9702
Epoch: 46/60
100%|██████████| 120/120 [00:08<00:00, 14.14it/s]
Training Loss: 0.0179 - Training Accuracy: 0.9946 - Validation Loss: 0.1968 - Validation Accuracy: 0.9719
Epoch: 47/60
100%|██████████| 120/120 [00:08<00:00, 14.33it/s]
Training Loss: 0.016 - Training Accuracy: 0.9956 - Validation Loss: 0.2332 - Validation Accuracy: 0.9677
Epoch: 48/60
100%|██████████| 120/120 [00:08<00:00, 14.20it/s]
Training Loss: 0.0221 - Training Accuracy: 0.9936 - Validation Loss: 0.2108 - Validation Accuracy: 0.9701
Epoch: 49/60
100%|██████████| 120/120 [00:08<00:00, 13.95it/s]
Training Loss: 0.0131 - Training Accuracy: 0.9962 - Validation Loss: 0.2086 - Validation Accuracy: 0.97
Epoch: 50/60
100%|██████████| 120/120 [00:08<00:00, 14.39it/s]
Training Loss: 0.0099 - Training Accuracy: 0.9971 - Validation Loss: 0.205 - Validation Accuracy: 0.9718
Epoch: 51/60
100%|██████████| 120/120 [00:08<00:00, 14.36it/s]
Training Loss: 0.0114 - Training Accuracy: 0.9967 - Validation Loss: 0.2421 - Validation Accuracy: 0.9691
Epoch: 52/60
100%|██████████| 120/120 [00:08<00:00, 14.33it/s]
Training Loss: 0.0079 - Training Accuracy: 0.9975 - Validation Loss: 0.1884 - Validation Accuracy: 0.9738
Epoch: 53/60
100%|██████████| 120/120 [00:08<00:00, 14.31it/s]
Training Loss: 0.0074 - Training Accuracy: 0.9977 - Validation Loss: 0.1989 - Validation Accuracy: 0.9738
Epoch: 54/60
100%|██████████| 120/120 [00:09<00:00, 13.22it/s]
Training Loss: 0.0072 - Training Accuracy: 0.9977 - Validation Loss: 0.1945 - Validation Accuracy: 0.9753
Epoch: 55/60
100%|██████████| 120/120 [00:08<00:00, 13.40it/s]
Training Loss: 0.0051 - Training Accuracy: 0.9983 - Validation Loss: 0.2097 - Validation Accuracy: 0.974
Epoch: 56/60
100%|██████████| 120/120 [00:08<00:00, 14.45it/s]
Training Loss: 0.009 - Training Accuracy: 0.9975 - Validation Loss: 0.2262 - Validation Accuracy: 0.9714
Epoch: 57/60
100%|██████████| 120/120 [00:08<00:00, 14.19it/s]
Training Loss: 0.012 - Training Accuracy: 0.9964 - Validation Loss: 0.2335 - Validation Accuracy: 0.9711
Epoch: 58/60
90%|█████████ | 108/120 [00:07<00:00, 14.24it/s]
model.loss_plot()
model.accuracy_plot()
y_pred = model.predict(X_test)
confusion_matrix(y_test, y_pred)
array([[ 953, 0, 8, 1, 3, 1, 7, 2, 3, 2],
[ 0, 1123, 4, 2, 0, 0, 2, 0, 4, 0],
[ 1, 0, 1009, 5, 3, 0, 2, 6, 6, 0],
[ 1, 0, 12, 978, 0, 10, 0, 4, 3, 2],
[ 1, 2, 2, 0, 955, 1, 4, 6, 0, 11],
[ 1, 1, 0, 13, 2, 859, 6, 2, 6, 2],
[ 3, 3, 1, 0, 7, 6, 936, 0, 2, 0],
[ 1, 7, 14, 3, 5, 0, 0, 991, 3, 4],
[ 5, 2, 7, 10, 5, 4, 7, 6, 922, 6],
[ 2, 5, 1, 6, 14, 6, 0, 8, 5, 962]])
acc = accuracy_score(y_test, y_pred)
print('Error Rate =',round((1-acc)*100, 2))
print('Accuracy =',round((acc)*100, 2))
Error Rate = 3.12
Accuracy = 96.88
Validating model using MNIST Dataset + Batch Normalization and Dropout#
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]**2)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]**2)
X_train = X_train/255
X_test = X_test/255
utility = Utility()
# train validation split
X_train_new, X_val, y_train_new, y_val = utility.train_test_split(X_train, y_train, test_ratio=0.2, seed=42)
Y_1hot_train, _ = utility.onehot(y_train_new)
lr, epochs, batch_size = 0.8, 40, 200
input_dim = X_train_new.shape[1]
output_dim = Y_1hot_train.shape[1]
model = MLP()
model.add(Dense(neurons=240,
activation_type='relu',
input_dim=input_dim,
seed=42,
weight_initializer_type='xavier_normal'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Dense(neurons=output_dim,
activation_type='softmax',
weight_initializer_type='he_normal',
seed=42))
model.compile(cost_type='cross-entropy', optimizer_type='adam')
model.summary()
Model: MLP
--------------------------------------------------------------------------------------
Layer (type) Output Shape # of Parameters
======================================================================================
input_1 (Input) (None, 784) 0
--------------------------------------------------------------------------------------
dense_1 (Dense) (None, 240) 188400
--------------------------------------------------------------------------------------
dropout_1 (Dropout) (None, 240) 0
--------------------------------------------------------------------------------------
batchnormalization_1 (BatchNormalization) (None, 240) 960
--------------------------------------------------------------------------------------
dense_2 (Dense) (None, 10) 2410
======================================================================================
Total params: 191770
Trainable params: 191290
Non-trainable params: 480
--------------------------------------------------------------------------------------
model.fit(X_train_new, Y_1hot_train, epochs=epochs, batch_size=batch_size, lr=lr, X_val=X_val, y_val=y_val, verbose=1)
Epoch: 1/40
100%|██████████| 240/240 [00:06<00:00, 35.19it/s]
Training Loss: 0.2963 - Training Accuracy: 0.9108 - Validation Loss: 0.1905 - Validation Accuracy: 0.9443
Epoch: 2/40
100%|██████████| 240/240 [00:07<00:00, 33.34it/s]
Training Loss: 0.1622 - Training Accuracy: 0.9502 - Validation Loss: 0.1609 - Validation Accuracy: 0.9518
Epoch: 3/40
100%|██████████| 240/240 [00:06<00:00, 37.13it/s]
Training Loss: 0.1301 - Training Accuracy: 0.961 - Validation Loss: 0.151 - Validation Accuracy: 0.9538
Epoch: 4/40
100%|██████████| 240/240 [00:06<00:00, 38.86it/s]
Training Loss: 0.1099 - Training Accuracy: 0.9656 - Validation Loss: 0.1424 - Validation Accuracy: 0.9544
Epoch: 5/40
100%|██████████| 240/240 [00:06<00:00, 38.94it/s]
Training Loss: 0.0997 - Training Accuracy: 0.9688 - Validation Loss: 0.1361 - Validation Accuracy: 0.9593
Epoch: 6/40
100%|██████████| 240/240 [00:06<00:00, 39.08it/s]
Training Loss: 0.092 - Training Accuracy: 0.9702 - Validation Loss: 0.1306 - Validation Accuracy: 0.962
Epoch: 7/40
100%|██████████| 240/240 [00:06<00:00, 39.14it/s]
Training Loss: 0.0805 - Training Accuracy: 0.9739 - Validation Loss: 0.1296 - Validation Accuracy: 0.9612
Epoch: 8/40
100%|██████████| 240/240 [00:06<00:00, 38.87it/s]
Training Loss: 0.077 - Training Accuracy: 0.9751 - Validation Loss: 0.1239 - Validation Accuracy: 0.9637
Epoch: 9/40
100%|██████████| 240/240 [00:06<00:00, 38.91it/s]
Training Loss: 0.0714 - Training Accuracy: 0.9758 - Validation Loss: 0.1237 - Validation Accuracy: 0.9642
Epoch: 10/40
100%|██████████| 240/240 [00:06<00:00, 39.13it/s]
Training Loss: 0.0642 - Training Accuracy: 0.9793 - Validation Loss: 0.1126 - Validation Accuracy: 0.9686
Epoch: 11/40
100%|██████████| 240/240 [00:11<00:00, 20.89it/s]
Training Loss: 0.061 - Training Accuracy: 0.9797 - Validation Loss: 0.1277 - Validation Accuracy: 0.9646
Epoch: 12/40
100%|██████████| 240/240 [00:14<00:00, 16.65it/s]
Training Loss: 0.06 - Training Accuracy: 0.9799 - Validation Loss: 0.1252 - Validation Accuracy: 0.9652
Epoch: 13/40
100%|██████████| 240/240 [00:07<00:00, 32.92it/s]
Training Loss: 0.0572 - Training Accuracy: 0.981 - Validation Loss: 0.1318 - Validation Accuracy: 0.9641
Epoch: 14/40
100%|██████████| 240/240 [00:06<00:00, 36.05it/s]
Training Loss: 0.0534 - Training Accuracy: 0.9821 - Validation Loss: 0.1378 - Validation Accuracy: 0.9632
Epoch: 15/40
100%|██████████| 240/240 [00:06<00:00, 39.32it/s]
Training Loss: 0.0502 - Training Accuracy: 0.9828 - Validation Loss: 0.1265 - Validation Accuracy: 0.9679
Epoch: 16/40
100%|██████████| 240/240 [00:06<00:00, 39.37it/s]
Training Loss: 0.0535 - Training Accuracy: 0.982 - Validation Loss: 0.1231 - Validation Accuracy: 0.9654
Epoch: 17/40
100%|██████████| 240/240 [00:06<00:00, 39.31it/s]
Training Loss: 0.0494 - Training Accuracy: 0.9844 - Validation Loss: 0.1314 - Validation Accuracy: 0.966
Epoch: 18/40
100%|██████████| 240/240 [00:06<00:00, 39.67it/s]
Training Loss: 0.0437 - Training Accuracy: 0.9852 - Validation Loss: 0.1312 - Validation Accuracy: 0.9669
Epoch: 19/40
100%|██████████| 240/240 [00:06<00:00, 39.08it/s]
Training Loss: 0.044 - Training Accuracy: 0.9854 - Validation Loss: 0.1325 - Validation Accuracy: 0.9654
Epoch: 20/40
100%|██████████| 240/240 [00:06<00:00, 38.29it/s]
Training Loss: 0.0433 - Training Accuracy: 0.9848 - Validation Loss: 0.1363 - Validation Accuracy: 0.9644
Epoch: 21/40
100%|██████████| 240/240 [00:06<00:00, 39.02it/s]
Training Loss: 0.0416 - Training Accuracy: 0.9857 - Validation Loss: 0.1288 - Validation Accuracy: 0.9688
Epoch: 22/40
100%|██████████| 240/240 [00:06<00:00, 39.07it/s]
Training Loss: 0.04 - Training Accuracy: 0.9857 - Validation Loss: 0.1358 - Validation Accuracy: 0.9673
Epoch: 23/40
100%|██████████| 240/240 [00:06<00:00, 38.89it/s]
Training Loss: 0.0387 - Training Accuracy: 0.9871 - Validation Loss: 0.1295 - Validation Accuracy: 0.9668
Epoch: 24/40
100%|██████████| 240/240 [00:06<00:00, 38.62it/s]
Training Loss: 0.04 - Training Accuracy: 0.9861 - Validation Loss: 0.1374 - Validation Accuracy: 0.9696
Epoch: 25/40
100%|██████████| 240/240 [00:07<00:00, 30.90it/s]
Training Loss: 0.0401 - Training Accuracy: 0.9864 - Validation Loss: 0.133 - Validation Accuracy: 0.9658
Epoch: 26/40
100%|██████████| 240/240 [00:06<00:00, 39.35it/s]
Training Loss: 0.0363 - Training Accuracy: 0.9875 - Validation Loss: 0.1319 - Validation Accuracy: 0.9663
Epoch: 27/40
100%|██████████| 240/240 [00:06<00:00, 35.62it/s]
Training Loss: 0.0363 - Training Accuracy: 0.9877 - Validation Loss: 0.1425 - Validation Accuracy: 0.9652
Epoch: 28/40
100%|██████████| 240/240 [00:06<00:00, 37.87it/s]
Training Loss: 0.0345 - Training Accuracy: 0.9882 - Validation Loss: 0.1369 - Validation Accuracy: 0.9686
Epoch: 29/40
100%|██████████| 240/240 [00:06<00:00, 35.04it/s]
Training Loss: 0.0371 - Training Accuracy: 0.987 - Validation Loss: 0.1473 - Validation Accuracy: 0.966
Epoch: 30/40
100%|██████████| 240/240 [00:06<00:00, 35.36it/s]
Training Loss: 0.0324 - Training Accuracy: 0.989 - Validation Loss: 0.1336 - Validation Accuracy: 0.9688
Epoch: 31/40
100%|██████████| 240/240 [00:07<00:00, 30.24it/s]
Training Loss: 0.0313 - Training Accuracy: 0.9888 - Validation Loss: 0.1321 - Validation Accuracy: 0.9681
Epoch: 32/40
100%|██████████| 240/240 [00:06<00:00, 34.49it/s]
Training Loss: 0.0318 - Training Accuracy: 0.9894 - Validation Loss: 0.135 - Validation Accuracy: 0.9691
Epoch: 33/40
100%|██████████| 240/240 [00:07<00:00, 33.35it/s]
Training Loss: 0.0338 - Training Accuracy: 0.9889 - Validation Loss: 0.136 - Validation Accuracy: 0.97
Epoch: 34/40
100%|██████████| 240/240 [00:07<00:00, 32.97it/s]
Training Loss: 0.0299 - Training Accuracy: 0.99 - Validation Loss: 0.133 - Validation Accuracy: 0.9701
Epoch: 35/40
100%|██████████| 240/240 [00:07<00:00, 33.01it/s]
Training Loss: 0.031 - Training Accuracy: 0.9894 - Validation Loss: 0.1441 - Validation Accuracy: 0.9686
Epoch: 36/40
100%|██████████| 240/240 [00:07<00:00, 32.94it/s]
Training Loss: 0.0294 - Training Accuracy: 0.99 - Validation Loss: 0.1381 - Validation Accuracy: 0.9689
Epoch: 37/40
100%|██████████| 240/240 [00:07<00:00, 32.58it/s]
Training Loss: 0.0267 - Training Accuracy: 0.9907 - Validation Loss: 0.1326 - Validation Accuracy: 0.9683
Epoch: 38/40
100%|██████████| 240/240 [00:08<00:00, 29.64it/s]
Training Loss: 0.0308 - Training Accuracy: 0.9892 - Validation Loss: 0.1381 - Validation Accuracy: 0.967
Epoch: 39/40
100%|██████████| 240/240 [00:09<00:00, 26.12it/s]
Training Loss: 0.0299 - Training Accuracy: 0.9901 - Validation Loss: 0.1434 - Validation Accuracy: 0.9682
Epoch: 40/40
100%|██████████| 240/240 [00:07<00:00, 31.81it/s]
Training Loss: 0.0303 - Training Accuracy: 0.9899 - Validation Loss: 0.1342 - Validation Accuracy: 0.9684
model.loss_plot()
model.accuracy_plot()
y_pred = model.predict(X_test)
confusion_matrix(y_test, y_pred)
array([[ 962, 0, 3, 1, 0, 2, 7, 2, 2, 1],
[ 0, 1120, 4, 1, 0, 0, 4, 0, 6, 0],
[ 4, 0, 1010, 3, 2, 0, 2, 6, 5, 0],
[ 1, 2, 8, 977, 0, 9, 0, 7, 2, 4],
[ 1, 2, 3, 0, 952, 1, 3, 5, 1, 14],
[ 1, 1, 1, 8, 3, 867, 6, 1, 3, 1],
[ 5, 2, 5, 0, 6, 4, 931, 0, 5, 0],
[ 3, 6, 16, 4, 2, 0, 1, 987, 0, 9],
[ 3, 1, 10, 15, 5, 6, 4, 3, 920, 7],
[ 2, 3, 0, 8, 11, 7, 1, 8, 9, 960]])
acc = accuracy_score(y_test, y_pred)
print('Error Rate =',round((1-acc)*100, 2))
print('Accuracy =',round((acc)*100, 2))
Error Rate = 3.14
Accuracy = 96.86