Bigdata and data science by Kartheek Dachepalli: Deep Learning

Showing posts with label Deep Learning - Convolutional Neural Network. Show all posts

Saturday, March 11, 2023

Tensorflow general methods

#method to get shape of tensor flow element

#method to apply a method to all elements

#defining variables, constants in Tensorflow for a matrix of elements.

# Casting, activation functions

Saturday, March 4, 2023

Dropout

DROPOUT is a widely used regularization technique that is specific to deep learning.

It randomly shuts down some neurons in each iteration.

At each iteration, you shut down (= set to zero) each neuron of a layer with probability 1−keep_prob or keep it with probability keep_prob (50% here).

The dropped neurons don't contribute to the training in both the forward and backward propagations of the iteration.

When you shut some neurons down, you actually modify your model. The idea behind drop-out is that at each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.

Dropout is a regularization technique.
You only use dropout during training. Don't use dropout (randomly eliminate nodes) during test time.
Apply dropout both during forward and backward propagation.
During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.

Wednesday, March 1, 2023

Deep Learning methodology using gradient descent

Usual Deep Learning methodology to build the model:

Initialize parameters / Define hyperparameters
Loop for num_iterations:

a. Forward propagation

b. Compute cost function

c. Backward propagation

d. Update parameters (using parameters, and grads from backprop)

3. Use trained parameters to predict labels

Sunday, December 18, 2022

split data set to train, cross validation and test sets

print(f"the shape of the original set (input) is: {x.shape}")

print(f"the shape of the original set (target) is: {y.shape}\n")

from sklearn.model_selection import train_test_split

# Get 60% of the dataset as the training set. Put the remaining 40% in temporary variables.

x_train, x_, y_train, y_ = train_test_split(x, y, test_size=0.40, random_state=1)

# Split the 40% subset above into two: one half for cross validation and the other for the test set

x_cv, x_test, y_cv, y_test = train_test_split(x_, y_, test_size=0.50, random_state=1)

# Delete temporary variables

del x_, y_

print(f"the shape of the training set (input) is: {x_train.shape}")

print(f"the shape of the training set (target) is: {y_train.shape}\n")

print(f"the shape of the cross validation set (input) is: {x_cv.shape}")

print(f"the shape of the cross validation set (target) is: {y_cv.shape}\n")

print(f"the shape of the test set (input) is: {x_test.shape}")

print(f"the shape of the test set (target) is: {y_test.shape}")

Tuesday, December 13, 2022

Epochs and batches

We provide epoch value while fitting/training the model as below.

Example: model.fit(X,Y,epoch=100)

Epochs and batches

In the fit statement above, the number of epochs was set to 100. This specifies that the entire data set

should be applied during training 100 times. During training, you see output describing the progress of

training that looks like this:

Epoch 1/100
157/157 [==============================] - 0s 1ms/step - loss: 2.2770

The first line, Epoch 1/100, describes which epoch the model is currently running. For efficiency,

the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32.

if given an model has are 5000 examples(X_train) it will set or roughly to 157 batches.

The notation on the 2nd line 157/157 [==== is describing which batch has been executed.

Loss (cost)

Ideally, the cost will decrease as the number of iterations of the algorithm increases. Tensorflow refers to

the cost as loss.

Monday, December 12, 2022

Derivative using python

Libraries for derivative

from sympy import symbols, diff

Let's try this out. Let's look at the derivative of the function

$J (w) = w^{2}$ its derivative is,

$J (w) = w^{2}$

Sunday, December 11, 2022

SparseCategorialCrossentropy or CategoricalCrossEntropy

Tensorflow has two potential formats for target values and the selection of the loss defines which is expected.

SparseCategorialCrossentropy: expects the target to be an integer corresponding to the index. For example, if there are 10 potential target values, y would be between 0 and 9.
CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].

Friday, December 9, 2022

Get the output of each layer in Neural network

Lets Consider following simple neural network

import keras.backend as K

from keras.models import Model
from keras.layers import Input, Dense

input_layer = Input((10,))

layer_1 = Dense(10)(input_layer)
layer_2 = Dense(20)(layer_1)
layer_3 = Dense(5)(layer_2)

output_layer = Dense(1)(layer_3)

model = Model(inputs=input_layer, outputs=output_layer)

# some random input
import numpy as np
features = np.random.rand(100,10)

and consider this model is trained

# With a Keras function get the ouputs of all the layers
get_all_layer_outputs = K.function([model.layers[0].input],
                                  [l.output for l in model.layers[0:]])

layer_output = get_all_layer_outputs([features]) # return the same thing

#layer_output is a list of all layers outputs

#if the model is trained you will get the output for input with trained weights other wise 
it will give outout with initial weights

Wednesday, December 7, 2022

Output layer of Neural network for Regression and classification problems

Regression output layer:

When developing a neural network to solve a regression problem, the output layer should have exactly one node. Here we are not trying to map inputs to a variety of class labels, but rather trying to predict a single continuous target value for each sample. Therefore, our network should have one output node to return one – and exactly one – output prediction for each sample in our dataset.

The activation function for a regression problem will be linear. This can be defined by using activation = ‘linear’ or leaving it unspecified to employ the default parameter value activation = None

Linear activation function: The linear activation function, also known as "no activation," or "identity function" (multiplied x1. 0), is where the activation is proportional to the input. The function doesn't do anything to the weighted sum of the input, it simply spits out the value it was given

Evaluation metrics for regression: Mostly use MSE loss function and other available options as below.

Root Mean Squared Error (RMSE) – a good option if you’d like the error to be in the same units as the target variable
Mean Absolute Error (MAE) – useful for when you need an error that scales linearly
Median Absolute Error (MdAE) – de-emphasizes outliers

Classification output layer:

If your data has a target that resides in a single vector, the number of output nodes in your neural network will be 1 and the activation function used on the final layer should be sigmoid. On the other hand, if your target is a matrix of One-Hot-Encoded vectors, your output layer should have 2 nodes and the activation function on the final layer should be softmax. Usually for binary classification, the last layer is logistic regression(as its single node/sigmoid) for deciding the class output.

Example: if Y has category values of (Yes,no) then one hot encoding give 2 columns encoded_yes, encoded_no. for these cases we need to consider 2 neurons in output layer for 2 outputs.

Evaluation metrics for Classfication:

The loss function used for binary classification problems is determined by the data format as well. When dealing with a single target vector of 0s and 1s, you should use BinaryCrossentropy as the loss function. When your target variable is stored as One-Hot-Encoded values, you should use the CategoricalCrossentropy loss function.

Reference:

https://www.enthought.com/blog/neural-network-output-layer/

Friday, December 2, 2022

Training a sign wave with feed forward neural network

Lets create some sample sign wave data and add some noise to it.

import numpy as np
import matplotlib.pyplot as plt
import math
from sklearn.utils import shuffle

#lets take some 5000 points
n=5000

#lets consider 0.2% of 5000 as test data
test_per=0.2

#lets consider 0.2% of 5000 as validation data
val_per=0.2

#generate some 5000 points in 2 pi(full cycle) of sign wave
x=np.random.uniform(low=0,high=2*math.pi,size=n)
y=np.sin(x)+0.1*np.random.randn(n)

#lets shuffle the dataset to get variety of data for train, test and validation data sets
x,y=shuffle(x,y)
test_num=int(test_per*n)
val_num=test_num+int(val_per*n)
x_test,x_val,x_train=np.split(x,[test_num,val_num])
y_test,y_val,y_train=np.split(y,[test_num,val_num])

#lets plot train, test and validation data to understand the data size
plt.plot(x_train,y_train,"r.",label="train")
plt.plot(x_test,y_test,"b.",label="test")
plt.plot(x_val,y_val,"g.",label="val")

plt.show()



import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense

#lets train the model with feed forward neural network

model = Sequential()

#I am taking 40 neurons in single hidden layer. we can also implement the same with multiple hidden layers with reduced neurons(16)
model.add(Dense(40, input_dim=1, activation='sigmoid'))
model.add(Dense(1))
model.compile(loss='mae',
                       optimizer='adam',
                       metrics=['mae'])
model.fit(x_train, y_train,batch_size=100, epochs=800)
scores = model.evaluate(x_val, y_val)

#lets print the accuracy of model
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

#lets predict the model with test data
y_pred=model.predict(x_test)

#lets plot the actual vs predicted test values 
plt.scatter(x_test,y_test,marker=".",c="r")
plt.scatter(x_test,y_pred,marker=".",c='b')
plt.show()


Neural network trained well and actual vs predicted almost fit.

But wait a minute, lets try to add some future values and test the same.

#lets add next cycle data points with some noise
x_extra=np.random.uniform(low=2*math.pi,high=4*math.pi,size=int(n/8))
y_extra=np.sin(x_extra)+0.1*np.random.randn(int(n/8))
#lest add next cycle points to existing test data
x_future=np.append(x_test,x_extra)
y_future=np.append(y_test,y_extra)
#lets predict over all values
y_pred=model.predict(x_future)
#plot actual vs predict
plt.scatter(x_future,y_future,marker=".",c="r")
plt.scatter(x_future,y_pred,marker=".",c='b')
plt.show()


Ooops ! neural network learnt only a range of data but failed to predict future data. 
Don't worry we can achieve this by training this with algorithm's which learn the pattern 
of sequence example: RNN, LSTM. We will perform it in next blog.



Information: One can tune neural network to any number of hidden layers and number of 
neurons per hidden layer. If we have more layers/neurons per layer the 
learning will be quicker.

Sunday, November 18, 2018

Training a Convolutional Neural Network to detect Digits

from sklearn.externals import joblib
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
import numpy as np
import pandas as pd
import os

#changes working directory
os.chdir("D:\\Kartheek\\DigitRecog")

(X_train,y_train),(X_test,y_test)=mnist.load_data()
x_train=X_train.reshape(X_train.shape[0],28,28,1).astype('float32')
x_test=X_test.reshape(X_test.shape[0],28,28,1).astype('float32')
print(X_train.shape)
print(X_test.shape)
#os.chdir("D:\\NeuralNetwork")
batch_size = 132
num_classes = 10
epochs = 16

# input image dimensions
img_rows, img_cols = 28, 28

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
#cv2.imwrite('messigray.png',x_train[0])
model = Sequential()
model.add(Conv2D(64, kernel_size=(5, 5),
activation='relu',
input_shape=(28,28,1)))
model.add(Conv2D(128, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])

model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Images = pd.read_csv("test.csv")
Images = Images.values
Images.shape
x_predict=Images.reshape(28000,28,28,1)
y_predict=model.predict(x_predict)
Submission = pd.read_csv("submission.csv")
z=y_predict.argmax(axis=1)

Submission["Label"]=z
Submission.to_csv("submission.csv",index=False)

joblib.dump(model, "digits_cls.pkl", compress=3)
Images = pd.read_csv("test.csv")
Images.shape
Images = Images.values
Images.shape
x_predict=Images.reshape(28000,28,28,1)
y_predict=model.predict(x_predict)
Submission = pd.read_csv("submission.csv")
z=y_predict.argmax(axis=1)

Submission["Label"]=z
Submission.to_csv("submission.csv",index=False)