print(f"the shape of the original set (input) is: {x.shape}")
print(f"the shape of the original set (target) is: {y.shape}\n")
print(f"the shape of the original set (input) is: {x.shape}")
print(f"the shape of the original set (target) is: {y.shape}\n")
We provide epoch value while fitting/training the model as below.
Example: model.fit(X,Y,epoch=100)
In the fit
statement above, the number of epochs
was set to 100. This specifies that the entire data set
should be applied during training 100 times. During training, you see output describing the progress of
training that looks like this:
Epoch 1/100
157/157 [==============================] - 0s 1ms/step - loss: 2.2770
The first line, Epoch 1/100
, describes which epoch the model is currently running. For efficiency,
the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32.
if given an model has are 5000 examples(X_train) it will set or roughly to 157 batches.
The notation on the 2nd line 157/157 [====
is describing which batch has been executed.
Libraries for derivative
from sympy import symbols, diff
Let's try this out. Let's look at the derivative of the function
Lets Consider following simple neural network
import keras.backend as K
from keras.models import Model
from keras.layers import Input, Dense
input_layer = Input((10,))
layer_1 = Dense(10)(input_layer)
layer_2 = Dense(20)(layer_1)
layer_3 = Dense(5)(layer_2)
output_layer = Dense(1)(layer_3)
model = Model(inputs=input_layer, outputs=output_layer)
# some random input
import numpy as np
features = np.random.rand(100,10)
and consider this model is trained
# With a Keras function get the ouputs of all the layers
get_all_layer_outputs = K.function([model.layers[0].input],
[l.output for l in model.layers[
0:]])
layer_output = get_all_layer_outputs([features]) # return the same thing
#layer_output is a list of all layers outputs
#if the model is trained you will get the output for input with trained weights other wise
it will give outout with initial weights
Regression output layer:
When developing a neural network to solve a regression problem, the output layer should have exactly one node. Here we are not trying to map inputs to a variety of class labels, but rather trying to predict a single continuous target value for each sample. Therefore, our network should have one output node to return one – and exactly one – output prediction for each sample in our dataset.
The activation function for a regression problem will be linear. This can be defined by using activation = ‘linear’ or leaving it unspecified to employ the default parameter value activation = None
Linear activation function: The linear activation function, also known as "no activation," or "identity function" (multiplied x1. 0), is where the activation is proportional to the input. The function doesn't do anything to the weighted sum of the input, it simply spits out the value it was given
Evaluation metrics for regression: Mostly use MSE loss function and other available options as below.
Classification output layer:
If your data has a target that resides in a single vector, the number of output nodes in your neural network will be 1 and the activation function used on the final layer should be sigmoid. On the other hand, if your target is a matrix of One-Hot-Encoded vectors, your output layer should have 2 nodes and the activation function on the final layer should be softmax. Usually for binary classification, the last layer is logistic regression(as its single node/sigmoid) for deciding the class output.
Example: if Y has category values of (Yes,no) then one hot encoding give 2 columns encoded_yes, encoded_no. for these cases we need to consider 2 neurons in output layer for 2 outputs.
Evaluation metrics for Classfication:
The loss function used for binary classification problems is determined by the data format as well. When dealing with a single target vector of 0s and 1s, you should use BinaryCrossentropy as the loss function. When your target variable is stored as One-Hot-Encoded values, you should use the CategoricalCrossentropy loss function.
Reference:
https://www.enthought.com/blog/neural-network-output-layer/
Options:
1. Collect more data.
2. Select features(select only important features contributing for prediction). It is called "Feature selection".
3. Reduce size of the parameters(model reduces the feature importance). It is called "Regularization"
We can divide these options into two categories.
The first category is data file, data files means spark only add the specified files into containers, no further commands will be executed. There are two options in this category:
--archives
: with this option, you can submit archives, and spark will extract files in it for you, spark support zip
, tar
... formats.--files
: with this option, you can submit files, spark will put it in container, won't do any other things. sc.addFile
is the programming api for this one.The second category is code dependencies. In spark application, code dependency could be JVM dependency or python dependency for pyspark application.
--jars
:this option is used to submit JVM dependency with Jar file, spark will add these Jars into CLASSPATH
automatically, so your JVM can load them.
--py-files
: this option is used to submit Python dependency, it can be .py
, .egg
or .zip
. spark will add these file into PYTHONPATH
, so your python interpreter can find them.
sc.addPyFile
is the programming api for this one.
PS: for single .py
file, spark will add it into a __pyfiles__
folder, others will add into CWD.
All these four options can specified multiple files, splitted with "," and for each file, you can specified an alias through {URL}#{ALIAS}
format. Don't specify alias in --py-files
option, cause spark won't add alias into PYTHONPATH
.
Example:
-- archives abc.zip#new_abc,cde.zip#new_cde
spark will extract abc.zip, cde.zip and creates new_abc, new_cde folders in container
Lets create some sample sign wave data and add some noise to it.
Three different techniques for feature scaling: