Python Artificial Neural Networks

Neural networks can be used for finding complex patterns in data by analyzing multiple levels of abstraction. For example, pixels in an image can be grouped together to find edges, edges can be grouped to find shapes, shapes can be grouped to find parts, and parts can be grouped to recognize objects (such as a car or a sign).

Whereas simpler prediction algorithms require data to be structured (organized in a table), neural networks can work with unstructured data, such as images, sound recordings, and human language. For the sake of simplicity, the example below will work with structured data.

A neural network consists of an input layer (indicating the shape of the inputs), 1 or more hidden layers (a collection of nodes), and one or more outputs.

Each node (neuron) of a network can be thought of as a linear regression model that takes inputs from the outputs of the previous layer's nodes (or it takes the model inputs if the node is the first hidden layer). Also, an activation function is applied to the outputs of each node. An activation function just adjusts the output a certain way, such as to convert negative numbers to 0 (in ReLU), or to convert values to a range between 0 and 1 (in the sigmoid activation). Each layer of a neural network (apart from the input layer) has weights and biases, these are the coefficients and y-intercepts for each of the neurons in the layer.


(Optional) Generate demo data

import numpy as np import keras keras.utils.set_random_seed(6) X = np.random.randint(0,11,size=(200,5)) y = np.where( np.where( np.sum(X[:,0:2], axis=1) > 10,1,0) & np.where( np.sum(X[:,2:5], axis=1) < 15, 1, 0), 1, 0)
  • The matrix X has a row for each of 200 data entries, and each row has five feature values (x1 ... x5) that are random numbers from 0 to 10.
  • The target values (y) are 1 if the following condition is true: (x1 + x2 > 10) AND (x3 + x4 + x5 < 15).
  • By using set_random_seed, the random number generator will give consistent values each time you run the code.
x1x2x3x4x5
35169
93826
581031
...............
    
y
0
0
1
...

Structure a simple neural network for binary classification

import keras from keras.models import Sequential from keras.layers import Dense from keras.layers import Input myModel = Sequential () myModel.add( Input(shape = (5,)) ) # Specify the shape of the input data (a 1D array with 5 values) myModel.add( Dense(2, activation='relu') ) # Create hidden layer 1 with 2 neurons, and relu activation myModel.add( Dense(1, activation='sigmoid') ) # Output layer with 1 neuron, sigmoid is used for classification myModel.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'] )
  • Keras is an API used to simplify access to more powerful deep learning libraries (Tensorflow, PyTorch, and JAX), you must install one of those along with keras.
  • This example is a binary classification model with 5 inputs, 1 hidden layer, and 1 output value (between 0 and 1).
  • Hidden layers often use the ReLU activation function, which just sets negative values to 0.
  • The output layer here uses the sigmoid activation function, which ensures that that value is between 0 and 1, and usually close to either extreme. This is useful to classify something as True (closer to 1) or False (closer to 0).
  • The loss function here is binary_crossentropy, which is used for classification between 2 options (e.g. True or False). For numerical value prediction models (regression), you should use mean_squared_error.

Train and evaluate the neural network

from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import numpy as np X_tr, X_tst, y_tr, y_tst = train_test_split(X, y, test_size=0.2) # Split available data (X and y) into training and testing data sets trainingHistory = myModel.fit(X_tr, y_tr, epochs=100) # Train the model on the training dataset myProbabilities = myModel.predict(X_tst) # Make predictions on the testing data as probabilites myBinaryPreds = np.where(myProbabilities > 0.5, 1,0) # Convert prediction probabilities to 1 or 0 myAccuracy = accuracy_score(y_tst, myBinaryPreds) # Compare predictions to actual values print("Accuracy:", myAccuracy)
  • To improve model accuracy, you may want to adjust the epochs parameter in the fit function, it specifies the number of times to run all of the training data through the model.
  • In the fit function, you can pass validation data to use in hyperparameter tuning. Validation data is another subset of the data apart from the training and testing set.

Graph the learning curve of training

import matplotlib.pyplot as plt lossHistory = trainingHistory.history['loss'] # Get an array of the loss values for each epoch of training plt.plot(range(1,101), lossHistory) # Define the plot to number the epochs (1-100) on the x-axis and show the loss values on the y-axis plt.show() # Display the plot
  • The learning curve should show the loss going down over time, usually it will start steeply downward and then level off.
  • This example has the assumption that trainingHistory is an object that stores the output of the model's fit function.

Use the model for predicting on an individual data sample

y_hat = myModel.predict(np.array([[0,5,5,3,9]])) print("Prediction:", y_hat[0][0] > 0.5)
  • A 'y' with a '^' symbol above it is called y-hat, it is the common notation for a model's predicted output, used to distinguish it from the true target value(s) 'y'
  • The result of predict is a 2D array of probabilities, so we get the first and only value with the index operators [0][0] and then see if the probability is greater than 0.5, indicating a classification of True.

All together

import numpy as np import keras from keras.models import Sequential from keras.layers import Dense from keras.layers import Input from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt keras.utils.set_random_seed(6) X = np.random.randint(0,11,size=(200,5)) y = np.where( np.where( np.sum(X[:,0:2], axis=1) > 10,1,0) & np.where( np.sum(X[:,2:5], axis=1) < 15, 1, 0), 1, 0) X_tr, X_tst, y_tr, y_tst = train_test_split(X, y, test_size=0.2) myModel = Sequential () myModel.add( Input(shape = (5,)) ) # Specify the shape of the input data (a 1D array with 5 values) myModel.add( Dense(2, activation='relu') ) # Create hidden layer 1 with 2 neurons, and relu activation myModel.add( Dense(1, activation='sigmoid') ) # Output layer with 1 neuron, sigmoid is used for classification myModel.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'] ) trainingHistory = myModel.fit(X_tr, y_tr, epochs=100) # Train the model on the training dataset myProbabilities = myModel.predict(X_tst) # Make predictions on the testing data as probabilites myBinaryPreds = np.where(myProbabilities > 0.5, 1,0) # Convert prediction probabilities to 1 or 0 myAccuracy = accuracy_score(y_tst, myBinaryPreds) # Compare predictions to actual values print("Accuracy:", myAccuracy) lossHistory = trainingHistory.history['loss'] # Get an array of the loss values for each epoch of training plt.plot(range(1,101), lossHistory) # Define the plot to number the epochs (1-100) on the x-axis and show the loss values on the y-axis plt.show() # Display the plot y_hat = myModel.predict(np.array([[0,5,5,3,9]])) print("Prediction:", y_hat[0][0] > 0.5)
  • This generates builds, trains, and evaluates a simple neural network, then uses the model to predict the classification of a data sample.

Reflection

In this simple model, there is only one hidden layer which has only 2 nodes. It works well for the demo data that we know relies on two conditions, because the model could train each node to focus on a separate condition. Also, each condition of our demo data deals with only a known selection of the inputs, so the neurons can learn to ignore the irrelevant inputs as they shrink the corresponding coefficients to values near zero.

If you used the provided code, including the demo input with the same random seed, you should see that the model improved as more epochs happened. It first improved steeply and then became more gradual. Increasing the number of epochs can improve the model, but there are diminishing returns. Also, the accuracy of the test data tends to be lower than the accuracy on the training data. If you train the model too many times, it can fit itself too well to the training data, and then not perform as well on new data, this is called overfitting. In order to make a model that performs effectively on new data, you may have to adjust the number of epochs as well as other values known as hyperparameters.


Challenge

Make the model described above, but modify it to have 10 inputs and 2 hidden layers. Test it with different numbers of epochs and see which variation of the model performs best.

Completed