Python Artificial Neural Networks
Neural networks can be used for finding complex patterns in data by analyzing multiple levels of abstraction. For example, pixels in an image can be grouped together to find edges, edges can be grouped to find shapes, shapes can be grouped to find parts, and parts can be grouped to recognize objects (such as a car or a sign).
Whereas simpler prediction algorithms require data to be structured (organized in a table), neural networks can work with unstructured data, such as images, sound recordings, and human language. For the sake of simplicity, the example below will work with structured data.
A neural network consists of an input layer (indicating the shape of the inputs), 1 or more hidden layers (a collection of nodes), and one or more outputs.
Each node (neuron) of a network can be thought of as a linear regression model that takes inputs from the outputs of the previous layer's nodes (or it takes the model inputs if the node is the first hidden layer). Also, an activation function is applied to the outputs of each node. An activation function just adjusts the output a certain way, such as to convert negative numbers to 0 (in ReLU), or to convert values to a range between 0 and 1 (in the sigmoid activation). Each layer of a neural network (apart from the input layer) has weights and biases, these are the coefficients and y-intercepts for each of the neurons in the layer.
(Optional) Generate demo data
- The matrix X has a row for each of 200 data entries, and each row has five feature values (x1 ... x5) that are random numbers from 0 to 10.
- The target values (y) are 1 if the following condition is true: (x1 + x2 > 10) AND (x3 + x4 + x5 < 15).
- By using set_random_seed, the random number generator will give consistent values each time you run the code.
| x1 | x2 | x3 | x4 | x5 |
|---|---|---|---|---|
| 3 | 5 | 1 | 6 | 9 |
| 9 | 3 | 8 | 2 | 6 |
| 5 | 8 | 10 | 3 | 1 |
| ... | ... | ... | ... | ... |
| y |
|---|
| 0 |
| 0 |
| 1 |
| ... |
Structure a simple neural network for binary classification
- Keras is an API used to simplify access to more powerful deep learning libraries (Tensorflow, PyTorch, and JAX), you must install one of those along with keras.
- This example is a binary classification model with 5 inputs, 1 hidden layer, and 1 output value (between 0 and 1).
- Hidden layers often use the ReLU activation function, which just sets negative values to 0.
- The output layer here uses the sigmoid activation function, which ensures that that value is between 0 and 1, and usually close to either extreme. This is useful to classify something as True (closer to 1) or False (closer to 0).
- The loss function here is binary_crossentropy, which is used for classification between 2 options (e.g. True or False). For numerical value prediction models (regression), you should use mean_squared_error.
Train and evaluate the neural network
- To improve model accuracy, you may want to adjust the epochs parameter in the fit function, it specifies the number of times to run all of the training data through the model.
- In the fit function, you can pass validation data to use in hyperparameter tuning. Validation data is another subset of the data apart from the training and testing set.
Graph the learning curve of training
- The learning curve should show the loss going down over time, usually it will start steeply downward and then level off.
- This example has the assumption that trainingHistory is an object that stores the output of the model's fit function.
Use the model for predicting on an individual data sample
- A 'y' with a '^' symbol above it is called y-hat, it is the common notation for a model's predicted output, used to distinguish it from the true target value(s) 'y'
- The result of predict is a 2D array of probabilities, so we get the first and only value with the index operators [0][0] and then see if the probability is greater than 0.5, indicating a classification of True.
All together
- This generates builds, trains, and evaluates a simple neural network, then uses the model to predict the classification of a data sample.
Reflection
In this simple model, there is only one hidden layer which has only 2 nodes.
It works well for the demo data that we know relies on two conditions, because
the model could train each node to focus on a separate condition.
Also, each condition of our demo data deals with only a known selection of the inputs,
so the neurons can learn to ignore the irrelevant inputs as they shrink the corresponding coefficients to values near zero.
If you used the provided code, including the demo input with the same random seed,
you should see that the model improved as more epochs happened.
It first improved steeply and then became more gradual.
Increasing the number of epochs can improve the model, but there are diminishing returns.
Also, the accuracy of the test data tends to be lower than the accuracy on the training data.
If you train the model too many times, it can fit itself too well to the training data,
and then not perform as well on new data, this is called overfitting.
In order to make a model that performs effectively on new data, you may have to adjust the number of epochs
as well as other values known as hyperparameters.
Challenge
Make the model described above, but modify it to have 10 inputs and 2 hidden layers. Test it with different numbers of epochs and see which variation of the model performs best.