## ANN Introduction

I prepared a basic introductory python script for teaching. It shows how a perceptron can be created and how a basic feed-forward two-layer network can be used to learn to approximate the XOR function. I also included code to visualize the decision boundary of the ANNs during training.

The first part shows how a perceptron can be realized from scratch and how it learns to represent the boolean AND function (with 1 encoding True and -1 encoding False):

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 import numpy as np from matplotlib import pyplot as pl from matplotlib import animation # Create a perceptron with two input nodes and one output node. It does not # have an offset node right now, since this an offset is not necessary for # this simple demonstration. # The weights are stored in a 1x2 matrix. perceptron_weights = np.random.normal(loc=0.0, scale=0.05, size=(1, 2)) print 'Weights:', perceptron_weights # Choose the nonlinear squashing function. For this example, the tangens # hyperbolicus is used: activation = np.tanh # Get the result of the forward propagation of [-1, 1] print 'Result of FP [[-1], [1]]:', activation(perceptron_weights * np.mat([[-1],[1]])) # Define a function for this: def perceptron_result(inp): return activation(perceptron_weights * inp) # Visualize the current decision boundary of this perceptron. # Define a visualization funciton: def visualize_decider(dec_func, out_index, training_data=None, annotations=None, x_offset=[-1.,1.], y_offset=[-1.,1], transpose=False): """ Visualizes a decision boundary. The out_index specifies the vector index of the decision function output component that should be visualized. If no training_data is specified, the offsets are interpreted as lists with absolute boundaries. """ from matplotlib import pyplot as pl if training_data is None: x = np.linspace(x_offset[0], x_offset[1]) y = np.linspace(y_offset[0], y_offset[1]) else : x = np.linspace(min(training_data[:, 0]) + x_offset[0], max(training_data[:, 0]) + x_offset[1]) y = np.linspace(min(training_data[:, 1]) + y_offset[0], max(training_data[:, 1]) + y_offset[1]) # Create the meshgrid X, Y = np.meshgrid(x, y) Z = np.zeros((X.shape[0], X.shape[1])) # Get dec values for i in range(X.shape[0]): for j in range(X.shape[1]): if transpose: Z[i,j] = dec_func(np.mat([ [X[i,j], Y[i,j]] ]))[0, out_index] else: Z[i,j] = dec_func(np.mat([ [X[i,j]], [Y[i,j]] ]))[0, out_index] # Plot the contours contourplot = pl.contour(X, Y, Z) pl.clabel(contourplot, inline=1, fontsize=10) # Plot the data if available if training_data is not None and annotations is not None: pl.scatter(training_data[:, 0], training_data[:, 1], c=annotations) # and use it visualize_decider(perceptron_result, 0, x_offset = [-2, 2], y_offset=[-2, 2]) # Can the perceptron learn the and-function for an encoding of -1 = False and # 1 = True? Let's find out: # Define training data: training_data = np.array([[-1, -1], [-1, 1], [1, -1], [1, 1]]) annotations = np.array([-1, -1, -1, 1]) # What's necessary for backpropagation? eta = 0.01 def train(data, annotation, eta): global perceptron_weights # Get the deltas: # Initialize the gradient to zero gradient = np.zeros(perceptron_weights.shape) for index, example in enumerate(data): output = perceptron_result(example) # The derivation of tanh(x) is -tanh(x)**2 + 1. gradient += - (annotation[index] - output) * (output * output + 1.) * example # Update the weights: perceptron_weights -= eta * gradient # Let's see what training does: train(training_data, annotations, eta) visualize_decider(perceptron_result, 0, x_offset = [-2, 2], y_offset=[-2, 2]) # and define this as a function def train_and_redraw(index): pl.clf() train(training_data, annotations, eta) visualize_decider(perceptron_result, 0, training_data, annotations) pl.title('Perceptron after ' + str(index+1) + ' steps of gradient descent') # Now we can use an animation to continuously visualize training. # Note that you might have to execute the animation more than once # until training saturates. anim = animation.FuncAnimation(pl.figure(), train_and_redraw, repeat=False, frames=50, interval=40) # You can reinitialize the network and play around with eta, the learning rate. # Note how training diverges if eta is chosen to be too large (this happens # for eta=1 already!) # Save the animation if wanted #anim.save('perceptron_learning.mp4') 

The code is meant to be executed step by step. It requires only matplotlib and numpy to be installed (and ffmpeg if the animation should be saved, though this feature seems to be broken on Windows at the time of writing). The resulting animation looks like this:

The code continues with an experiment with a two-layer network being trained to approximate the XOR function. To save some hassle defining all necessary structures, Theano and pynnet is used.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # The previous code must have been executed before to make this work. # Let's use Theano and pynnet for simplicity: import theano from pynnet.nodes import SimpleNode, errors from pynnet.training import get_updates sx = theano.tensor.matrix('x') sy = theano.tensor.matrix('y') # We initialize an MLP with one hidden layer of two units. h = SimpleNode(sx, 2, 2) out = SimpleNode(h, 2, 1) # Use the MSE cost = errors.mse(out, sy) # We can build functions from expressions to use our network network_out = theano.function([sx], out.output) train = theano.function([sx, sy], cost.output, updates=get_updates(cost.params, cost.output, 0.1)) # Let's have a look at the decision boundary: visualize_decider(network_out, 0, transpose=True) # Define the training data: x = np.array([[-1, -1], [1, 1], [1, -1], [-1, 1]]) y = np.array([[-1], [-1], [1], [1]]) # It's interesting to try to not specify all points. This visualizes well the # problem of generalization: the error function (MSE) gives a strong preference # bias here, and hence the network can not even learn to generalize properly # to non-specified points of the XOR problem. # Redefine our functions. def train_and_redraw(index): pl.clf() train(x, y) visualize_decider(network_out, 0, x, y, transpose=True) pl.title('Network output after '+str(index+1)+' steps of gradient descent') # And watch the animation. anim = animation.FuncAnimation(pl.figure(), train_and_redraw, repeat=False, frames=150, interval=20) # Save it if wanted. anim.save('2layer_net_learning.mp4') 

Now you can watch the two layer network being trained:

By the way it wasn't that simple to get the video to being compatible with most browsers. A quick search resulted in using a .mp4 file as container for a .H264 encoded video. Handbrake turned out to be extremely simple and helpful for recoding.