DANNs with Theano

While ANNs are a fascinating research subject, quickly implementing new algorithms using their concepts is not always easy. Speed matters, and there are many paradigms how ANN theory can be mapped to implementations. Currently working on extending my difference based learning theory introduced in this recent blog post to convolutional neural networks, I figured Theano and pynnet would just be the right frameworks to make implementation a little easier.

Theano already has many incredibly handy tools at hand, with the possibility of 'automagically' executing ANN code on the GPU. This includes automatic differentiation even for custom error functions! pynnet makes heavy use of the Theano routines and classes and nicely wraps things up in a very comfortable framework for the construction of neural networks. Using both tools, I quickly found a way to take the difference based error function to this new 'universe' ;).

All error functions of pynnet can be found in the module pynnet.nodes.errors. The comment in the module file also states, that

All function[s] exported by this module (the names listed in __all__) must comply with this interface:

Parameters (the names don't have to be the same):
os -- the symbolic variable for the output
y -- the symbolic variable for the targets

This is unfortunate, since the provided interface does not allow for using the forward propagation results of two examples, which is in turn necessary to use the difference based error function. The error function can be defined as follows (with T referring to theano.tensor):

1
2
3
4
5
def dse(output_sample_x, output_sample_xs, desired_distance):
    r"""
    Difference squared error.
    """
    return T.mean(((output_sample_x - output_sample_xs) ** 2 - desired_distance) ** 2)

To apply this error function, a network is necessary that can provide outputs for both samples output_sample_x and output_sample_xs at the same time. This can be achieved by creating two Theano neural networks with the same shared weights:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Create network inputs and outputs twice
sx = theano.tensor.matrix('x')
sxs = theano.tensor.matrix('xs')
sy = theano.tensor.matrix('y')
sys = theano.tensor.matrix('ys')

# Initialize an MLP with one hidden layer of two units.
h = SimpleNode(sx, 2, 2)
hs = SimpleNode(sxs, 2, 2)
# Make the parameters of the second MLP be the one of the first.
hs.W = h.W
hs.b = h.b
# ... same for the out nodes
out = SimpleNode(h, 2, 1)
outs = SimpleNode(hs, 2, 1)
outs.W = out.W
outs.b = out.b

# Now the cost function can be used:
cost = dse(out.output, outs.output, sy)

All functions can now be used as usual. The only difference is, that the dse function defined here is not wrapped through the pynnet interface, meaning that it is not necessary to use cost.output or cost.params, but directly the Theano expressions. Even the automatic gradient calculation can be performed:

1
2
3
4
5
6
# We can build functions from expressions to use our network
eval = theano.function([sx], out.output)
test = theano.function([sx, sxs, sy], cost)

train = theano.function([sx, sxs, sy], cost, 
                        updates=get_updates([h.W[0], h.b, out.W[0], out.b], cost, 0.1))

Whether this is the computationally most efficient way to transport difference based error calculation to the Theano universe remains to be seen. However, it makes the usage of important Theano features such as automatic gradient computation possible without much overhead allowing for quick experimentation with e.g. convolutional networks.

Comments

Tags

Archive

Archive