This is an old mini-project where I attempted to implement a basic learning/training algorithm in Python + NumPy alone.
This Medium article convinced me it wouldn't be too hard to do something basic like implement a multilayer perceptron (though I used ReLU instead of a sigmoid activation).
There is an example in "interactive testing.ipynb" that suggests training an MLP (with ReLU activations) is working.
I don't intend to update this repo. I might try implementing the more flexible "computational graph" and backpropagation algorithm described in Deep Learning Book, Chapter 6 at some point, probably in a separate repo.
Nothing here is guaranteed to be correct, extensible or efficient, though I did make reasonable, some, and no effort to ensure this, respectively.
I'm using :
- Python 3.6.6
- NumPy 1.15.0
for the implementation and
- matplotlib 2.2.2 in the "interactive testing" notebook for a simple plot.
If something doesn't work inside and outside of the example in "interactive testing", it's almost certainly my fault.
There are two source files and two test files.
layers.pylayers_tests.pyann.pyann_tests.py
The layers.py file defines an abstract layer class and three standard layers fc,relu and mse. The layers have to define
- a
forwardmethod - a
get_training_parametersmethod that returns some container of learnable parameters for that layer - an
initialize_training_parametersmethod that chooses initial values for the learnable parameters.
I didn't get round to making a good API for the learnable parameter container, it is simply a dictionary that maps a nice name like 'Weights' or 'Bias' to another dictionary that contains the variable name used for the corresponding property, and in the case layer.get_training_parameters(x) is called with a non-empty x, the dictionary contains an entry 'Gradient' that contains the value of the gradient of this layer with respect to the corresponding learnable parameter and layer input x. I didn't say it wasn't convoluted. To make it clearer:
Given a layer L with a learnable parameter W, typically called "Weights", suppose x is the single input to L. Then L.get_training_parameters(x)['Weights'].Name is W and L.get_training_parameters(x)['Weights'].Gradient is the derivative dL/dW evaluated at x.
The ann.py file defines an abstract ann class, and implements ann_by_layers as a subclass, and mlp as a further subclass of that. I viewed ann as a container for a directed graph of layer objects, in particular the ann.forward method was intended to follow the edges of this graph and apply the layer.forward method at each node. An ann implementation has to specify:
forward- the network level forward propagationinitialize_training_parameters- to call the corresponding method on every layerget_training_parameters- a method to collect training parameters from each layer into some container.
I only tried implementing this in the simple "sequential" case called ann_by_layers. In this case each layer has at most one input and one output connection, and the graph is a single connected component. The container for get_training_parameters can then simply be a flat list/array since there is a natural ordering of the layers in the graph. Of course on top of the abstract ann interface I also had to add methods for the gradients.
The mlp class is a convenience class for constructing an ann_by_layers consisting of layers fc then relu repeatedly for some given "depth".
The tests layer_tests.py and ann_tests.py were passing last time I checked. In this case CI stands for "Check Intermittently".