This is an old mini-project where I attempted to implement a basic learning/training algorithm in Python + NumPy alone.
This Medium article convinced me it wouldn't be too hard to do something basic like implement a multilayer perceptron (though I used ReLU instead of a sigmoid activation).
There is an example in "interactive testing.ipynb" that suggests training an MLP (with ReLU activations) is working.
I don't intend to update this repo. I might try implementing the more flexible "computational graph" and backpropagation algorithm described in Deep Learning Book, Chapter 6 at some point, probably in a separate repo.
Nothing here is guaranteed to be correct, extensible or efficient, though I did make reasonable, some, and no effort to ensure this, respectively.
I'm using :
- Python 3.6.6
- NumPy 1.15.0
for the implementation and
- matplotlib 2.2.2 in the "interactive testing" notebook for a simple plot.
If something doesn't work inside and outside of the example in "interactive testing", it's almost certainly my fault.
There are two source files and two test files.
layers.py
layers_tests.py
ann.py
ann_tests.py
The layers.py
file defines an abstract layer
class and three standard layers fc
,relu
and mse
. The layers have to define
- a
forward
method - a
get_training_parameters
method that returns some container of learnable parameters for that layer - an
initialize_training_parameters
method that chooses initial values for the learnable parameters.
I didn't get round to making a good API for the learnable parameter container, it is simply a dictionary that maps a nice name like 'Weights' or 'Bias' to another dictionary that contains the variable name used for the corresponding property, and in the case layer.get_training_parameters(x)
is called with a non-empty x
, the dictionary contains an entry 'Gradient' that contains the value of the gradient of this layer with respect to the corresponding learnable parameter and layer input x
. I didn't say it wasn't convoluted. To make it clearer:
Given a layer L
with a learnable parameter W
, typically called "Weights", suppose x
is the single input to L
. Then L.get_training_parameters(x)['Weights'].Name
is W
and L.get_training_parameters(x)['Weights'].Gradient
is the derivative dL/dW
evaluated at x
.
The ann.py
file defines an abstract ann
class, and implements ann_by_layers
as a subclass, and mlp
as a further subclass of that. I viewed ann
as a container for a directed graph of layer
objects, in particular the ann.forward
method was intended to follow the edges of this graph and apply the layer.forward
method at each node. An ann
implementation has to specify:
forward
- the network level forward propagationinitialize_training_parameters
- to call the corresponding method on every layerget_training_parameters
- a method to collect training parameters from each layer into some container.
I only tried implementing this in the simple "sequential" case called ann_by_layers
. In this case each layer has at most one input and one output connection, and the graph is a single connected component. The container for get_training_parameters
can then simply be a flat list/array since there is a natural ordering of the layers in the graph. Of course on top of the abstract ann
interface I also had to add methods for the gradients.
The mlp
class is a convenience class for constructing an ann_by_layers
consisting of layers fc
then relu
repeatedly for some given "depth".
The tests layer_tests.py
and ann_tests.py
were passing last time I checked. In this case CI stands for "Check Intermittently".