My code for the Deep Learning class exercises. There should be nothing proprietary in here. For the earlier exercises, I tried to create parallel implementations in Octave and NumPy. Later on, class-supplied helper code necessitated the use of Matlab (for now).
- The L-BFGS Matlab code is licensed by Stanford under a Creative Commons, Attribute, Non-Commercial license. Please read the details on the UFLDL wiki.
- The MNIST digit data comes from http://yann.lecun.com/exdb/mnist/.
“Since J(W,b) is a non-convex function, gradient descent is susceptible to local optima; however, in practice gradient descent usually works fairly well.” - UFLDL/Backpropagation
Why? Is it almost convex? Are the local optima all of a similar quality? Are any of the variations (squared error / squared error + weight decay / squared error + weight decay + sparsity constraints) convex?