a tensorflow implementation of "federated learning: strategies for improving communication efficiency".
the goal is to learn over distributed devices (eg smartphones), where each device holds data that may be (a) non iid, (b) imbalanced, and (c) sparse.
svrg is core to federated learning. when compared to vanilla sgd, it allows for faster convergence by reducing variance, introduced through small, noisy minibatches.