The Lottery Ticket Hypothesis Finding Sparse, Trainable Neural Networks


Tags: paper ml
State: None
Source: https://arxiv.org/abs/1803.03635
Code: None
  • Pruning can reduce network by 90% without compromising accuracy
  • Standard pruning naturally uncovers sub-networks whose initialization made them capable of training effectively
  • They find "winning tickets" consistently for MNSIT and CIFAR10

Formally:

  • f(x,θ)f(x, \theta) is some dense FF network where θ0Dθ\theta_0 \sim D_\theta for some distribution of parameters DθD_\theta
  • ff reaches ll validation loss and test accuracy aa at some iteration jj from e.g. SGD
  • Consider training f(x,mθ˙)f(x, m \dot \theta) for some fixed mask m{0,1}θm \in \{0, 1\}^{\Vert{\theta}\Vert}
    • This will reach some validation loss ll', at iteration jj' for some test accuracy aa'
  • LTH States: M\exists M where jjj' \leq j, aaa' \leq a

To find such MM they propose an algorithm:

  1. Randomly initialize f(x,θ0)f(x, \theta_0)
  2. Train for some fixed iterations
  3. Prune p% of the network, to construct the mask MM
  4. Reset the parameters to θ0\theta_0 and retrain the model

Repeat this procedure iteratively over n rounds, combining the mask over each round. They should empirically that this out-performs doing this once.

They have experimental results on MNIST and CIFAR10