Tags: paper ml
State: None
Source: https://arxiv.org/abs/1803.03635
Code: None
- Pruning can reduce network by 90% without compromising accuracy
- Standard pruning naturally uncovers sub-networks whose initialization made them capable of training effectively
- They find "winning tickets" consistently for MNSIT and CIFAR10
Formally:
- is some dense FF network where for some distribution of parameters
- reaches validation loss and test accuracy at some iteration from e.g. SGD
- Consider training for some fixed mask
- This will reach some validation loss , at iteration for some test accuracy
- LTH States: where ,
To find such they propose an algorithm:
- Randomly initialize
- Train for some fixed iterations
- Prune p% of the network, to construct the mask
- Reset the parameters to and retrain the model
Repeat this procedure iteratively over n rounds, combining the mask over each round. They should empirically that this out-performs doing this once.
They have experimental results on MNIST and CIFAR10