Interesting Loss Functions

Triplet Loss

  • positive example
  • negative example
  • anchor
    • has the same identity as anchor

High-level goal:

  • influence anchor and positive example to be closer, and
  • influence anchor and negative example to be further apart

e.g. use L2 distance or some other distance metric


  • Why not just sample 1 positive and 1 negative?


Used for BERT/RoBERTa:


  • Instead of predicting the token - why not predict the embedding vector representing the missing token?

MLM Embedding Loss #idea

Contrastive Losses

InfoNCE #todo

SimCLR #todo

MoCo #todo