Devlin et al. (2018): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Vinyals et al. (2016): Matching Networks for One Shot Learning
Snell et al. (2017): Prototypical Networks for Few-Shot Learning