Prototypical networks for few-shot learning

Posted on 17/04/2019, in Paper.
  • Overview: This paper extend the [[Matching Networks for One Shot Learning]] paper by a) applying inference rules that applicable to few-shot or zero-shot more naturally; b) theoretically explaining the advantage of Euclidian distance over Cosine distance in the one shot framework; c) empirically simplify the system design (the Full Context Embedding) and achieve similar level performance.
  • Prototype: Each prototype is the mean vector of the embedded support points belonging to its class. Given a distance, the networks produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space.
  • Regular Bregman divergences: This is a fancy concept of a family of distance functions including Euclidian distance and Mahalanobis distance. The author proved Mathematically that for any exponential family distribution, adopting Bregman divergence the networks are equivalently performing mixture density — this possibly explains the advantage of Euclidian distance over Cosine distance. Further more, for Euclidean distance the classifier is linear w.r.t. the embedding, hence embedding actually takes care of the non-linear part.
  • Experiment result: The authors found that it is better to have larger N_C but similar N_S in meta-training. I doubt this is just because the model is given more data to learn the general embedding.

This paper point to two interesting meta learning paper I want to check out later:

  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. International Conference on Learning Representations, 2017.
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning, 2017.