Tile2Vec: Unsupervised representation learning for spatially distributed data

Posted on 24/10/2019, in Paper.
  • Overview: This paper propose an unsupervised learning model to learn semantically meaningful representation for satellite tiles.
  • Unsupervised Triplet Loss: Triplet loss here is defined as: \begin{equation} L(t_a, t_n, t_d) = (|| f_\theta(t_a) - f_\theta(t_n) ||2 - || f\theta(t_a) - f_\theta(t_d) ||2 + m)+ \end{equation} The idea is that a tile should be similar to its neighbours and (on average) far from one sampled ranodmly over the glove. In practice, we have to penalize the L2 norm of embeddings to make sure they are not shrinking to zero.
  • Downstream tasks: There are four downstream tasks that has been carried out: Land cover classification; Visual analogies of US cities; Poverty prediction in Uganda; Predicting country health index. The proposed embedding outperforms the PCA, k-means and AutoEncoder baseline.

Review questions:

  • Q1: Is there any type in the equations (there are not too many equations in this paper)?
  • Q2: What would be the consequence if the $m$ in Triplet Loss is way too big (1)?
  • Q3: Why we want to penalize the L2 norm of the embedding in equation (2)?


  • This is a research topic I would like to follow. There are some other priors of Geo-tiles that hasn’t been explored:
    • Sparsity
    • Temporal sparsity