Bootstrap Your Own Latent A New Approach to Self-Supervised Learning (NeurIPS 2020)

4 min readSep 2, 2021

--

Introduction

很多 SSL 作法會需要較大的 batch size 或是使用 memory bank 做 negative example，此篇提出的方法不需要透過 negative example 做 contrastive learning，且認為這樣能夠提升穩定度，且有實驗證明不會因為只用 positive example 導致 collapsed 都輸出一樣的影像，貢獻如下。

不使用 negative pairs 達到 SOTA
SOTA on semi-supervised and transfer benchmarks.
BYOL 相較於其他 CL 方法(如 SimCLR)對 batch size、augmentation 不同的影響較小。

Method

有 θ/ξ 表示 online/target network，ξ 是透過 θ 的 moving average 得到的， τ 是 0~1 的 decay rate，可以透過此方式預防 collapsed 的問題。

流程很單純，只做 positive example，一個影像產生兩個不同的 augmentation，經過網路得到不同的 representation，再透過 linear projection 得到不同的 z，projection 其實在架構上可有可無，但通常做了會有比較好的 performance，然後 online 的部分多訓練一個 prediction head，計算 l_2 loss 讓其和 z’ 經過 normalize 後的值越近越好。

然後也會將 v 和 v’ 交換丟進去算一次 loss，再把兩著加起來。

因此參數的更新方式就會如下，訓練 θ 然後透過移動平均更新 ξ。

最後訓練完只會留下 f_θ 當作之後的 encoder，也就是他架構裡面的 ResNet，實際上 g_θ 和 q_θ 是兩個架構相同的 MLP。

Experiment

一定要比在 ImageNet 上的 linear evaluation，做法大概是在訓練好的模型上固定網路的參數，訓練相同的 linear classifier 進行比較。

然後是 SSL 最重要的 semi-supervised。

還有 transfer 到其他資料集上的結果，也是重要的實驗，才知道訓練的是不是 general 的 representation，repro 是他們重新訓練的，畢竟實驗設備有點不同。

對應到第三個貢獻，有 negative example 方法的 performance 比較容易因為 batch size 降低而降低，或是因為 augmentation 種類變少而減少。

還有很多實驗，有興趣可以看原始論文。

Reference

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Computer Science

Computer Vision

Written by Balin

NTUST CSIE

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams