Gaze360: Physically Unconstrained Gaze Estimation in the Wild (ICCV 2019)

4 min readNov 11, 2021

--

Introduction

提出了 Gaze360 資料集，共有 238 個 indoor/outdoor subjects，且有標註 3D gaze，Data distribution 也較一般資料集大 (Wide range of head poses and distances)，也透過 Cross-dataset evaluation、Cross-dataset domain adaptation.
將 Model 加入 Temporal information 直接進行預測。
使用 Pinball regression loss 去預測 Quantile regression 處理 Uncertainty 的問題。

Full stands for full face images, Eyes denotes crops of eye regions and N/A means that the dataset was not available for use. Asterisks indicate datasets containing partially occluded face images.

Method

透過 Pinball loss 訓練 Uncertainty value 輸出 Confidence 來處理眼睛被蓋住之類的情況而導致準確率降低的問題，因為 Regression 的問題不像分類任務可以用 Softmax 去處理，所以可以訓練模型輸出 Confidence 去影響 loss 的計算，因此若是眼睛被遮住會輸出比較高的 Uncertainty value。

這篇主要是提出 Dataset，所以模型架構相對簡單，是用 7 個 Frame 當 Input 的 Bidirectional LSTM。

Output f(I) = (θ, Φ, σ)，前面兩者為球座標系的 yaw 和 pitch，而 σ 為分布區間期望值的一個 offset，要讓 θ/Φ + σ 在分布的 90% 而 θ/Φ - σ 在分布的 10%。

θ 和 Φ 都會透過下面的公式計算 Loss，τ = 0.1 和 0.9，之後相加。

Loss 的程式如下，因為輸入會是 Vector 所以其實 θ 和 Φ 都有計算到。

因為還有訓練在 Cross domain 所以有訓練 Discriminator 進行二元分類，並計算 Loss(L_D)。

此外也有計算水平翻轉前後的 Pinball loss(L_S)，去 minimize 第一個 input 和翻轉後的第二個 input 的角度差。
整體的 Loss 如下，α = 60, β = 3

Experiment

Model evaluation

比較其他的 Model 和一些實驗設定 Static 表示直接使用 ResNet18 進行 Prediction，TRN 是 Temporal Relation Network，LSTM 則是上面有提到的架構。

Cross-dataset evaluation

DA 表示有使用 L_D 進行 Cross domain 的訓練。

Reference

Machine Learning

Computer Science

Computer Vision

Balin

Written by Balin

NTUST CSIE

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams