Believe It or Not, We Know What You Are Looking at! (ACCV2018)

4 min readSep 17, 2021

Introduction

貢獻如下：

提出 two-stage 的 framework
透過 supervise 方式訓練 multi-scale 的 gaze direction attention prediction
提出 Daily Life Gaze dataset (DL Gaze)

Method

如下圖，two-stage 的 end-to-end 訓練方式，第一階段先將頭的影像和座標做 encode，並透過 multi-scale gaze direction field 計算大概的 gaze direction，在將其結果跟原始圖像做 concatenate，最後透過 encode decode 輸出 heatmap。

Gaze direction pathway

透過 supervise 的方式學習 gaze direction。

Multi-scale gaze direction field

透過 Gaze direction pathway 預測的位置輸出三張不同 scale 的圖。
如下圖 θ 越小則 probability 越高，反之則越低，因此 gaze direction field 其實就是 probability map 其大小和 scene image 一樣大。

作者透過簡單的幾何內積投影去做計算，G 表示 L_HP。

這邊的 multi-scale 只的就是不同的 γ 光圈大小，實驗設定為 1, 3, 5。

Heatmap pathway

將 Multi-scale gaze direction field 輸出的三個影像和原始圖像做 concate，最後透過 Sigmoid 輸出 0~1 之間的 probability，計算其 BCE loss，N 是 heatmap 的大小 56x56。

整體 loss 如下， λ 為 0.5。

Experiment

GazeFollow dataset

此為單張影像的資訊

Daily Life Gaze following dataset (DL Gaze)

video-based

Ablation study

original image: 沒有 concate 直接丟原始影像進 heatmap pathway
original image + ROI head: ROI head 是指從 heatmap pathway 取出頭部的 feature 丟進 gaze direction 訓練。
w/o mid-layer supervision: 去除 gaze direction supervised 的部分，且只用 one-scale (一個 γ)。

Reference

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Computer Vision

Computer Science

Written by Balin

NTUST CSIE

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams