A Vector-based Representation to Enhance Head Pose Estimation (WACV2021)

Dec 13, 2021

Introduction

Three vectors are left (red), down (green) and front (blue) vectors respectively.

由於一般 Head Pose Estimation 的資料集大多都如上圖使用 Euler 或 Quaternions 的向量表示，容易造成資料在某些角度數值不連續，另外作者認為在上圖視角中，一般的 Mean Absolute Error (MAE) 不能很好的判斷誤差，像是第一張和第二張圖片的 Ground truth 歐拉角就差很多，但從圖片上來看幾乎是相同角度，基於以上兩點作者提出以下兩個貢獻：

提出使用旋轉矩陣的預測方式預測三個 Vectors 並限制其相互正交
提出 Measurement Mean Absolute Error of Vectors (MAEV) 的評量指標

Left vector (red), a down vector (green) and a front vector (blue)

Method

透過一個 Backbone 和三個 Head 進行 Corse-to-fine 的方式去預測三個三維的 Vectors，理想上因為是要代表不同軸的旋轉向量，因此會希望彼此互相正交，C 會透過 Pooling 和 1x1 conv 讓輸出的 Size 相同，而三個 Head 的架構都相同。

三維的旋轉矩陣是 3x3 的矩陣，其中對應的是 x, y, z 三軸分別有三個向量，因此透過此方式預測的 Loss 作者定義如下，對每個一軸(r)的三個值計算 L2 並將三軸算完的結果加總起來後平均。

而這些向量值的範圍為 [-1, 1]，作者將其切成不同的 Intervals，越深的網路切越細，之後再把每個 Stage(S) 的值平均起來當成 Output。

n(s) is the number of intervals at stage s, p_i(s) is the probability that the element is in the i-th interval and q_i(s) is the mean value of the i-th interval.