Fine-Tuning StyleGAN2 For Cartoon Face Generation

3 min readJul 20, 2021

Introduction

最近針對 StyleGAN2 做 transfer learning 還是很熱門，主要是因為他的架構是從 low-resolution 到 high-resolution 進行訓練，簡單的 fine-tuning 就可以讓 source domains 轉移到 target domains，不需要整個重新訓練。
下圖為 StyleGAN 的架構，有別於一般 GAN 直接從 random 的 z 開始訓練GAN，而是先透過 mapping network 訓練 distribution，再透過 PGGAN 的概念訓練 GAN，因此在其架構中，low-resolution 的 layer 會掌控圖像的整體輪廓，而 high-resolution 的部分則是掌控圖像的細節部分。
Layer Swapping (LS) 表示從兩個不同 domain 的 distribution 丟給 GAN model，所以如果基於 StyleGAN 做 LS 的話哪個 domain 放在 low-resoltuion 就會由那個 domain 掌握輪廓，high-resolution 同理。

Method

此篇提出兩個想法如下，FreezeSG 固定住 w(Style vector)和 g(generator) 的 initial blockse(4x4–8x8)，Structure Loss 在不同解析度的output 地方加上 source 對 target 的 MSE loss。

1. FreezeSG, which freezes the initial blocks of the style vectors and generator. This is very simple and allows the target image to follow the structure of the source image.
2. Structure Loss, which make to reduce the distance between the inital blocks of the source generator and the inital blocks of the target generator. The effectiveness of applying Layer Swapping to models trained by this loss is also remarkable.

下圖為 Structure Loss 的架構和公式，用在三個 low-resolution layers。

整體的 loss，L_adv 就一般的 GAN loss，λ_structure 設為 1。

結果如下圖1，作者認為 FreezeSG 比單獨的 FreezeG 還好，LS 的部分 low-resolution layer of the source generator (4x4–64x64) and the high-resolution layer (64x64–256x256) of the target generator。