Multi-Scale Spatio-Temporal Graph Convolutional Network for Facial Expression Spotting

概要

Facial expression spotting is a significant but challenging task in facial expression analysis. The accuracy of expression spotting is affected not only by irrelevant facial movements but also by the difficulty of perceiving subtle motions in micro-expressions. In this paper, we propose a Multi-Scale Spatio-Temporal Graph Convolutional Network (SpoT-GCN) for facial expression spotting. To extract more robust motion features, we track both short- and long-term motion of facial muscles in compact sliding windows whose window length adapts to the temporal receptive field of the network. This strategy, termed the receptive field adaptive sliding window strategy, effectively magnifies the motion features while alleviating the problem of severe head movement. The subtle motion features are then converted to a facial graph representation, whose spatio-temporal graph patterns are learned by a graph convolutional network. This network learns both local and global features from multiple scales of facial graph structures using our proposed facial local graph pooling (FLGP). Furthermore, we introduce supervised contrastive learning to enhance the discriminative capability of our model for difficult-to-classify frames. The experimental results on the SAMM-LV and CAS(ME)2 datasets demonstrate that our method achieves state-of-the-art performance, particularly in micro-expression spotting. Ablation studies further verify the effectiveness of our proposed modules.

論文種別
発表文献
Proceedings of the International Conference on Automatic Face and Gesture Recognition (FG 2024)
Yicheng Deng
Yicheng Deng
博士後期課程学生
早志英朗
早志英朗
准教授

深層学習やベイズ推定を基盤とした機械学習アルゴリズムの開発を中心に、生体信号解析、医用画像処理などの応用研究に従事。

長原一
長原一
教授

コンピューテーショナルフォトグラフィ、コンピュータビジョンを専門とし実世界センシングや情報処理技術、画像認識技術の研究を行う。さらに、画像センシングにとどまらず様々なセンサに拡張したコンピュテーショナルセンシング手法の開発や高次元で冗長な実世界ビッグデータから意味のある情報を計測するスパースセンシングへの転換を目指す。