Visually grounded paraphrase identification via gating and phrase localization

概要

Visually grounded paraphrases (VGPs) describe the same visual concept but in different wording. Previous studies have developed models to identify VGPs from language and visual features. In these existing methods, language and visual features are simply fused. However, our detailed analysis indicates that VGPs with different lexical similarities require different weights on language and visual features to maximize identification performance. This motivates us to propose a gated neural network model to adaptively control the weights. In addition, because VGP identification is closely related to phrase localization, we also propose a way to explicitly incorporate phrase-object correspondences. From our evaluation in detail, we confirmed our model outperforms the state-of-the-art model.

発表文献
Neurocomputing
Chenhui Chu
Chenhui Chu
招へい准教授
中島悠太
中島悠太
准教授

コンピュータビジョン・パターン認識などの研究。ディープニューラルネットワークなどを用いた画像・映像の認識・理解を主に、自然言語処理を援用した応用研究などに従事。