Revisiting Pixel-Level Contrastive Pre-Training on Scene Images

1月, 2024

概要

Contrastive image representation learning through instance discrimination has shown impressive transfer performance. Recent strategies have focused on pushing the limit of their transfer performance for dense prediction tasks, particularly when conducting pre-training on scene images with complex structures. Initial approaches employ pixel-level contrastive pre-training to optimize dense spatial features, while subsequent methods utilize region-mining algorithms to capture holistic regional semantics and address the issue of semantically inconsistent scene image crops. In this paper, we revisit pixel-level contrastive pre-training on scene images. Contrary to the assumption that pixel-level learning falls short in achieving these objectives, we demonstrate its under-explored potentials: (1) it can effectively learn holistic regional semantics more simply compared to region-level methods, and (2) it intrinsically provides tools to mitigate the impact of semantically inconsistent views involved with scene-level training images. We propose PixCon, a pixel-level contrastive learning framework, and explore two variants with different positive matching strategies to investigate the potential of pixel-level learning. Additionally, when PixCon incorporates a novel semantic reweighting approach tailored for scene image pre-training, it outperforms or matches the performance of previous region-level methods in object detection and semantic segmentation tasks across multiple benchmarks.

論文種別

Conference paper

発表文献

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Revisiting Pixel-Level Contrastive Pre-Training on Scene Images

概要

Zongshang Pang

博士後期課程学生

中島悠太

教授

長原一

教授