Home
Projects
Research Topics
People
Publications
Contact
Light
Dark
Automatic
English
日本語
Mayu Otani
Latest
Revisiting Pixel-Level Contrastive Pre-Training on Scene Images
Toward verifiable and reproducible human evaluation for text-to-image generation
Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
The semantic typology of visually grounded paraphrases
Transferring domain-agnostic knowledge in video question answering
Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
A comparative study of language Transformers for video question answering
The laughing machine: Predicting humor in video
Cross-lingual visual grounding
Visually grounded paraphrase identification via gating and phrase localization
Knowledge VQA
KnowIT VQA: Answering knowledge-based questions about videos
BERT representations for video question answering
Rethinking the evaluation of video summaries
Video meets knowledge in visual question answering
Finding important people in a video using deep neural networks with conditional random fields
iParaphrasing: Extracting visually grounded paraphrases via an image
Visually grounded paraphrase extraction
Video summarization using textual descriptions for authoring video blogs
Fine-grained video retrieval for multi-clip video
Unsupervised Video Summarization using Deep Video Features
Video question answering to find a desired video eegment
Cite
×