This website uses cookies to ensure you get the best experience on our website.
Learn more
Got it!
Home
Projects
Research Topics
People
Publications
Contact
Light
Dark
Automatic
English
日本語
Mayu Otani
Latest
Revisiting Pixel-Level Contrastive Pre-Training on Scene Images
Toward verifiable and reproducible human evaluation for text-to-image generation
Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
The semantic typology of visually grounded paraphrases
Transferring domain-agnostic knowledge in video question answering
Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
A comparative study of language Transformers for video question answering
The laughing machine: Predicting humor in video
Cross-lingual visual grounding
Visually grounded paraphrase identification via gating and phrase localization
Knowledge VQA
KnowIT VQA: Answering knowledge-based questions about videos
BERT representations for video question answering
Rethinking the evaluation of video summaries
Video meets knowledge in visual question answering
Finding important people in a video using deep neural networks with conditional random fields
iParaphrasing: Extracting visually grounded paraphrases via an image
Visually grounded paraphrase extraction
Video summarization using textual descriptions for authoring video blogs
Fine-grained video retrieval for multi-clip video
Unsupervised Video Summarization using Deep Video Features
Video question answering to find a desired video eegment
Cite
×