Research Topics
Yuta Nakashima
Institute for Datability Science
Osaka University
email: n-yuta@ids.
Please add
to complete email address
Computer Vision
Natural Language Processing
Pattern Recognition
Machine Learning
Instruct Me More! Random Prompting for Visual In-Context Learning
Revisiting Pixel-Level Contrastive Pre-Training on Scene Images
ICDAR’23: Intelligent Cross-Data Analysis and Retrieval
Learning bottleneck concepts in image classification
Model-agnostic gender debiased image captioning
Not only generative art: Stable diffusion for content-style disentanglement in art analysis
Toward verifiable and reproducible human evaluation for text-to-image generation
Uncurated image-text datasets: Shedding light on demographic bias
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation
Automated grading system of retinal arterio-venous crossing patterns: A deep learning approach replicating ophthalmologist’s diagnostic process of arteriolosclerosis
Automatic evaluation of atlantoaxial subluxation in rheumatoid arthritis by a deep learning model
Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
Development of a vertex finding algorithm using Recurrent Neural Network
Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
Gender and Racial Bias in Visual Question Answering Datasets
Quantifying Societal Bias Amplification in Image Captioning
Information Extraction from Public Meeting Articles
Anonymous identity sampling and reusable synthesis for sensitive face camouflage
Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction
Deep Gesture Generation for Social Robots Using Type-Specific Libraries
Depthwise spatio-temporal STFT convolutional neural networks for human action recognition
Emotional Intensity Estimation based on Writer’s Personality
ICDAR'22: Intelligent Cross-Data Analysis and Retrieval
Integration of gesture generation system using gesture library with DIY robot design kit
Match them up: visually explainable few-shot image classification
Multi-label disengagement and behavior prediction in online learning
Tone Classification for Political Advertising Video using Multimodal Cues
The semantic typology of visually grounded paraphrases
GCNBoost: Artwork Classificationby Label Propagation Through a Knowledge Graph
Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning
SCOUTER: Slot attention-based classifier for explainable image recognition
Transferring domain-agnostic knowledge in video question answering
Built year prediction from Buddha face with heterogeneous labels
Explain me the painting: Multi-topic knowledgeable art description generation
Learners' efficiency prediction using facial behavior analysis
Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation
Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
A comparative study of language Transformers for video question answering
MTUNet: Few-shot image classification with visual explanations
WRIME: A new dataset for emotional intensity estimation with subjective and objective annotations
Noisy-LSTM: Improving temporal awareness for video semantic segmentation
Generation and detection of media clones
Preventing fake information generation against media clone attacks
The laughing machine: Predicting humor in video
Cross-lingual visual grounding
IDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets
Improving topic modeling through homophily for legal documents
Visually grounded paraphrase identification via gating and phrase localization
Demographic influences on contemporary art with unsupervised style embeddings
Knowledge-based video question answering with unsupervised scene descriptions
Knowledge VQA
Australian History in Newspaper and AI
MLPhys: Foundation of Machine Learning Physics
Society 5.0 Projects
AI Hospital
Law and AI
Buddha Face and AI
Yoga-82: a new dataset for fine-grained classification of human poses
KnowIT VQA: Answering knowledge-based questions about videos
3D Image Reconstruction from Multi-focus Microscopic Images
BERT representations for video question answering
ContextNet: representation and exploration for painting classification and retrieval in context
IterNet: retinal image segmentation utilizing structural redundancy in vessel networks
Joint learning of vessel segmentation and artery/vein classification with post-processing
Speech-driven face reenactment for a video sequence
Public meeting corpus construction and content delivery
Human shape reconstruction with loose clothes from partially observed data by pose specific deformation
Legal information as a complex network: Improving topic modeling through homophily
Multimodal learning analytics: Society 5.0 project in Japan
Buda.art: A multimodal content-based analysis and retrieval system for Buddha statues
Context-aware embeddings for automatic art analysis
Facial expression recognition with skip-connection to leverage low-level features
Rethinking the evaluation of video summaries
Video meets knowledge in visual question answering
Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation
Finding important people in a video using deep neural networks with conditional random fields
Iterative applications of image completion with CNN-based failure detection
Summarization of user-generated sports video by using deep action recognition features
iParaphrasing: Extracting visually grounded paraphrases via an image
Visually grounded paraphrase extraction
Augmented reality marker hiding with texture deformation
Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD
Video summarization using textual descriptions for authoring video blogs
Fine-grained video retrieval for multi-clip video
Increasing pose comprehension through augmented reality reenactment
Realtime novel view synthesis with eigen-texture regression
ReMagicMirror: Action learning using human reenactment with the mirror metaphor
Unsupervised Video Summarization using Deep Video Features
Video question answering to find a desired video eegment