Publications

Information Extraction from Public Meeting Articles

Public meeting articles are the key to understanding the history of public opinion and public sphere in Australia. Information …

Quantifying Societal Bias Amplification in Image Captioning

Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning …

Gender and Racial Bias in Visual Question Answering Datasets

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate …

Anonymous identity sampling and reusable synthesis for sensitive face camouflage

An increasing amount of face images are being captured, shared, or applied in various applications. These images usually contain lots …

Integration of gesture generation system using gesture library with DIY robot design kit

Conversational agents are expected to improve the quality of communication by adding gestures to the speech, and are considered to be a …

Depthwise spatio-temporal STFT convolutional neural networks for human action recognition

Conventional 3D convolutional neural networks (CNNs) are computationally expensive, memory intensive, prone to overfitting, and most …

The semantic typology of visually grounded paraphrases

Visually grounded paraphrases (VGPs) are different phrasal expressions describing the same visual concept in an image. Previous studies …

Transferring domain-agnostic knowledge in video question answering

SCOUTER: Slot attention-based classifier for explainable image recognition

Explainable artificial intelligence has been gaining attention in the past few years. However, most existing methods are based on …

Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning

Deep hashing has been widely used to approximate nearest-neighbor search for image retrieval tasks. Most of them are trained with …

GCNBoost: Artwork Classificationby Label Propagation Through a Knowledge Graph

Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip. The current available …

Explain me the painting: Multi-topic knowledgeable art description generation

Have you ever looked at a painting and wondered what is the story behind it? This work presents a framework to bring art closer to …

Built year prediction from Buddha face with heterogeneous labels

Buddha statues are a part of human culture, especially of the Asia area, and they have been alongside human civilisation for more than …

PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation

We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose …

Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log

This paper proposes a method for automatically generating postcards that reflect each visitor’s museum experience by analyzing …

Learners' efficiency prediction using facial behavior analysis

In the e-learning context, how much the learner is concentrated and engaged, or the learners' efficiency, is essential for providing …

Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers

The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating …

A comparative study of language Transformers for video question answering

With the goal of correctly answering questions about images or videos, visual question answering (VQA) has quickly developed in recent …

WRIME: A new dataset for emotional intensity estimation with subjective and objective annotations

We annotate 17,000 SNS posts with both the writer’s subjective emotional intensity and the reader’s objective one to construct a …

MTUNet: Few-shot image classification with visual explanations

Few-shot learning (FSL) approaches, mostly neural network-based, are assuming that the pre-trained knowledge can be obtained from base …

Noisy-LSTM: Improving temporal awareness for video semantic segmentation

Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is …

The laughing machine: Predicting humor in video

Humor is a very important communication tool; yet, it is an open problem for machines to understand humor. In this paper, we build a …

Preventing fake information generation against media clone attacks

Fake media has been spreading due to remarkable advances in media processing and machine leaning technologies, causing serious problems …

Generation and detection of media clones

With the spread of high-performance sensors and social network services (SNS) and the remarkable advances in machine learning …

CFA Handling and Quality Analysis for Compressive Light Field Camera

A light field can carry rich visual information of a real 3-D scene, leading to many attractive applications. However, the acquisition …

Cross-lingual visual grounding

Visual grounding is a vision and language understanding task aiming at locating a region in an image according to a specific query …

IDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets

We introduce the IDSOU submission for the WNUT-2020 task 2: identification of informative COVID-19 English Tweets. Our system is an …

Improving topic modeling through homophily for legal documents

Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The …

Following Embryonic Stem Cells, Their Differentiated Progeny, and Cell-State Changes During iPS Reprogramming by Raman Spectroscopy

Monitoring cell-state transition in pluripotent cells is invaluable for application and basic research. In this study, we demonstrate …

Diagnostic performance for pulmonary adenocarcinoma on CT: comparison of radiologists with and without three-dimensional convolutional neural network

Objectives To compare diagnostic performance for pulmonary invasive adenocarcinoma among radiologists with and without …

Visually grounded paraphrase identification via gating and phrase localization

Visually grounded paraphrases (VGPs) describe the same visual concept but in different wording. Previous studies have developed models …

Red-Fluorescent Pt Nanoclusters for Detecting and Imaging HER2 in Breast Cancer Cells

Overexpression of human epidermal growth factor receptor 2 (HER2) is associated with more frequent cancer recurrence and metastasis. …

Improvement of nerve imaging speed with coherent anti-Stokes Raman scattering rigid endoscope using deep-learning noise reduction

A coherent anti-Stokes Raman scattering (CARS) rigid endoscope was developed to visualize peripheral nerves without labeling for …

YOLO in the Dark - Domain adaptation method for merging multiple models -

Generating models to handle new visual tasks requires additional datasets, which take considerable effort to create. We propose a …

Knowledge-based video question answering with unsupervised scene descriptions

To understand movies, humans constantly reason over the dialogues and actions shown in specific scenes and relate them to the overall …

Demographic influences on contemporary art with unsupervised style embeddings

Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are …

Acquiring dynamic light fields through coded aperture camera

We investigate the problem of compressive acquisition of a dynamic light field. A promising solution for compressive light field …

Nerve segmentation with deep learning from label-free endoscopic images obtained using coherent anti-stokes Raman scattering

Semantic segmentation with deep learning to extract nerves from label-free endoscopic images obtained using coherent anti-Stokes Raman …

公開集会記事からの情報抽出

OCR誤り訂正を⽤いた歴史新聞データからのコーパス構築

Constructing a public meeting corpus

In this paper, we propose a method for constructing a large corpus about a century of public meetings in historical Australian …

Yoga-82: a new dataset for fine-grained classification of human poses

Human pose estimation is a well-known problem in computer vision to locate joint positions. Existing datasets for the learning of poses …

Convolutional Neural Network Can Recognize Drug Resistance of Single Cancer Cells

textlessptextgreaterIt is known that single or isolated tumor cells enter cancer patients' circulatory systems. These circulating tumor …

Detecting learner drowsiness based on facial expressions and head movements in online courses

Drowsiness is a major factor that hinders learning. To improve learning efficiency, it is important to understand students' physical …

KnowIT VQA: Answering knowledge-based questions about videos

We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a …

Warmer Environments Increase Implicit Mental Workload Even If Learning Efficiency Is Enhanced

© Copyright © 2020 Kimura, Takemura, Nakashima, Kobori, Nagahara, Numao and Shinohara. Climate change is one of the most important …

Uncovering hidden challenges in query-based video moment retrieval

The query-based moment retrieval is a problem of localising a specific clip from an untrimmed video according a query sentence. This is …

Speech-driven face reenactment for a video sequence

We present a system for reenacting a person’s face driven by speech. Given a video sequence with the corresponding audio track of …

Joint learning of vessel segmentation and artery/vein classification with post-processing

Retinal imaging serves as a valuable tool for diagnosis of various diseases. However, reading retinal images is a difficult and …

IterNet: retinal image segmentation utilizing structural redundancy in vessel networks

Retinal vessel segmentation is of great interest for diagnosis of retinal vascular diseases. To further improve the performance of …

ContextNet: representation and exploration for painting classification and retrieval in context

© 2019, The Author(s). In automatic art analysis, models that besides the visual elements of an artwork represent the relationships …

BERT representations for video question answering

Visual question answering (VQA) aims at answering questions about the visual content of an image or a video. Currently, most work on …

Action recognition from a single coded image

Cameras are prevalent in society at the present time, for example, surveillance cameras, and smartphones equipped with cameras and …

5D Light Field Synthesis from a Monocular Video

Commercially available light field cameras have difficulty in capturing 5D (4D + time) light field videos. They can only capture still …

3D Image Reconstruction from Multi-focus Microscopic Images

This paper presents a method for reconstructing 3D image from multi-focus microscopic images captured with different focuses. We model …

歴史研究におけるビッグデータの活用-オーストラリアを中心に

Reflectance and Shape Estimation with a Light Field Camera Under Natural Illumination

Reflectance and shape are two important components in visually perceiving the real world. Inferring the reflectance and shape of an …

Public meeting corpus construction and content delivery

Deep-UV excitation fluorescence microscopy for detection of lymph node metastasis using deep neural network

Contextualized multi-sense word embedding

Currently, distributed word representations are employed in many natural language processing tasks. However, when generating one …

Legal information as a complex network: Improving topic modeling through homophily

Topic modeling is a key component to computational legal science. Network analysis is also very important to further understand the …

Human shape reconstruction with loose clothes from partially observed data by pose specific deformation

Reconstructing the entire body of moving human in a computer is important for various applications, such as tele-presence, virtual …

Deep compressive sensing for visual privacy protection in flatcam imaging

Detection followed by projection in conventional privacy cameras is vulnerable to software attacks that threaten to expose image sensor …

Metric for automatic machine translation evaluation based on pre-trained sentence embeddings

This study describes a segment-level metric for automatic machine translation evaluation (MTE). Although various MTE metrics have been …

A 3-D Display Pipeline from Coded-Aperture Camera to Tensor Light-Field Display Through CNN

We propose an efficient pipeline from input to output for a tensor light-field display. Conventionally, a dense light field (i.e., tens …

Application of deep learning (3-dimensional convolutional neural network) for the prediction of pathological invasiveness in lung adenocarcinoma

歴史新聞データからのコーパス構築

Multimodal learning analytics: Society 5.0 project in Japan

Fall detection using optical level anonymous image sensing system

Fall is one of the leading causes of injury for the elderly individuals. Systems that automatically detect falls can significantly …

Video meets knowledge in visual question answering

In this work, we address knowledge-based visual question answering in videos. First, we introduce KnowIT VQA, a video dataset with …

Rethinking the evaluation of video summaries

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists …

Negative lexically constrained decoding for paraphrase generation

Paraphrase generation can be regarded as monolingual translation. Unlike bilingual machine translation, paraphrase generation rewrites …

Historical and modern features for Buddha statue classification

© 2019 Copyright held by the owner/author(s). While Buddhism has spread along the Silk Roads, many pieces of art have been displaced. …

Facial expression recognition with skip-connection to leverage low-level features

Deep convolutional neural networks (CNNs) have established their feet in the ground of computer vision and machine learning, used in …

Efficacy of Novel Multispectral Imaging Device to Determine Anastomosis for Esophagogastrostomy

© 2019 The Authors Background: Biomedical imaging devices that utilize the optical characteristics of hemoglobin (Hb) have become …

Controllable text simplification with lexical constraint loss

We propose a method to control the level of a sentence in a text simplification task. Text simplification is a monolingual translation …

Contextualized context2vec

Lexical substitution ranks substitution candidates from the viewpoint of paraphrasability for a target word in a given sentence. There …

Context-aware embeddings for automatic art analysis

© 2019 Association for Computing Machinery. Automatic art analysis aims to classify and retrieve artistic representations from a …

Buda.art: A multimodal content-based analysis and retrieval system for Buddha statues

© 2019 Copyright held by the owner/author(s). We introduce BUDA.ART, a system designed to assist researchers in Art History, to explore …

A Coded Aperture for Watermark Extraction from Defocused Images

© 2019, Springer Nature Switzerland AG. Barcodes and 2D codes are widely used for various purposes, such as electronic payments and …

Space-time-brightness sampling using an adaptive pixel-wise coded exposure

Most conventional digital video cameras face a fundamental trade-off between spatial resolution, temporal resolution and dynamic range …

Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation

Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent …

Finding important people in a video using deep neural networks with conditional random fields

Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically …

Designing coded aperture camera based on PCA and NMF for light field acquisition

A light field, which is often understood as a set of dense multi-view images, has been utilized in various 2D/3D applications. …

Summarization of user-generated sports video by using deep action recognition features

Automatically generating a summary of a sports video poses the challenge of detecting interesting moments, or highlights, of a game. …

Iterative applications of image completion with CNN-based failure detection

Image completion is a technique to fill missing regions in a damaged or redacted image. A patch-based approach is one of major …

iParaphrasing: Extracting visually grounded paraphrases via an image

A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many …

PCA-coded aperture for light field photography

A light field, which is often understood as a set of dense multi-view images, has been utilized in various 2D/3D applications. …

Visually grounded paraphrase extraction

The dynamic photometric stereo method using a multi-tap CMOS image sensor

The photometric stereo method enables estimation of surface normals from images that have been captured using different but known …

RUSE: Regressor using sentence embeddings for automatic machine translation evaluation

We introduce the RUSE metric for the WMT18 metrics shared task. Sentence embeddings can capture global information that cannot be …

Metric for automatic machine translation evaluation based on universal sentence representations

Sentence representations can capture a wide range of information that cannot be captured by local features based on character or word …

Learning to capture light fields through a coded aperture camera

We propose a learning-based framework for acquiring a light field through a coded aperture camera. Acquiring a light field is a …

Joint optimization for compressive video sensing and reconstruction under hardware constraints

Compressive video sensing is the process of encoding multiple sub-frames into a single frame with controlled sensor exposures and …

Graphical classification of DNA sequences of HLA alleles by deep learning

© 2018 The Author(s) Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial …

Complex word identification based on frequency in a learner corpus

We introduce the TMU systems for the Complex Word Identification (CWI) Shared Task 2018. TMU systems use random forest classifiers and …

Coherent anti-stokes Raman scattering rigid endoscope toward robot-assisted surgery

© 2018 Optical Society of America. Label-free visualization of nerves and nervous plexuses will improve the preservation of …

Adapting local features for face detection in thermal image

A thermal camera captures the temperature distribution of a scene as a thermal image. In thermal images, facial appearances of …

Augmented reality marker hiding with texture deformation

Augmented reality (AR) marker hiding is a technique to visually remove AR markers in a real-time video stream. A conventional approach …

Adaptive background model registration for moving cameras

We propose a framework for adaptively registering background models with an image for background subtraction with moving cameras. …

Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD

The proliferation of off-the-shelf head-mounted displays (HMDs) let end-users enjoy virtual reality applications, some of which render …

Video summarization using textual descriptions for authoring video blogs

Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this …

Hyperspectral imaging using flickerless active LED illumination

© 2017 SPIE. Hyperspectral imaging is used in various fields because it can obtain much more information than imaging by conventional …

Video question answering to find a desired video eegment

Unsupervised Video Summarization using Deep Video Features

ReMagicMirror: Action learning using human reenactment with the mirror metaphor

We propose ReMagicMirror, a system to help people learn actions (e.g., martial arts, dances). We first capture the motions of a teacher …

Realtime novel view synthesis with eigen-texture regression

Realtime novel view synthesis, which generates a novel view of a real object or scene in realtime, enjoys a wide range of applications …

Mixed features for face detection in thermal image

© 2017 SPIE. An infrared (IR) camera captures the temperature distribution of an object as an IR image. Because facial temperature is …

Incremental structural modeling on sparse visual SLAM

© 2017 MVA Organization All Rights Reserved. This paper presents an incremental structural modeling approach that improves the …

Increasing pose comprehension through augmented reality reenactment

Standard video does not capture the 3D aspect of human motion, which is important for comprehension of motion that may be ambiguous. In …

Fine-grained video retrieval for multi-clip video

Classification of C2C12 cells at differentiation by convolutional neural network of deep learning using phase contrast images

© 2017 The Author(s) In the field of regenerative medicine, tremendous numbers of cells are necessary for tissue/organ regeneration. …

High-speed imaging using CMOS image sensor with quasi pixel-wise exposure

Several recent studies in compressive video sensing have realized scene capture beyond the fundamental trade-off limit between spatial …

Dynamic photometric stereo method using multi-tap CMOS image sensor

Photometric stereo enables the estimation of surface normals from images that were captured using different known lighting directions. …