Constructing a public meeting corpus

5月, 2020

概要

In this paper, we propose a method for constructing a large corpus about a century of public meetings in historical Australian newspapers, and analyze the constructed corpus. The corpus construction method is based on image processing and Optical Character Recognition (OCR). We digitize and transcribe texts of the specific topic of public meeting. Experiments show that our proposed method achieves a F-score of 71.5% with a high recall of 97.5% for corpus construction. This allows us to feed a content search tool for temporal and semantic content analysis.

論文種別

Conference paper

発表文献

Proceedings - the 12th International Conference on Language Resources and Evaluation (LREC 2020)

Constructing a public meeting corpus

概要

Intelligence and Sensing Lab.