Abstract—Although neural machine translation (NMT) has recently achieved the state-of-the-art performance, it is confronted with the challenge of word-sense disambiguation (WSD). This paper proposes a Korean word-sense annotation preprocessor based on a lexical semantic network that we built as a large-scale lexical knowledge base for the Korean language. We evaluated the effectiveness of the proposed preprocessor on NMT using Korean-Japanese and Korean-English bi-directional translations. The experiments show that the proposed preprocessor significantly improves the quality of NMT systems for both the similar (Korean-Japanese) and different (Korean-English) sentence structural language pairs in term of the BLEU and TER evaluation metrics.
Index Terms—Lexical semantic network, lexical knowledge base, neural machine translation, parallel corpus word sense disambiguation.
The authors are with the Department of IT Convergence, University of Ulsan, Ulsan 44610, Rep. of Korea (e-mail: nqphuoc@mail.ulsan.ac.kr, ducksjc@nate.com, okcy@ulsan.ac.kr).
[PDF]
Cite: Quang-Phuoc Nguyen, Joon-Choul Shin, and Cheol-Young Ock, "Word-Sense Annotation Preprocessor for Improving Neural Machine Translation," International Journal of Knowledge Engineering vol. 5, no. 2, pp. 57-63, 2019.