TEXT DATA MINING OF THE “Tourism English Proficiency Test”

  • Hiromi Ban Graduate School of Engineering, Nagaoka University of Technology, Nagaoka, Niigata, Japan,
  • Takashi Oyabu NIHONKAI International Exchange Center, Kanazawa, Ishikawa, Japan,
Keywords: data mining, metrical linguistics, statistical analysis, text mining, Tourism English Proficiency Test


According to the White Paper on Tourism for 2019, 18.95 million Japanese people travelled abroad, and 31.19 million foreigners came to Japan for sightseeing in 2018. It can be said that it is just the time of sightseeing right now. Therefore, knowledge of tourism has become more and more important, and the necessity for using English, which can be said to be a world common language, has increased. As a measurement of English communication competence needed at tourism sites, the “Tourism English Proficiency Test” started in 1989. In this study, English sentences of the “Tourism English Proficiency Test” were examined, and compared with other proficiency tests and English textbooks for junior high and high school students in terms of metrical linguistics. In short, frequency characteristics of character- and word-appearance were investigated using a program written in C++. These characteristics were approximated by an exponential function. Furthermore, the percentage of Japanese junior high school required vocabulary and American basic vocabulary was calculated to obtain the difficulty-level as well as the K-characteristic of each material.