NINJAL and WAP Collaborate on Industry-Academia Research for Word Vector

Using One of the Largest Japanese Databases and “Sudachi,” a Japanese Morphological Analyzer Developed by WAP, to Promote High-Performance Word Vector Offered for Free

[Translation of press release in Japanese]
Works Applications Co., Ltd. (Headquarters: Minato-ku, Tokyo; Representative Director, Chief Executive Officer: Masayuki Makino; hereinafter “WAP”) today announced an agreement by “,” its artificial intelligence (AI) research institute and National Institute for Japanese Language and Linguistics (hereinafter “NINJAL”) on collaborative research on “Word Vector,” one of the essential resources for natural language processing.


In this collaborative research, NINJAL and WAP use “NINJAL Web Japanese Corpus,”[1] a Japanese database containing 10 billion words owned by NINJAL, and  “Sudachi,”[2] a new Japanese morphological analyzer developed by WAP Tokushima Laboratory of AI and NLP, to build a more practical word vector.

Word vector is the numeric representation of word characteristics. It allows the computer to capture similarities and relevance of words, which can lead to high-performance retrieval, translation, text mining, analysis, and automated conversation (e.g. Chatbot) in the future.

NINJAL and WAP are committed to contribute to accelerated research and development of natural language processing by offering the word vector for free as an open-source software program.


The goal of industry-academia collaborative research

Working on the research on natural language processing technology and its practical application, WAP Tokushima Laboratory of AI and NLP has released “Sudachi,” a Japanese morphological analyzer that delivers high-quality analysis capabilities ideal for commercial application as an operations support system (OSS) (open-source software) in August 2017.

NINJAL has an important mission to conduct theoretical and empirical researches as a Japanese language research hub in collaboration with a variety of organizations to promote their application in various areas including Japanese language education and natural language processing.

To that end, NINJAL and WAP Tokushima Laboratory of AI and NLP have agreed on an industry-academia research collaboration, aiming for further development of natural language processing technology, as well as the return of knowledge to the society, by utilizing technological assets owned by each organization.


Collaborative research on natural language processing

Creating a word vector requires a massive amount of text data.

Using large-scale “NINJAL Web Japanese Corpus” owned by NINJAL and easy-to-use “Sudachi,” NINJAL and WAP are working together to build a highly accurate word vector, which we aim to make commercially available to users for free as open data in the future.


 [1]“NINJAL Web Japanese Corpus” is a database of systematically collected words, both written and spoken, which can provide information to be used in the study of numerous natural languages, including Japanese.
[2]As “Sudachi”, WAP Tokushima Laboratory of AI and NLP has developed a new tokenizer and a dictionary with features such as multi-granular output for different purposes and normalization of notation variations.

* Works Applications and Works Applications product names are trademarks or registered trademarks of our company in Japan.
* The contents of this release are based on information available at the time of announcement and are subject to change without notice and the accuracy of their content is not guaranteed. Furthermore, the predictions and forward-looking statements included in this release are uncertain and therefore the actual results may differ significantly from these forecasts for various reasons.

Media Contact:

Works Applications
Hiromi Kaneda, Hirokimi Yamagiwa

This site uses cookies for comfortable viewing and improving user experience. By continuing to browse the site, you are agreeing to our use ofcookies. Display cookie policy