Human-Computer Interaction

Adaptive Distributional Word Models for Robust Semantic Information Systems

This project is already completed.

Motivation & Goal

Machinery and plant engineering represent a substantial contribution to German economy: the sector constitutes the largest job provider and the Federal Ministry for Economic Affairs and Energy defines it being the central point of the German capital goods industry (Bundesministerium für Wirtschaft und Energie, 2018). However, innovative solutions are needed particularly when it comes to maintenance and repair tasks in order to allow for an unimpeded and efficient operation of those increasingly complex machines.

In this context, digital information systems can provide a more effective access to machine documentation, thereby facilitating and accelerating the accomplishment of tasks of mechanics working in the field of Technical Service. As an example, semantic search and interactive diagnostic systems can support a technician during maintenance and repair tasks. Voice assistants are a promising form of human-computer interaction in this regard. They appear to be eminently suitable as they can provide information to predominantly manual and visual tasks via an audio-verbal interaction channel.

However, there are problems utilizing speech-enabled assistants. The interaction usually is not very robust: requests articulated in natural language often cannot be processed adequately as pre-defined phrase structures are demanded in order to parse the relevant pieces of input information from a given request. Consequently, the user first has to learn how to formulate requests to their voice assistant to get satisfactory results. Besides the rigid phrase structures, a user also may have to express very specific terms of machine components or structures for a system to return an appropriate result as contemporary information systems only have a restricted set of known terms. In many cases however, this is not possible since technicians employ a varying level of competence and terms often are ambiguous and not straightforward.

Therefore, this master’s thesis aims for researching and utilizing approaches allowing to gain robustness with respect to natural language requests in the domain of Technical Service. An additional online learning approach should furthermore increase robustness as the system is capable of continually learning and updating itself from user input. If implemented in a speech-enabled information system, this should consequently contribute positively to the user’s experience by reducing error rates, providing them with a higher quality output and granting the opportunity to interact more naturally.

Related & Preparatory Work

In the domain of natural language processing (NLP), word embeddings are a common language modeling technique of mapping words onto a multi-dimensional vector space. The key idea is, that, for a specific word, a vector is generated based on the words that, in a large text corpus, appear frequently in close proximity – or as the linguist John Rupert Firth put it: “You shall know a word by the company it keeps” (Firth, 1957). Popular word embedding approaches include for example Word2Vec (Mikolov, Chen, Corrado, & Dean, 2013), GloVe (Pennington, Socher, & Manning, 2014) and FastText (Bojanowski, Grave, Joulin, & Mikolov, 2016). Utilizing statistical methods like neural networks for generating them, the resulting word vectors can be characterized by their position in the vector space: vectors that are nearby represent similar (or synonym) words. Similarly, sequences of words (or phrases) – so called word-level-n-grams – can be learned as well. Researchers have been using word embeddings for example by feeding them as input into deep neural networks solving NLP tasks such as sentiment analysis, text summarization and question answering (QA).

In a preliminary project, a QA dataset for the domain of Technical Service was generated resembling the structure of the Stanford Question Answering Dataset (SQuAD) (Rajpurkar, Zhang, Lopyrev, & Liang, 2016). The dataset is based on the documentation of a harvesting machine and consists of a train, validation, and test split. This allows for training and testing machine comprehension models on it which have been published in the course of the SQuAD challenge. Two examples of high performing models on SQuAD are R-NET (Wang, Yang, Wei, Chang, & Zhou, 2017) by researchers from Microsoft Research Asia and QANet (Yu, et al., 2018) by researchers from Google and the Carnegie Mellon University. Both these models are based on deep neural learning architectures with the former employing deep bidirectional recurrent neural networks (RNN) and the latter employing deep convolutional neural networks (CNN) for learning and performing the QA task.

Methodology & Concepts

To gain robustness with respect to natural language input, this work is supposed to evaluate in what way different forms of word embeddings can contribute to that objective. Therefore, first different kinds of word embeddings are supposed to be created. Possible forms of embeddings are for example (1) pre-trained embeddings, (2) embeddings that are learned solely on the text corpus of a machine documentation, or (3) transfer-learning approaches combining the former two. Furthermore, it is imaginable to augment such word vectors with ontological knowledge (cf. Speer, Chin, & Havasi, 2017). Subsequently, the different embedding approaches are planned to be evaluated. Extrinsic as well as intrinsic evaluations are conducted.

Extrinsic evaluation of word vectors is measured by the performance on some NLP task. Given different word embeddings and a classifier that makes use of these embeddings, the performance of the classifier on an NLP task is evaluated. By keeping the classifier and its hyperparameters constant, difference in performance is attributable to the influence of word embeddings. The NLP task at hand is QA based on the generated dataset described in the previous section. As classifier, any classifier published in the course of the SQuAD competition can be used, e.g., R-NET or QANet. However, simpler architectures should be taken into consideration as well, as they might be sufficient to expose a difference in performance while at the same time saving computational resources. The extrinsic evaluation metrics are the F1 and exact match (EM) scores of the models on the dataset.

Intrinsic evaluation captures how well the word embeddings are doing what they are supposed to – namely mapping relative similarity (or dissimilarity) of words. Therefore, some kind of gold standard has to exist to compare them to. Several human annotated datasets exist measuring the similarity of words, e.g. WS-353 (Finkelstein, et al., 2002) or MEN (Bruni, Tran, & Baroni, 2012). However, these collections consist of common words and therefore cannot cope with machine-specific terms. Therefore, a user study will be conducted with the goal of creating a machine-specific word similarity dataset. Subject-matter experts will be asked to rate word similarity on a predefined discrete scale. The evaluation metric is represented by the correlation of human-annotated similarities with the respective word embedding similarities.

Furthermore, human feedback is supposed to be taken into account for improving word vectors continuously. In order to do that, a concept is created for (1) how humans can give feedback to a system which is using word embeddings and (2) how this feedback can be used for online learning of these embeddings.

Based on the results of this work, we expect to gain an understanding on the following three research questions: (1) What kind of word vectors are most effective in the domain of Technical Service, i.e. perform best on intrinsic and extrinsic evaluation tasks? – where we hypothesize that transfer learning approaches work out best as they are based on more training data than task-specifically trained models and better fitted to the problem domain than pre-trained general-domain models. (2) How can we embed an online learning mechanism for continuously updating the word embeddings? – where different approaches are to be researched and evaluated in terms of applicability. (3) How can we put the word embeddings into practical use with an already existing information system? – where we will try to integrate our findings in a voice assistant of an information system for Technical Service tasks.

References

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606.

Bruni, E., Tran, N., & Baroni, M. (2012). Multimodal Distributional Semantics. Journal of Artificial Intelligence Research 49, 1-47.

Bundesministerium für Wirtschaft und Energie. (2018). BMWi – Maschinen- und Anlagenbau. Retrieved 10 09, 2018, from https://www.bmwi.de/Redaktion/DE/Artikel/Branchenfokus/Industrie/branchenfokus-maschinen-und-anlagenbau.html

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2002). Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1), 116-131.

Firth, J. R. (1957). A synopsis of linguistic theory. Studies in Linguistic analysis.

Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).

Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.

Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. AAAI Conference on Artificial Intelligence, (pp. 4444-4451).

Wang, W., Yang, N., Wei, F., Chang, B., & Zhou, M. (2017). Gated self-matching networks for reading comprehension and question answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).

Yu, A., Dohan, D., Luong, M.-T., Zhao, R., Chen, K., Norouzi, M., & Le, Q. (2018). QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv preprint arXiv:1804.09541.

Contact Persons at the University Würzburg

Chris Zimmerer (Primary Contact Person)
Mensch-Computer-Interaktion, Universität Würzburg
chris.zimmerer@uni-wuerzburg.de

Dr. Joachim Baumeister
Lehrstuhl für Informatik VI, Universität Würzburg
joachim.baumeister@denkbares.com

Legal Information