A generative oracle in a few lines of code using DeepPavlov

Published in

DeepPavlov

6 min readJul 23, 2019

Recently the CISS 2 summer school on deep learning and dialog systems took place in Lowell. It was organized by the Text Machine Lab of UMass Lowell and iPavlov project, Moscow. People traveled from around the world to the school, to meet top-notch researchers and learn the state of the art.

Besides the lectures and tutorials, the school held a competition between the participants, concerning team projects that we should carry out during the school. There was limited time to work on the team projects since the lectures and tutorials took up most of our days. Hence efficiency was a huge matter, and teams were worried about managing to have a working dialog system at school’s end. We told a little about the experience in this post.

Manuel, Beatriz, and Estevão, left-to-right

Our team was composed by Estevão Uyrá, Beatriz Albiero, and Manuel Ciosici. Estevão and Beatriz are from Brazil and work together at the Serasa Experian DataLab. Manuel, originally from Romania, was just finishing his Ph.D. in Denmark at Aarhus University. In the end, our team wrote a simple fortune telling chatbot that placed second in the competition. In this post, we will describe how we managed to create a complete chatbot powered by deep learning in just a few lines of code.

Because we had lots of DeepPavlov experts at the summer school, we decided to follow their track of the tutorials. This was in some way a risky decision since we knew nothing about the library, but it really paid off. We knew from the beginning that we wanted to build a question-answering bot, but beyond that, we didn’t really know exactly what kind of conversation we wanted it to perform. Since time was short, we decided to start quickly from something simple and rethink our final product later on.

Iterating fast

We started by implementing a simple system to identify if a given text span from Wikipedia contains an answer to a given question. For this, we used the SQuAD 1.1 data set, which we pre-processed into triples of (question, sentence, flag). The flag indicated if the sentence text contains the answer to the question, and was used as the label. Put simply, we reduced the original problem that is to predict the precise span of the answer in the text into the binary classification problem, aiming to understand better how difficult was the problem, get some grasp of the dataset, and start coding.

We then used DeepPavlov to download pre-trained GloVe embeddings using the DeepPavlov GLoVe embedder. In four lines we can embed each sentence in our dataset as the mean vector of its word embeddings, first downloading the model and then using it. Interestingly to note, the GloVeEmbedder class is able to use any file in the simple GloVe format, meaning that user-created embedding can also be used (see more in the docs).

We concatenated question and sentence and inputted this to a shallow feed-forward neural network consisting of a layer with ReLU units followed by a sigmoid unit for which we used Keras in TensorFlow. Due to the straightforwardness of DeepPavlov and Keras, we only needed to write a few lines of code, which gave us more time to understand the problem we were working on. The precision value of .4 we found informed us that the task was indeed hard, and it would be challenging to train our own custom model to perform well enough to be the brain of a bot, especially since our computing power (and time) was short.

After having gained this appreciation for the question answering task, we decided to use pre-trained question answering models. Luckily, DeepPavlov comes with pre-trained question answering systems trained on SQuAD, so when we wanted a proper question answering system, we could get one in just two lines of code. Downloading, in this case, didn’t even need a separate function, as all dependencies (char embeddings, word embeddings and model weights) are downloaded in the first run and stored for later ones.

Before trying the DeepPavlov model, we were ready to spend most of a day making the pre-trained model work and were thinking of our next step to move the model into Telegram. In reality, the model worked right out-of-the-box and we had a full extra day of coding to do. Calling model with a background context and a question, like we did above, returns a span from the context that answers the question together with the starting character of the span, and a confidence score. Moreover, the response time was under a second.

Aiming high

We decided we had time to make a fortune-telling bot, by adding to our Q&A capability an extra pre-trained language model that could generate text on its own. The language model should generate coherent predictions about the future. Our get_future_prediction function does exactly that, by leveraging GPT2 capacity to generate coherent text after some seed text.

The user’s name is provided as a parameter to the function which then creates a starting text containing some seed tokens and a starting sentence containing the user’s name. We use this as a starting point and generate a continuation of the sentence using GPT2. To prevent GPT2 from drifting off-topic, we only generate 80 tokens of text, after which we remove trailing tokens so that we have complete sentences. We put our starting text and the generated continuation text back into GPT2 as starting text and generate 80 tokens more. We go through this process until we have a 400 long text that usually looks like a vague future prediction that an oracle would make.

We got the pre-trained GPT2 model from the official OpenAI implementation and had to spend some time making it work. After some adjustment, the model generated futures in approximately 2 min, which we considered enough to our purpose. We just made our telegram bot ask the user to wait, and then send a message when the future was ready to be questioned.

The Oracle

Time to put it all together. An overview of our bot’s architecture is shown in the image. When a user starts a conversation with our oracle, the chatbot starts by generating a future prediction that is 400 characters long based on the user’s name and some standard fortune-telling sentences. Once the future prediction is generated, the chatbot uses this as the background context to answer questions about the user’s future. This is where the DeepPavlov pre-trained SQuAD question answering system comes in. It takes the user’s question and the generated text, tries to find a span of the generated text that appears to answer the user’s question, and then sends the answer back to the user.

The entire source code for the bot is in the GitHub repo. Generation of the oracle predictions (GPT2 text generation) is quite resource consuming, so instead of running locally, we ran it on Colab. If you run the notebook on a computer with a CPU only, expect several minutes to be spent on this task. We recommend you either run the notebook on a computer with a GPU or use Google Colab and select a GPU run time, which you can use for free. If you try the bot alone, 2 min is more than enough to generate a future. With many people, as we discovered in our live demo, it may take a lot more.

That’s it. You now have your own personal fortune-telling oracle chatbot in a few lines of code.

Beware, because you may not want to see what the future holds.

We would like to thank the CISS organizers for enabling the huge exchange of experience and making all this possible, and a big thanks to the other school participants for the conversations, open discussion and feedback. Our project would certainly fall short without the vivid and creative environment around us.

From Organizers

We hope this was helpful and you’ll be eager to use the DeepPavlov library😃. And don’t forget DeepPavlov has a forum — just ask us anything concerning the framework and the models here, we’ll reach out to you back ASAP. Thank you for reading!

A generative oracle in a few lines of code using DeepPavlov

Iterating fast

Aiming high

The Oracle

From Organizers

Written by Estevão Uyrá Pardillos Vieira