Chatbot Analysis in Five Simple Steps

Yes, it really is that easy to analyse your chatbot in five simple steps!

But seriously, we all know the importance of doing endless rounds of training and testing before we release our chatbots into the real world.

But the problem with NLP training is that when we look at our training data, we’re looking at it from a human perspective.

And we’re probably trying to make improvements, tweaks, and additions from a human perspective, too.

But the best way to understand the principles of chatbot performance is to think of it from an NLP point of view.

The job of finetuning our chatbot training data would be made so much easier if we knew the algorithms that the NLP providers use.

But the problem is, we don’t, these algorithms are a black box to us chatbot builders, and we’ll never get access to them.

However, QBox is the perfect tool to help give insight into our training data, enabling us to make informed decisions about how to develop the performance of our chatbots.

QBox does this through analysing and benchmarking the chatbot training data, by visualizing and understanding where it does and doesn’t perform, and why for our specific NLP provider.

This is achieved in five simple steps:

Step 1 – Test

You will first need to download your model file from your chosen provider, and then run your first test in QBox.

This will start the process of analysing the performance of your training data for your chosen NLP provider.

But there’s more. A standard test will analyse your training data in your model, but there is also the option of doing a cross validation test, where QBox can analyse the performance of data the model has not been trained on.

Both types of tests are just as important as each other: standard testing for assessing the strength of the training data within your chatbot model, and cross validation testing for seeing if there are any gaps in the chatbot’s knowledge and getting an idea of how it will perform in the real world.

Step 2 – Identify

Once the results of the test are revealed, QBox will give three scores for the model as a whole. One each for correctness, confidence and clarity.

These three scores act as KPIs and will give a good idea of the strength of the model’s performance.

Each intent in the model also has the same three scores and this enables us to easily identify the most poorly performing ones in the model, i.e. the ones with the lowest scores for correctness (first and foremost), confidence and clarity.

Step 3 – Analyse

Once the poor-performing intents have been identified, you can start the deep analysis by selecting the one with the lowest score for correctness.

This will be the best place to start making improvements, because once you start improving the correctness score of an intent, the confidence and clarity will generally follow suit.

There are a variety of features available In QBox to deeply analyse each utterance within the poorly performing intent:

  • Diagrams for correctness, confidence and clarity that display the training data within the intent, in easy-to-understand formats so you can see which utterances are causing confusion and need to be fixed
  • Word Density feature to see quick snapshot of any words in the utterance that may have been over-represented or under-represented, both at intent and global model level
  • Training Data feature with colour coding to see at-a-glance any overuse of certain words, phrases and to compare the training data of confused intents side-by-side
  • Explain feature to give a deeper word-by-word analysis of how much influence each word has on the intent prediction.

All these features in QBox will provide the information necessary to understand why the utterance is poorly performing so that the user has the knowledge needed to fix the issue.

Step 4 – Fix

Once you understand where the issues are in your training data, you can make the appropriate changes.

This will be done within your NLP provider.

The changes might include adding new training data to reinforce a concept being expressed within an intent, removing training data that isn’t helping the model whatsoever, moving training data to a better placed intent or a variety of other fixes.

Step 5 – Validate

After you’ve made the fixes to your model, the last step is to run another test in QBox.

Download the fixed model from the NLP provider and then simply run your next test in QBox.

The second test will enable you to:

  1. Validate if the fixes you made were successful
  2. Detect what affect your fixes have made on the rest of the model.

QBox will compare the second test to the first test so you can easily see if the three scores for the intent you’ve tried to fix have improved.

It will also alert you to any improvements or regressions that have affected other intents in the model. 

It’s as easy as that, just follow these five simple steps and you’ll soon be on your way to really understanding how your training data is being viewed from the NLP provider point of view.

This will then give you the knowledge to put any necessary fixes in place to improve the performance of your chatbot!

Find out more about QBox, watch the video ‘QBox in 5 Simple Steps’

Want to try QBox? Get 5 free tests here.

 Read another blog in the series, How to Build and Scale a Chatbot

Alison Houston

Alison is our Data Model Analyst and builds and trains chatbot models for clients. She also provides advice and troubleshooting support for clients who are struggling with the performance of their own chatbots.

Follow me on LinkedIn