What’s the Optimum Confidence Threshold for my Chatbot and why do I need one?

What is a confidence threshold and why do we need it?

When a user question matches an intent that it’s been trained on in the chatbot model, the NLP provider returns the intent with a confidence score as a percentage (in fact, NLP providers will typically return up to ten possible intent predictions in order of decreasing confidence score).

This percentage represents how confident the NLP provider was in that intent prediction.  The higher the score, the greater the confidence in that prediction.   

But what if the highest intent prediction is only 25% confident, you wouldn’t want a user to be presented with an intent answer when the chatbot model is not very confident of this prediction.  This is where the confidence threshold comes in. 

If a confidence threshold is set in the model, its function is to only present the user with the top predicted intent if the confidence of that prediction is above the set threshold.  

Then if the top predicted intent falls below this threshold, the user will be presented with the fallback answer, “I’m sorry, I don’t understand” (or something similar), or handed over to a human agent for assistance.



What level do you set your confidence threshold to?

A typical threshold may be set to a default level of 50%, and this threshold may or may not work for your chatbot. 

If you are risk adverse, you may want to set the confidence threshold higher to minimize the risk of incorrect answers to your users.  

However, the downside of this would be the potential for your chatbot to give too many fallback answers (or being passed to a human agent too frequently, thus defeating the object of having a chatbot!) - even if the intent was correct and with reasonably high confidence.  

Conversely, you might be comfortable with a little bit of risk and so you’ll want to set your confidence threshold lower to maximise the correct answers to your users.  

But consequently, the number of incorrect responses could potentially be higher.  

So, there could be a little bit of a trial-and-error process of finding out what works best for your chatbot, through constant testing and re-adjusting of the threshold level.


But how can you accurately gauge what the optimum confidence threshold is for your chatbot model?

Traditionally, the best way to discover the best confidence score and therefore the optimum threshold would be to calculate a receiver operating characteristic curve, or ROC curve.  

This is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.  (read more about this here).  

But this is hugely time-consuming, especially for larger chatbot models, and it can also be difficult to interpret the results for anyone unfamiliar with such statistics.

For those of us who haven’t got the time (or the knowledge or confidence) to calculate a ROC curve, thankfully there is an easier way of finding out what the optimum confidence threshold would be for your chatbot, by using the QBox Confidence Threshold Analysis feature.

QBox Threshold Analysis Feature

In less than a few minutes, QBox can give you the information needed to set the right confidence threshold for your chatbot.  

And all that’s needed is a good cross-validation dataset that spans all the intents in your chatbot model, and then to run a cross-validation test within the tool.  

It’s as simple as that!  

You have three options available to enable you to meet business KPIs:

  1. You can adjust the confidence threshold to any level to find out how your chatbot would perform on the cross-validation dataset.  For example, you might be curious to see how it would perform if you had a 40% threshold set.  QBox would then inform you what percentage of cross-validation questions would be answered correctly, incorrectly and unanswered (ie below confidence).
  2. You can adjust the percentage of correctly answered questions.  For example, you may have a KPI stating the chatbot must perform at a correctness rate of no lower than 95%.  QBox would recommend the level to set your confidence threshold to achieve this level of correctness.  
  3. You can adjust the percentage of incorrectly answered questions.  So, if you have a KPI stating the chatbot must have no more than 2% of incorrectly answered questions, again QBox would recommend the level to set your confidence threshold to meet this KPI.

All of these options will help you decide what is the best confidence to set depending on your use case - can you accept a certain level of risk, and therefore increase the level of automation, or are you in a scenario where you have to be risk adverse?  

With this Threshold Analysis feature in QBox, you will quickly and easily understand the trade-off for each scenario. 

To find out more, visit QBox.ai.

Read through our library of useful content in the QBox blog.

Ready to give QBox a try? Get 5 free tests.

Alison Houston

Alison is our Data Model Analyst and builds and trains chatbot models for clients. She also provides advice and troubleshooting support for clients who are struggling with the performance of their own chatbots.

Follow me on LinkedIn