When you talk about testing here at LevelFish we are really referring to warm up training data (usually a pre-chatbot build data set), with folks working a data set from ground zero.
Our testing service is part of the process of generating additional alternative phrase data by machine and then testing the ‘before and after’ with additional training data to see if you have moved the needle positively in terms of helping a chatbot be more likely to understand by a chatbot NLU (how it understands dialog).
Let's not get confused here, LevelFish’s solutions are designed to do two very specific things.
i) Amplify your seed data and then
ii) test the ‘before and after’ data sets using industry standard K-Fold a dn F-Score testing for accuracy.
For experts out there, yes, yes, of course you would expect to see a big improvement in accuracy % by doubling or tripping up your training data set from 3 utterances per intent to 8 or 10 and it is useful to to see intent confusion matrices early on in the process to improve understanding and more data helps reduce bias, so all good there I think you will agree.
The main reason you might do K-Fold tests early is also in a corporate setting is to provide more proof of the chatbot milestone improvements to secure budget / funding AND to ensure early testers are interested and engaged rather than getting the red screen of death equivalent to ‘I am sorry but I can’t understand you’ or ‘please can you ask that in a different way’.
What does LevelFish’s quickstart K-Fold testing actually do ? In a nutshell, it automatically splits your warm up training data (let's say 30 intents with 2-3 seed utterance examples) into 3 sets or K’s, so K= number of folders containing equal amounts of data.
As a comparison for the more advanced, folks who have already chosen a chatbot platform, for example RASA, you may be offered five (5) folders or K-Folds during testing as you may have much bigger data sets by then and are more likely in the final production or pre- release phase where other analysis tools offer value.
As K’s refer to the number of folders note that some hardcore solutions (read powerful and costly) will run as you grow to intents of 100, 200,300,400,500 or more and might be providing heavyweight banking, insurance experiences which need a dedicated testing suite like a Qbox or Botium solutions. So LevelFish is a nice fit at the front end of the data generation process and powerful enough to take you to the point of chatbot selection in any setting. We do not aim to play in the big deployment testing game post production and neither do we carry the big $ license fees those platforms do.
The challenge of course for none technical folks and especially for low or no code platforms is that these test solutions are both expensive to license (euro 10,000 to 30,000 per chatbot) and overly complex to get started in terms of installing, training and set up. Further some platforms like Rasa require Linux and python environments, so if like me you just want a steer as to the right direction you are going, to create some better data and check your chatbot is going to understand most questions, then levelfish is ideal, low effort, zero cost for ‘starter uppers’ which is people going from the start line and imagining their chatbot.
Our team and solution leans on IBM Watson technology for K-fold testing as it is the market leader and our K-fold code takes your data set and creates 3 folders splitting the data in thirds to carry out a train and test data process and outputs the results in CSV format with three rather useful diagrams showing precision, recall and confusion both before and after enhancing with new phrases.
To pigeon hole LevelFish tools, you are looking at warm up training data, say 5-100 intents and we suggest that subject matter experts, conversational designers, chatbot builders, deliver up to three (3) diverse utterances per intent (it is held as correct that one person will become a little repetitive if they create more than 3), hence then use of real customer data if you can (which may be unstructured and take time to prepare) or, colleagues, consultants and crowd sourcing to get more variance on a slower and more costly paid for basis.
Three varied utterances can then be amplified with anywhere from 3-15 suggested utterances from LevelFish’s paraphrasing solution (download it FREE by clicking > here). If you want to translate another language inbound to English and then paraphrase it, you’ll need our API or a dedicated Google Sheet / Excel version with translate enabled, email email@example.com to know more on that, happy to help and give you a trial and promise not to sell it to you. We want you to come back asking for more from our free trials.
Once the builder selects 3 or so plumb, varied synthetically created phases its time for that K-Fold testing and to see if your data is going to make your chatbot smart enough to secure interest at your company and the necessary budget to build something that users can interact with.
Cross validation helps you find confusion in your training data and intents. Like humans, the more confused you are, the less likely you are to know what's going on or what somebody wants, so confusion is always a bad thing to have.
Please note that whilst this powerful service does NOT predict runtime performance on utterances it can help determine if your initial intent structure is clear or confusing, as well as identify places where additional training is needed, and that's mainly what we do, provide additional training data.
If you need to train the service more, then run it through the paraphrase again to enhance the suggestions you chose (as we run a number of differently trained paraphrase engines so you may get somewhat different results each time).
When you consider learning, time and money involved in creating a chatbot LevelFish’s paraphraser delivers a FREE solution and testing on start up projects delivering instant results. You don’t need to learn anything to generate better data and unless you need premium features where you can go a long way fast to creating a better chatbot and proving it. Until you have user feedback and budget nothing is going anywhere fast anyway, take our word for it from experience.
So download the addon right now and try it out by clicking > here then simply get in touch for us to test your data for free (24 hour process until Q2 2022). If you need to strike an mutual non-disclosure agreement for your data, thats fine as well, we totally respect that and ours is digitally available.
To conclude, if you think your organisation can benefit from a chatbot that is step 1. Getting the right training data is the critical step 2 otherwise your users will be talking to a toddler instead of a 12 year old in terms of cognitive understanding and thats a bad thing.
Algorithm Inspection for Chatbot Performance Evaluation