Three Common Paraphrase Questions
When we speak to the community of chatbot platform owners and product managers in this space, we see a common set of questions, so we thought we would pick the top three and answer them here:
Why do I need to help my customers with training data?
Over the eight years that I have worked on conversational-A and chatbot capabilities, the constant challenge has been training data. Training data stands squarely as the primary obstacle between a good service and a bad service.
Consistently, chatbot platforms, classifier developers, and even consultants charged with building chatbots have made training data the responsibility of the subject matter experts.
The challenge typically is that they are not chatbot experts, they usually have a day job, and they are not computational linguists.
So, there is a strong need to enable whatever support you can to create training data. To provide tools that ease the subject matter experts struggle and, more importantly, make a working chatbot. This is especially true of low-code/no-code solutions.
So we have to ask ourselves if generating training data works. More and more metrics are emerging, but we can see that chatbot platforms are adopting this feature. From IBM to Rasa and IPSoft, we see many platforms supporting their customers with training data generation. Moreover, these product features are making their way quickly to the low-code/no-code sector.
What Platforms does Levelfish work on?
We are sometimes asked what technology our platform is based on. What technology sets and what data sources we use.
Well, so far, we have declined to give details on the basis that we use a range of technologies, some are very simple, and some are very sophisticated. It is the blend of technologies, training data bundles, and various platforms that allow us to generate a range of unique paraphrases. Typically we generate between 6-12 paraphrases per request, and the responses are quality checked to ensure that the quality of the responses is consistently high.
We are, of course, continually improving and developing. I would estimate that we add a new paraphrase engine every month, and in doing so, we typically cycle out lower-performing paraphrase engines. This may not always be straightforward as a critical differentiator in our product is an originality coefficient that needs to be assessed in any of our paraphrase engines.
In short, there is a range of technologies, and we are constantly on the move. We would love to be more transparent, but we have IP to protect and wish to be judged on our results for now.
What data do you store?
Apart from our accounts team holding data, there is very little data that we collect and store term in our system without express permissions.
Of course, when we receive the paraphrase request, we use it, and we generate the paraphrases as a response. These are all stored for a maximum of 6 months and used only for fault resolution.
If you have signed up for the feedback API to assist us in the generation of more training data, then we will, of course, hold those requests and accepted responses for an indefinite period.
NOTE: It is important to note that our product should never be plugged directly into a live chatbot, and therefore we would never receive personal identifiable user data of the end-users of your chatbot platform. We are solely here to help your customers build better chatbots by providing better training data.