Cross-lingual understanding and Synthetic Data Generation
The LevelFish team often gets asked as the first question - can we paraphrase in other languages ? The answer is yes…. and no.
If you download our free utility chatbot alternative phrase generator on the Google Workspace Marketplace, then it is English only paraphrasing as default and valid for a short trial and although this is not enabled for paraphrasing from another language source phrase, you can upgrade and get in touch for a richer, even easier to user paraphrasing tool with language detection and translation inbound enabled.
You can fill out the contact form here to request a trial Google or Excel Sheet.
More advanced users and chatbot platforms can also access our API to integrate into in addition to obtaining a nicely branded Google Sheet or Excel Sheet which is enabled for translation on inbound language phrases. For example:
For example -
Source Language | Source Phrase | Utterance in English | 1 of many alternative paraphrases generated |
English Phrase | I would like to make a complaint. | I might want to submit a question. | I would like to make a complaint. |
Dutch Phrase | ik ben op zoek naar een antwoord op mijn vraag | I'm searching for a solution to my inquiry | I'm looking for a solution to my question. |
Dutch Phrase | waar kan ik informatie over producten vinden? | How can i find information about products? | Is there a way I can find information about products? |
French Phrase | Où puis-je trouver des informations sur les produits ? | How can i find product information? | is there a way i can find product information? |
English Phrase | are the beer bombs still available somewhere | Are the beer bombs still available? | are the lager bombs still accessible some place |
The challenge of doing anything more specialist than this is currently quite great for a start up like LevelFish, as the main supersize training data sets of GPT2, GPT3, MS Turing have billions of GB of training data and it is all in English. Processor overhead for training against these is costly.
The comparative lack of foreign language training data in a GPT2/3 state and not to mention the need for many languages understood by linguists in house in order to fine tune and Quality Assure outputs through algorithms and fine tuning is not an insubstantial investment.
LevelFish would love nothing more than providing Cross-lingual understanding and natural language generation in our tools, however for today that is our moonshot. Right now, huge amounts of training data can be created in the most widely used language of English and from another language to begin with.
Luckily for us, business largely uses English globally and we are very lucky to be able to lean on and fine tune our engines against training data sets from OpenAI and MS Turing, and Google.
As we focus on accelerating the generation and utility / accuracy of warm up training data, our next target is to double down on linguistic variation generation which is further each time from the source phrase in terms of variance.
For example it would be great for our service to be able to take the phrase
My car is covered in mud
And rephrase this as I need to wash my car because it is dirty.
We aren't there yet, but that’s our aim and moonshot.
So, if you want amplify and vary English phrases for now, including an inbound translated phrase then get in touch and have a play first by downloading the Google Addon paraphraser for FREE by clicking > here), note the addon does not allow translation in bound though.
Whether you call it rephrasing, paraphrasing, phrase variants, replacement phrase, varying a phrase, paraphrasing chatbot training data, utterance generation or whatever, then it's all the same to us.
If you want to jump deeper in and translate another language inbound to English and then paraphrase it, you’ll need our API or a dedicated Google Sheet / Excel version with translate enabled, email sales@levelfish.com to know more on that, it's totally inexpensive, though not free (even to us!)
Contact us on our web contact form https://www.levelfish.com or at info@levelfish.com or book a meeting to chat https://calendly.com/levelfish
Useful Reads:
XGLUE: Expanding cross-lingual understanding and generation with tasks from real-world scenarios
Cross-Lingual Natural Language Generation via Pre-Training
Multilingual Whispers: Generating Paraphrases with Translation
https://aclanthology.org/D19-5503.pdf