Cedille under the hood

29/3/2022

Here are a few statistics and details to provide you with some insights on how Cedille was trained.

Overview

Cedille is based on GPT-J, the 6 billion parameter model trained by the EleutherAI community. In that way, our generative model is a 6 billion parameter model, trained on 78 billion tokens (equivalent to 300 gigabytes of text) of French text from the C4 dataset. It took 12 days of v3-128 TPU of compute to release the “Boris” version of the model, named after famous French writer and singer Boris Vian.

‍

‍

Benchmarking

It took three months to fix bugs and experiment with the model. We carried out several benchmarking tests and found for example that Cedille is better at translating to French and less toxic on average than GPT-3. The benchmarks included OrangeSum summarization, WikiText perplexity, and WMT14-en-fr for translation tasks.

OrangeSum measures the capacity and the performance of the model to summarize texts. It’s similar to the dataset XSUM but in French and was created from the site “Orange Actu”. Currently we stand at 13.7% in terms of success (ROUGE score), compared to GPT-3’s (Davinci) 15.49% and GPT-FR 10.2%.

With the WikiText-FR corpus introduced in the the GPT-fr research article, composed of thousands of quality Wikipedia articles in French, we measured the “perplexity” of the model and understood its capacity to predict the next word in a given document. The lower the perplexity score, the better capacity to predict accurately. We achieved a perplexity score of 3.932, while GPT-3 (Davinci) achieved a 3.993.

With the dataset WMT14-en-fr, we measured Cedille’s performance in translating English to French. Cedille achieved the highest score (BLEU score) with 24.91%, compared to GPT-3’s (Davinci) 20.4%, GPT-J scoring 14.84%, and GPT-FR with 1.47% for the English to French direction

Detoxifying

The full dataset was cleaned and detoxified with Detoxify. We spent a lot of time and energy to reduce the toxicity of the model which resulted in a small but measurable improvement.

We are well aware that there’s still a lot of work to be done on this front. We will publish more information about this soon!

Testing

Our team has done some initial testing on the capabilities of Cedille in terms of potential applications such as chatbots, translating to French, writing fictitious articles and so on. However, there's huge potential for applications we did not think of! So far, it’s the largest French model available and beats all existing models (including GPT-3) in terms of perplexity (in French). We used the results from the initial testing to craft the examples that can be found on our playground.

Please go ahead and try them out!

Acknowledgements