A few months ago, by combining market opportunity and our team skills at Coteries, we launched cedille.ai, the new largest (and best performing!) French Language Model on the market. In other words, an AI that can generate and process texts in French to serve businesses or research projects. We figured it was time to let you know how that came up!

How it all started

In the beginning, the idea came when our Machine Learning engineers specialized in “Natural Language Processing” (NLP) noticed large language models were available mostly in English or Chinese but none were available in French. On the other hand, we had run several discussions with media professionals and clients to generate texts or rewrite articles. Having tested existing AI models, we found that the results were indeed unsatisfactory, especially in French or German. The most well known and relevant French language model at the time was only 1.5 billion parameters (eventually a little light compared to GPT-3 in English, with 175 billion parameters). 

 

We therefore saw an opportunity and went for it, delivering a solution in record time thanks to our united competencies of machine learning engineers, UX/UI designers, frontend developers, and digital marketing team. Our model, cedille.ai, was born and was delivering better results than GPT-3 in French – a remarkable thing since its size was only about 6.5 billion parameters, being based on GPT-J.

Positioning among other models

Toxicity

Other limitations for current models included toxicity in text generated (meaning if the output content could be perceived as offensive or inappropriate). Our model Cedille, if you look at our benchmarking, is actually less toxic than GPT-3. Of course, the work of preventing toxicity is ongoing and we are continuing to improve the toxicity score of our model, as nothing is ever hundred percent non-toxic.

Language

The biggest language models to date are mostly trained in English. As mentioned previously, GPT-3 alone, the biggest multilingual model, has 175 billion parameters. Apart from French, other European languages are also missing larger single language models which means most are relying on GPT-2 and 3. The most natural way for us was therefore to start with French.

Traction

Our model was launched on November 9th, 2021. The market response to the launch went beyond all our wildest projections! Thousands of people tested the model, tweeted about it and Cedille was featured in many articles and shows: Heidi News, 24heures, 20minutes, Bilan and Startupticker among Swiss media. Even Radio-Canada, a radio station from Quebec, mentioned the project. Cedille was also featured by several Twitch and Youtube influencers in the field representing millions of viewers, such as Science Etonnante, MonsieurPhi, MiCode, and Yannic Kilcher.

Twitter User MrPhi talks about Cedille, inviting people to join a Twitch stream for a philosophical Turing test

This first model launched by Coteries in November has been a huge success : around 20,000 users registered online, generating over 1 million pieces of content so far, either through a playground or through the API launched in December. Cedille has already established itself as a qualitatively superior alternative to OpenAI’s GPT-3 or GPT-J models.

Application statistics on February 24th, 2022
Application statistics on February 24th, 2022

The French model was just the first step and we will soon release our German model. One of our goals is to launch the reference model in several European languages. Next ones may be Spanish, Italian or Portuguese.

How to get access to Cedille

Open source

Research is important in the field of Machine Learning. In that way, we released Cedille as an open-source model. Any researcher, student, individual can access our code directly on Github or on Huggingface.

Playground

Anyone can access our playground to test what Cedille can do. Simply go on Cedille’s web platform and start processing texts. If you lack ideas on what to test, simply play with our various examples.

License

Our model can also be licensed and installed on premise, unlike GPT-3. Why is this important? Installing Cedille on your own servers is a way to comply with strict privacy rules like GDPR. This is especially relevant to any bank, insurance, or government which cannot use cloud-based services.

API

The easiest way of leveraging the potential of Cedille is, however, to request access to our API. Accessing Cedille’s API allows anyone to integrate the model in their own applications. Connecting your application to our API has the following advantages:

  • easy integration to your application
  • continuous access to the latest version of our language model
  • longer text generation
  • more context

Request your API access and let us know your project!

Your future with Cedille

Running open source models in general is no small feat. A lot of expertise in the field is needed, and there are large costs for training the models. To better serve our clients, we can implement our own models, trained specifically for European languages, with a detoxified dataset.

Cedille is a perfect fit for any company that needs efficient text processing in copywriting, summarization, classification, SEO, smart journalism and much more. Note our Machine Learning can also be used to implement our own model adapted to your needs, provide custom AI development or build a custom NLP model for your company projects.

For the future on the other hand, we do plan big with Cedille. Stay tuned and see the potential for yourself on cedille.ai!