This is the first post in a series dedicated to making large language models (LLMs) encrypted end-to-end with homomorphic encryption. We will publish more details on how to achieve it technically as we make progress towards this goal in the coming years.
.gif)
Large language models (LLMs) are probably the biggest breakthrough in AI over the last decade. While there were successes using other architectures, the recent advances displayed by the likes of ChatGPT and Bing have completely changed the game. It is now clear that AI will transform society as deeply as the internet did before.
While LLMs trained on public data can be used for anything from writing a blog post to writing code, the real power comes from the ability to contextualize models, either by fine tuning or bootstrapping the prompt with some additional information. In both cases, you will need to feed the LLM custom data, which could include highly sensitive things such as your messaging history, your company’s internal documents, your slack messages, etc. This lack of privacy protection is precisely what made Italy and other countries ban ChatGPT.
But what if we could have encrypted conversations with LLMs in the same ways that we have encrypted conversations with our friends on messaging apps? Being able to use LLMs without revealing our personal data would unleash the true power of AI, while making both users and regulators happy. And as it turns out, this is now possible thanks to a powerful encryption technique called Fully Homomorphic Encryption (FHE).
The idea behind FHE is that you can do processing on encrypted data without having to decrypt it. In the context of AI, this is how it would work:
Similarly, if you want to fine tune a model using some sensitive data, you would encrypt the entire dataset under your secret key, then send it to the service provider who would blindly calibrate the model.
The reason why FHE wasn’t used before is simply that it wasn’t ready: it was too slow to be practical, too limited in terms of what it could do, and too difficult to use. This is exactly what we are solving at Zama, by providing tools that enable data scientists and developers to use FHE without having to know cryptography. Our technology enables any computation to be carried out over encrypted data, regardless of how complex things are.
But while performance has improved by 20x in the last 3 years, we are still very far from being able to run LLMs in FHE in a cost effective way. A simple back of the envelope calculation shows that for an average-sized LLM, generating one encrypted token would require up to 1 billion large-precision PBS. The term PBS here stands for “programmable bootstrapping”, which is the most costly operation in FHE, and is used to compute functions on encrypted data (such as activation functions in neural networks).
On a modern CPU, we can compute around 200 8-bit PBS / second at a cost of $0.001. To generate one token per second, it would thus cost around ~$5,000 per token. To make this economically viable, tokens should cost at most $0.01, meaning a 500,000x improvement.
While getting 500,000x improvement may sound impossible, it is actually on the horizon. There are 3 major trends at play here:
Since most of the challenges in FHE are already solved (or will be in the near future), we can confidently expect to have end-to-end encrypted AI within 5 years. I strongly believe that when this happens, nobody will care about privacy anymore, not because it’s unimportant, but because it will be guaranteed by design.
.png)
News, research and product releases