Federating Large Language Models from Scratch @ Google Research
Date:
I was invited by Zachary Charles to present our work on federated learning of large language models to the Google Research team working on privacy and federated learning.
Talk abstract:
Large language models (LLMs) offer unprecedented ML capabilities, and continue to improve at a rapid rate. As a result, a variety of organizations are locked a race to scale LLMs and explore the limits of these capabilities. We believe federated learning (FL) offers an untapped potential to dramatically increase the supply of data sources for these models. Early work in this direction has shown, for example, how LLM pre-training can tap into edge device data. Others have shown that federated optimizers can be more communication efficient than their centralized counterparts due to their relaxed synchronization requirements. What is clear is that we are very early in understanding the potential of FL to reshape LLM practices and opportunities. This talk contributes to this early emerging field that blends the two worlds of FL and LLMs. In particular, I will present a fully federated approach for LLM pre-training that has already been shown to be viable at a scale of 3B parameters under a real working system. Although this is only the beginning, we hope this work, in combination with the work of others, can provide a meaningful step towards democratizing LLMs through improved sharing of resources and an approach to lifting their capabilities even further by opening new opportunities to scale data and compute.