Varun Singh, Lucas Krauss, Sami Jaghouar, Matej Sirovatka, Charles Goddard, Fares Obied, Jack Min Ong, Jannik Straube, Fern, Aria Harley, Conner Stewart, Colin Kealty, Maziyar Panahi, Simon Kirsten, Anushka Deshpande, Anneketh Vij, Arthur Bresnu, Pranav Veldurthi, Raghav Ravishankar, Hardik Bishnoi, DatologyAI Team, Arcee AI Team, Prime Intellect Team, Mark McQuade, Johannes Hagemann, Lucas Atkins
The Arcee Trinity Large is a 400B parameter sparse model using a novel MoE approach, with successful training on 17 trillion tokens and new load balancing strategies.
Researchers have developed a new large-scale AI model called Arcee Trinity Large, which is part of a family of models designed to efficiently process language. This model uses a technique called Mixture-of-Experts (MoE) to activate only a part of its parameters for each input, making it more efficient. The biggest model in this series, Trinity Large, has 400 billion parameters but activates only 13 billion for each token it processes. The team introduced a new method to balance the workload across different parts of the model, ensuring smooth training. The models were trained on vast amounts of text data and are now available for use.