Google AI introduces AltUp (Alternative Upgrades): an AI method that takes advantage of increased volume in switch networks without increasing computation cost.

Google AI introduces AltUp (Alternative Upgrades): an AI method that takes advantage of increased volume in switch networks without increasing computation cost.

In deep learning, transfer neural networks have received significant attention for their effectiveness in various fields, especially in natural language processing and emerging applications such as computer vision, robotics, and autonomous driving. However, with performance enhancement, the ever-increasing scale of these models leads to a significant increase in the computing cost and inference latency. The main challenge is to take advantage of the advantages of larger models without incurring impractical computational burdens.

The current landscape of deep learning models, especially transformers, shows remarkable progress across diverse domains. However, the scalability of these models often needs to be improved due to increasing computational requirements. Previous efforts, embodied in expert sparse mixture models such as Switch Transformer, Expert Choice, and V-MoE, have mostly focused on efficiently scaling network parameters, which mitigates computation overhead per input. However, there is a research gap regarding scaling the representation of the token itself. Enter AltUp is a new method introduced to address this gap.

AltUp stands out by providing a way to increase token representation without inflating computational overhead. This method divides the expanded representation vector into equal-sized blocks, processing only one block in each layer. The core of AltUp’s effectiveness lies in its prediction correction mechanism, allowing the outputs of unprocessed blocks to be inferred. By maintaining the dimensionality of the model and avoiding the quadratic increase in computation associated with direct scaling, AltUp appears as a promising solution to the computational challenges posed by larger switch networks.

AltUp Mechanics delves into the complexities of token embeddings and how they can be expanded without causing increased computational complexity. The method includes:

  • Call a 1x width adapter layer for one of the blocks.
  • The block is called “activated”.
  • At the same time it uses a lightweight predictor.

This predictor computes a weighted combination of all input blocks, and the predicted values, along with the calculated value of the activated blocks, are subjected to correction by a lightweight corrector. This correction mechanism makes it easier to update broken blocks based on active blocks. Importantly, both prediction and correction steps involve minimal vector additions and multiplications, and are much faster than a traditional transformer layer.

Evaluation of AltUp on T5 models across standard language tasks demonstrates its consistent ability to outperform dense models with the same accuracy. Notably, the AltUp-enhanced T5 Large achieves significant speedups of 27%, 39%, 87%, and 29% on the GLUE, SuperGLUE, SQuAD, and Trivia-QA benchmarks, respectively. The relative performance improvements of AltUp become more pronounced when applied to larger models, confirming its scalability and improved effectiveness as model size increases.

In conclusion, AltUp emerges as a noteworthy solution to the long-standing challenge of efficiently scaling convolutional neural networks. Its ability to increase symbolic representation without a proportional increase in computational cost holds great promise for various applications. AltUp’s innovative approach, featuring partitioning and prediction correction, provides a practical way to leverage the benefits of larger models without being subject to unwieldy computational requirements.

The researchers’ extension of AltUp, known as Recycled-AltUp, shows how adaptable the proposed method is. Recycled-AltUp, by iterating the embeddings rather than expanding the initial token embeddings, shows strict improvements in pre-training performance without a noticeable slowdown. This dual approach, coupled with AltUp’s seamless integration with other technologies such as MoE, exemplifies its versatility and opens up avenues for future research in exploring training dynamics and typical performance.

AltUp represents a breakthrough in the quest for efficient scaling of switch networks, and offers a compelling solution to the trade-off between model size and computational efficiency. As described in this paper, the research team’s contributions represent an important step toward making large-scale transformer modeling more accessible and practical for a myriad of applications.


Check the Paper and Google Article. All credit for this research goes to the researchers in this project. Also don’t forget to join We have 32k+ ML SubReddit, 41k+ Facebook community, Discord channelAnd Email newsletterwhere we share the latest AI research news, cool AI projects, and more.

If you like our work, you’ll love our newsletter.

We are also on cable And WhatsApp.


Madhur Garg is a consulting intern at MarktechPost. He is currently pursuing a bachelor’s degree in Civil and Environmental Engineering from the Indian Institute of Technology (IIT), Patna. He shares a strong passion for machine learning and enjoys exploring the latest developments in technologies and their practical applications. With a keen interest in artificial intelligence and its diverse applications, Madhur is determined to contribute to the field of data science and leverage its potential impact across various industries.


🔥 Meet Retouch4me: a set of AI-powered plugins for photo retouching

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *