Talk @ SNU Computer Science & Engineering

Jun 18, 2025·
Donghun Lee
Donghun Lee
· 2 min read
Image credit: Alpine Mag
Abstract
Directly bypassing training slowdowns for neural networks is possible, by temporarily expanding the model with learnable activations, exploring new descent directions, and then contracting back. This strategy can accelerate gradient-based training of deep learning models.
Date
Jun 18, 2025 11:00 AM — 12:00 PM
Event

For more information on the research in the talk, see the publication from AIML@K.

View Talk Details

Talk Title

Bypass Training Stagnation and Accelerate Your Deep Learning

Abstract

What would you do when your deep learning training slows down? Tired of ad-hoc methods hoping for a better outcome by restarting, perturbing, or adding momentums? This talk introduces Bypass, a well-principled active method that directly rescues gradient-based optimizers from stagnation near stationary points such as saddle points and suboptimal local minima.

Bypass uses a simple yet powerful idea: temporarily extend the model, explore new directions in this richer space, and then contract back to the original architecture while preserving the learned function. This extension-contraction principle is mathematically well-grounded, requiring no explicit identification of stationary regions, and is easy to implement in standard training pipelines. I will present both the mathematical foundation and algorithmic design of the Bypass pipeline, explain how algebraic constraints can enforce a safe return to the original model, and show how this method can naturally unlock new descent paths to gradient-based optimizers. Empirical validations on regression and classification tasks demonstrate that Bypass consistently leads to better training, and surprisingly, improved generalization as well.

Bypass also offers a new lens on neural network morphism and neural architecture search, suggesting ways to navigate model space dynamically during training. Whether your specialty is in optimization theory, dynamics of deep learning models, or practical improvements of deep learning applications, this talk will be of interest to you, since Bypass is a versatile method that integrates naturally with existing model architectures and optimizers.