Talk @ SNU Computer Science & Engineering


For more information on the research in the talk, see the publication from AIML@K.
View Talk Details
Talk Title
Bypass Training Stagnation and Accelerate Your Deep Learning
Abstract
What would you do when your deep learning training slows down? Tired of ad-hoc methods hoping for a better outcome by restarting, perturbing, or adding momentums? This talk introduces Bypass, a well-principled active method that directly rescues gradient-based optimizers from stagnation near stationary points such as saddle points and suboptimal local minima.
Bypass uses a simple yet powerful idea: temporarily extend the model, explore new directions in this richer space, and then contract back to the original architecture while preserving the learned function. This extension-contraction principle is mathematically well-grounded, requiring no explicit identification of stationary regions, and is easy to implement in standard training pipelines. I will present both the mathematical foundation and algorithmic design of the Bypass pipeline, explain how algebraic constraints can enforce a safe return to the original model, and show how this method can naturally unlock new descent paths to gradient-based optimizers. Empirical validations on regression and classification tasks demonstrate that Bypass consistently leads to better training, and surprisingly, improved generalization as well.
Bypass also offers a new lens on neural network morphism and neural architecture search, suggesting ways to navigate model space dynamically during training. Whether your specialty is in optimization theory, dynamics of deep learning models, or practical improvements of deep learning applications, this talk will be of interest to you, since Bypass is a versatile method that integrates naturally with existing model architectures and optimizers.