Tsinghua's latest 'continuous learning' review, 32 pages detailing the review of continuous learning theories, methods and applications-AI-php.cn

In a general sense, continuous learning is clearly limited by catastrophic forgetting, and learning new tasks often leads to a sharp decline in performance on old tasks.

# In addition to this, there have been an increasing number of developments in recent years that have expanded the understanding and application of continuous learning to a great extent.

# The growing and widespread interest in this direction demonstrates its practical significance and complexity.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

Paper address: //m.sbmmt.com/link/82039d16dce0aab3913b6a7ac73deff7##

This article conducts a comprehensive survey on continuous learning and attempts to Make connections between basic settings, theoretical foundations, representative methods and practical applications.

Based on existing theoretical and empirical results, the general goals of continuous learning are summarized as: ensuring appropriate stability-plasticity trade-offs, and adequate task performance, in the context of resource efficiency Within/between-task generalization ability.

Provides a state-of-the-art and detailed taxonomy that extensively analyzes how representative strategies address continuous learning and how they adapt to specific challenges in various applications.

Through an in-depth discussion of current trends in continuous learning, cross-directional prospects, and interdisciplinary connections with neuroscience, we believe this holistic perspective can greatly advance this field and beyond. follow-up exploration.

Introduction

Learning is the basis for intelligent systems to adapt to the environment. In order to cope with changes in the outside world, evolution has made humans and other organisms highly adaptable and able to continuously acquire, update, accumulate and utilize knowledge [148], [227], [322]. Naturally, we expect artificial intelligence (AI) systems to adapt in a similar way. This has inspired research on continuous learning, where a typical setting is to learn a sequence of contents one by one and behave as if they were observed simultaneously (Figure 1, a). These can be new skills, new examples of old skills, different environments, different contexts, etc., and contain specific real-world challenges [322], [413]. Since content is provided gradually over a lifetime, continuous learning is also called incremental learning or lifelong learning in many literatures, but there is no strict distinction [70], [227].

Different from traditional machine learning models based on static data distribution, continuous learning is characterized by learning from dynamic data distribution.

A major challenge is known as catastrophic forgetting [291], [292], where adaptation to a new distribution often results in a greatly reduced ability to capture the old distribution. This dilemma is one aspect of the trade-off between learning plasticity and memory stability: too much of the former interferes with the latter, and vice versa. Beyond simply balancing the “ratio” of these two aspects, an ideal solution for continuous learning should achieve strong generalization capabilities to adapt to distributional differences within and between tasks (Figure 1, b). As a naive baseline, retraining all old training samples (if allowed) can easily solve the above challenges, but incurs huge computational and storage overhead (and potential privacy issues). In fact, the main purpose of continuous learning is to ensure resource efficiency of model updates, preferably close to only learning new training samples.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

Many efforts have been devoted to solving the above challenges, which can be conceptually divided into five groups (Figure 1, c): Adding regularization terms with reference to the old model (regularization-based methods); Approximating and recovering old data distributions (replay-based approaches); operating optimizers explicitly (optimization-based approaches); learning representations that are robust and well-generalized (representation-based approaches); and building tasks using properly designed architectures Adaptive parameters (architecture-based approach). This taxonomy extends recent advances in commonly used taxonomies and provides refined sub-directions for each category. It summarizes how these methods achieve the general goals proposed, and provides an extensive analysis of their theoretical foundations and typical implementations. In particular, these methods are closely related, such as regularization and replay ultimately correct the gradient direction in optimization, and are highly synergistic, for example, the effect of replay can be improved by extracting knowledge from the old model.

Real-life applications pose special challenges to continuous learning, which can be divided into scene complexity and task specificity. For the former, for example, the task oracle (i.e. which task to perform) may be missing in training and testing, and the training samples may be introduced in small batches or even at once. Due to the cost and scarcity of data labelling, continuous learning needs to be effective in few-shot, semi-supervised or even unsupervised scenarios. For the latter, while current progress is mainly focused on visual classification, other visual fields such as object detection, semantic segmentation and image generation, as well as other related fields such as reinforcement learning (RL), natural language processing (NLP) and ethical considerations ) is receiving more and more attention, its opportunities and challenges.

Given the significant growth in interest in continuous learning, we believe this latest and comprehensive survey can provide a holistic perspective for subsequent work. Although there are some early investigations on continuous learning with relatively wide coverage [70], [322], the important progress in recent years has not been included. In contrast, recent surveys have generally collated only local aspects of continuous learning, regarding its biological basis [148], [156], [186], [227], and specialized settings for visual classification [85], [283] , [289], [346], and extensions in NLP [37], [206] or RL [214]. To the best of our knowledge, this is the first survey to systematically summarize recent advances in continuous learning. Building on these strengths, we provide an in-depth discussion of continuous learning on current trends, cross-directional prospects (such as diffusion models, large-scale pre-training, visual transformers, embodied AI, neural compression, etc.), and interdisciplinary connections with neuroscience.

Main contributions include:

(1) An up-to-date and comprehensive review of continuous learning , to connect advances in theory, methods, and applications;

(2) Based on existing theoretical and empirical results, the general goals of continuous learning are summarized, and Detailed classification of representative strategies;

#(3) Divide the special challenges of real-world applications into scene complexity and task specificity, and Extensive analysis of how continuous learning strategies adapt to these challenges; .

This paper is organized as follows: In Section 2, we introduce the setting of continuous learning, including its basic formula, typical scenarios and evaluation metrics. In Section 3, we summarize some theoretical efforts on continuous learning with their general goals. In Section 4, we provide an up-to-date and detailed classification of representative strategies, analyzing their motivations and typical implementations. In Sections 5 and 6, we describe how these strategies adapt to real-world challenges of scene complexity and task specificity. In Section 7, we provide a discussion of current trends, prospects for intersectional directions and interdisciplinary connections in neuroscience.

In this section, we detail the classification of representative continuous learning methods (see Figure 3 and Figure 1 ,c), and extensively analyze their main motivations, typical implementations, and empirical properties. Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

Regularization-based method

This direction is characterized by adding explicit regularization terms to balance old and new tasks, which often requires storing a frozen copy of the old model for reference (see Figure 4). According to the goal of regularization, such methods can be divided into two categories.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

##Replay-based method

Group methods for approximating and recovering old data distributions into this direction (see Figure 5). Depending on the content of the playback, these methods can be further divided into three sub-directions, each with its own challenges.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

##Optimization-based method

Continuous learning can be achieved not only by adding additional terms to the loss function (such as regularization and replay), but also by explicitly designing and operating optimization procedures.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

##Representation-based method

Methods for creating and taking advantage of continuous learning representations fall into this category. In addition to early work on obtaining sparse representations through meta-training [185], recent work has attempted to combine self-supervised learning (SSL) [125], [281], [335] and large-scale pre-training [295], [380], [456] to improve representation in initialization and ongoing learning. Note that these two strategies are closely related, as pre-training data is often huge and not explicitly labeled, while the performance of SSL itself is mainly evaluated by fine-tuning (a series of) downstream tasks. Below, we discuss representative sub-directions.

Tsinghuas latest continuous learning review, 32 pages detailing the review of continuous learning theories, methods and applications

##Architecture-based approach

The above strategies mainly focus on learning all incremental tasks with shared parameter sets (i.e., a single model and a parameter space), which is the main cause of inter-task interference. Instead, constructing task-specific parameters can solve this problem explicitly. Previous work usually divides this direction into parameter isolation and dynamic architecture based on whether the network architecture is fixed or not. This paper focuses on ways to implement task-specific parameters, extending the above concepts to parameter assignment, model decomposition, and modular networks (Figure 8).

The above is the detailed content of Tsinghua's latest 'continuous learning' review, 32 pages detailing the review of continuous learning theories, methods and applications. For more information, please follow other related articles on the PHP Chinese website!