publications
(*) denotes equal contribution
2024
- TPAMITraining Networks in Null Space of Feature Covariance With Self-Supervision for Incremental LearningShipeng Wang, Xiaorong Li, Jian Sun, and Zongben XuIEEE Transactions on Pattern Analysis and Machine Intelligence
In the context of incremental learning, a network is sequentially trained on a stream of tasks, where data from previous tasks are particularly assumed to be inaccessible. The major challenge is how to overcome the stability-plasticity dilemma, i.e., learning knowledge from new tasks without forgetting the knowledge of previous tasks. To this end, we propose two mathematical conditions for guaranteeing network stability and plasticity with theoretical analysis. The conditions demonstrate that we can restrict the parameter update in the null space of uncentered feature covariance at each linear layer to overcome the stability-plasticity dilemma, which can be realized by layerwise projecting gradient into the null space. Inspired by it, we develop two algorithms, dubbed Adam-NSCL and Adam-SFCL respectively, for incremental learning. Adam-NSCL and Adam-SFCL provide different ways to compute the projection matrix. The projection matrix in Adam-NSCL is constructed by singular vectors associated with the smallest singular values of the uncentered feature covariance matrix, while the projection matrix in Adam-SFCL is constructed by all singular vectors associated with adaptive scaling factors. Additionally, we explore adopting self-supervised techniques, including self-supervised label augmentation and a newly proposed contrastive loss, to improve the performance of incremental learning. These self-supervised techniques are orthogonal to Adam-NSCL and Adam-SFCL and can be incorporated with them seamlessly, leading to Adam-NSCL-SSL and Adam-SFCL-SSL respectively. The proposed algorithms are applied to task-incremental and class-incremental learning on various benchmark datasets with multiple backbones, and the results show that they outperform the compared incremental learning methods.
2023
- TPAMIVariational Data-Free Knowledge Distillation for Continual LearningXiaorong Li, Shipeng Wang, Jian Sun, and Zongben XuIEEE Transactions on Pattern Analysis and Machine Intelligence
Deep neural networks suffer from catastrophic forgetting when trained on sequential tasks in continual learning. Various methods rely on storing data of previous tasks to mitigate catastrophic forgetting, which is prohibited in real-world applications considering privacy and security issues. In this paper, we consider a realistic setting of continual learning, where training data of previous tasks are unavailable and memory resources are limited. We contribute a novel knowledge distillation-based method in an information-theoretic framework by maximizing mutual information between outputs of previously learned and current networks. Due to the intractability of computation of mutual information, we instead maximize its variational lower bound, where the covariance of variational distribution is modeled by a graph convolutional network. The inaccessibility of data of previous tasks is tackled by Taylor expansion, yielding a novel regularizer in network training loss for continual learning. The regularizer relies on compressed gradients of network parameters. It avoids storing previous task data and previously learned networks. Additionally, we employ self-supervised learning technique for learning effective features, which improves the performance of continual learning. We conduct extensive experiments including image classification and semantic segmentation, and the results show that our method achieves state-of-the-art performance on continual learning benchmarks.
- PRMemory Efficient Data-Free Distillation for Continual LearningXiaorong Li*, Shipeng Wang*, Jian Sun, and Zongben XuPattern Recognition
Deep neural networks suffer from the catastrophic forgetting phenomenon when trained on sequential tasks in continual learning, especially when data from previous tasks are unavailable. To mitigate catastrophic forgetting, various methods either store data from previous tasks, which may raise privacy concerns, or require large memory storage. Particularly, the distillation-based methods mitigate catastrophic forgetting by using proxy datasets. However, proxy datasets may not match the distributions of the original datasets of previous tasks. To address these problems in a setting where the full training data of previous tasks are unavailable and memory resources are limited, we propose a novel data-free distillation method. Our method encodes knowledge of previous tasks into network parameter gradients by Taylor expansion, deducing a regularizer relying on gradients in network training loss. To improve memory efficiency, we design an approach to compressing the gradients in the regularizer. Moreover, we theoretically analyze the approximation error of our method. Experimental results on multiple datasets demonstrate that our proposed method outperforms the existing approaches in continual learning.
2021
- CVPROralTraining Networks in Null Space of Feature Covariance for Continual LearningShipeng Wang, Xiaorong Li, Jian Sun, and Zongben XuIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern RecognitionOral Presentation [Top 4%]
In the setting of continual learning, a network is trained on a sequence of tasks, and suffers from catastrophic forgetting. To balance plasticity and stability of network in continual learning, in this paper, we propose a novel network training algorithm called Adam-NSCL, which sequentially optimizes network parameters in the null space of previous tasks. We first propose two mathematical conditions respectively for achieving network stability and plasticity in continual learning. Based on them, the network training for sequential tasks can be simply achieved by projecting the candidate parameter update into the approximate null space of all previous tasks in the network training process, where the candidate parameter update can be generated by Adam. The approximate null space can be derived by applying singular value decomposition to the uncentered covariance matrix of all input features of previous tasks for each linear layer. For efficiency, the uncentered covariance matrix can be incrementally computed after learning each task. We also empirically verify the rationality of the approximate null space at each linear layer. We apply our approach to training networks for continual learning on benchmark datasets of CIFAR-100 and TinyImageNet, and the results suggest that the proposed approach outperforms or matches the state-ot-the-art continual learning approaches.