Research¶
In the last few years we have witnessed a renewed and steadily growing interest in the ability to learn continuously from high-dimensional data. In this page, we will keep track of recent Continual/Lifelong Learning developments in the research community.
Publications¶
In this section we maintain an updated list of publications related to Continual Learning. This references list is automatically generated by a single bibtex file maintained by the ContinualAI community through an open Mendeley group! Join our group here to add a reference to your paper! Please, remember to follow the (very simple) contributions guidelines when adding new papers.
Search among 242 papers!
Filter list by keyword:
Filter list by regex:
Filter list by year:
[framework] [som] [sparsity] [dual] [spiking] [rnn] [nlp] [graph] [vision] [hebbian] [audio] [bayes] [generative] [mnist] [fashion] [cifar] [core50] [imagenet] [omniglot] [cubs] [experimental] [theoretical]
Applications¶
23 papers
In this section we maintain a list of all applicative papers produced on continual learning and related topics.
Continual Learning of Predictive Models in Video Sequences via Variational Autoencoders by Damian Campo, Giulia Slavic, Mohamad Baydoun, Lucio Marcenaro and Carlo Regazzoni. arXiv, 2020. [vision]
@article{campo2020,
annotation = {_eprint: 2006.01945},
author = {Campo, Damian and Slavic, Giulia and Baydoun, Mohamad and Marcenaro, Lucio and Regazzoni, Carlo},
journal = {arXiv},
keywords = {[vision]},
title = {Continual Learning of Predictive Models in Video Sequences via Variational Autoencoders},
url = {http://arxiv.org/abs/2006.01945},
year = {2020}
}
This paper proposes a method for performing continual learning of predictive models that facilitate the inference of future frames in video sequences. For a first given experience, an initial Variational Autoencoder, together with a set of fully connected neural networks are utilized to respectively learn the appearance of video frames and their dynamics at the latent space level. By employing an adapted Markov Jump Particle Filter, the proposed method recognizes new situations and integrates them as predictive models avoiding catastrophic forgetting of previously learned tasks. For evaluating the proposed method, this article uses video sequences from a vehicle that performs different tasks in a controlled environment.
N.A.
2020Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem by Matthias De Lange, Xu Jia, Sarah Parisot, Ales Leonardis, Gregory Slabaugh and Tinne Tuytelaars. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14451–14460, 2020. [framework] [mnist] [vision]
@article{delange2020,
annotation = {_eprint: 2003.13296},
author = {De Lange, Matthias and Jia, Xu and Parisot, Sarah and Leonardis, Ales and Slabaugh, Gregory and Tuytelaars, Tinne},
doi = {10.1109/cvpr42600.2020.01447},
journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
keywords = {[framework],[mnist],[vision]},
pages = {14451--14460},
title = {Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem},
url = {http://arxiv.org/abs/2003.13296},
year = {2020}
}
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images. We consider the practical scenario where a high capacity server interacts with a myriad of resource-limited edge devices, imposing strong requirements on scalability and local data privacy. We aim to address this challenge within the continual learning paradigm and provide a novel Dual User-Adaptation framework (DUA) to explore the problem. This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device, with desirable properties regarding scalability and privacy constraints. First, on the server, we introduce incremental learning of task-specific expert models, subsequently aggregated using a concealed unsupervised user prior. Aggregation avoids retraining, whereas the user prior conceals sensitive raw user data, and grants unsupervised adaptation. Second, local user-adaptation incorporates a domain adaptation point of view, adapting regularizing batch normalization parameters to the user data. We explore various empirical user configurations with different priors in categories and a tenfold of transforms for MIT Indoor Scene recognition, and classify numbers in a combined MNIST and SVHN setup. Extensive experiments yield promising results for data-driven local adaptation and elicit user priors for server adaptation to depend on the model rather than user data. Hence, although user-adaptation remains a challenging open problem, the DUA framework formalizes a principled foundation for personalizing both on server and user device, while maintaining privacy and scalability.
N.A.
2020Incremental Learning for End-to-End Automatic Speech Recognition by Li Fu, Xiaoxiao Li and Libo Zi. arXiv, 2020. [audio]
@article{fu2020,
author = {Fu, Li and Li, Xiaoxiao and Zi, Libo},
journal = {arXiv},
keywords = {[audio],end-to-end,incremental learning,Index Terms: automatic speech recognition,knowledge distillation},
title = {Incremental Learning for End-to-End Automatic Speech Recognition},
url = {https://arxiv.org/abs/2005.04288},
year = {2020}
}
† We propose an incremental learning for end-to-end Automatic Speech Recognition (ASR) to extend the model's capacity on a new task while retaining the performance on existing ones. The proposed method is effective without accessing to the old dataset to address the issues of high training cost and old dataset unavailability. To achieve this, knowledge distillation is applied as a guidance to retain the recognition ability from the previous model, which is then combined with the new ASR task for model optimization. With an ASR model pre-trained on 12,000h Mandarin speech, we test our proposed method on 300h new scenario task and 1h new named entities task. Experiments show that our method yields 3.25% and 0.88% absolute Character Error Rate (CER) reduction on the new scenario, when compared with the pre-trained model and the full-data retraining baseline, respectively. It even yields a surprising 0.37% absolute CER reduction on the new scenario than the fine-tuning. For the new named entities task, our method significantly improves the accuracy compared with the pre-trained model, i.e. 16.95% absolute CER reduction. For both of the new task adaptions, the new models still maintain a same accuracy with the baseline on the old tasks.
N.A.
2020Neural Topic Modeling with Continual Lifelong Learning by Pankaj Gupta, Yatin Chaudhary, Thomas Runkler and Hinrich Schütze. ICML, 2020. [nlp]
@inproceedings{gupta2020,
annotation = {_eprint: 2006.10909},
author = {Gupta, Pankaj and Chaudhary, Yatin and Runkler, Thomas and Schütze, Hinrich},
booktitle = {ICML},
keywords = {[nlp]},
title = {Neural Topic Modeling with Continual Lifelong Learning},
url = {http://arxiv.org/abs/2006.10909},
year = {2020}
}
Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections. However, the application of topic modeling is challenging due to data sparsity, e.g., in a small collection of (short) documents and thus, generate incoherent topics and sub-optimal document representations. To address the problem, we propose a lifelong learning framework for neural topic modeling that can continuously process streams of document collections, accumulate topics and guide future topic modeling tasks by knowledge transfer from several sources to better deal with the sparse data. In the lifelong process, we particularly investigate jointly: (1) sharing generative homologies (latent topics) over lifetime to transfer prior knowledge, and (2) minimizing catastrophic forgetting to retain the past learning via novel selective data augmentation, co-training and topic regularization approaches. Given a stream of document collections, we apply the proposed Lifelong Neural Topic Modeling (LNTM) framework in modeling three sparse document collections as future tasks and demonstrate improved performance quantified by perplexity, topic coherence and information retrieval task.
N.A.
2020CLOPS: Continual Learning of Physiological Signals by Dani Kiyasseh, Tingting Zhu and David A Clifton. arXiv, 2020.
@article{kiyasseh2020,
annotation = {_eprint: 2004.09578},
author = {Kiyasseh, Dani and Zhu, Tingting and Clifton, David A},
journal = {arXiv},
title = {CLOPS: Continual Learning of Physiological Signals},
url = {http://arxiv.org/abs/2004.09578},
year = {2020}
}
Deep learning algorithms are known to experience destructive interference when instances violate the assumption of being independent and identically distributed (i.i.d). This violation, however, is ubiquitous in clinical settings where data are streamed temporally and from a multitude of physiological sensors. To overcome this obstacle, we propose CLOPS, a healthcare-specific replay-based continual learning strategy. In three continual learning scenarios based on three publically-available datasets, we show that CLOPS can outperform its multi-task learning counterpart. Moreover, we propose end-to-end trainable parameters, which we term task-instance parameters, that can be used to quantify task difficulty and similarity. This quantification yields insights into both network interpretability and clinical applications, where task difficulty is poorly quantified.
N.A.
2020Class-Agnostic Continual Learning of Alternating Languages and Domains by Germán Kruszewski, Ionut-Teodor Sorodoc and Tomas Mikolov. arXiv, 2020. [nlp] [rnn]
@article{kruszewski2020,
author = {Kruszewski, Germán and Sorodoc, Ionut-Teodor and Mikolov, Tomas},
journal = {arXiv},
keywords = {[nlp],[rnn],Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning,expert,mixture},
note = {arXiv: 2004.03340},
title = {Class-Agnostic Continual Learning of Alternating Languages and Domains},
url = {http://arxiv.org/abs/2004.03340},
year = {2020}
}
Continual Learning has been often framed as the problem of training a model in a sequence of tasks. In this regard, Neural Networks have been attested to forget the solutions to previous task as they learn new ones. Yet, modelling human life-long learning does not necessarily require any crisp notion of tasks. In this work, we propose a benchmark based on language modelling in a multilingual and multidomain setting that prescinds of any explicit delimitation of training examples into distinct tasks, and propose metrics to study continual learning and catastrophic forgetting in this setting. Then, we introduce a simple Product of Experts learning system that performs strongly on this problem while displaying interesting properties, and investigate its merits for avoiding forgetting.
N.A.
2020Clinical Applications of Continual Learning Machine Learning by Cecilia S Lee and Aaron Y Lee. The Lancet Digital Health, e279–e281, 2020.
@article{lee2020,
author = {Lee, Cecilia S and Lee, Aaron Y},
doi = {10.1016/S2589-7500(20)30102-3},
issn = {25897500},
journal = {The Lancet Digital Health},
number = {6},
pages = {e279--e281},
title = {Clinical Applications of Continual Learning Machine Learning},
url = {https://linkinghub.elsevier.com/retrieve/pii/S2589750020301023},
volume = {2},
year = {2020}
}
N.A.
N.A.
2020Continual Learning for Domain Adaptation in Chest X-Ray Classification by Matthias Lenga, Heinrich Schulz and Axel Saalbach. arXiv, 1–11, 2020. [vision]
@article{lenga2020,
annotation = {_eprint: 2001.05922},
author = {Lenga, Matthias and Schulz, Heinrich and Saalbach, Axel},
journal = {arXiv},
keywords = {[vision],catastrophic forgetting,chest x-ray,chestx-ray14,continual learning,convolutional neural networks,elastic weight consolidation,joint training,learning without forgetting,mimic-cxr},
pages = {1--11},
title = {Continual Learning for Domain Adaptation in Chest X-Ray Classification},
url = {http://arxiv.org/abs/2001.05922},
year = {2020}
}
Over the last years, Deep Learning has been successfully applied to a broad range of medical applications. Especially in the context of chest X-ray classification, results have been reported which are on par, or even superior to experienced radiologists. Despite this success in controlled experimental environments, it has been noted that the ability of Deep Learning models to generalize to data from a new domain (with potentially different tasks) is often limited. In order to address this challenge, we investigate techniques from the field of Continual Learning (CL) including Joint Training (JT), Elastic Weight Consolidation (EWC) and Learning Without Forgetting (LWF). Using the ChestX-ray14 and the MIMIC-CXR datasets, we demonstrate empirically that these methods provide promising options to improve the performance of Deep Learning models on a target domain and to mitigate effectively catastrophic forgetting for the source domain. To this end, the best overall performance was obtained using JT, while for LWF competitive results could be achieved - even without accessing data from the source domain.
N.A.
2020Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis by Avinash Madasu and Vijjini Anvesh Rao. arXiv, 2020. [nlp] [rnn]
@article{madasu2020,
archiveprefix = {arXiv},
author = {Madasu, Avinash and Rao, Vijjini Anvesh},
eprint = {2007.01189},
eprinttype = {arxiv},
journal = {arXiv},
keywords = {[nlp],[rnn],Computer Science - Computation and Language},
note = {Comment: Accepted at 25th International Conference on Pattern Recognition, January 2021, Milan, Italy},
title = {Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis},
url = {http://arxiv.org/abs/2007.01189},
urldate = {2021-01-08},
year = {2020}
}
Elastic Weight Consolidation (EWC) is a technique used in overcoming catastrophic forgetting between successive tasks trained on a neural network. We use this phenomenon of information sharing between tasks for domain adaptation. Training data for tasks such as sentiment analysis (SA) may not be fairly represented across multiple domains. Domain Adaptation (DA) aims to build algorithms that leverage information from source domains to facilitate performance on an unseen target domain. We propose a model-independent framework - Sequential Domain Adaptation (SDA). SDA draws on EWC for training on successive source domains to move towards a general domain solution, thereby solving the problem of domain adaptation. We test SDA on convolutional, recurrent, and attention-based architectures. Our experiments show that the proposed framework enables simple architectures such as CNNs to outperform complex state-of-the-art models in domain adaptation of SA. In addition, we observe that the effectiveness of a harder first Anti-Curriculum ordering of source domains leads to maximum performance.
N.A.
2020Importance Driven Continual Learning for Segmentation Across Domains by Sinan Özgür Özgün, Anne-Marie Rickmann, Abhijit Guha Roy and Christian Wachinger. arXiv, 1–10, 2020. [vision]
@article{ozgun2020,
annotation = {_eprint: 2005.00079},
author = {Özgün, Sinan Özgür and Rickmann, Anne-Marie and Roy, Abhijit Guha and Wachinger, Christian},
journal = {arXiv},
keywords = {[vision]},
pages = {1--10},
title = {Importance Driven Continual Learning for Segmentation Across Domains},
url = {http://arxiv.org/abs/2005.00079},
year = {2020}
}
The ability of neural networks to continuously learn and adapt to new tasks while retaining prior knowledge is crucial for many applications. However, current neural networks tend to forget previously learned tasks when trained on new ones, i.e., they suffer from Catastrophic Forgetting (CF). The objective of Continual Learning (CL) is to alleviate this problem, which is particularly relevant for medical applications, where it may not be feasible to store and access previously used sensitive patient data. In this work, we propose a Continual Learning approach for brain segmentation, where a single network is consecutively trained on samples from different domains. We build upon an importance driven approach and adapt it for medical image segmentation. Particularly, we introduce learning rate regularization to prevent the loss of the network's knowledge. Our results demonstrate that directly restricting the adaptation of important network parameters clearly reduces Catastrophic Forgetting for segmentation across domains.
N.A.
2020LAMOL: LAnguage MOdeling for Lifelong Language Learning by Fan-Keng Sun, Cheng-Hao Ho and Hung-Yi Lee. ICLR, 2020. [nlp]
@inproceedings{sun2020,
author = {Sun, Fan-Keng and Ho, Cheng-Hao and Lee, Hung-Yi},
booktitle = {ICLR},
keywords = {[nlp]},
shorttitle = {LAMOL},
title = {LAMOL: LAnguage MOdeling for Lifelong Language Learning},
url = {https://openreview.net/forum?id=Skgxcn4YDS},
year = {2020}
}
Most research on lifelong learning applies to images or games, but not language. We present LAMOL, a simple yet effective method for lifelong language learning (LLL) based on language...
N.A.
2020Non-Parametric Adaptation for Neural Machine Translation by Ankur Bapna and Orhan Firat. arXiv, 2019. [nlp]
@article{bapna2019,
annotation = {_eprint: 1903.00058},
author = {Bapna, Ankur and Firat, Orhan},
journal = {arXiv},
keywords = {[nlp]},
title = {Non-Parametric Adaptation for Neural Machine Translation},
url = {http://arxiv.org/abs/1903.00058},
year = {2019}
}
Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process. In the context of Neural Machine Translation (NMT) this results in poor performance on heterogeneous datasets and on sub-tasks like rare phrase translation. On the other hand, non-parametric approaches are immune to forgetting, perfectly complementing the generalization ability of NMT. However, attempts to combine non-parametric or retrieval based approaches with NMT have only been successful on narrow domains, possibly due to over-reliance on sentence level retrieval. We propose a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low. We complement this with an expressive neural network, allowing our model to extract information from the noisy retrieved context. We evaluate our semi-parametric NMT approach on a heterogeneous dataset composed of WMT, IWSLT, JRC-Acquis and OpenSubtitles, and demonstrate gains on all 4 evaluation sets. The semi-parametric nature of our approach opens the door for non-parametric domain adaptation, demonstrating strong inference-time adaptation performance on new domains without the need for any parameter updates.
N.A.
2019Episodic Memory in Lifelong Language Learning by Cyprien de Masson D’Autume, Sebastian Ruder, Lingpeng Kong and Dani Yogatama. NeurIPS, 2019. [nlp]
@inproceedings{dautume2019,
annotation = {_eprint: 1906.01076},
author = {D'Autume, Cyprien de Masson and Ruder, Sebastian and Kong, Lingpeng and Yogatama, Dani},
booktitle = {NeurIPS},
keywords = {[nlp]},
title = {Episodic Memory in Lifelong Language Learning},
url = {http://arxiv.org/abs/1906.01076},
year = {2019}
}
We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly ($∼$50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.
N.A.
2019Continual Adaptation for Efficient Machine Communication by Robert D Hawkins, Minae Kwon, Dorsa Sadigh and Noah D Goodman. Proceedings of the ICML Workshop on Adaptive & Multitask Learning: Algorithms & Systems, 2019.
@inproceedings{hawkins2019,
annotation = {_eprint: 1911.09896},
author = {Hawkins, Robert D and Kwon, Minae and Sadigh, Dorsa and Goodman, Noah D},
booktitle = {Proceedings of the ICML Workshop on Adaptive & Multitask Learning: Algorithms & Systems},
title = {Continual Adaptation for Efficient Machine Communication},
url = {http://arxiv.org/abs/1911.09896},
year = {2019}
}
To communicate with new partners in new contexts, humans rapidly form new linguistic conventions. Recent language models trained with deep neural networks are able to comprehend and produce the existing conventions present in their training data, but are not able to flexibly and interactively adapt those conventions on the fly as humans do. We introduce a repeated reference task as a benchmark for models of adaptation in communication and propose a regularized continual learning framework that allows an artificial agent initialized with a generic language model to more accurately and efficiently communicate with a partner over time. We evaluate this framework through simulations on COCO and in real-time reference game experiments with human partners.
N.A.
2019Continual Learning for Sentence Representations Using Conceptors by Tianlin Liu, Lyle Ungar and João Sedoc. NAACL, 2019. [nlp]
@inproceedings{liu2019,
annotation = {_eprint: 1904.09187},
author = {Liu, Tianlin and Ungar, Lyle and Sedoc, João},
booktitle = {NAACL},
keywords = {[nlp]},
title = {Continual Learning for Sentence Representations Using Conceptors},
url = {http://arxiv.org/abs/1904.09187},
year = {2019}
}
Distributed representations of sentences have become ubiquitous in natural language processing tasks. In this paper, we consider a continual learning scenario for sentence representations: Given a sequence of corpora, we aim to optimize the sentence encoder with respect to the new corpus while maintaining its accuracy on the old corpora. To address this problem, we propose to initialize sentence encoders with the help of corpus-independent features, and then sequentially update sentence encoders using Boolean operations of conceptor matrices to learn corpus-dependent features. We evaluate our approach on semantic textual similarity tasks and show that our proposed sentence encoder can continually learn features from new corpora while retaining its competence on previously encountered corpora.
N.A.
2019Lifelong and Interactive Learning of Factual Knowledge in Dialogues by Sahisnu Mazumder, Bing Liu, Shuai Wang and Nianzu Ma. Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 21–31, 2019. [nlp]
@inproceedings{mazumder2019,
address = {Stroudsburg, PA, USA},
annotation = {_eprint: 1907.13295},
author = {Mazumder, Sahisnu and Liu, Bing and Wang, Shuai and Ma, Nianzu},
booktitle = {Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue},
doi = {10.18653/v1/W19-5903},
keywords = {[nlp]},
pages = {21--31},
publisher = {Association for Computational Linguistics},
title = {Lifelong and Interactive Learning of Factual Knowledge in Dialogues},
url = {http://arxiv.org/abs/1907.13295 https://www.aclweb.org/anthology/W19-5903},
year = {2019}
}
Dialogue systems are increasingly using knowledge bases (KBs) storing real-world facts to help generate quality responses. However, as the KBs are inherently incomplete and remain fixed during conversation, it limits dialogue systems' ability to answer questions and to handle questions involving entities or relations that are not in the KB. In this paper, we make an attempt to propose an engine for Continuous and Interactive Learning of Knowledge (CILK) for dialogue systems to give them the ability to continuously and interactively learn and infer new knowledge during conversations. With more knowledge accumulated over time, they will be able to learn better and answer more questions. Our empirical evaluation shows that CILK is promising.
N.A.
2019Making Good on LSTMs’ Unfulfilled Promise by Daniel Philps, Artur d’Avila Garcez and Tillman Weyde. arXiv, 2019. [rnn]
@article{philps2019,
archiveprefix = {arXiv},
author = {Philps, Daniel and d'Avila Garcez, Artur and Weyde, Tillman},
eprint = {1911.04489},
eprinttype = {arxiv},
journal = {arXiv},
keywords = {[rnn],Computer Science - Machine Learning,Quantitative Finance - Computational Finance,Quantitative Finance - Portfolio Management,Statistics - Machine Learning},
note = {Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. arXiv admin note: text overlap with arXiv:1812.02340},
title = {Making Good on LSTMs' Unfulfilled Promise},
url = {http://arxiv.org/abs/1911.04489},
urldate = {2021-01-08},
year = {2019}
}
LSTMs promise much to financial time-series analysis, temporal and cross-sectional inference, but we find that they do not deliver in a real-world financial management task. We examine an alternative called Continual Learning (CL), a memory-augmented approach, which can provide transparent explanations, i.e. which memory did what and when. This work has implications for many financial applications including credit, time-varying fairness in decision making and more. We make three important new observations. Firstly, as well as being more explainable, time-series CL approaches outperform LSTMs as well as a simple sliding window learner using feed-forward neural networks (FFNN). Secondly, we show that CL based on a sliding window learner (FFNN) is more effective than CL based on a sequential learner (LSTM). Thirdly, we examine how real-world, time-series noise impacts several similarity approaches used in CL memory addressing. We provide these insights using an approach called Continual Learning Augmentation (CLA) tested on a complex real-world problem, emerging market equities investment decision making. CLA provides a test-bed as it can be based on different types of time-series learners, allowing testing of LSTM and FFNN learners side by side. CLA is also used to test several distance approaches used in a memory recall-gate: Euclidean distance (ED), dynamic time warping (DTW), auto-encoders (AE) and a novel hybrid approach, warp-AE. We find that ED under-performs DTW and AE but warp-AE shows the best overall performance in a real-world financial task.
N.A.
2019Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation by Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh and Philipp Koehn. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2062–2068, 2019. [nlp] [rnn]
@inproceedings{thompson2019a,
address = {Minneapolis, Minnesota},
author = {Thompson, Brian and Gwinnup, Jeremy and Khayrallah, Huda and Duh, Kevin and Koehn, Philipp},
booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
doi = {10.18653/v1/N19-1209},
keywords = {[nlp],[rnn]},
pages = {2062--2068},
publisher = {Association for Computational Linguistics},
title = {Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation},
url = {https://www.aclweb.org/anthology/N19-1209},
urldate = {2021-01-08},
year = {2019}
}
Continued training is an effective method for domain adaptation in neural machine translation. However, in-domain gains from adaptation come at the expense of general-domain performance. In this work, we interpret the drop in general-domain performance as catastrophic forgetting of general-domain knowledge. To mitigate it, we adapt Elastic Weight Consolidation (EWC)— a machine learning method for learning a new task without forgetting previous tasks. Our method retains the majority of general-domain performance lost in continued training without degrading in-domain performance, outperforming the previous state-of-the-art. We also explore the full range of general-domain performance available when some in-domain degradation is acceptable.
N.A.
2019A Multi-Task Learning Framework for Overcoming the Catastrophic Forgetting in Automatic Speech Recognition by Jiabin Xue, Jiqing Han, Tieran Zheng, Xiang Gao and Jiaxing Guo. arXiv, 2019. [audio] [rnn]
@article{xue2019,
annotation = {_eprint: 1904.08039},
author = {Xue, Jiabin and Han, Jiqing and Zheng, Tieran and Gao, Xiang and Guo, Jiaxing},
journal = {arXiv},
keywords = {[audio],[rnn]},
title = {A Multi-Task Learning Framework for Overcoming the Catastrophic Forgetting in Automatic Speech Recognition},
url = {http://arxiv.org/abs/1904.08039},
year = {2019}
}
Recently, data-driven based Automatic Speech Recognition (ASR) systems have achieved state-of-the-art results. And transfer learning is often used when those existing systems are adapted to the target domain, e.g., fine-tuning, retraining. However, in the processes, the system parameters may well deviate too much from the previously learned parameters. Thus, it is difficult for the system training process to learn knowledge from target domains meanwhile not forgetting knowledge from the previous learning process, which is called as catastrophic forgetting (CF). In this paper, we attempt to solve the CF problem with the lifelong learning and propose a novel multi-task learning (MTL) training framework for ASR. It considers reserving original knowledge and learning new knowledge as two independent tasks, respectively. On the one hand, we constrain the new parameters not to deviate too far from the original parameters and punish the new system when forgetting original knowledge. On the other hand, we force the new system to solve new knowledge quickly. Then, a MTL mechanism is employed to get the balance between the two tasks. We applied our method to an End2End ASR task and obtained the best performance in both target and original datasets.
N.A.
2019Lifelong Learning for Scene Recognition in Remote Sensing Images by Min Zhai, Huaping Liu and Fuchun Sun. IEEE Geoscience and Remote Sensing Letters, 1472–1476, 2019. [vision]
@article{zhai2019,
author = {Zhai, Min and Liu, Huaping and Sun, Fuchun},
doi = {10.1109/LGRS.2019.2897652},
issn = {1545-598X},
journal = {IEEE Geoscience and Remote Sensing Letters},
keywords = {[vision]},
number = {9},
pages = {1472--1476},
publisher = {IEEE},
title = {Lifelong Learning for Scene Recognition in Remote Sensing Images},
url = {https://ieeexplore.ieee.org/document/8662768/},
volume = {16},
year = {2019}
}
The development of visual sensing technologies has made it possible to obtain some high resolution and to gather many high-resolution satellite images. To make the best use of these images, it is essential to be able to recognize and retrieve their intrinsic scene information. The problem of scene recognition in remote sensing images has recently aroused considerable interest, mainly due to the great success achieved by deep learning methods in generic image classification. Nevertheless, such methods usually require large amounts of labeled data. By contrast, remote sensing images are relatively scarce and expensive to obtain. Moreover, data sets from different aerospace research institutions exhibit large disparities. In order to address these problems, we propose a model based on a meta-learning method with the ability of learning a classifier from just few-shot samples. With the proposed model, the knowledge learned from one data set can be easily adapted to a new data set, which, in turn, would serve in the lifelong few-shot learning. Scene-level image recognition experiments, on public high-resolution remote sensing image data sets, validate our proposed lifelong few-shot learning model.
N.A.
2019Towards Continual Learning in Medical Imaging by Chaitanya Baweja, Ben Glocker and Konstantinos Kamnitsas. NeurIPS Workshop on Continual Learning, 1–4, 2018. [vision]
@article{baweja2018,
annotation = {_eprint: 1811.02496},
author = {Baweja, Chaitanya and Glocker, Ben and Kamnitsas, Konstantinos},
doi = {arXiv:1811.02496v1},
journal = {NeurIPS Workshop on Continual Learning},
keywords = {[vision]},
pages = {1--4},
title = {Towards Continual Learning in Medical Imaging},
url = {http://arxiv.org/abs/1811.02496},
year = {2018}
}
This work investigates continual learning of two segmentation tasks in brain MRI with neural networks. To explore in this context the capabilities of current methods for countering catastrophic forgetting of the first task when a new one is learned, we investigate elastic weight consolidation, a recently proposed method based on Fisher information, originally evaluated on reinforcement learning of Atari games. We use it to sequentially learn segmentation of normal brain structures and then segmentation of white matter lesions. Our findings show this recent method reduces catastrophic forgetting, while large room for improvement exists in these challenging settings for continual learning.
N.A.
2018Toward Continual Learning for Conversational Agents by and Sungjin Lee. arXiv, 2018. [nlp]
@article{lee2018,
author = {Lee, Sungjin},
journal = {arXiv},
keywords = {[nlp],chatbot,Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Human-Computer Interaction,conversation,conversational agent,ewc,lstm},
note = {arXiv: 1712.09943},
title = {Toward Continual Learning for Conversational Agents},
url = {http://arxiv.org/abs/1712.09943},
year = {2018}
}
While end-to-end neural conversation models have led to promising advances in reducing hand-crafted features and errors induced by the traditional complex system architecture, they typically require an enormous amount of data due to the lack of modularity. Previous studies adopted a hybrid approach with knowledge-based components either to abstract out domain-specific information or to augment data to cover more diverse patterns. On the contrary, we propose to directly address the problem using recent developments in the space of continual learning for neural models. Specifically, we adopt a domain-independent neural conversational model and introduce a novel neural continual learning algorithm that allows a conversational agent to accumulate skills across different tasks in a data-efficient way. To the best of our knowledge, this is the first work that applies continual learning to conversation systems. We verified the efficacy of our method through a conversational skill transfer from either synthetic dialogs or human-human dialogs to human-computer conversations in a customer support domain.
N.A.
2018Principles of Lifelong Learning for Predictive User Modeling by Ashish Kapoor and Eric Horvitz. User Modeling 2007, 37–46, 2009.
@incollection{kapoor2009,
address = {Berlin, Heidelberg},
author = {Kapoor, Ashish and Horvitz, Eric},
booktitle = {User Modeling 2007},
doi = {10.1007/978-3-540-73078-1_7},
isbn = {978-3-540-73077-4},
issn = {03029743},
pages = {37--46},
pmid = {16717005},
publisher = {Springer Berlin Heidelberg},
title = {Principles of Lifelong Learning for Predictive User Modeling},
url = {http://link.springer.com/10.1007/978-3-540-73078-1_7},
year = {2009}
}
N.A.
N.A.
2009
Architectural Methods¶
25 papers
In this section we collect all the papers introducing a continual learning strategy employing some architectural methods.
Continual Learning with Adaptive Weights (CLAW) by Tameem Adel, Han Zhao and Richard E Turner. International Conference on Learning Representations, 2020. [cifar] [mnist] [omniglot]
@inproceedings{adel2020,
author = {Adel, Tameem and Zhao, Han and Turner, Richard E},
booktitle = {International Conference on Learning Representations},
keywords = {[cifar],[mnist],[omniglot]},
title = {Continual Learning with Adaptive Weights (CLAW)},
url = {https://openreview.net/forum?id=Hklso24Kwr},
year = {2020}
}
Approaches to continual learning aim to successfully learn a set of related tasks that arrive in an online manner. Recently, several frameworks have been developed which enable deep learning to be deployed in this learning scenario. A key modelling decision is to what extent the architecture should be shared across tasks. On the one hand, separately modelling each task avoids catastrophic forgetting but it does not support transfer learning and leads to large models. On the other hand, rigidly specifying a shared component and a task-specific part enables task transfer and limits the model size, but it is vulnerable to catastrophic forgetting and restricts the form of task-transfer that can occur. Ideally, the network should adaptively identify which parts of the network to share in a data driven way. Here we introduce such an approach called Continual Learning with Adaptive Weights (CLAW), which is based on probabilistic modelling and variational inference. Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting.
N.A.
2020Continual Learning with Gated Incremental Memories for Sequential Data Processing by Andrea Cossu, Antonio Carta and Davide Bacciu. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020. [mnist] [rnn]
@inproceedings{cossu2020,
author = {Cossu, Andrea and Carta, Antonio and Bacciu, Davide},
booktitle = {Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020)},
keywords = {[mnist],[rnn],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Statistics - Machine Learning},
note = {An evaluation of RNNs (LSTM and LMN) inspired by Progressive networks, leading to the Gated Incremental Memory approach to overcome catastrophic forgetting.
\par
An evaluation of RNNs (LSTM and LMN) inspired by Progressive networks, leading to the Gated Incremental Memory approach to overcome catastrophic forgetting.},
title = {Continual Learning with Gated Incremental Memories for Sequential Data Processing},
url = {http://arxiv.org/abs/2004.04077},
year = {2020}
}
The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs.
N.A.
2020Bayesian Nonparametric Weight Factorization for Continual Learning by Nikhil Mehta, Kevin J Liang and Lawrence Carin. arXiv, 1–17, 2020. [bayes] [cifar] [mnist] [sparsity]
@article{mehta2020,
annotation = {_eprint: 2004.10098},
author = {Mehta, Nikhil and Liang, Kevin J and Carin, Lawrence},
journal = {arXiv},
keywords = {[bayes],[cifar],[mnist],[sparsity]},
pages = {1--17},
title = {Bayesian Nonparametric Weight Factorization for Continual Learning},
url = {http://arxiv.org/abs/2004.10098},
year = {2020}
}
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and typically a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows us to scale the number of factors of each weight matrix to the complexity of the task, while the IBP prior imposes weight factor sparsity and encourages factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
N.A.
2020SpaceNet: Make Free Space For Continual Learning by Ghada Sokar, Decebal Constantin Mocanu and Mykola Pechenizkiy. arXiv, 2020. [cifar] [fashion] [mnist] [sparsity]
@article{sokar2020,
annotation = {_eprint: 2007.07617},
author = {Sokar, Ghada and Mocanu, Decebal Constantin and Pechenizkiy, Mykola},
journal = {arXiv},
keywords = {[cifar],[fashion],[mnist],[sparsity]},
title = {SpaceNet: Make Free Space For Continual Learning},
url = {http://arxiv.org/abs/2007.07617},
year = {2020}
}
The continual learning (CL) paradigm aims to enable neural networks to learn tasks continually in a sequential fashion. The fundamental challenge in this learning paradigm is catastrophic forgetting previously learned tasks when the model is optimized for a new task, especially when their data is not accessible. Current architectural-based methods aim at alleviating the catastrophic forgetting problem but at the expense of expanding the capacity of the model. Regularization-based methods maintain a fixed model capacity; however, previous studies showed the huge performance degradation of these methods when the task identity is not available during inference (e.g. class incremental learning scenario). In this work, we propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario where we utilize the available fixed capacity of the model intelligently. SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons. The adaptive training of the sparse connections results in sparse representations that reduce the interference between the tasks. Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model, leaving space for more tasks to be learned. In particular, when SpaceNet is tested on the well-known benchmarks for CL: split MNIST, split Fashion-MNIST, and CIFAR-10/100, it outperforms regularization-based methods by a big performance gap. Moreover, it achieves better performance than architectural-based methods without model expansion and achieved comparable results with rehearsal-based methods, while offering a huge memory reduction.
N.A.
2020Efficient Continual Learning with Modular Networks and Task-Driven Priors by Tom Veniat, Ludovic Denoyer and Marc’Aurelio Ranzato. arXiv, 2020. [experimental]
@article{veniat2020,
archiveprefix = {arXiv},
author = {Veniat, Tom and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
eprint = {2012.12631},
eprinttype = {arxiv},
journal = {arXiv},
keywords = {[experimental],Computer Science - Machine Learning},
note = {This paper introduces a new benchmark for CL and evaluate a newly proposed model on different types of task streams. The model is a modular network with a data-driven prior to choose among modules. It uses task labels both at training and test time.},
title = {Efficient Continual Learning with Modular Networks and Task-Driven Priors},
url = {http://arxiv.org/abs/2012.12631},
urldate = {2020-12-29},
year = {2020}
}
Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting, the inability of the learner to recall how to perform tasks observed in the past. There are however other desirable properties of a CL system, such as the ability to transfer knowledge from previous tasks and to scale memory and compute sub-linearly with the number of tasks. Since most current benchmarks focus only on forgetting using short streams of tasks, we first propose a new suite of benchmarks to probe CL algorithms across these new axes. Finally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task. Learning a task reduces to figuring out which past modules to re-use, and which new modules to instantiate to solve the current task. Our learning algorithm leverages a task-driven prior over the exponential search space of all possible ways to combine modules, enabling efficient learning on long streams of tasks. Our experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work.
N.A.
2020Progressive Memory Banks for Incremental Domain Adaptation by Nabiha Asghar, Lili Mou, Kira A Selby, Kevin D Pantasdo, Pascal Poupart and Xin Jiang. International Conference on Learning Representations, 2019. [nlp] [rnn]
@inproceedings{asghar2019,
author = {Asghar, Nabiha and Mou, Lili and Selby, Kira A and Pantasdo, Kevin D and Poupart, Pascal and Jiang, Xin},
booktitle = {International Conference on Learning Representations},
keywords = {[nlp],[rnn]},
note = {The authors leverage a Recurrent Neural Network with an explicit memory (memory banks) which grows when new computational capabilities are needed. Attention mechanisms are exploited in order to focus on specific component of previous memories.},
title = {Progressive Memory Banks for Incremental Domain Adaptation},
url = {https://openreview.net/forum?id=BkepbpNFwr},
year = {2019}
}
This paper addresses the problem of incremental domain adaptation (IDA) in natural language processing (NLP). We assume each domain comes one after another, and that we could only access data in...
N.A.
2019Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments by Andri Ashfahani and Mahardhika Pratama. Proceedings of the 2019 SIAM International Conference on Data Mining, 666–674, 2019. [mnist]
@inproceedings{ashfahani2019,
address = {Philadelphia, PA},
annotation = {_eprint: 1810.07348},
author = {Ashfahani, Andri and Pratama, Mahardhika},
booktitle = {Proceedings of the 2019 SIAM International Conference on Data Mining},
doi = {10.1137/1.9781611975673.75},
isbn = {978-1-61197-567-3},
keywords = {[mnist]},
pages = {666--674},
publisher = {Society for Industrial and Applied Mathematics},
title = {Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments},
url = {https://epubs.siam.org/doi/10.1137/1.9781611975673.75},
year = {2019}
}
The feasibility of deep neural networks (DNNs) to address data stream problems still requires intensive study because of the static and offline nature of conventional deep learning approaches. A deep continual learning algorithm, namely autonomous deep learning (ADL), is proposed in this paper. Unlike traditional deep learning methods, ADL features a flexible structure where its network structure can be constructed from scratch with the absence of initial network structure via the self-constructing network structure. ADL specifically addresses catastrophic forgetting by having a different-depth structure which is capable of achieving a trade-off between plasticity and stability. Network significance (NS) formula is proposed to drive the hidden nodes growing and pruning mechanism. Drift detection scenario (DDS) is put forward to signal distributional changes in data streams which induce the creation of a new hidden layer. Maximum information compression index (MICI) method plays an important role as a complexity reduction module eliminating redundant layers. The efficacy of ADL is numerically validated under the prequential test-then-train procedure in lifelong environments using nine popular data stream problems. The numerical results demonstrate that ADL consistently outperforms recent continual learning methods while characterizing the automatic construction of network structures.
N.A.
2019Compacting, Picking and Growing for Unforgetting Continual Learning by Steven C Y Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan and Chu-Song Chen. NeurIPS, 13669–13679, 2019. [cifar] [imagenet]
@inproceedings{hung2019,
author = {Hung, Steven C Y and Tu, Cheng-Hao and Wu, Cheng-En and Chen, Chien-Hung and Chan, Yi-Ming and Chen, Chu-Song},
booktitle = {NeurIPS},
keywords = {[cifar],[imagenet]},
pages = {13669--13679},
title = {Compacting, Picking and Growing for Unforgetting Continual Learning},
url = {http://papers.nips.cc/paper/9518-compacting-picking-and-growing-for-unforgetting-continual-learning.pdf},
year = {2019}
}
Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the number of sequential tasks in a continual learning process. Our approach is easy to implement and owns several favorable characteristics. First, it can avoid forgetting (i.e., learn new tasks while remembering all previous tasks). Second, it allows model expansion but can maintain the model compactness when handling sequential tasks. Besides, through our compaction and selection/expansion mechanism, we show that the knowledge accumulated through learning previous tasks is helpful to build a better model for the new tasks compared to training the models independently with tasks. Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training.
N.A.
2019Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting by Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher and Caiming Xiong. arXiv, 2019. [cifar] [mnist]
@article{li2019a,
annotation = {_eprint: 1904.00310},
author = {Li, Xilai and Zhou, Yingbo and Wu, Tianfu and Socher, Richard and Xiong, Caiming},
journal = {arXiv},
keywords = {[cifar],[mnist]},
title = {Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting},
url = {https://arxiv.org/pdf/1904.00310.pdf},
year = {2019}
}
Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs. The proposed method consists of two components: a neural structure optimization component and a parameter learning and/or fine-tuning component. By separating the explicit neural structure learning and the parameter estimation, not only is the proposed method capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catastrophic forgetting in experiments. Furthermore, the proposed method outperforms all other baselines on the permuted MNIST dataset, the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual learning setting
N.A.
2019Towards AutoML in the Presence of Drift: First Results by Jorge G. Madrid, Hugo Jair Escalante, Eduardo F. Morales, Wei-Wei Tu, Yang Yu, Lisheng Sun-Hosoya, Isabelle Guyon and Michele Sebag. arXiv, 2019.
@article{madrid2019,
annotation = {_eprint: 1907.10772},
author = {Madrid, Jorge G. and Escalante, Hugo Jair and Morales, Eduardo F. and Tu, Wei-Wei and Yu, Yang and Sun-Hosoya, Lisheng and Guyon, Isabelle and Sebag, Michele},
journal = {arXiv},
title = {Towards AutoML in the Presence of Drift: First Results},
url = {http://arxiv.org/abs/1907.10772},
year = {2019}
}
Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.
N.A.
2019Continual Unsupervised Representation Learning by Dushyant Rao, Francesco Visin, Andrei A Rusu, Yee Whye Teh, Razvan Pascanu and Raia Hadsell. NeurIPS, 2019. [mnist] [omniglot]
@inproceedings{rao2019,
author = {Rao, Dushyant and Visin, Francesco and Rusu, Andrei A and Teh, Yee Whye and Pascanu, Razvan and Hadsell, Raia},
booktitle = {NeurIPS},
keywords = {[mnist],[omniglot]},
title = {Continual Unsupervised Representation Learning},
url = {https://papers.nips.cc/paper/8981-continual-unsupervised-representation-learning.pdf},
year = {2019}
}
Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more general problem that we will refer to as unsupervised continual learning. The focus is on learning representations without any knowledge about task identity, and we explore scenarios when there are abrupt changes between tasks, smooth transitions from one task to another, or even when the data is shuffled. The proposed approach performs task inference directly within the model, is able to dynamically expand to capture new concepts over its lifetime, and incorporates additional rehearsal-based techniques to deal with catastrophic forgetting. We demonstrate the efficacy of CURL in an unsupervised learning setting with MNIST and Omniglot, where the lack of labels ensures no information is leaked about the task. Further, we demonstrate strong performance compared to prior art in an i.i.d setting, or when adapting the technique to supervised tasks such as incremental class learning.
N.A.
2019A Progressive Model to Enable Continual Learning for Semantic Slot Filling by Yilin Shen, Xiangyu Zeng and Hongxia Jin. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 1279–1284, 2019. [nlp]
@inproceedings{shen2019,
author = {Shen, Yilin and Zeng, Xiangyu and Jin, Hongxia},
booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing},
keywords = {[nlp]},
pages = {1279--1284},
publisher = {Association for Computational Linguistics},
title = {A Progressive Model to Enable Continual Learning for Semantic Slot Filling},
url = {https://www.aclweb.org/anthology/D19-1126.pdf},
year = {2019}
}
Semantic slot filling is one of the major tasks in spoken language understanding (SLU). After a slot filling model is trained on pre-collected data, it is crucial to continually improve the model after deployment to learn users' new expressions. As the data amount grows, it becomes infeasible to either store such huge data and repeatedly retrain the model on all data or fine tune the model only on new data without forgetting old expressions. In this paper, we introduce a novel progressive slot filling model, ProgModel. ProgModel consists of a novel context gate that transfers previously learned knowledge to a small size expanded component; and meanwhile enables this new component to be fast trained to learn from new data. As such, ProgModel learns the new knowledge by only using new data at each time and meanwhile preserves the previously learned expressions. Our experiments show that ProgModel needs much less training time and smaller model size to outperform various model fine tuning competitors by up to 4.24% and 3.03% on two benchmark datasets.
N.A.
2019Adaptive Compression-Based Lifelong Learning by Shivangi Srivastava, Maxim Berman, Matthew B Blaschko and Devis Tuia. BMVC, 2019. [imagenet] [sparsity]
@inproceedings{srivastava2019,
annotation = {_eprint: 1907.09695},
author = {Srivastava, Shivangi and Berman, Maxim and Blaschko, Matthew B and Tuia, Devis},
booktitle = {BMVC},
keywords = {[imagenet],[sparsity]},
title = {Adaptive Compression-Based Lifelong Learning},
url = {http://arxiv.org/abs/1907.09695},
year = {2019}
}
The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task's training samples anymore. Recently, approaches like pruning networks for freeing network capacity during sequential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.
N.A.
2019Frosting Weights for Better Continual Training by Xiaofeng Zhu, Feng Liu, Goce Trajcevski and Dingding Wang. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 506–510, 2019. [cifar] [mnist]
@inproceedings{zhu2019,
annotation = {_eprint: 2001.01829},
author = {Zhu, Xiaofeng and Liu, Feng and Trajcevski, Goce and Wang, Dingding},
booktitle = {2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)},
doi = {10.1109/ICMLA.2019.00094},
isbn = {978-1-72814-550-1},
keywords = {[cifar],[mnist]},
number = {1},
pages = {506--510},
publisher = {IEEE},
title = {Frosting Weights for Better Continual Training},
url = {https://ieeexplore.ieee.org/document/8999083/},
year = {2019}
}
Training a neural network model can be a lifelong learning process and is a computationally intensive one. A severe adverse effect that may occur in deep neural network models is that they can suffer from catastrophic forgetting during retraining on new data. To avoid such disruptions in the continuous learning, one appealing property is the additive nature of ensemble models. In this paper, we propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the catastrophic forgetting problem in tuning pre-trained neural network models.
N.A.
2019Dynamic Few-Shot Visual Learning Without Forgetting by Spyros Gidaris and Nikos Komodakis. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4367–4375, 2018. [imagenet] [vision]
@inproceedings{gidaris2018,
annotation = {_eprint: 1804.09458},
author = {Gidaris, Spyros and Komodakis, Nikos},
booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
doi = {10.1109/CVPR.2018.00459},
isbn = {978-1-5386-6420-9},
issn = {10636919},
keywords = {[imagenet],[vision]},
pages = {4367--4375},
title = {Dynamic Few-Shot Visual Learning Without Forgetting},
url = {http://openaccess.thecvf.com/content_cvpr_2018/html/Gidaris_Dynamic_Few-Shot_Visual_CVPR_2018_paper.html},
year = {2018}
}
The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on 'unseen' categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results.
N.A.
2018HOUDINI: Lifelong Learning as Program Synthesis by Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton and Swarat Chaudhuri. NeurIPS, 8687–8698, 2018.
@inproceedings{valkov2018,
author = {Valkov, Lazar and Chaudhari, Dipak and Srivastava, Akash and Sutton, Charles and Chaudhuri, Swarat},
booktitle = {NeurIPS},
pages = {8687--8698},
title = {HOUDINI: Lifelong Learning as Program Synthesis},
url = {http://papers.nips.cc/paper/8086-houdini-lifelong-learning-as-program-synthesis.pdf},
year = {2018}
}
We present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. Reusing high-level concepts across domains and learning complex procedures are key challenges in lifelong learning. We show that a program synthesis approach that combines gradient descent with combinatorial search over programs can be a more effective response to these challenges than purely neural methods. Our framework, called HOUDINI, represents neural networks as strongly typed, differentiable functional programs that use symbolic higher-order combinators to compose a library of neural functions. Our learning algorithm consists of: (1) a symbolic program synthesizer that performs a type-directed search over parameterized programs, and decides on the library functions to reuse, and the architectures to combine them, while learning a sequence of tasks; and (2) a neural module that trains these programs using stochas-tic gradient descent. We evaluate HOUDINI on three benchmarks that combine perception with the algorithmic tasks of counting, summing, and shortest-path computation. Our experiments show that HOUDINI transfers high-level concepts more effectively than traditional transfer learning and progressive neural networks, and that the typed representation of networks significantly accelerates the search.
N.A.
2018Reinforced Continual Learning by Ju Xu and Zhanxing Zhu. Advances in Neural Information Processing Systems, 899–908, 2018. [cifar] [mnist]
@inproceedings{xu2018,
author = {Xu, Ju and Zhu, Zhanxing},
booktitle = {Advances in Neural Information Processing Systems},
keywords = {[cifar],[mnist]},
pages = {899--908},
title = {Reinforced Continual Learning},
url = {http://papers.nips.cc/paper/7369-reinforced-continual-learning},
year = {2018}
}
Most artificial intelligence models are limited in their ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies. We name it as Reinforced Continual Learning. Our method not only has good performance on preventing catastrophic forgetting but also fits new tasks well. The experiments on sequential classification tasks for variants of MNIST and CIFAR-100 datasets demonstrate that the proposed approach outperforms existing continual learning alternatives for deep networks.
N.A.
2018Lifelong Learning With Dynamically Expandable Networks by Jaehong Yoon, Eunho Yang, Jeongtae Lee and Sung Ju Hwang. ICLR, 11, 2018. [cifar] [mnist] [sparsity]
@inproceedings{yoon2018,
author = {Yoon, Jaehong and Yang, Eunho and Lee, Jeongtae and Hwang, Sung Ju},
booktitle = {ICLR},
keywords = {[cifar],[mnist],[sparsity],disadvantages,lifelong learning,modular,progressive},
language = {en},
note = {The authors propose a method to evaluate the importance of each neuron in the network through the use of sparse connections. The network is then expanded based on the neuron importance for each task.},
pages = {11},
title = {Lifelong Learning With Dynamically Expandable Networks},
url = {https://arxiv.org/abs/1708.01547},
year = {2018}
}
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained siginficantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
N.A.
2018Expert Gate: Lifelong Learning with a Network of Experts by Rahaf Aljundi, Punarjay Chakravarty and Tinne Tuytelaars. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [vision]
@inproceedings{aljundi2017,
annotation = {_eprint: 1611.06194},
author = {Aljundi, Rahaf and Chakravarty, Punarjay and Tuytelaars, Tinne},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
keywords = {[vision]},
title = {Expert Gate: Lifelong Learning with a Network of Experts},
url = {http://arxiv.org/abs/1611.06194},
year = {2017}
}
In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process, data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision of which expert to deploy at test time. We introduce a gating autoencoder that learns a representation for the task at hand, and is used at test time to automatically forward the test sample to the relevant expert. This has the added advantage of being memory efficient as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert with fine-tuning or learning-without-forgetting can be selected. We evaluate our system on image classification and video prediction problems.
N.A.
2017Neurogenesis Deep Learning by Timothy John Draelos, Nadine E Miner, Christopher Lamb, Jonathan A Cox, Craig Michael Vineyard, Kristofor David Carlson, William Mark Severa, Conrad D James and James Bradley Aimone. IJCNN, 2017. [mnist]
@inproceedings{draelos2017,
author = {Draelos, Timothy John and Miner, Nadine E and Lamb, Christopher and Cox, Jonathan A and Vineyard, Craig Michael and Carlson, Kristofor David and Severa, William Mark and James, Conrad D and Aimone, James Bradley},
booktitle = {IJCNN},
keywords = {[mnist],autoencoder,autoencoders,neurogenesis,reconstruction},
language = {English},
note = {The neurogenesis algorithm selectively expand the original multi-layer autoencoder at the neuron level depending on its reconstruction performance measured at each layer. The model is capable of maintaining plasticity while mitigating forgetting through replay of old samples.},
title = {Neurogenesis Deep Learning},
url = {https://www.osti.gov/biblio/1424868},
year = {2017}
}
Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - data processing domains in which humans have long held clear advantages over conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information in an already trained network. As a result, methods for continuous learning are potentially highly impactful in enabling the application of deep networks to dynamic data sets. Here, inspired by the process of adult neurogenesis in the hippocampus, we explore the potential for adding new neurons to deep layers of artificial neural networks in order to facilitate their acquisition of novel information while preserving previously trained data representations. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, demonstrate that neurogenesis is well suited for addressing the stability-plasticity dilemma that has long challenged adaptive machine learning algorithms.
N.A.
2017Net2Net: Accelerating Learning via Knowledge Transfer by Tianqi Chen, Ian Goodfellow and Jonathon Shlens. ICLR, 2016.
@inproceedings{chen2016,
archiveprefix = {arXiv},
author = {Chen, Tianqi and Goodfellow, Ian and Shlens, Jonathon},
booktitle = {ICLR},
eprint = {1511.05641},
eprinttype = {arxiv},
keywords = {Computer Science - Machine Learning},
note = {Comment: ICLR 2016 submission},
shorttitle = {Net2Net},
title = {Net2Net: Accelerating Learning via Knowledge Transfer},
url = {http://arxiv.org/abs/1511.05641},
urldate = {2021-01-07},
year = {2016}
}
We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on the ImageNet dataset.
N.A.
2016Continual Learning through Evolvable Neural Turing Machines by Benno Luders, Mikkel Schlager and Sebastian Risi. NIPS 2016 Workshop on Continual Learning and Deep Networks, 2016.
@inproceedings{luders2016,
author = {Luders, Benno and Schlager, Mikkel and Risi, Sebastian},
booktitle = {NIPS 2016 Workshop on Continual Learning and Deep Networks},
title = {Continual Learning through Evolvable Neural Turing Machines},
url = {https://core.ac.uk/reader/84859350},
year = {2016}
}
Continual learning, i.e. the ability to sequentially learn tasks without catastrophicforgetting of previously learned ones, is an important open challenge in machinelearning. In this paper we take a step in this direction by showing that the recentlyproposedEvolving Neural Turing Machine(ENTM) approach is able to performone-shot learningin a reinforcement learning task without catastrophic forgettingof previously stored associations.
N.A.
2016Progressive Neural Networks by Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu and Raia Hadsell. arXiv, 2016. [mnist]
@article{rusu2016,
author = {Rusu, Andrei A and Rabinowitz, Neil C and Desjardins, Guillaume and Soyer, Hubert and Kirkpatrick, James and Kavukcuoglu, Koray and Pascanu, Razvan and Hadsell, Raia},
journal = {arXiv},
keywords = {[mnist],Computer Science - Machine Learning,lifelong learning,modular,progressive},
language = {en},
note = {The authors rely on a separate feedforward network (column) for each task the model is trained on. Each column is connected through adaptive connections to all the previous ones. The weights of previous columns are frozen once trained. At inference time, given a known task label, the network choose the appropriate column to produce the output, thus preventing forgetting by design.
\par
The authors rely on a separate feedforward network (column) for each task the model is trained on. Each column is connected through adaptive connections to all the previous ones. The weights of previous columns are frozen once trained. At inference time, given a known task label, the network choose the appropriate column to produce the output, thus preventing forgetting by design.},
title = {Progressive Neural Networks},
url = {http://arxiv.org/abs/1606.04671},
year = {2016}
}
Learning to solve complex sequences of tasks— while both leveraging transfer and avoiding catastrophic forgetting— remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
N.A.
2016Knowledge Transfer in Deep Block-Modular Neural Networks by Alexander V. Terekhov, Guglielmo Montone and J. Kevin O’Regan. Conference on Biomimetic and Biohybrid Systems, 268–279, 2015. [vision]
@inproceedings{terekhov2015,
annotation = {_eprint: 1908.08017},
author = {Terekhov, Alexander V. and Montone, Guglielmo and O'Regan, J. Kevin},
booktitle = {Conference on Biomimetic and Biohybrid Systems},
doi = {10.1007/978-3-319-22979-9_27},
isbn = {978-3-319-22978-2},
issn = {16113349},
keywords = {[vision],Deep learning,Knowledge transfer,Modular,Neural networks},
pages = {268--279},
publisher = {Springer Verlag},
title = {Knowledge Transfer in Deep Block-Modular Neural Networks},
url = {http://lpp.psycho.univ-paris5.fr/feel},
volume = {9222},
year = {2015}
}
Although deep neural networks (DNNs) have demonstrated impressive results during the last decade, they remain highly specialized tools, which are trained – often from scratch – to solve each particular task. The human brain, in contrast, significantly re-uses existing capacities when learning to solve new tasks. In the current study we explore a block-modular architecture for DNNs, which allows parts of the existing network to be re-used to solve a new task without a decrease in performance when solving the original task. We show that networks with such architectures can outperform networks trained from scratch, or perform comparably, while having to learn nearly 10 times fewer weights than the networks trained from scratch.
N.A.
2015A Self-Organising Network That Grows When Required by Stephen Marsland, Jonathan Shapiro and Ulrich Nehmzow. Neural Networks, 1041–1058, 2002. [som]
@article{marsland2002,
author = {Marsland, Stephen and Shapiro, Jonathan and Nehmzow, Ulrich},
doi = {10.1016/S0893-6080(02)00078-3},
issn = {08936080},
journal = {Neural Networks},
keywords = {[som],Dimensionality reduction,Growing networks,Inspection,Mobile robotics,Novelty detection,Self-organisation,Topology preservation,Unsupervised learning},
number = {8-9},
pages = {1041--1058},
publisher = {Pergamon},
title = {A Self-Organising Network That Grows When Required},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0893608002000783},
volume = {15},
year = {2002}
}
The ability to grow extra nodes is a potentially useful facility for a self-organising neural network. A network that can add nodes into its map space can approximate the input space more accurately, and often more parsimoniously, than a network with predefined structure and size, such as the Self-Organising Map. In addition, a growing network can deal with dynamic input distributions. Most of the growing networks that have been proposed in the literature add new nodes to support the node that has accumulated the highest error during previous iterations or to support topological structures. This usually means that new nodes are added only when the number of iterations is an integer multiple of some pre-defined constant, $łambda$. This paper suggests a way in which the learning algorithm can add nodes whenever the network in its current state does not sufficiently match the input. In this way the network grows very quickly when new data is presented, but stops growing once the network has matched the data. This is particularly important when we consider dynamic data sets, where the distribution of inputs can change to a new regime after some time. We also demonstrate the preservation of neighbourhood relations in the data by the network. The new network is compared to an existing growing network, the Growing Neural Gas (GNG), on a artificial dataset, showing how the network deals with a change in input distribution after some time. Finally, the new network is applied to several novelty detection tasks and is compared with both the GNG and an unsupervised form of the Reduced Coulomb Energy network on a robotic inspection task and with a Support Vector Machine on two benchmark novelty detection tasks. © 2002 Elsevier Science Ltd. All rights reserved.
N.A.
2002
Benchmarks¶
4 papers
In this section we list all the papers related to new benchmarks proposals for continual learning and related topics.
Defining Benchmarks for Continual Few-Shot Learning by Antreas Antoniou, Massimiliano Patacchiola, Mateusz Ochal and Amos Storkey. arXiv, 2020. [imagenet]
@article{antoniou2020,
annotation = {_eprint: 2004.11967},
author = {Antoniou, Antreas and Patacchiola, Massimiliano and Ochal, Mateusz and Storkey, Amos},
journal = {arXiv},
keywords = {[imagenet]},
title = {Defining Benchmarks for Continual Few-Shot Learning},
url = {http://arxiv.org/abs/2004.11967},
year = {2020}
}
Both few-shot and continual learning have seen substantial progress in the last years due to the introduction of proper benchmarks. That being said, the field has still to frame a suite of benchmarks for the highly desirable setting of continual few-shot learning, where the learner is presented a number of few-shot tasks, one after the other, and then asked to perform well on a validation set stemming from all previously seen tasks. Continual few-shot learning has a small computational footprint and is thus an excellent setting for efficient investigation and experimentation. In this paper we first define a theoretical framework for continual few-shot learning, taking into account recent literature, then we propose a range of flexible benchmarks that unify the evaluation criteria and allows exploring the problem from multiple perspectives. As part of the benchmark, we introduce a compact variant of ImageNet, called SlimageNet64, which retains all original 1000 classes but only contains 200 instances of each one (a total of 200K data-points) downscaled to 64 x 64 pixels. We provide baselines for the proposed benchmarks using a number of popular few-shot learning algorithms, as a result, exposing previously unknown strengths and weaknesses of those algorithms in continual and data-limited settings.
N.A.
2020Continual Reinforcement Learning in 3D Non-Stationary Environments by Vincenzo Lomonaco, Karan Desai, Eugenio Culurciello and Davide Maltoni. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 248–249, 2020.
@inproceedings{lomonaco2020,
author = {Lomonaco, Vincenzo and Desai, Karan and Culurciello, Eugenio and Maltoni, Davide},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages = {248--249},
title = {Continual Reinforcement Learning in 3D Non-Stationary Environments},
url = {https://openaccess.thecvf.com/content_CVPRW_2020/html/w15/Lomonaco_Continual_Reinforcement_Learning_in_3D_Non-Stationary_Environments_CVPRW_2020_paper.html},
year = {2020}
}
High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.
N.A.
2020OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning by Qi She, Fan Feng, Xinyue Hao, Qihan Yang, Chuanlin Lan, Vincenzo Lomonaco, Xuesong Shi, Zhengwei Wang, Yao Guo, Yimin Zhang, Fei Qiao and Rosa H M Chan. arXiv, 1–8, 2019. [vision]
@article{she2019,
annotation = {_eprint: 1911.06487},
author = {She, Qi and Feng, Fan and Hao, Xinyue and Yang, Qihan and Lan, Chuanlin and Lomonaco, Vincenzo and Shi, Xuesong and Wang, Zhengwei and Guo, Yao and Zhang, Yimin and Qiao, Fei and Chan, Rosa H M},
journal = {arXiv},
keywords = {[vision]},
pages = {1--8},
title = {OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning},
url = {http://arxiv.org/abs/1911.06487},
year = {2019}
}
The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models each time a new task becomes available is infeasible due to computational, storage and sometimes privacy issues, while na\$\ backslash\$"\i\ve incremental strategies have been shown to suffer from catastrophic forgetting. It is crucial for the robots to operate continuously under open-set and detrimental conditions with adaptive visual perceptual systems, where lifelong learning is a fundamental capability. However, very few datasets and benchmarks are available to evaluate and compare emerging techniques. To fill this gap, we provide a new lifelong robotic vision dataset ("OpenLORIS-Object") collected via RGB-D cameras. The dataset embeds the challenges faced by a robot in the real-life application and provides new benchmarks for validating lifelong object recognition algorithms. Moreover, we have provided a testbed of \$9\$ state-of-the-art lifelong learning algorithms. Each of them involves \$48\$ tasks with \$4\$ evaluation metrics over the OpenLORIS-Object dataset. The results demonstrate that the object recognition task in the ever-changing difficulty environments is far from being solved and the bottlenecks are at the forward/backward transfer designs. Our dataset and benchmark are publicly available at at \$\ backslash\$href\https://lifelong-robotic-vision.github.io/dataset/object\\\$\ backslash\$underline\https://lifelong-robotic-vision.github.io/dataset/object\\.
N.A.
2019CORe50: A New Dataset and Benchmark for Continuous Object Recognition by Vincenzo Lomonaco and Davide Maltoni. Proceedings of the 1st Annual Conference on Robot Learning, 17–26, 2017. [vision]
@inproceedings{lomonaco2017,
author = {Lomonaco, Vincenzo and Maltoni, Davide},
booktitle = {Proceedings of the 1st Annual Conference on Robot Learning},
editor = {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken},
keywords = {[vision]},
pages = {17--26},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
title = {CORe50: A New Dataset and Benchmark for Continuous Object Recognition},
url = {http://proceedings.mlr.press/v78/lomonaco17a.html},
volume = {78},
year = {2017}
}
Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while naïve incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios.
N.A.
2017
Bioinspired Methods¶
22 papers
In this section we list all the papers related to bioinspired continual learning approaches.
Controlled Forgetting: Targeted Stimulation and Dopaminergic Plasticity Modulation for Unsupervised Lifelong Learning in Spiking Neural Networks by Jason M. Allred and Kaushik Roy. Frontiers in Neuroscience, 7, 2020. [spiking]
@article{allred2020,
author = {Allred, Jason M. and Roy, Kaushik},
doi = {10.3389/fnins.2020.00007},
issn = {1662-453X},
journal = {Frontiers in Neuroscience},
keywords = {[spiking],catastrophic forgetting,continual learning,controlled forgetting,dopaminergic learning,lifelong learning,Spike Timing Dependent Plasticity,Spiking Neural Networks,stability-plasticity dilemma},
pages = {7},
publisher = {Frontiers Media S.A.},
title = {Controlled Forgetting: Targeted Stimulation and Dopaminergic Plasticity Modulation for Unsupervised Lifelong Learning in Spiking Neural Networks},
url = {https://www.frontiersin.org/article/10.3389/fnins.2020.00007/full},
volume = {14},
year = {2020}
}
Stochastic gradient descent requires that training samples be drawn from a uniformly random distribution of the data. For a deployed system that must learn online from an uncontrolled and unknown environment, the ordering of input samples often fails to meet this criterion, making lifelong learning a difficult challenge. We exploit the locality of the unsupervised Spike Timing Dependent Plasticity (STDP) learning rule to target local representations in a Spiking Neural Network (SNN) to adapt to novel information while protecting essential information in the remainder of the SNN from catastrophic forgetting. In our Controlled Forgetting Networks (CFNs), novel information triggers stimulated firing and heterogeneously modulated plasticity, inspired by biological dopamine signals, to cause rapid and isolated adaptation in the synapses of neurons associated with outlier information. This targeting controls the forgetting process in a way that reduces the degradation of accuracy for older tasks while learning new tasks. Our experimental results on the MNIST dataset validate the capability of CFNs to learn successfully over time from an unknown, changing environment, achieving 95.24% accuracy, which we believe is the best unsupervised accuracy ever achieved by a fixed-size, single-layer SNN on a completely disjoint MNIST dataset.
N.A.
2020Cognitively-Inspired Model for Incremental Learning Using a Few Examples by A. Ayub and A. R. Wagner. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. [cifar] [cubs] [dual]
@inproceedings{ayub2020,
author = {Ayub, A. and Wagner, A. R.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
keywords = {[cifar],[cubs],[dual],catastrophic forgetting,cognitively-inspired learning,continual learning},
title = {Cognitively-Inspired Model for Incremental Learning Using a Few Examples},
url = {https://openaccess.thecvf.com/content_CVPRW_2020/html/w15/Ayub_Cognitively-Inspired_Model_for_Incremental_Learning_Using_a_Few_Examples_CVPRW_2020_paper.html},
year = {2020}
}
Incremental learning attempts to develop a classifier which learns continuously from a stream of data segregated into different classes. Deep learning approaches suffer from catastrophic forgetting when learning classes incrementally, while most incremental learning approaches require a large amount of training data per class. We examine the problem of incremental learning using only a few training examples, referred to as Few-Shot Incremental Learning (FSIL). To solve this problem, we propose a novel approach inspired by the concept learning model of the hippocampus and the neocortex that represents each image class as centroids and does not suffer from catastrophic forgetting. We evaluate our approach on three class-incremental learning benchmarks: Caltech-101, CUBS-200-2011 and CIFAR-100 for incremental and few-shot incremental learning and show that our approach achieves state-of-the-art results in terms of classification accuracy over all learned classes.
N.A.
2020Storing Encoded Episodes as Concepts for Continual Learning by Ali Ayub and Alan R. Wagner. arXiv, 2020. [generative] [imagenet] [mnist]
@article{ayub2020a,
annotation = {_eprint: 2007.06637},
author = {Ayub, Ali and Wagner, Alan R.},
journal = {arXiv},
keywords = {[generative],[imagenet],[mnist],catastrophic forgetting,continual learning},
title = {Storing Encoded Episodes as Concepts for Continual Learning},
url = {https://arxiv.org/abs/2007.06637 http://arxiv.org/abs/2007.06637},
year = {2020}
}
The two main challenges faced by continual learning approaches are catastrophic forgetting and memory limitations on the storage of data. To cope with these challenges, we propose a novel, cognitively-inspired approach which trains autoencoders with Neural Style Transfer to encode and store images. Reconstructed images from encoded episodes are replayed when training the classifier model on a new task to avoid catastrophic forgetting. The loss function for the reconstructed images is weighted to reduce its effect during classifier training to cope with image degradation. When the system runs out of memory the encoded episodes are converted into centroids and covariance matrices, which are used to generate pseudo-images during classifier training, keeping classifier performance stable with less memory. Our approach increases classification accuracy by 13-17% over state-of-the-art methods on benchmark datasets, while requiring 78% less storage space.
N.A.
2020Spiking Neural Predictive Coding for Continual Learning from Data Streams by and Alexander Ororbia. arXiv, 2020. [spiking]
@article{ororbia2020,
author = {Ororbia, Alexander},
journal = {arXiv},
keywords = {[spiking],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Quantitative Biology - Neurons and Cognition},
note = {Comment: Revised version of manuscript – includes updated experimental results arXiv: 1908.08655},
title = {Spiking Neural Predictive Coding for Continual Learning from Data Streams},
url = {http://arxiv.org/abs/1908.08655},
year = {2020}
}
For energy-efficient computation in specialized neuromorphic hardware, we present the Spiking Neural Coding Network, an instantiation of a family of artificial neural models strongly motivated by the theory of predictive coding. The model, in essence, works by operating in a never-ending process of "guess-and-check", where neurons predict the activity values of one another and then immediately adjust their own activities to make better future predictions. The interactive, iterative nature of our neural system fits well into the continuous time formulation of data sensory stream prediction and, as we show, the model's structure yields a simple, local synaptic update rule, which could be used to complement or replace online spike-timing dependent plasticity. In this article, we experiment with an instantiation of our model that consists of leaky integrate-and-fire units. However, the general framework within which our model is situated can naturally incorporate more complex, formal neurons such as the Hodgkin-Huxley model. Our experimental results in pattern recognition demonstrate the potential of the proposed model when binary spike trains are the primary paradigm for inter-neuron communication. Notably, our model is competitive in terms of classification performance, can conduct online semi-supervised learning, naturally experiences less forgetting when learning from a sequence of tasks, and is more computationally economical and biologically-plausible than popular artificial neural networks.
N.A.
2020Brain-like Replay for Continual Learning with Artificial Neural Networks by Gido M. van de Ven, Hava T. Siegelmann and Andreas S. Tolias. International Conference on Learning Representations (Workshop on Bridging AI and Cognitive Science), 2020. [cifar]
@inproceedings{vandeven2020a,
author = {van de Ven, Gido M. and Siegelmann, Hava T. and Tolias, Andreas S.},
booktitle = {International Conference on Learning Representations (Workshop on Bridging AI and Cognitive Science)},
keywords = {[cifar]},
title = {Brain-like Replay for Continual Learning with Artificial Neural Networks},
url = {https://baicsworkshop.github.io/pdf/BAICS_8.pdf},
year = {2020}
}
Artificial neural networks suffer from catastrophic forgetting. Unlike humans, when these networks are trained on something new, they rapidly forget what was learned before. In the brain, a mechanism thought to be important for protecting memories is the replay of neuronal activity patterns representing those memories. In artificial neural networks, such memory replay has been implemented in the form of `generative replay', which can successfully prevent catastrophic forgetting in a range of toy examples. Scaling up generative replay to problems with more complex inputs, however, turns out to be challenging. We propose a new, more brain-like variant of replay in which internal or hidden representations are replayed that are generated by the network's own, context-modulated feedback connections. In contrast to established continual learning methods, our method achieves acceptable performance on the challenging problem of class-incremental learning on natural images without relying on stored data.
N.A.
2020Selfless Sequential Learning by Rahaf Aljundi, Marcus Rohrbach and Tinne Tuytelaars. ICLR, 2019. [cifar] [mnist] [sparsity]
@inproceedings{aljundi2019c,
author = {Aljundi, Rahaf and Rohrbach, Marcus and Tuytelaars, Tinne},
booktitle = {ICLR},
keywords = {[cifar],[mnist],[sparsity]},
note = {The authors combine multiple penalizations to (1) induce sparse activations through lateral inhibitions between neurons and to (2) penalize changes in most important weights in order to prevent forgetting.},
title = {Selfless Sequential Learning},
url = {https://openreview.net/forum?id=Bkxbrn0cYX},
year = {2019}
}
Sequential learning, also called lifelong learning, studies the problem of learning tasks in a sequence with access restricted to only the data of the current task. In this paper we look at a...
N.A.
2019Backpropamine: Training Self-Modifying Neural Networks with Differentiable Neuromodulated Plasticity by Thomas Miconi, Aditya Rawal, Jeff Clune and Kenneth O Stanley. ICLR, 2019.
@inproceedings{miconi2019,
author = {Miconi, Thomas and Rawal, Aditya and Clune, Jeff and Stanley, Kenneth O},
booktitle = {ICLR},
keywords = {fashion,mnist,spiking},
title = {Backpropamine: Training Self-Modifying Neural Networks with Differentiable Neuromodulated Plasticity},
url = {https://openreview.net/pdf?id=r1lrAiA5Ym},
year = {2019}
}
The impressive lifelong learning in animal brains is primarily enabled by plastic changes in synaptic connectivity. Importantly, these changes are not passive, but are actively controlled by neuromodulation, which is itself under the control of the brain. The resulting self-modifying abilities of the brain play an important role in learning and adaptation, and are a major basis for biological reinforcement learning. Here we show for the first time that artificial neural networks with such neuromodulated plasticity can be trained with gradient descent. Extending previous work on differentiable Hebbian plasticity, we propose a differentiable formulation for the neuromodulation of plasticity. We show that neuromodulated plasticity improves the performance of neural networks on both reinforcement learning and supervised learning tasks. In one task, neuromodulated plastic LSTMs with millions of parameters outperform standard LSTMs on a benchmark language modeling task (controlling for the number of parameters). We conclude that differentiable neuromodulation of plasticity offers a powerful new framework for training neural networks.
N.A.
2019Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations by Alexander Ororbia, Ankur Mali, C Lee Giles and Daniel Kifer. arXiv, 2019. [mnist] [rnn] [spiking]
@article{ororbia2019,
author = {Ororbia, Alexander and Mali, Ankur and Giles, C Lee and Kifer, Daniel},
journal = {arXiv},
keywords = {[mnist],[rnn],[spiking],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,credi assignment},
note = {Comment: Important revisions made throughout (additional items/results added, including a complexity analysis) arXiv: 1810.07411},
title = {Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations},
url = {http://arxiv.org/abs/1810.07411},
year = {2019}
}
Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications. However, training these models often relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the Parallel Temporal Neural Coding Network (P-TNCN), a biologically inspired model trained by the learning algorithm we call Local Representation Alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure to other back-propagation through time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can in some instances outperform full back-propagation through time as well as variants such as sparse attentive back-tracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new datasets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and online continual sequence modeling.
N.A.
2019Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting When Learning Cumulatively by Alexander Ororbia, Ankur Mali, Daniel Kifer and C Lee Giles. arXiv, 1–11, 2019. [fashion] [mnist] [sparsity]
@article{ororbia2019a,
annotation = {_eprint: 1905.10696},
author = {Ororbia, Alexander and Mali, Ankur and Kifer, Daniel and Giles, C Lee},
journal = {arXiv},
keywords = {[fashion],[mnist],[sparsity]},
pages = {1--11},
title = {Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting When Learning Cumulatively},
url = {http://arxiv.org/abs/1905.10696},
year = {2019}
}
In lifelong learning systems, especially those based on artificial neural networks, one of the biggest obstacles is the severe inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we present a new connectionist model, the Sequential Neural Coding Network, and its learning procedure, grounded in the neurocognitive theory of predictive coding. The architecture experiences significantly less forgetting as compared to standard neural models and outperforms a variety of previously proposed remedies and methods when trained across multiple task datasets in a stream-like fashion. The promising performance demonstrated in our experiments offers motivation that directly incorporating mechanisms prominent in real neuronal systems, such as competition, sparse activation patterns, and iterative input processing, can create viable pathways for tackling the challenge of lifelong machine learning.
N.A.
2019FearNet: Brain-Inspired Model for Incremental Learning by Ronald Kemker and Christopher Kanan. ICLR, 2018. [audio] [cifar] [generative]
@inproceedings{kemker2018,
author = {Kemker, Ronald and Kanan, Christopher},
booktitle = {ICLR},
keywords = {[audio],[cifar],[generative]},
title = {FearNet: Brain-Inspired Model for Incremental Learning},
url = {https://openreview.net/pdf?id=SJ1Xmf-Rb},
year = {2018}
}
Incremental class learning involves sequentially learning classes in bursts of examples from the same class. This violates the assumptions that underlie methods for training standard deep neural networks, and will cause them to suffer from catastrophic forgetting. Arguably, the best method for incremental class learning is iCaRL, but it requires storing training examples for each class, making it challenging to scale. Here, we propose FearNet for incremental class learning. FearNet is a generative model that does not store previous examples, making it memory efficient. FearNet uses a brain-inspired dual-memory system in which new memories are consolidated from a network for recent memories inspired by the mammalian hippocampal complex to a network for long-term storage inspired by medial prefrontal cortex. Memory consolidation is inspired by mechanisms that occur during sleep. FearNet also uses a module inspired by the basolateral amygdala for determining which memory system to use for recall. FearNet achieves state-of-the-art performance at incremental class learning on image (CIFAR-100, CUB-200) and audio classification (AudioSet) benchmarks.
N.A.
2018Differentiable Plasticity: Training Plastic Neural Networks with Backpropagation by Thomas Miconi, Kenneth Stanley and Jeff Clune. International Conference on Machine Learning, 3559–3568, 2018.
@inproceedings{miconi2018,
author = {Miconi, Thomas and Stanley, Kenneth and Clune, Jeff},
booktitle = {International Conference on Machine Learning},
keywords = {plasticity,recurrent},
language = {en},
pages = {3559--3568},
shorttitle = {Differentiable Plasticity},
title = {Differentiable Plasticity: Training Plastic Neural Networks with Backpropagation},
url = {http://proceedings.mlr.press/v80/miconi18a.html},
year = {2018}
}
How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains:...
N.A.
2018Lifelong Learning of Spatiotemporal Representations With Dual-Memory Recurrent Self-Organization by German I Parisi, Jun Tani, Cornelius Weber and Stefan Wermter. Frontiers in Neurorobotics, 2018. [core50] [dual] [rnn] [som]
@article{parisi2018,
author = {Parisi, German I and Tani, Jun and Weber, Cornelius and Wermter, Stefan},
doi = {10.3389/fnbot.2018.00078},
issn = {1662-5218},
journal = {Frontiers in Neurorobotics},
keywords = {[core50],[dual],[rnn],[som],CLS,Incremental Learning,Lifelong learning,Memory,object recognition systems,Self-organizing Network},
language = {English},
title = {Lifelong Learning of Spatiotemporal Representations With Dual-Memory Recurrent Self-Organization},
url = {https://www.frontiersin.org/articles/10.3389/fnbot.2018.00078/full},
volume = {12},
year = {2018}
}
Artificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as lifelong learning and represents a long-standing challenge for neural network models due to catastrophic forgetting in which novel sensory experience interferes with existing representations and leads to abrupt decreases in the performance on previously acquired knowledge. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to. In more natural settings, sequential information may become progressively available over time and access to previous experience may be restricted. Therefore, specialized neural network mechanisms are required that adapt to novel sequential experience while preventing disruptive interference with existing representations. In this paper, we propose a dual-memory self-organizing architecture for lifelong learning scenarios. The architecture comprises two growing recurrent networks with the complementary tasks of learning object instances (episodic memory) and categories (semantic memory). Both growing networks can expand in response to novel sensory experience: the episodic memory learns fine-grained spatiotemporal representations of object instances in an unsupervised fashion while the semantic memory uses task-relevant signals to regulate structural plasticity levels and develop more compact representations from episodic experience. For the consolidation of knowledge in the absence of external sensory input, the episodic memory periodically replays trajectories of neural reactivations. We evaluate the proposed model on the CORe50 benchmark dataset for continuous object recognition, showing that we significantly outperform current methods of lifelong learning in three different incremental learning scenarios.
N.A.
2018SLAYER: Spike Layer Error Reassignment in Time by Sumit Bam Shrestha and Garrick Orchard. Advances in Neural Information Processing Systems 31, 1412–1421, 2018.
@incollection{shrestha2018,
author = {Shrestha, Sumit Bam and Orchard, Garrick},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {Bengio, S and Wallach, H and Larochelle, H and Grauman, K and Cesa-Bianchi, N and Garnett, R},
pages = {1412--1421},
publisher = {Curran Associates, Inc.},
shorttitle = {SLAYER},
title = {SLAYER: Spike Layer Error Reassignment in Time},
url = {http://papers.nips.cc/paper/7415-slayer-spike-layer-error-reassignment-in-time.pdf},
year = {2018}
}
N.A.
N.A.
2018Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World by Sahil Garg, Irina Rish, Guillermo Cecchi and Aurelie Lozano. IJCAI International Joint Conference on Artificial Intelligence, 1696–1702, 2017. [nlp] [vision]
@article{garg2017,
annotation = {_eprint: 1701.06106},
author = {Garg, Sahil and Rish, Irina and Cecchi, Guillermo and Lozano, Aurelie},
doi = {10.24963/ijcai.2017/235},
isbn = {9780999241103},
issn = {10450823},
journal = {IJCAI International Joint Conference on Artificial Intelligence},
keywords = {[nlp],[vision]},
pages = {1696--1702},
title = {Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World},
url = {https://arxiv.org/abs/1701.06106},
year = {2017}
}
We address the problem of online model adaptation when learning representations from non-stationary data streams. Specifically, we focus here on online dictionary learning (i.e. sparse linear autoencoder), and propose a simple but effective online modelselection approach involving "birth" (addition) and "death" (removal) of hidden units representing dictionary elements, in response to changing inputs; we draw inspiration from the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with better adaptation to new environments. Empirical evaluation on real-life datasets (images and text), as well as on synthetic data, demonstrates that the proposed approach can considerably outperform the state-of-art non-adaptive online sparse coding of [Mairal et al., 2009] in the presence of non-stationary data. Moreover, we identify certain data- and model properties associated with such improvements.
N.A.
2017Diffusion-Based Neuromodulation Can Eliminate Catastrophic Forgetting in Simple Neural Networks by Roby Velez and Jeff Clune. PLoS ONE, 1–31, 2017.
@article{velez2017,
annotation = {_eprint: 1705.07241},
author = {Velez, Roby and Clune, Jeff},
doi = {10.1371/journal.pone.0187736},
isbn = {1111111111},
issn = {19326203},
journal = {PLoS ONE},
number = {11},
pages = {1--31},
title = {Diffusion-Based Neuromodulation Can Eliminate Catastrophic Forgetting in Simple Neural Networks},
url = {http://arxiv.org/abs/1705.07241 http://dx.doi.org/10.1371/journal.pone.0187736},
volume = {12},
year = {2017}
}
A long-term goal of AI is to produce agents that can learn a diversity of skills throughout their lifetimes and continuously improve those skills via experience. A longstanding obstacle towards that goal is catastrophic forgetting, which is when learning new information erases previously learned information. Catastrophic forgetting occurs in artificial neural networks (ANNs), which have fueled most recent advances in AI. A recent paper proposed that catastrophic forgetting in ANNs can be reduced by promoting modularity, which can limit forgetting by isolating task information to specific clusters of nodes and connections (functional modules). While the prior work did show that modular ANNs suffered less from catastrophic forgetting, it was not able to produce ANNs that possessed task-specific functional modules, thereby leaving the main theory regarding modularity and forgetting untested. We introduce diffusion-based neuromodulation, which simulates the release of diffusing, neuromodulatory chemicals within an ANN that can modulate (i.e. up or down regulate) learning in a spatial region. On the simple diagnostic problem from the prior work, diffusion-based neuromodulation 1) induces task-specific learning in groups of nodes and connections (task-specific localized learning), which 2) produces functional modules for each subtask, and 3) yields higher performance by eliminating catastrophic forgetting. Overall, our results suggest that diffusion-based neuromodulation promotes task-specific localized learning and functional modularity, which can help solve the challenging, but important problem of catastrophic forgetting.
N.A.
2017How Do Neurons Operate on Sparse Distributed Representations? A Mathematical Theory of Sparsity, Neurons and Active Dendrites by Subutai Ahmad and Jeff Hawkins. arXiv, 1–23, 2016. [hebbian] [sparsity]
@article{ahmad2016,
annotation = {_eprint: 1601.00720},
author = {Ahmad, Subutai and Hawkins, Jeff},
journal = {arXiv},
keywords = {[hebbian],[sparsity],active dendrites,neocortex,neurons,nmda spike,sparse coding},
pages = {1--23},
title = {How Do Neurons Operate on Sparse Distributed Representations? A Mathematical Theory of Sparsity, Neurons and Active Dendrites},
url = {http://arxiv.org/abs/1601.00720},
year = {2016}
}
We propose a formal mathematical model for sparse representations and active dendrites in neocortex. Our model is inspired by recent experimental findings on active dendritic processing and NMDA spikes in pyramidal neurons. These experimental and modeling studies suggest that the basic unit of pattern memory in the neocortex is instantiated by small clusters of synapses operated on by localized non-linear dendritic processes. We derive a number of scaling laws that characterize the accuracy of such dendrites in detecting activation patterns in a neuronal population under adverse conditions. We introduce the union property which shows that synapses for multiple patterns can be randomly mixed together within a segment and still lead to highly accurate recognition. We describe simulation results that provide further insight into sparse representations as well as two primary results. First we show that pattern recognition by a neuron with active dendrites can be extremely accurate and robust with high dimensional sparse inputs even when using a tiny number of synapses to recognize large patterns. Second, equations representing recognition accuracy of a dendrite predict optimal NMDA spiking thresholds under a generous set of assumptions. The prediction tightly matches NMDA spiking thresholds measured in the literature. Our model matches many of the known properties of pyramidal neurons. As such the theory provides a mathematical framework for understanding the benefits and limits of sparse representations in cortical networks.
N.A.
2016Continuous Online Sequence Learning with an Unsupervised Neural Network Model by Yuwei Cui, Subutai Ahmad and Jeff Hawkins. Neural Computation, 2474–2504, 2016. [spiking]
@article{cui2016,
author = {Cui, Yuwei and Ahmad, Subutai and Hawkins, Jeff},
doi = {10.1162/NECO_a_00893},
issn = {0899-7667},
journal = {Neural Computation},
keywords = {[spiking],htm},
note = {Publisher: MIT Press},
number = {11},
pages = {2474--2504},
title = {Continuous Online Sequence Learning with an Unsupervised Neural Network Model},
url = {https://doi.org/10.1162/NECO_a_00893},
volume = {28},
year = {2016}
}
The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory (HTM) sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variable order temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods— autoregressive integrated moving average; feedforward neural networks— time delay neural network and online sequential extreme learning machine; and recurrent neural networks— long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.
N.A.
2016Backpropagation of Hebbian Plasticity for Continual Learning by and Thomas Miconi. NIPS Workshop - Continual Learning, 5, 2016.
@article{miconi2016,
author = {Miconi, Thomas},
journal = {NIPS Workshop - Continual Learning},
keywords = {hebbian,workshop},
language = {en},
pages = {5},
title = {Backpropagation of Hebbian Plasticity for Continual Learning},
url = {https://c38663e3-a-62cb3a1a-s-sites.googlegroups.com/site/cldlnips2016/CLDL-2016_paper_2.pdf?attachauth=ANoY7cpkpkdHxt2kA42TazATZVrBcNkcKZBbB_QkYQ2MQDe-Hz-inAnoBcb2Rl-6VCBWzWbjKjULT3tkSAtt1hdk66nh4Gy28ObAg7jKgLXNMzPTOYyB_roYB1nPaDNNkfQQhJJGXUdSexlxXDBUU0S},
year = {2016}
}
N.A.
N.A.
2016Mitigation of Catastrophic Forgetting in Recurrent Neural Networks Using a Fixed Expansion Layer by Robert Coop and Itamar Arel. The 2013 International Joint Conference on Neural Networks (IJCNN), 1–7, 2013. [mnist] [rnn] [sparsity]
@inproceedings{coop2013,
address = {Dallas, TX, USA},
author = {Coop, Robert and Arel, Itamar},
booktitle = {The 2013 International Joint Conference on Neural Networks (IJCNN)},
doi = {10.1109/IJCNN.2013.6707047},
isbn = {978-1-4673-6129-3 978-1-4673-6128-6},
keywords = {[mnist],[rnn],[sparsity],fel,recurrent fel},
language = {en},
pages = {1--7},
publisher = {IEEE},
title = {Mitigation of Catastrophic Forgetting in Recurrent Neural Networks Using a Fixed Expansion Layer},
url = {http://ieeexplore.ieee.org/document/6707047/},
year = {2013}
}
Catastrophic forgetting (or catastrophic interference) in supervised learning systems is the drastic loss of previously stored information caused by the learning of new information. While substantial work has been published on addressing catastrophic forgetting in memoryless supervised learning systems (e.g. feedforward neural networks), the problem has received limited attention in the context of dynamic systems, particularly recurrent neural networks. In this paper, we introduce a solution for mitigating catastrophic forgetting in RNNs based on enhancing the Fixed Expansion Layer (FEL) neural network which exploits sparse coding of hidden neuron activations. Simulation results on several non-stationary data sets clearly demonstrate the effectiveness of the proposed architecture.
N.A.
2013Compete to Compute by Rupesh Kumar Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez and Jürgen Schmidhuber. Advances in Neural Information Processing Systems 26, 2013. [mnist] [sparsity]
@inproceedings{srivastava2013,
author = {Srivastava, Rupesh Kumar and Masci, Jonathan and Kazerounian, Sohrob and Gomez, Faustino and Schmidhuber, Jürgen},
booktitle = {Advances in Neural Information Processing Systems 26},
keywords = {[mnist],[sparsity]},
title = {Compete to Compute},
url = {http://papers.nips.cc/paper/5059-compete-to-compute.pdf},
year = {2013}
}
Local competition among neighboring neurons is common in biological neu-ral networks (NNs). In this paper, we apply the concept to gradient-based, backprop-trained artificial multilayer NNs. NNs with competing linear units tend to outperform those with non-competing nonlinear units, and avoid catastrophic forgetting when training sets change over time.
N.A.
2013Mitigation of Catastrophic Interference in Neural Networks Using a Fixed Expansion Layer by Robert Coop and Itamar Arel. 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), 726–729, 2012. [sparsity]
@inproceedings{coop2012,
author = {Coop, Robert and Arel, Itamar},
booktitle = {2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS)},
doi = {10.1109/MWSCAS.2012.6292123},
isbn = {978-1-4673-2527-1},
keywords = {[sparsity],Accuracy,binary activations,Biological neural networks,catastrophic forgetting,catastrophic interference,Feedforward neural networks,fixed expansion layer feedforward neural network,Interference,multilayer perceptron,multilayer perceptrons,Neurons,non-stationary inputs,sparse neurons,Training},
note = {ISSN: 1548-3746},
pages = {726--729},
publisher = {IEEE},
title = {Mitigation of Catastrophic Interference in Neural Networks Using a Fixed Expansion Layer},
url = {http://ieeexplore.ieee.org/document/6292123/},
year = {2012}
}
In this paper we present the fixed expansion layer (FEL) feedforward neural network designed for balancing plasticity and stability in the presence of non-stationary inputs. Catastrophic interference (or catastrophic forgetting) refers to the drastic loss of previously learned information when a neural network is trained on new or different information. The goal of the FEL network is to reduce the effect of catastrophic interference by augmenting a multilayer perceptron with a layer of sparse neurons with binary activations. We compare the FEL network's performance to that of other algorithms designed to combat the effects of catastrophic interference and demonstrate that the FEL network is able to retain information for significantly longer periods of time with substantially lower computational requirements.
N.A.
2012Synaptic Plasticity: Taming the Beast by L F Abbott and Sacha B Nelson. Nature Neuroscience, 1178–1183, 2000. [hebbian]
@article{abbott2000,
author = {Abbott, L F and Nelson, Sacha B},
doi = {10.1038/81453},
issn = {1546-1726},
journal = {Nature Neuroscience},
keywords = {[hebbian]},
language = {en},
number = {11},
pages = {1178--1183},
shorttitle = {Synaptic Plasticity},
title = {Synaptic Plasticity: Taming the Beast},
url = {https://www.nature.com/articles/nn1100_1178},
volume = {3},
year = {2000}
}
Synaptic plasticity provides the basis for most models of learning, memory and development in neural circuits. To generate realistic results, synapse-specific Hebbian forms of plasticity, such as long-term potentiation and depression, must be augmented by global processes that regulate overall levels of neuronal and network activity. Regulatory processes are often as important as the more intensively studied Hebbian processes in determining the consequences of synaptic plasticity for network function. Recent experimental results suggest several novel mechanisms for regulating levels of activity in conjunction with Hebbian synaptic modification. We review three of them— synaptic scaling, spike-timing dependent plasticity and synaptic redistribution— and discuss their functional implications.
N.A.
2000
Catastrophic Forgetting Studies¶
9 papers
In this section we list all the major contributions trying to understand catastrophic forgetting and its implication in machines that learn continually.
Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget by Guy Davidson and Michael C Mozer. CVPR, 9282–9293, 2020. [vision]
@inproceedings{davidson2020,
author = {Davidson, Guy and Mozer, Michael C},
booktitle = {CVPR},
keywords = {[vision]},
pages = {9282--9293},
title = {Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget},
url = {https://openaccess.thecvf.com/content_CVPR_2020/papers/Davidson_Sequential_Mastery_of_Multiple_Visual_Tasks_Networks_Naturally_Learn_to_CVPR_2020_paper.pdf},
year = {2020}
}
We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. This setting corresponds to that which human learners face as they acquire domain expertise serially, for example, as an individual studies a textbook. Through simulations involving sequences of ten related visual tasks, we find reason for optimism that nets will scale well as they advance from having a single skill to becoming multi-skill domain experts. We observe two key phenomena. First, forward facilitation-the accelerated learning of task n+1 having learned n previous tasks-grows with n. Second, backward interference-the forgetting of the n previous tasks when learning task n + 1-diminishes with n. Amplifying forward facilitation is the goal of research on metalearning, and attenuating backward interference is the goal of research on catastrophic forgetting. We find that both of these goals are attained simply through broader exposure to a domain.
N.A.
2020Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization by Giang Nguyen, Shuan Chen, Thao Do, Tae Joon Jun, Ho-Jin Choi and Daeyoung Kim. arXiv, 2020. [vision]
@article{nguyen2020,
annotation = {_eprint: 2001.01578},
author = {Nguyen, Giang and Chen, Shuan and Do, Thao and Jun, Tae Joon and Choi, Ho-Jin and Kim, Daeyoung},
journal = {arXiv},
keywords = {[vision]},
title = {Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization},
url = {http://arxiv.org/abs/2001.01578},
year = {2020}
}
Interpreting the behaviors of Deep Neural Networks (usually considered as a black box) is critical especially when they are now being widely adopted over diverse aspects of human life. Taking the advancements from Explainable Artificial Intelligent, this paper proposes a novel technique called Auto DeepVis to dissect catastrophic forgetting in continual learning. A new method to deal with catastrophic forgetting named critical freezing is also introduced upon investigating the dilemma by Auto DeepVis. Experiments on a captioning model meticulously present how catastrophic forgetting happens, particularly showing which components are forgetting or changing. The effectiveness of our technique is then assessed; and more precisely, critical freezing claims the best performance on both previous and coming tasks over baselines, proving the capability of the investigation. Our techniques could not only be supplementary to existing solutions for completely eradicating catastrophic forgetting for life-long learning but also explainable.
N.A.
2020Toward Understanding Catastrophic Forgetting in Continual Learning by Cuong V Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan and Stefano Soatto. arXiv, 2019. [cifar] [mnist]
@article{nguyen2019a,
annotation = {_eprint: 1908.01091},
author = {Nguyen, Cuong V and Achille, Alessandro and Lam, Michael and Hassner, Tal and Mahadevan, Vijay and Soatto, Stefano},
journal = {arXiv},
keywords = {[cifar],[mnist]},
title = {Toward Understanding Catastrophic Forgetting in Continual Learning},
url = {http://arxiv.org/abs/1908.01091},
year = {2019}
}
We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlation analysis to specify and analyze the properties we are interested in. As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity. We show that error rates are strongly and positively correlated to a task sequence's total complexity for some state-of-the-art algorithms. We also show that, surprisingly, the error rates have no or even negative correlations in some cases to sequential heterogeneity. Our findings suggest directions for improving continual learning benchmarks and methods.
N.A.
2019An Empirical Study of Example Forgetting during Deep Neural Network Learning by Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio and Geoffrey J Gordon. International Conference on Learning Representations, 2019. [cifar] [mnist]
@inproceedings{toneva2019,
author = {Toneva, Mariya and Sordoni, Alessandro and des Combes, Remi Tachet and Trischler, Adam and Bengio, Yoshua and Gordon, Geoffrey J},
booktitle = {International Conference on Learning Representations},
keywords = {[cifar],[mnist]},
note = {An interesting aspect of this paper is related to the study of unforgettable patterns and how they influence performance in terms of forgetting.},
title = {An Empirical Study of Example Forgetting during Deep Neural Network Learning},
url = {https://openreview.net/forum?id=BJlxm30cKm},
year = {2019}
}
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a...
N.A.
2019Localizing Catastrophic Forgetting in Neural Networks by Felix Wiewel and Bin Yang. arXiv, 2019. [mnist]
@article{wiewel2019,
annotation = {_eprint: 1906.02568},
author = {Wiewel, Felix and Yang, Bin},
journal = {arXiv},
keywords = {[mnist]},
title = {Localizing Catastrophic Forgetting in Neural Networks},
url = {http://arxiv.org/abs/1906.02568},
year = {2019}
}
Artificial neural networks (ANNs) suffer from catastrophic forgetting when trained on a sequence of tasks. While this phenomenon was studied in the past, there is only very limited recent research on this phenomenon. We propose a method for determining the contribution of individual parameters in an ANN to catastrophic forgetting. The method is used to analyze an ANNs response to three different continual learning scenarios.
N.A.
2019Don’t Forget, There Is More than Forgetting: New Metrics for Continual Learning by Natalia Díaz-Rodrǵuez, Vincenzo Lomonaco, David Filliat and Davide Maltoni. arXiv, 2018. [cifar] [framework]
@article{diaz-rodriguez2018,
author = {Díaz-Rodr\ǵuez, Natalia and Lomonaco, Vincenzo and Filliat, David and Maltoni, Davide},
journal = {arXiv},
keywords = {[cifar],[framework],68T05,Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Rec,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,cs.AI,cs.CV,cs.LG,cs.NE,stat.ML},
note = {arXiv: 1810.13166},
shorttitle = {Don't Forget, There Is More than Forgetting},
title = {Don't Forget, There Is More than Forgetting: New Metrics for Continual Learning},
url = {http://arxiv.org/abs/1810.13166},
year = {2018}
}
Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics accounting for several factors we believe have practical implications worth considering in the deployment of real AI systems that learn continually: accuracy or performance over time, backward and forward knowledge transfer, memory overhead as well as computational efficiency. Drawing inspiration from the standard Multi-Attribute Value Theory (MAVT) we further propose to fuse these metrics into a single score for ranking purposes and we evaluate our proposal with five continual learning strategies on the iCIFAR-100 continual learning benchmark.
N.A.
2018The Stability-Plasticity Dilemma: Investigating the Continuum from Catastrophic Forgetting to Age-Limited Learning Effects by Martial Mermillod, Aurélia Bugaiska and Patrick Bonin. Frontiers in Psychology, 504, 2013.
@article{mermillod2013,
author = {Mermillod, Martial and Bugaiska, Aurélia and Bonin, Patrick},
doi = {10.3389/fpsyg.2013.00504},
issn = {1664-1078},
journal = {Frontiers in Psychology},
keywords = {Mermillod2013a},
number = {August},
pages = {504},
pmid = {23935590},
title = {The Stability-Plasticity Dilemma: Investigating the Continuum from Catastrophic Forgetting to Age-Limited Learning Effects},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3732997%7B%5C&%7Dtool=pmcentrez%7B%5C&%7Drendertype=abstract http://journal.frontiersin.org/article/10.3389/fpsyg.2013.00504/abstract},
volume = {4},
year = {2013}
}
N.A.
N.A.
2013Catastrophic Forgetting in Connectionist Networks by and Robert French. Trends in Cognitive Sciences, 128–135, 1999. [sparsity]
@article{french1999,
author = {French, Robert},
doi = {10.1016/S1364-6613(99)01294-2},
issn = {1364-6613, 1879-307X},
journal = {Trends in Cognitive Sciences},
keywords = {[sparsity],biology,Catastrophic forgetting,Connectionism,Connectionist networks,Interference,Learning,Memory,Neuroscience},
language = {English},
number = {4},
pages = {128--135},
pmid = {10322466},
title = {Catastrophic Forgetting in Connectionist Networks},
url = {https://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(99)01294-2},
volume = {3},
year = {1999}
}
N.A.
N.A.
1999How Does a Brain Build a Cognitive Code? by and Stephen Grossberg. Psychological Review, 1–51, 1980.
@article{grossberg1980,
address = {US},
author = {Grossberg, Stephen},
doi = {10.1037/0033-295X.87.1.1},
issn = {1939-1471(Electronic),0033-295X(Print)},
journal = {Psychological Review},
keywords = {Attention,Cognitive Processes,Electrical Activity,Expectations,Human Information Storage,Neurophysiology},
note = {It introduces the stability-plasticity dilemma related to the catastrophic forgetting.},
number = {1},
pages = {1--51},
publisher = {American Psychological Association},
title = {How Does a Brain Build a Cognitive Code?},
url = {https://psycnet.apa.org/record/1980-06768-001},
volume = {87},
year = {1980}
}
Discusses how competition between afferent data and learned feedback expectancies can stabilize a developing code by buffering committed populations of detectors against continual erosion by new environmental demands. The gating phenomena that result lead to dynamically maintained critical periods and to attentional phenomena such as overshadowing in the adult. The functional unit of cognitive coding is suggested to be an adaptive resonance, or amplification and prolongation of neural activity, that occurs when afferent data and efferent expectancies reach consensus through a matching process. The resonant state embodies the perceptual event, and its amplified and sustained activities are capable of driving slow changes of long-term memory. These mechanisms help to explain and predict (a) positive and negative aftereffects, the McCollough effect, spatial frequency adaptation, monocular rivalry, binocular rivalry and hysteresis, pattern completion, and Gestalt switching; (b) analgesia, partial reinforcement acquisition effect, conditioned reinforcers, underaroused vs overaroused depression; (c) the contingent negative variation, P300, and pontogeniculo-occipital waves; and (d) olfactory coding, corticogeniculate feedback, matching of proprioceptive and terminal motor maps, and cerebral dominance. (125 ref) (PsycINFO Database Record (c) 2016 APA, all rights reserved)
N.A.
1980
Classics¶
8 papers
In this section you’ll find pioneering and classic continual learning papers. We recommend to read all the papers in this section for a good background on current continual deep learning developments.
The Organization of Behavior: A Neuropsychological Theory by and D O Hebb. Lawrence Erlbaum, 2002. [hebbian]
@book{hebb2002,
author = {Hebb, D O},
isbn = {978-1-135-63191-8},
journal = {Lawrence Erlbaum},
keywords = {[hebbian],Psychology / Cognitive Psychology & Cognition,Psychology / General,Psychology / Neuropsychology,Psychology / Physiological Psychology},
language = {en},
publisher = {Psychology Press},
shorttitle = {The Organization of Behavior},
title = {The Organization of Behavior: A Neuropsychological Theory},
url = {https://www.amazon.com/Organization-Behavior-Neuropsychological-Theory/dp/0805843000 https://books.google.it/books/about/The_Organization_of_Behavior.html?id=ddB4AgAAQBAJ&printsec=frontcover&source=kp_read_button&redir_esc=y#v=onepage&q&f=false},
year = {2002}
}
Since its publication in 1949, D.O. Hebb's, The Organization of Behavior has been one of the most influential books in the fields of psychology and neuroscience. However, the original edition has been unavailable since 1966, ensuring that Hebb's comment that a classic normally means "cited but not read" is true in his case. This new edition rectifies a long-standing problem for behavioral neuroscientists– the inability to obtain one of the most cited publications in the field. The Organization of Behavior played a significant part in stimulating the investigation of the neural foundations of behavior and continues to be inspiring because it provides a general framework for relating behavior to synaptic organization through the dynamics of neural networks. D.O. Hebb was also the first to examine the mechanisms by which environment and experience can influence brain structure and function, and his ideas formed the basis for work on enriched environments as stimulants for behavioral development. References to Hebb, the Hebbian cell assembly, the Hebb synapse, and the Hebb rule increase each year. These forceful ideas of 1949 are now applied in engineering, robotics, and computer science, as well as neurophysiology, neuroscience, and psychology– a tribute to Hebb's foresight in developing a foundational neuropsychological theory of the organization of behavior.
N.A.
2002Pseudo-Recurrent Connectionist Networks: An Approach to the ‘Sensitivity-Stability’ Dilemma by and Robert French. Connection Science, 353–380, 1997. [dual]
@article{french1997,
author = {French, Robert},
doi = {10.1080/095400997116595},
issn = {0954-0091, 1360-0494},
journal = {Connection Science},
keywords = {[dual],Catastrophic Interference,dilemma,Dual Memory,Keywords: Pseudopatterns,plasticity,Semi-distributed Representations,Sensitivity-stability Transfer,stability},
language = {en},
note = {In this seminal paper the author introduces many different forms of rehearsal in order to mitigate the catastrophic forgetting phenomenon},
number = {4},
pages = {353--380},
shorttitle = {Pseudo-Recurrent Connectionist Networks},
title = {Pseudo-Recurrent Connectionist Networks: An Approach to the 'Sensitivity-Stability' Dilemma},
url = {http://www.tandfonline.com/doi/abs/10.1080/095400997116595},
volume = {9},
year = {1997}
}
In order to solve the ``sensitivity-stability'' problem — and its immediate correlate, the problem of sequential learning — it is crucial to develop connectionist architectures that are simultaneously sensitive to, but not excessively disrupted by, new input. French (1992) suggested that to alleviate a particularly severe form of this disruption, catastrophic forgetting, it was necessary for networks to dynamically separate their internal representations during learning. McClelland, McNaughton, & O'Reilly (1995) went even further. They suggested that nature's way of implementing this obligatory separation was the evolution of two separate areas of the brain, the hippocampus and the neocortex. In keeping with this idea of radical separation, a ``pseudo-recurrent'' memory model is presented here that partitions a connectionist network into two functionally distinct, but continually interacting areas. One area serves as a final-storage area for representations; the other is an early-processing area where new representations are first learned by the system. The final-storage area continually supplies internally generated patterns (pseudopatterns, Robins (1995)), which are approximations of its content, to the early-processing area, where they are interleaved with the new patterns to be learned. Transfer of the new learning is done either by weight-copying from the early-processing area to the final-storage area or by pseudopattern transfer. A number of experiments are presented that demonstrate the effectiveness of this approach, allowing, in particular, effective sequential learning with gradual forgetting in the presence of new input. Finally, it is shown that the two interacting areas automatically produce representational compaction and it is suggested that similar representational streamlining may exist in the brain.
N.A.
1997CHILD: A First Step Towards Continual Learning by and Mark B Ring. Machine Learning, 77–104, 1997.
@article{ring1997,
author = {Ring, Mark B},
doi = {10.1023/A:1007331723572},
issn = {1573-0565},
journal = {Machine Learning},
keywords = {cl,continual learner,Continual learning,definition,hierarchical neural networks,reinforcement learning,sequence learning,transfer},
language = {en},
number = {1},
pages = {77--104},
shorttitle = {CHILD},
title = {CHILD: A First Step Towards Continual Learning},
url = {https://doi.org/10.1023/A:1007331723572},
volume = {28},
year = {1997}
}
Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.
N.A.
1997Is Learning The N-Th Thing Any Easier Than Learning The First? by and Sebastian Thrun. Advances in Neural Information Processing Systems 8, 640–646, 1996. [vision]
@inproceedings{thrun1996a,
author = {Thrun, Sebastian},
booktitle = {Advances in Neural Information Processing Systems 8},
editor = {Touretzky, D S and Mozer, M C and Hasselmo, M E},
keywords = {[vision],lifelong,lifelong learning},
pages = {640--646},
publisher = {MIT Press},
title = {Is Learning The N-Th Thing Any Easier Than Learning The First?},
url = {http://papers.nips.cc/paper/1034-is-learning-the-n-th-thing-any-easier-than-learning-the-first.pdf},
year = {1996}
}
N.A.
N.A.
1996Learning in the Presence of Concept Drift and Hidden Contexts by Gerhard Widmer and Miroslav Kubat. Machine Learning, 69–101, 1996.
@article{widmer1996,
author = {Widmer, Gerhard and Kubat, Miroslav},
doi = {10.1007/BF00116900},
issn = {0885-6125},
journal = {Machine Learning},
language = {en},
number = {1},
pages = {69--101},
title = {Learning in the Presence of Concept Drift and Hidden Contexts},
url = {https://doi.org/10.1007/BF00116900 http://link.springer.com/10.1007/BF00116900},
volume = {23},
year = {1996}
}
On-line learning in domains where the target concept depends on some hidden context poses serious problems. A changing context can induce changes in the target concepts, producing what is known as concept drift. We describe a family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear. The general approach underlying all these algorithms consists of (1) keeping only a window of currently trusted examples and hypotheses; (2) storing concept descriptions and reusing them when a previous context re-appears; and (3) controlling both of these functions by a heuristic that constantly monitors the system's behavior. The paper reports on experiments that test the systems' perfomance under various conditions such as different levels of noise and different extent and rate of concept drift.
N.A.
1996Using Semi-Distributed Representations to Overcome Catastrophic Forgetting in Connectionist Networks by and Robert French. In Proceedings of the 13th Annual Cognitive Science Society Conference, 173–178, 1991. [sparsity]
@inproceedings{french1991,
author = {French, Robert},
booktitle = {In Proceedings of the 13th Annual Cognitive Science Society Conference},
keywords = {[sparsity],activation sharpening},
pages = {173--178},
publisher = {Erlbaum},
title = {Using Semi-Distributed Representations to Overcome Catastrophic Forgetting in Connectionist Networks},
url = {https://www.aaai.org/Papers/Symposia/Spring/1993/SS-93-06/SS93-06-007.pdf},
year = {1991}
}
In connectionist networks, newly-learned information destroys previously-learned information unless the network is continually retrained on the old information. This behavior, known as catastrophic forgetting, is unacceptable both for practical purposes and as a model of mind. This paper advances the claim that catastrophic forgetting is a direct consequence of the overlap of the system's distributed representations and can be reduced by reducing this overlap. A simple algorithm is presented that allows a standard feedforward backpropagation network to develop semi-distributed representations, thereby significantly reducing the problem of catastrophic forgetting. 1 Introduction Catastrophic forgetting is the inability of a neural network to retain old information in the presence of new. New information destroys old unless the old information is continually relearned by the net. McCloskey & Cohen [1990] and Ratcliff [1989] have demonstrated that this is a serious problem with c...
N.A.
1991The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network by Gail A. Carpenter and Stephen Grossberg. Computer, 77–88, 1988.
@article{carpenter1988,
author = {Carpenter, Gail A. and Grossberg, Stephen},
doi = {10.1109/2.33},
issn = {00189162},
journal = {Computer},
note = {Seminal paper on the stability-plasticity dilemma.},
number = {3},
pages = {77--88},
title = {The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network},
url = {https://ieeexplore.ieee.org/document/33},
volume = {21},
year = {1988}
}
The adaptive resonance theory (ART) suggests a solution to the stability-plasticity dilemma facing designers of learning systems, namely how to design a learning system that will remain plastic, or adaptive, in response to significant events and yet remain stable in response to irrelevant events. ART architectures are discussed that are neural networks that self-organize stable recognition codes in real time in response to arbitrary sequences of input patterns. Within such an ART architecture, the process of adaptive pattern recognition is a special case of the more general cognitive process of hypothesis discovery, testing, search, classification, and learning. This property opens up the possibility of applying ART systems to more general problems of adaptively processing large abstract information sources and databases. The main computational properties of these ART architectures are outlined and contrasted with those of alternative learning and recognition systems.\ textless \ textgreater
N.A.
1988How Does a Brain Build a Cognitive Code? by and Stephen Grossberg. Psychological Review, 1–51, 1980.
@article{grossberg1980,
address = {US},
author = {Grossberg, Stephen},
doi = {10.1037/0033-295X.87.1.1},
issn = {1939-1471(Electronic),0033-295X(Print)},
journal = {Psychological Review},
keywords = {Attention,Cognitive Processes,Electrical Activity,Expectations,Human Information Storage,Neurophysiology},
note = {It introduces the stability-plasticity dilemma related to the catastrophic forgetting.},
number = {1},
pages = {1--51},
publisher = {American Psychological Association},
title = {How Does a Brain Build a Cognitive Code?},
url = {https://psycnet.apa.org/record/1980-06768-001},
volume = {87},
year = {1980}
}
Discusses how competition between afferent data and learned feedback expectancies can stabilize a developing code by buffering committed populations of detectors against continual erosion by new environmental demands. The gating phenomena that result lead to dynamically maintained critical periods and to attentional phenomena such as overshadowing in the adult. The functional unit of cognitive coding is suggested to be an adaptive resonance, or amplification and prolongation of neural activity, that occurs when afferent data and efferent expectancies reach consensus through a matching process. The resonant state embodies the perceptual event, and its amplified and sustained activities are capable of driving slow changes of long-term memory. These mechanisms help to explain and predict (a) positive and negative aftereffects, the McCollough effect, spatial frequency adaptation, monocular rivalry, binocular rivalry and hysteresis, pattern completion, and Gestalt switching; (b) analgesia, partial reinforcement acquisition effect, conditioned reinforcers, underaroused vs overaroused depression; (c) the contingent negative variation, P300, and pontogeniculo-occipital waves; and (d) olfactory coding, corticogeniculate feedback, matching of proprioceptive and terminal motor maps, and cerebral dominance. (125 ref) (PsycINFO Database Record (c) 2016 APA, all rights reserved)
N.A.
1980
Continual Few Shot Learning¶
7 papers
Here we list the papers related to Few-Shot continual and incremental learning.
Defining Benchmarks for Continual Few-Shot Learning by Antreas Antoniou, Massimiliano Patacchiola, Mateusz Ochal and Amos Storkey. arXiv, 2020. [imagenet]
@article{antoniou2020,
annotation = {_eprint: 2004.11967},
author = {Antoniou, Antreas and Patacchiola, Massimiliano and Ochal, Mateusz and Storkey, Amos},
journal = {arXiv},
keywords = {[imagenet]},
title = {Defining Benchmarks for Continual Few-Shot Learning},
url = {http://arxiv.org/abs/2004.11967},
year = {2020}
}
Both few-shot and continual learning have seen substantial progress in the last years due to the introduction of proper benchmarks. That being said, the field has still to frame a suite of benchmarks for the highly desirable setting of continual few-shot learning, where the learner is presented a number of few-shot tasks, one after the other, and then asked to perform well on a validation set stemming from all previously seen tasks. Continual few-shot learning has a small computational footprint and is thus an excellent setting for efficient investigation and experimentation. In this paper we first define a theoretical framework for continual few-shot learning, taking into account recent literature, then we propose a range of flexible benchmarks that unify the evaluation criteria and allows exploring the problem from multiple perspectives. As part of the benchmark, we introduce a compact variant of ImageNet, called SlimageNet64, which retains all original 1000 classes but only contains 200 instances of each one (a total of 200K data-points) downscaled to 64 x 64 pixels. We provide baselines for the proposed benchmarks using a number of popular few-shot learning algorithms, as a result, exposing previously unknown strengths and weaknesses of those algorithms in continual and data-limited settings.
N.A.
2020Tell Me What This Is: Few-Shot Incremental Object Learning by a Robot by Ali Ayub and Alan R. Wagner. arXiv, 2020.
@article{ayub2020b,
annotation = {_eprint: 2008.00819},
author = {Ayub, Ali and Wagner, Alan R.},
journal = {arXiv},
keywords = {catastrophic forgetting,continual learning,few-shot incremenatl learning,robotics},
title = {Tell Me What This Is: Few-Shot Incremental Object Learning by a Robot},
url = {http://arxiv.org/abs/2008.00819},
year = {2020}
}
For many applications, robots will need to be incrementally trained to recognize the specific objects needed for an application. This paper presents a practical system for incrementally training a robot to recognize different object categories using only a small set of visual examples provided by a human. The paper uses a recently developed state-of-the-art method for few-shot incremental learning of objects. After learning the object classes incrementally, the robot performs a table cleaning task organizing objects into categories specified by the human. We also demonstrate the system's ability to learn arrangements of objects and predict missing or incorrectly placed objects. Experimental evaluations demonstrate that our approach achieves nearly the same performance as a system trained with all examples at one time (batch training), which constitutes a theoretical upper bound.
N.A.
2020La-MAML: Look-Ahead Meta Learning for Continual Learning by Gunshi Gupta, Karmesh Yadav and Liam Paull. arXiv, 2020.
@article{gupta2020a,
annotation = {_eprint: 2007.13904},
author = {Gupta, Gunshi and Yadav, Karmesh and Paull, Liam},
journal = {arXiv},
title = {La-MAML: Look-Ahead Meta Learning for Continual Learning},
url = {https://arxiv.org/abs/2007.13904},
year = {2020}
}
The continual learning problem involves training models with limited capacity to perform well on a set of an unknown number of sequentially arriving tasks. While meta-learning shows great potential for reducing interference between old and new tasks, the current training procedures tend to be either slow or offline, and sensitive to many hyper-parameters. In this work, we propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory. Our proposed modulation of per-parameter learning rates in our meta-learning update allows us to draw connections to prior work on hypergradients and meta-descent. This provides a more flexible and efficient way to mitigate catastrophic forgetting compared to conventional prior-based methods. La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
N.A.
2020iTAML: An Incremental Task-Agnostic Meta-Learning Approach by Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan and Mubarak Shah. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13588—13597, 2020. [cifar] [imagenet]
@inproceedings{rajasegaran2020,
author = {Rajasegaran, Jathushan and Khan, Salman and Hayat, Munawar and Khan, Fahad Shahbaz and Shah, Mubarak},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
keywords = {[cifar],[imagenet]},
pages = {13588---13597},
title = {iTAML: An Incremental Task-Agnostic Meta-Learning Approach},
url = {https://openaccess.thecvf.com/content_CVPR_2020/html/Rajasegaran_iTAML_An_Incremental_Task-Agnostic_Meta-learning_Approach_CVPR_2020_paper.html},
year = {2020}
}
Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks can quickly fade out when they are trained on a new task. In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks. In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks. This is ensured by a new meta-update rule which avoids catastrophic forgetting. In comparison to previous meta-learning techniques, our approach is task-agnostic. When presented with a continuum of data, our model automatically identifies the task and quickly adapts to it with just a single update. We perform extensive experiments on five datasets in a class-incremental setting, leading to significant improvements over the state of the art methods (e.g., a 21.3% boost on CIFAR100 with 10 incremental tasks). Specifically, on large-scale datasets that generally prove difficult cases for incremental learning, our approach delivers absolute gains as high as 19.1% and 7.4% on ImageNet and MS-Celeb datasets, respectively.
N.A.
2020Wandering within a World: Online Contextualized Few-Shot Learning by Mengye Ren, Michael L Iuzzolino, Michael C Mozer and Richard S Zemel. arXiv, 2020. [omniglot]
@article{ren2020,
annotation = {_eprint: 2007.04546},
author = {Ren, Mengye and Iuzzolino, Michael L and Mozer, Michael C and Zemel, Richard S},
journal = {arXiv},
keywords = {[omniglot]},
title = {Wandering within a World: Online Contextualized Few-Shot Learning},
url = {https://arxiv.org/abs/2007.04546},
year = {2020}
}
We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in real world, where the presence of spatiotemporal context helps us retrieve learned skills in the past, our online few-shot learning setting also features an underlying context that changes throughout time. Object classes are correlated within a context and inferring the correct context can lead to better performance. Building upon this setting, we propose a new few-shot learning dataset based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world. Furthermore, we convert popular few-shot learning approaches into online versions and we also propose a new model named contextual prototypical memory that can make use of spatiotemporal contextual information from the recent past.
N.A.
2020Few-Shot Class-Incremental Learning by X. Tao, Hong X., X. Chang, S. Dong, X. Wei and Y. Gong. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [cifar]
@inproceedings{tao2020,
author = {Tao, X. and X., Hong and Chang, X. and Dong, S. and Wei, X. and Gong, Y.},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
keywords = {[cifar]},
title = {Few-Shot Class-Incremental Learning},
url = {https://arxiv.org/abs/2004.10956},
year = {2020}
}
The ability to incrementally learn new classes is crucial to the development of real-world artificial intelligence systems. In this paper, we focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem. FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones. To address this problem, we represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes. On this basis, we propose the TOpology-Preserving knowledge InCrementer (TOPIC) framework. TOPIC mitigates the forgetting of the old classes by stabilizing NG's topology and improves the representation learning for few-shot new classes by growing and adapting NG to new training samples. Comprehensive experimental results demonstrate that our proposed method significantly outperforms other state-of-the-art class-incremental learning methods on CIFAR100, miniImageNet, and CUB200 datasets.
N.A.
2020Few-Shot Class-Incremental Learning via Feature Space Composition by H. Zhao, Y. Fu, X. Li, S. Li, B. Omar and X. Li. arXiv, 2020. [cifar] [cubs]
@article{zhao2020,
author = {Zhao, H. and Fu, Y. and Li, X. and Li, S. and Omar, B. and Li, X.},
journal = {arXiv},
keywords = {[cifar],[cubs]},
title = {Few-Shot Class-Incremental Learning via Feature Space Composition},
url = {https://arxiv.org/abs/2006.15524},
year = {2020}
}
As a challenging problem in machine learning, few-shot class-incremental learning asynchronously learns a sequence of tasks, acquiring the new knowledge from new tasks (with limited new samples) while keeping the learned knowledge from previous tasks (with old samples discarded). In general, existing approaches resort to one unified feature space for balancing old-knowledge preserving and new-knowledge adaptation. With a limited embedding capacity of feature representation, the unified feature space often makes the learner suffer from semantic drift or overfitting as the number of tasks increases. With this motivation, we propose a novel few-shot class-incremental learning pipeline based on a composite representation space, which makes old-knowledge preserving and new-knowledge adaptation mutually compatible by feature space composition (enlarging the embedding capacity). The composite representation space is generated by integrating two space components (i.e. stable base knowledge space and dynamic lifelong-learning knowledge space) in terms of distance metric construction. With the composite feature space, our method performs remarkably well on the CUB200 and CIFAR100 datasets, outperforming the state-of-the-art algorithms by 10.58% and 14.65% respectively.
N.A.
2020
Continual Meta Learning¶
4 papers
In this section we list all the papers related to the continual meta-learning.
Online Fast Adaptation and Knowledge Accumulation: A New Approach to Continual Learning by Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexande Lacoste, David Vazquez and Laurent Charlin. arXiv, 2020. [fashion] [framework] [mnist]
@article{caccia2020,
annotation = {_eprint: 2003.05856},
author = {Caccia, Massimo and Rodriguez, Pau and Ostapenko, Oleksiy and Normandin, Fabrice and Lin, Min and Caccia, Lucas and Laradji, Issam and Rish, Irina and Lacoste, Alexande and Vazquez, David and Charlin, Laurent},
journal = {arXiv},
keywords = {[fashion],[framework],[mnist],Computer Science - Artificial Intelligence,Computer Science - Machine Learning,continual meta learning,framework,MAML,meta continual learning,OSAKA},
note = {arXiv: 2003.05856},
title = {Online Fast Adaptation and Knowledge Accumulation: A New Approach to Continual Learning},
url = {http://arxiv.org/abs/2003.05856},
year = {2020}
}
Learning from non-stationary data remains a great challenge for machine learning. Continual learning addresses this problem in scenarios where the learning agent faces a stream of changing tasks. In these scenarios, the agent is expected to retain its highest performance on previous tasks without revisiting them while adapting well to the new tasks. Two new recent continual-learning scenarios have been proposed. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting when trained on a sequence of tasks. In continual-meta learning, the goal is faster remembering, i.e., focusing on how quickly the agent recovers performance rather than measuring the agent's performance without any adaptation. Both scenarios have the potential to propel the field forward. Yet in their original formulations, they each have limitations. As a remedy, we propose a more general scenario where an agent must quickly solve (new) out-of-distribution tasks, while also requiring fast remembering. We show that current continual learning, meta learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. Accordingly, we propose a strong baseline: Continual-MAML, an online extension of the popular MAML algorithm. In our empirical experiments, we show that our method is better suited to the new scenario than the methodologies mentioned above, as well as standard continual learning and meta learning approaches.
N.A.
2020Continuous Meta-Learning without Tasks by James Harrison, Apoorva Sharma, Chelsea Finn and Marco Pavone. arXiv, 2019. [imagenet] [mnist]
@article{harrison2019,
annotation = {_eprint: 1912.08866},
author = {Harrison, James and Sharma, Apoorva and Finn, Chelsea and Pavone, Marco},
journal = {arXiv},
keywords = {[imagenet],[mnist]},
title = {Continuous Meta-Learning without Tasks},
url = {https://arxiv.org/abs/1912.08866},
year = {2019}
}
Meta-learning is a promising strategy for learning to efficiently learn within new tasks, using data gathered from a distribution of tasks. However, the meta-learning literature thus far has focused on the task segmented setting, where at train-time, offline data is assumed to be split according to the underlying task, and at test-time, the algorithms are optimized to learn in a single task. In this work, we enable the application of generic meta-learning algorithms to settings where this task segmentation is unavailable, such as continual online learning with a time-varying task. We present meta-learning via online changepoint analysis (MOCA), an approach which augments a meta-learning algorithm with a differentiable Bayesian changepoint detection scheme. The framework allows both training and testing directly on time series data without segmenting it into discrete tasks. We demonstrate the utility of this approach on a nonlinear meta-regression benchmark as well as two meta-image-classification benchmarks.
N.A.
2019Task Agnostic Continual Learning via Meta Learning by Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A Rusu, Yee Whye Teh and Razvan Pascanu. arXiv:1906.05201 [cs, stat], 2019. [mnist]
@book{he2019,
archiveprefix = {arXiv},
author = {He, Xu and Sygnowski, Jakub and Galashov, Alexandre and Rusu, Andrei A and Teh, Yee Whye and Pascanu, Razvan},
eprint = {1906.05201},
eprinttype = {arxiv},
journal = {arXiv:1906.05201 [cs, stat]},
keywords = {[mnist],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Statistics - Machine Learning},
note = {arXiv: 1906.05201
\par
arXiv: 1906.05201},
primaryclass = {cs, stat},
title = {Task Agnostic Continual Learning via Meta Learning},
url = {http://arxiv.org/abs/1906.05201},
year = {2019}
}
While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering – i.e measuring how quickly the network recovers performance rather than measuring the network's performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.
N.A.
2019Reconciling Meta-Learning and Continual Learning with Online Mixtures of Tasks by Ghassen Jerfel, Erin Grant, Tom Griffiths and Katherine A Heller. Advances in Neural Information Processing Systems, 9122–9133, 2019. [bayes] [vision]
@inproceedings{jerfel2019,
author = {Jerfel, Ghassen and Grant, Erin and Griffiths, Tom and Heller, Katherine A},
booktitle = {Advances in Neural Information Processing Systems},
keywords = {[bayes],[vision]},
pages = {9122--9133},
title = {Reconciling Meta-Learning and Continual Learning with Online Mixtures of Tasks},
url = {http://papers.nips.cc/paper/9112-reconciling-meta-learning-and-continual-learning-with-online-mixtures-of-tasks},
year = {2019}
}
Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks.
N.A.
2019
Continual Reinforcement Learning¶
19 papers
In this section we list all the papers related to the continual Reinforcement Learning.
Reducing Catastrophic Forgetting When Evolving Neural Networks by and Joseph Early. arXiv, 2019.
@article{early2019,
annotation = {_eprint: 1904.03178},
author = {Early, Joseph},
journal = {arXiv},
title = {Reducing Catastrophic Forgetting When Evolving Neural Networks},
url = {http://arxiv.org/abs/1904.03178},
year = {2019}
}
A key stepping stone in the development of an artificial general intelligence (a machine that can perform any task), is the production of agents that can perform multiple tasks at once instead of just one. Unfortunately, canonical methods are very prone to catastrophic forgetting (CF) - the act of overwriting previous knowledge about a task when learning a new task. Recent efforts have developed techniques for overcoming CF in learning systems, but no attempt has been made to apply these new techniques to evolutionary systems. This research presents a novel technique, weight protection, for reducing CF in evolutionary systems by adapting a method from learning systems. It is used in conjunction with other evolutionary approaches for overcoming CF and is shown to be effective at alleviating CF when applied to a suite of reinforcement learning tasks. It is speculated that this work could indicate the potential for a wider application of existing learning-based approaches to evolutionary systems and that evolutionary techniques may be competitive with or better than learning systems when it comes to reducing CF.
N.A.
2019A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning by Francisco M Garcia and Philip S Thomas. NeurIPS, 5691–5700, 2019.
@inproceedings{garcia2019,
author = {Garcia, Francisco M and Thomas, Philip S},
booktitle = {NeurIPS},
pages = {5691--5700},
title = {A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning},
url = {https://papers.nips.cc/paper/8806-a-meta-mdp-approach-to-exploration-for-lifelong-reinforcement-learning.pdf},
year = {2019}
}
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework.
N.A.
2019Policy Consolidation for Continual Reinforcement Learning by Christos Kaplanis, Murray Shanahan and Claudia Clopath. ICML, 2019.
@article{kaplanis2019,
annotation = {_eprint: 1902.00255},
author = {Kaplanis, Christos and Shanahan, Murray and Clopath, Claudia},
journal = {ICML},
title = {Policy Consolidation for Continual Reinforcement Learning},
url = {http://arxiv.org/abs/1902.00255},
year = {2019}
}
We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is \$\ backslash\$textit\agnostic\ to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries, and can adapt in \$\ backslash\$textito̧ntinuously\ changing environments. In our \$\ backslash\$textit\policy consolidation\ model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent's policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.
N.A.
2019Continual Learning Exploiting Structure of Fractal Reservoir Computing by Taisuke Kobayashi and Toshiki Sugino. Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions, 35–47, 2019. [rnn]
@inproceedings{kobayashi2019,
address = {Cham},
author = {Kobayashi, Taisuke and Sugino, Toshiki},
booktitle = {Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions},
doi = {10.1007/978-3-030-30493-5_4},
editor = {Tetko, Igor V and Kůrková, Věra and Karpov, Pavel and Theis, Fabian},
isbn = {978-3-030-30492-8 978-3-030-30493-5},
keywords = {[rnn],fractals,rc,reinforcement,reservoir computing},
language = {en},
note = {A reservoir computing approach with Echo State Networks is implemented in order to learn multiple tasks in reinforcement learning environments.},
pages = {35--47},
publisher = {Springer International Publishing},
title = {Continual Learning Exploiting Structure of Fractal Reservoir Computing},
url = {http://link.springer.com/10.1007/978-3-030-30493-5_4},
volume = {11731},
year = {2019}
}
Neural network has a critical problem, called catastrophic forgetting, where memories for tasks already learned are easily overwritten with memories for a task additionally learned. This problem interferes with continual learning required for autonomous robots, which learn many tasks incrementally from daily activities. To mitigate the catastrophic forgetting, it is important for especially reservoir computing to clarify which neurons should be fired corresponding to each task, since only readout weights are updated according to the degree of firing of neurons. We therefore propose the way to design reservoir computing such that the firing neurons are clearly distinguished from others according to the task to be performed. As a key design feature, we employ fractal network, which has modularity and scalability, to be reservoir layer. In particular, its modularity is fully utilized by designing input layer. As a result, simulations of control tasks using reinforcement learning show that our design mitigates the catastrophic forgetting even when random actions from reinforcement learning prompt parameters to be overwritten. Furthermore, learning multiple tasks with a single network suggests that knowledge for the other tasks can facilitate to learn a new task, unlike the case using completely different networks.
N.A.
2019Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL by Anusha Nagabandi, Chelsea Finn and Sergey Levine. 7th International Conference on Learning Representations, ICLR 2019, 2019.
@article{nagabandi2019,
annotation = {_eprint: 1812.07671},
author = {Nagabandi, Anusha and Finn, Chelsea and Levine, Sergey},
journal = {7th International Conference on Learning Representations, ICLR 2019},
title = {Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL},
url = {https://arxiv.org/abs/1812.07671},
year = {2019}
}
Humans and animals can learn complex predictive models that allow them to accurately and reliably reason about real-world phenomena, and they can adapt such models extremely quickly in the face of unexpected changes. Deep neural network models allow us to represent very complex functions, but lack this capacity for rapid online adaptation. The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. We formulate an online learning procedure that uses stochastic gradient descent to update model parameters, and an expectation maximization algorithm with a Chinese restaurant process prior to develop and maintain a mixture of models to handle non-stationary task distributions. This allows for all models to be adapted as necessary, with new models instantiated for task changes and old models recalled when previously seen tasks are encountered again. Furthermore, we observe that meta-learning can be used to meta-train a model such that this direct online adaptation with SGD is effective, which is otherwise not the case for large function approximators. In this work, we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances. Videos available at: https://sites.google.com/Berkeley.edu/onlineviameta.
N.A.
2019Leaky Tiling Activations: A Simple Approach to Learning Sparse Representations Online by Yangchen Pan, Kirby Banman and Martha White. arXiv, 2019. [sparsity]
@article{pan2019,
annotation = {_eprint: 1911.08068},
author = {Pan, Yangchen and Banman, Kirby and White, Martha},
journal = {arXiv},
keywords = {[sparsity]},
title = {Leaky Tiling Activations: A Simple Approach to Learning Sparse Representations Online},
url = {http://arxiv.org/abs/1911.08068},
year = {2019}
}
Interference is a known problem when learning in online settings, such as continual learning or reinforcement learning. Interference occurs when updates, to improve performance for some inputs, degrades performance for others. Recent work has shown that sparse representations— where only a small percentage of units are active— can significantly reduce interference. Those works, however, relied on relatively complex regularization or meta-learning approaches, that have only been used offline in a pre-training phase. In our approach, we design an activation function that naturally produces sparse representations, and so is much more amenable to online training. The idea relies on the simple approach of binning, but overcomes the two key limitations of binning: zero gradients for the flat regions almost everywhere, and lost precision— reduced discrimination— due to coarse aggregation. We introduce a Leaky Tiling Activation (LTA) that provides non-negligible gradients and produces overlap between bins that improves discrimination. We empirically investigate both value-based and policy gradient reinforcement learning algorithms that use neural networks with LTAs, in classic discrete-action control environments and Mujoco continuous-action environments. We show that, with LTAs, learning is faster, with more stable policies, without needing target networks.
N.A.
2019Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference by Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu and Gerald Tesauro. ICLR, 2019. [mnist]
@inproceedings{riemer2019,
author = {Riemer, Matthew and Cases, Ignacio and Ajemian, Robert and Liu, Miao and Rish, Irina and Tu, Yuhai and Tesauro, Gerald},
booktitle = {ICLR},
keywords = {[mnist]},
title = {Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference},
url = {https://openreview.net/pdf?id=B1gTShAct7},
year = {2019}
}
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
N.A.
2019Experience Replay for Continual Learning by David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P Lillicrap and Greg Wayne. NeurIPS, 350–360, 2019.
@inproceedings{rolnick2019,
author = {Rolnick, David and Ahuja, Arun and Schwarz, Jonathan and Lillicrap, Timothy P and Wayne, Greg},
booktitle = {NeurIPS},
pages = {350--360},
title = {Experience Replay for Continual Learning},
url = {http://papers.nips.cc/paper/8327-experience-replay-for-continual-learning.pdf},
year = {2019}
}
Interacting with a complex world involves continual learning, in which tasks and data distributions change over time. A continual learning system should demonstrate both plasticity (acquisition of new knowledge) and stability (preservation of old knowledge). Catastrophic forgetting is the failure of stability, in which new experience overwrites previous experience. In the brain, replay of past experience is widely believed to reduce forgetting, yet it has been largely overlooked as a solution to forgetting in deep reinforcement learning. Here, we introduce CLEAR, a replay-based method that greatly reduces catastrophic forgetting in multi-task reinforcement learning. CLEAR leverages off-policy learning and behavioral cloning from replay to enhance stability, as well as on-policy learning to preserve plasticity. We show that CLEAR performs better than state-of-the-art deep learning techniques for mitigating forgetting, despite being significantly less complicated and not requiring any knowledge of the individual tasks being learned.
N.A.
2019Selective Experience Replay for Lifelong Learning by David Isele and Akansel Cosgun. Thirty-Second AAAI Conference on Artificial Intelligence, 3302–3309, 2018.
@article{isele2018,
annotation = {_eprint: 1802.10269},
author = {Isele, David and Cosgun, Akansel},
journal = {Thirty-Second AAAI Conference on Artificial Intelligence},
keywords = {Natural Language Processing and Machine Learning T},
pages = {3302--3309},
title = {Selective Experience Replay for Lifelong Learning},
url = {http://arxiv.org/abs/1802.10269},
year = {2018}
}
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.
N.A.
2018Continual Reinforcement Learning with Complex Synapses by Christos Kaplanis, Murray Shanahan and Claudia Clopath. ICML, 2018.
@inproceedings{kaplanis2018,
annotation = {_eprint: 1802.07239},
author = {Kaplanis, Christos and Shanahan, Murray and Clopath, Claudia},
booktitle = {ICML},
title = {Continual Reinforcement Learning with Complex Synapses},
url = {http://arxiv.org/abs/1802.07239},
year = {2018}
}
Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a complex network of interacting biochemical components that evolve at different timescales. In this paper, we show that by equipping tabular and deep reinforcement learning agents with a synaptic model that incorporates this biological complexity (Benna & Fusi, 2016), catastrophic forgetting can be mitigated at multiple timescales. In particular, we find that as well as enabling continual learning across sequential training of two simple tasks, it can also be used to overcome within-task forgetting by reducing the need for an experience replay database.
N.A.
2018Unicorn: Continual Learning with a Universal, Off-Policy Agent by Daniel J Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver and Tom Schaul. arXiv, 1–17, 2018.
@article{mankowitz2018,
annotation = {_eprint: 1802.08294},
author = {Mankowitz, Daniel J and Žídek, Augustin and Barreto, André and Horgan, Dan and Hessel, Matteo and Quan, John and Oh, Junhyuk and van Hasselt, Hado and Silver, David and Schaul, Tom},
journal = {arXiv},
pages = {1--17},
title = {Unicorn: Continual Learning with a Universal, Off-Policy Agent},
url = {http://arxiv.org/abs/1802.08294},
year = {2018}
}
Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.
N.A.
2018Lifelong Inverse Reinforcement Learning by Jorge A Mendez, Shashank Shivkumar and Eric Eaton. NeurIPS, 4502–4513, 2018.
@inproceedings{mendez2018,
author = {Mendez, Jorge A and Shivkumar, Shashank and Eaton, Eric},
booktitle = {NeurIPS},
pages = {4502--4513},
title = {Lifelong Inverse Reinforcement Learning},
url = {http://papers.nips.cc/paper/7702-lifelong-inverse-reinforcement-learning.pdf},
year = {2018}
}
Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via demonstration, this process would substantially burden the user if each task were learned in isolation. To address this challenge, we introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. As one solution to this problem, we propose the first lifelong learning approach to inverse reinforcement learning, which learns consecutive tasks via demonstration, continually transferring knowledge between tasks to improve performance.
N.A.
2018Progress & Compress: A Scalable Framework for Continual Learning by Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu and Raia Hadsell. International Conference on Machine Learning, 4528–4537, 2018. [vision]
@inproceedings{schwarz2018,
author = {Schwarz, Jonathan and Czarnecki, Wojciech and Luketina, Jelena and Grabska-Barwinska, Agnieszka and Teh, Yee Whye and Pascanu, Razvan and Hadsell, Raia},
booktitle = {International Conference on Machine Learning},
keywords = {[vision],ewc,normalized ewc,online ewc},
language = {en},
pages = {4528--4537},
shorttitle = {Progress & Compress},
title = {Progress & Compress: A Scalable Framework for Continual Learning},
url = {http://proceedings.mlr.press/v80/schwarz18a.html},
year = {2018}
}
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to ...
N.A.
2018Overcoming Catastrophic Forgetting in Neural Networks by James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran and Raia Hadsell. PNAS, 3521–3526, 2017. [mnist]
@article{kirkpatrick2017,
author = {Kirkpatrick, James and Pascanu, Razvan and Rabinowitz, Neil and Veness, Joel and Desjardins, Guillaume and Rusu, Andrei A and Milan, Kieran and Quan, John and Ramalho, Tiago and Grabska-Barwinska, Agnieszka and Hassabis, Demis and Clopath, Claudia and Kumaran, Dharshan and Hadsell, Raia},
journal = {PNAS},
keywords = {[mnist],annotated,Computer Science - Artificial Intelligence,Computer Science - Machine Learning,ewc,Statistics - Machine Learning},
note = {arXiv: 1612.00796
\par
arXiv: 1612.00796},
number = {13},
pages = {3521--3526},
title = {Overcoming Catastrophic Forgetting in Neural Networks},
url = {http://arxiv.org/abs/1612.00796},
volume = {114},
year = {2017}
}
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
N.A.
2017Stable Predictive Representations with General Value Functions for Continual Learning by Matthew Schlegel, Adam White and Martha White. Continual Learning and Deep Networks Workshop at the Neural Information Processing System Conference, 2017.
@inproceedings{schlegel2017,
author = {Schlegel, Matthew and White, Adam and White, Martha},
booktitle = {Continual Learning and Deep Networks Workshop at the Neural Information Processing System Conference},
title = {Stable Predictive Representations with General Value Functions for Continual Learning},
url = {https://sites.ualberta.ca/ amw8/cldl.pdf},
year = {2017}
}
The objective of continual learning is to build agents that continually learn about their world, building on prior learning. In this paper, we explore an approach to continual learning based on making and updating many predictions formalized as general value functions (GVFs). The idea behind GVFs is simple: if we can cast the task of representing predictive knowledge as a prediction of future reward, then computationally efficient policy evaluation methods from reinforcement learning can be used to learn a large collection of predictions while the agent interacts with the world. We explore this idea further by analyzing how GVF predictions can be used as predictive features, and introduce two algorithmic techniques to ensure the stability of continual prediction learning. We illustrate these ideas with a small experiment in the cycle world domain.
N.A.
2017Continual Learning through Evolvable Neural Turing Machines by Benno Luders, Mikkel Schlager and Sebastian Risi. NIPS 2016 Workshop on Continual Learning and Deep Networks, 2016.
@inproceedings{luders2016,
author = {Luders, Benno and Schlager, Mikkel and Risi, Sebastian},
booktitle = {NIPS 2016 Workshop on Continual Learning and Deep Networks},
title = {Continual Learning through Evolvable Neural Turing Machines},
url = {https://core.ac.uk/reader/84859350},
year = {2016}
}
Continual learning, i.e. the ability to sequentially learn tasks without catastrophicforgetting of previously learned ones, is an important open challenge in machinelearning. In this paper we take a step in this direction by showing that the recentlyproposedEvolving Neural Turing Machine(ENTM) approach is able to performone-shot learningin a reinforcement learning task without catastrophic forgettingof previously stored associations.
N.A.
2016Progressive Neural Networks by Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu and Raia Hadsell. arXiv, 2016. [mnist]
@article{rusu2016,
author = {Rusu, Andrei A and Rabinowitz, Neil C and Desjardins, Guillaume and Soyer, Hubert and Kirkpatrick, James and Kavukcuoglu, Koray and Pascanu, Razvan and Hadsell, Raia},
journal = {arXiv},
keywords = {[mnist],Computer Science - Machine Learning,lifelong learning,modular,progressive},
language = {en},
note = {The authors rely on a separate feedforward network (column) for each task the model is trained on. Each column is connected through adaptive connections to all the previous ones. The weights of previous columns are frozen once trained. At inference time, given a known task label, the network choose the appropriate column to produce the output, thus preventing forgetting by design.
\par
The authors rely on a separate feedforward network (column) for each task the model is trained on. Each column is connected through adaptive connections to all the previous ones. The weights of previous columns are frozen once trained. At inference time, given a known task label, the network choose the appropriate column to produce the output, thus preventing forgetting by design.},
title = {Progressive Neural Networks},
url = {http://arxiv.org/abs/1606.04671},
year = {2016}
}
Learning to solve complex sequences of tasks— while both leveraging transfer and avoiding catastrophic forgetting— remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
N.A.
2016Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets. by Lei Shu, Bing Liu, Hu Xu and Annice Kim. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 225–235, 2016. [nlp]
@inproceedings{shu2016,
annotation = {_eprint: 15334406},
author = {Shu, Lei and Liu, Bing and Xu, Hu and Kim, Annice},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing},
doi = {10.1038/nrg3575.Systems},
isbn = {978-1-4939-7371-2},
issn = {1527-5418},
keywords = {[nlp]},
pages = {225--235},
pmid = {29756130},
title = {Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets.},
url = {http://www.ncbi.nlm.nih.gov/pubmed/29756130 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5947972},
volume = {2016},
year = {2016}
}
It is well-known that opinions have targets. Extracting such targets is an important problem of opinion mining because without knowing the target of an opinion, the opinion is of limited use. So far many algorithms have been proposed to extract opinion targets. However, an opinion target can be an entity or an aspect (part or attribute) of an entity. An opinion about an entity is an opinion about the entity as a whole, while an opinion about an aspect is just an opinion about that specific attribute or aspect of an entity. Thus, opinion targets should be separated into entities and aspects before use because they represent very different things about opinions. This paper proposes a novel algorithm, called Lifelong-RL, to solve the problem based on lifelong machine learning and relaxation labeling. Extensive experiments show that the proposed algorithm Lifelong-RL outperforms baseline methods markedly.
N.A.
2016CHILD: A First Step Towards Continual Learning by and Mark B Ring. Machine Learning, 77–104, 1997. [rnn]
@article{ring1997,
author = {Ring, Mark B},
doi = {10.1023/A:1007331723572},
issn = {1573-0565},
journal = {Machine Learning},
keywords = {cl,continual learner,Continual learning,definition,hierarchical neural networks,reinforcement learning,sequence learning,transfer},
language = {en},
number = {1},
pages = {77--104},
shorttitle = {CHILD},
title = {CHILD: A First Step Towards Continual Learning},
url = {https://doi.org/10.1023/A:1007331723572},
volume = {28},
year = {1997}
}
Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.
N.A.
1997
Continual Sequential Learning¶
20 papers
Here we maintain a list of all the papers related to the continual learning at the intersection with sequential learning.
Continual Learning with Gated Incremental Memories for Sequential Data Processing by Andrea Cossu, Antonio Carta and Davide Bacciu. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020. [mnist] [rnn]
@inproceedings{cossu2020,
author = {Cossu, Andrea and Carta, Antonio and Bacciu, Davide},
booktitle = {Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020)},
keywords = {[mnist],[rnn],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Statistics - Machine Learning},
note = {An evaluation of RNNs (LSTM and LMN) inspired by Progressive networks, leading to the Gated Incremental Memory approach to overcome catastrophic forgetting.
\par
An evaluation of RNNs (LSTM and LMN) inspired by Progressive networks, leading to the Gated Incremental Memory approach to overcome catastrophic forgetting.},
title = {Continual Learning with Gated Incremental Memories for Sequential Data Processing},
url = {http://arxiv.org/abs/2004.04077},
year = {2020}
}
The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs.
N.A.
2020Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams by Matthias De Lange and Tinne Tuytelaars. arXiv, 2020. [cifar] [framework] [mnist] [vision]
@article{delange2020a,
annotation = {_eprint: 2009.00919},
author = {De Lange, Matthias and Tuytelaars, Tinne},
journal = {arXiv},
keywords = {[cifar],[framework],[mnist],[vision]},
title = {Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams},
url = {https://arxiv.org/abs/2009.00919},
year = {2020}
}
Attaining prototypical features to represent class distributions is well established in representation learning. However, learning prototypes online from streams of data proves a challenging endeavor as they rapidly become outdated, caused by an ever-changing parameter space in the learning process. Additionally, continual learning does not assume the data stream to be stationary, typically resulting in catastrophic forgetting of previous knowledge. As a first, we introduce a system addressing both problems, where prototypes evolve continually in a shared latent space, enabling learning and prediction at any point in time. In contrast to the major body of work in continual learning, data streams are processed in an online fashion, without additional task-information, and an efficient memory scheme provides robustness to imbalanced data streams. Besides nearest neighbor based prediction, learning is facilitated by a novel objective function, encouraging cluster density about the class prototype and increased inter-class variance. Furthermore, the latent space quality is elevated by pseudo-prototypes in each batch, constituted by replay of exemplars from memory. We generalize the existing paradigms in continual learning to incorporate data incremental learning from data streams by formalizing a two-agent learner-evaluator framework, and obtain state-of-the-art performance by a significant margin on eight benchmarks, including three highly imbalanced data streams.
N.A.
2020Organizing Recurrent Network Dynamics by Task-Computation to Enable Continual Learning by Lea Duncker, Laura N Driscoll, Krishna V Shenoy, Maneesh Sahani and David Sussillo. Advances in Neural Information Processing Systems, 2020. [rnn]
@inproceedings{duncker2020,
author = {Duncker, Lea and Driscoll, Laura N and Shenoy, Krishna V and Sahani, Maneesh and Sussillo, David},
booktitle = {Advances in Neural Information Processing Systems},
keywords = {[rnn]},
note = {The hidden state pre-activation and the input are projected during learning in a subspace orthogonal to all the ones of the previous tasks, if new task is dissimilar. Projection on orthogonal subspace avoid interference with previous knowledge.},
title = {Organizing Recurrent Network Dynamics by Task-Computation to Enable Continual Learning},
url = {https://proceedings.neurips.cc/paper/2020/file/a576eafbce762079f7d1f77fca1c5cc2-Paper.pdf},
volume = {33},
year = {2020}
}
Biological systems face dynamic environments that require continual learning. It is not well understood how these systems balance the tension between flexibility for learning and robustness for memory of previous behaviors. Continual learning without catastrophic interference also remains a challenging problem in machine learning. Here, we develop a novel learning rule designed to minimize interference between sequentially learned tasks in recurrent networks. Our learning rule preserves network dynamics within activity-defined subspaces used for previously learned tasks. It encourages dynamics associated with new tasks that might otherwise interfere to instead explore orthogonal subspaces, and it allows for reuse of previously established dynamical motifs where possible. Employing a set of tasks used in neuroscience, we demonstrate that our approach successfully eliminates catastrophic interference and offers a substantial improvement over previous continual learning algorithms. Using dynamical systems analysis, we show that networks trained using our approach can reuse similar dynamical structures across similar tasks. This possibility for shared computation allows for faster learning during sequential training. Finally, we identify organizational differences that emerge when training tasks sequentially versus simultaneously.
N.A.
2020Continual Learning in Recurrent Neural Networks by Benjamin Ehret, Christian Henning, Maria R Cervera, Alexander Meulemans, Johannes von Oswald and Benjamin F Grewe. arXiv, 2020. [audio] [rnn]
@article{ehret2020,
annotation = {_eprint: 2006.12109},
author = {Ehret, Benjamin and Henning, Christian and Cervera, Maria R and Meulemans, Alexander and von Oswald, Johannes and Grewe, Benjamin F},
journal = {arXiv},
keywords = {[audio],[rnn]},
title = {Continual Learning in Recurrent Neural Networks},
url = {http://arxiv.org/abs/2006.12109},
year = {2020}
}
While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.
N.A.
2020Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis by Tyler L Hayes and Christopher Kanan. CLVision Workshop at CVPR 2020, 1–15, 2020. [core50] [imagenet]
@inproceedings{hayes2020,
annotation = {_eprint: 1909.01520},
author = {Hayes, Tyler L and Kanan, Christopher},
booktitle = {CLVision Workshop at CVPR 2020},
keywords = {[core50],[imagenet],deep learning,streaming learning},
pages = {1--15},
title = {Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis},
url = {http://arxiv.org/abs/1909.01520},
year = {2020}
}
When a robot acquires new information, ideally it would immediately be capable of using that information to understand its environment. While deep neural networks are now widely used by robots for inferring semantic information, conventional neural networks suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. While a variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, in which an agent learns a large collection of labeled samples at once, streaming learning has been much less studied in the robotics and deep learning communities. In streaming learning, an agent learns instances one-by-one and can be tested at any time. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet-1K and CORe50.
N.A.
2020Meta-Consolidation for Continual Learning by K J Joseph and Vineeth N Balasubramanian. NeurIPS, 2020. [bayes] [cifar] [imagenet] [mnist]
@inproceedings{joseph2020,
annotation = {_eprint: 2010.00352},
author = {Joseph, K J and Balasubramanian, Vineeth N},
booktitle = {NeurIPS},
keywords = {[bayes],[cifar],[imagenet],[mnist]},
note = {The authors leverage a bayesian framework in which the parameters of a model are sampled from a generating distribution. This distribution, parameterized by a task label, is used together with a VAE to consolidate online previous and current knowledge. Inference does not require task labels and exploit an ensemble of model, sampled from the generating distribution.
\par
The authors leverage a bayesian framework in which the parameters of a model are sampled from a generating distribution. This distribution, parameterized by a task label, is used together with a VAE to consolidate online previous and current knowledge. Inference does not require task labels and exploit an ensemble of model, sampled from the generating distribution.},
title = {Meta-Consolidation for Continual Learning},
url = {http://arxiv.org/abs/2010.00352},
year = {2020}
}
The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a novel methodology for continual learning called MERLIN: Meta-Consolidation for Continual Learning. We assume that weights of a neural network \$\ backslashboldsymbol \ backslashpsi\$, for solving task \$\ backslashboldsymbol t\$, come from a meta-distribution \$p(\ backslashboldsymbol\\ backslashpsi|t\)\$. This meta-distribution is learned and consolidated incrementally. We operate in the challenging online continual learning setting, where a data point is seen by the model only once. Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini-ImageNet datasets show consistent improvement over five baselines, including a recent state-of-the-art, corroborating the promise of MERLIN.
N.A.
2020Continual Learning with Bayesian Neural Networks for Non-Stationary Data by Richard Kurle, Botond Cseke, Alexej Klushyn, Patrick van der Smagt and Stephan Günnemann. Eighth International Conference on Learning Representations, 2020. [bayes]
@inproceedings{kurle2020,
author = {Kurle, Richard and Cseke, Botond and Klushyn, Alexej and van der Smagt, Patrick and Günnemann, Stephan},
booktitle = {Eighth International Conference on Learning Representations},
keywords = {[bayes]},
language = {en},
title = {Continual Learning with Bayesian Neural Networks for Non-Stationary Data},
url = {https://iclr.cc/virtual_2020/poster_SJlsFpVtDB.html},
urldate = {2021-01-01},
year = {2020}
}
This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data.
N.A.
2020Compositional Language Continual Learning by Yuanpeng Li, Liang Zhao, Kenneth Church and Mohamed Elhoseiny. Eighth International Conference on Learning Representations, 2020. [nlp] [rnn]
@inproceedings{li2020b,
author = {Li, Yuanpeng and Zhao, Liang and Church, Kenneth and Elhoseiny, Mohamed},
booktitle = {Eighth International Conference on Learning Representations},
keywords = {[nlp],[rnn]},
language = {en},
title = {Compositional Language Continual Learning},
url = {https://iclr.cc/virtual_2020/poster_rklnDgHtDS.html},
urldate = {2021-01-01},
year = {2020}
}
Motivated by the human's ability to continually learn and gain knowledge over time, several research efforts have been pushing the limits of machines to constantly learn while alleviating catastrophic forgetting. Most of the existing methods have been focusing on continual learning of label prediction tasks, which have fixed input and output sizes. In this paper, we propose a new scenario of continual learning which handles sequence-to-sequence tasks common in language learning. We further propose an approach to use label prediction continual learning algorithm for sequence-to-sequence continual learning by leveraging compositionality. Experimental results show that the proposed method has significant improvement over state-of-the-art methods. It enables knowledge transfer and prevents catastrophic forgetting, resulting in more than 85% accuracy up to 100 stages, compared with less than 50% accuracy for baselines in instruction learning task. It also shows significant improvement in machine translation task. This is the first work to combine continual learning and compositionality for language learning, and we hope this work will make machines more helpful in various tasks.
N.A.
2020Online Continual Learning on Sequences by German I Parisi and Vincenzo Lomonaco. arXiv, 2020. [framework]
@article{parisi2020,
author = {Parisi, German I and Lomonaco, Vincenzo},
doi = {10.1007/978-3-030-43883-8_8},
journal = {arXiv},
keywords = {[framework],Computer Science - Computer Vision and Pattern Rec,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi},
note = {Comment: L. Oneto et al. (eds.), Recent Trends in Learning From Data, Studies in Computational Intelligence 896 arXiv: 2003.09114},
title = {Online Continual Learning on Sequences},
url = {http://arxiv.org/abs/2003.09114},
year = {2020}
}
Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial for agents and robots operating in changing environments and required to acquire, fine-tune, and transfer increasingly complex representations from non-i.i.d. input distributions. Machine learning models that address OCL must alleviate \$\ backslash\$textita̧tastrophic forgetting\ in which hidden representations are disrupted or completely overwritten when learning from streams of novel input. In this chapter, we summarize and discuss recent deep learning models that address OCL on sequential input through the use (and combination) of synaptic regularization, structural plasticity, and experience replay. Different implementations of replay have been proposed that alleviate catastrophic forgetting in connectionists architectures via the re-occurrence of (latent representations of) input sequences and that functionally resemble mechanisms of hippocampal replay in the mammalian brain. Empirical evidence shows that architectures endowed with experience replay typically outperform architectures without in (online) incremental learning tasks.
N.A.
2020Gradient Based Sample Selection for Online Continual Learning by Rahaf Aljundi, Min Lin, Baptiste Goujaud and Yoshua Bengio. Advances in Neural Information Processing Systems 32, 11816–11825, 2019. [cifar] [mnist]
@inproceedings{aljundi2019a,
author = {Aljundi, Rahaf and Lin, Min and Goujaud, Baptiste and Bengio, Yoshua},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {Wallach, H and Larochelle, H and Beygelzimer, A and d\$\backslash\$textquotesingle Alché-Buc, F and Fox, E and Garnett, R},
keywords = {[cifar],[mnist]},
pages = {11816--11825},
publisher = {Curran Associates, Inc.},
title = {Gradient Based Sample Selection for Online Continual Learning},
url = {http://papers.nips.cc/paper/9354-gradient-based-sample-selection-for-online-continual-learning.pdf},
year = {2019}
}
N.A.
N.A.
2019Online Continual Learning with Maximal Interfered Retrieval by Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin and Lucas Page-Caccia. Advances in Neural Information Processing Systems 32, 11849–11860, 2019. [cifar] [mnist]
@inproceedings{aljundi2019b,
author = {Aljundi, Rahaf and Belilovsky, Eugene and Tuytelaars, Tinne and Charlin, Laurent and Caccia, Massimo and Lin, Min and Page-Caccia, Lucas},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {Wallach, H and Larochelle, H and Beygelzimer, A and d\$\backslash\$textquotesingle Alché-Buc, F and Fox, E and Garnett, R},
keywords = {[cifar],[mnist]},
pages = {11849--11860},
publisher = {Curran Associates, Inc.},
title = {Online Continual Learning with Maximal Interfered Retrieval},
url = {http://papers.nips.cc/paper/9357-online-continual-learning-with-maximal-interfered-retrieval.pdf},
year = {2019}
}
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at https://github.com/optimass/Maximally_Interfered_Retrieval
N.A.
2019Task-Free Continual Learning by Rahaf Aljundi, Klaas Kelchtermans and Tinne Tuytelaars. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [vision]
@inproceedings{aljundi2019d,
author = {Aljundi, Rahaf and Kelchtermans, Klaas and Tuytelaars, Tinne},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
keywords = {[vision]},
title = {Task-Free Continual Learning},
url = {https://openaccess.thecvf.com/content_CVPR_2019/papers/Aljundi_Task-Free_Continual_Learning_CVPR_2019_paper.pdf},
year = {2019}
}
N.A.
N.A.
2019Efficient Lifelong Learning with A-GEM by Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach and Mohamed Elhoseiny. ICLR, 2019. [cifar] [mnist]
@inproceedings{chaudhry2019,
author = {Chaudhry, Arslan and Ranzato, Marc'Aurelio and Rohrbach, Marcus and Elhoseiny, Mohamed},
booktitle = {ICLR},
keywords = {[cifar],[mnist],Computer Science - Machine Learning,Statistics - Machine Learning},
language = {en},
note = {Comment: Published as a conference paper at ICLR 2019 arXiv: 1812.00420},
title = {Efficient Lifelong Learning with A-GEM},
url = {http://arxiv.org/abs/1812.00420},
year = {2019}
}
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
N.A.
2019Task Agnostic Continual Learning via Meta Learning by Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A Rusu, Yee Whye Teh and Razvan Pascanu. arXiv:1906.05201 [cs, stat], 2019. [mnist]
@book{he2019,
archiveprefix = {arXiv},
author = {He, Xu and Sygnowski, Jakub and Galashov, Alexandre and Rusu, Andrei A and Teh, Yee Whye and Pascanu, Razvan},
eprint = {1906.05201},
eprinttype = {arxiv},
journal = {arXiv:1906.05201 [cs, stat]},
keywords = {[mnist],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Statistics - Machine Learning},
note = {arXiv: 1906.05201
\par
arXiv: 1906.05201},
primaryclass = {cs, stat},
title = {Task Agnostic Continual Learning via Meta Learning},
url = {http://arxiv.org/abs/1906.05201},
year = {2019}
}
While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering – i.e measuring how quickly the network recovers performance rather than measuring the network's performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.
N.A.
2019A Study on Catastrophic Forgetting in Deep LSTM Networks by Monika Schak and Alexander Gepperth. Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning, 714–728, 2019. [rnn]
@incollection{schak2019,
address = {Cham},
author = {Schak, Monika and Gepperth, Alexander},
booktitle = {Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning},
doi = {10.1007/978-3-030-30484-3_56},
editor = {Tetko, Igor V and Kůrková, Věra and Karpov, Pavel and Theis, Fabian},
isbn = {978-3-030-30484-3},
keywords = {[rnn],Catastrophic Forgetting,LSTM,sequential},
language = {en},
pages = {714--728},
publisher = {Springer International Publishing},
series = {Lecture Notes in Computer Science},
title = {A Study on Catastrophic Forgetting in Deep LSTM Networks},
url = {http://link.springer.com/10.1007/978-3-030-30484-3_56},
year = {2019}
}
We present a systematic study of Catastrophic Forgetting (CF), i.e., the abrupt loss of previously acquired knowledge, when retraining deep recurrent LSTM networks with new samples. CF has recently received renewed attention in the case of feed-forward DNNs, and this article is the first work that aims to rigorously establish whether deep LSTM networks are afflicted by CF as well, and to what degree. In order to test this fully, training is conducted using a wide variety of high-dimensional image-based sequence classification tasks derived from established visual classification benchmarks (MNIST, Devanagari, FashionMNIST and EMNIST). We find that the CF effect occurs universally, without exception, for deep LSTM-based sequence classifiers, regardless of the construction and provenance of sequences. This leads us to conclude that LSTMs, just like DNNs, are fully affected by CF, and that further research work needs to be conducted in order to determine how to avoid this effect (which is not a goal of this study).
N.A.
2019Unsupervised Progressive Learning and the STAM Architecture by James Smith, Seth Baer, Cameron Taylor and Constantine Dovrolis. arXiv, 2019. [mnist]
@article{smith2019,
annotation = {_eprint: 1904.02021},
author = {Smith, James and Baer, Seth and Taylor, Cameron and Dovrolis, Constantine},
journal = {arXiv},
keywords = {[mnist]},
title = {Unsupervised Progressive Learning and the STAM Architecture},
url = {http://arxiv.org/abs/1904.02021},
year = {2019}
}
We first pose the Unsupervised Progressive Learning (UPL) problem: an online representation learning problem in which the learner observes a non-stationary and unlabeled data stream, and identifies a growing number of features that persist over time even though the data is not stored or replayed. To solve the UPL problem we propose the Self-Taught Associative Memory (STAM) architecture. Layered hierarchies of STAM modules learn based on a combination of online clustering, novelty detection, forgetting outliers, and storing only prototypical features rather than specific examples. We evaluate STAM representations using classification and clustering tasks. Even though there are no prior approaches that are directly applicable to the UPL problem, we evaluate the STAM architecture in comparison to some unsupervised and self-supervised deep learning approaches adapted in the UPL context.
N.A.
2019Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence by Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan and Philip H.S. Torr. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018. [cifar] [mnist]
@inproceedings{chaudhry2018,
author = {Chaudhry, Arslan and Dokania, Puneet K. and Ajanthan, Thalaiyasingam and Torr, Philip H. S.},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
pages = {532--547},
shorttitle = {Riemannian Walk for Incremental Learning},
title = {Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence},
url = {https://openaccess.thecvf.com/content_ECCV_2018/html/Arslan_Chaudhry__Riemannian_Walk_ECCV_2018_paper.html},
urldate = {2021-01-05},
year = {2018}
}
Incremental learning (il) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the il problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of il. The main challenge for an il algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, il also suffers from a problem we call intransigence, its inability to update knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of il algorithms. Furthermore, we present RWalk, a generalization of ewc++ (our efficient version of ewc [6]) and Path Integral [25] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various il algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off for forgetting and intransigence.
N.A.
2018Overcoming Catastrophic Interference Using Conceptor-Aided Backpropagation by Xu He and Herbert Jaeger. ICLR, 2018. [mnist]
@inproceedings{he2018,
author = {He, Xu and Jaeger, Herbert},
booktitle = {ICLR},
keywords = {[mnist]},
title = {Overcoming Catastrophic Interference Using Conceptor-Aided Backpropagation},
url = {https://openreview.net/pdf?id=B1al7jg0b},
year = {2018}
}
Catastrophic interference has been a major roadblock in the research of continual learning. Here we propose a variant of the back-propagation algorithm, "conceptor-aided backprop" (CAB), in which gradients are shielded by concep-tors against degradation of previously learned tasks. Conceptors have their origin in reservoir computing, where they have been previously shown to overcome catastrophic forgetting. CAB extends these results to deep feedforward networks. On the disjoint and permuted MNIST tasks, CAB outperforms two other methods for coping with catastrophic interference that have recently been proposed.
N.A.
2018Gradient Episodic Memory for Continual Learning by David Lopez-Paz and Marc’Aurelio Ranzato. NIPS, 2017. [cifar] [mnist]
@inproceedings{lopez-paz2017,
author = {Lopez-Paz, David and Ranzato, Marc'Aurelio},
booktitle = {NIPS},
keywords = {[cifar],[mnist],Computer Science - Artificial Intelligence,Computer Science - Machine Learning,gem},
note = {Comment: Published at NIPS 2017 arXiv: 1706.08840},
title = {Gradient Episodic Memory for Continual Learning},
url = {https://arxiv.org/abs/1706.08840},
year = {2017}
}
One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.
N.A.
2017iCaRL: Incremental Classifier and Representation Learning by Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl and Christoph H Lampert. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [cifar]
@inproceedings{rebuffi2017,
author = {Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
keywords = {[cifar]},
title = {iCaRL: Incremental Classifier and Representation Learning},
url = {http://openaccess.thecvf.com/content_cvpr_2017/papers/Rebuffi_iCaRL_Incremental_Classifier_CVPR_2017_paper.pdf},
year = {2017}
}
N.A.
N.A.
2017
Dissertation and Theses¶
6 papers
In this section we maintain a list of all the dissertation and thesis produced on continual learning and related topics.
Continual Learning: Tackling Catastrophic Forgetting in Deep Neural Networks with Replay Processes by and Timoth’ee Lesort. arXiv, 2020. [cifar] [framework] [generative] [mnist] [vision]
@phdthesis{lesort2020a,
annotation = {_eprint: 2007.00487},
author = {Lesort, Timoth'ee},
journal = {arXiv},
keywords = {[cifar],[framework],[generative],[mnist],[vision]},
note = {This dissertation constitutes a valid summary of the latest effort in the use of generative models for continual learning and vice-versa.},
school = {EnstaParis Tech},
title = {Continual Learning: Tackling Catastrophic Forgetting in Deep Neural Networks with Replay Processes},
type = {PhD Thesis},
url = {http://arxiv.org/abs/2007.00487},
year = {2020}
}
Humans learn all their life long. They accumulate knowledge from a sequence of learning experiences and remember the essential concepts without forgetting what they have learned previously. Artificial neural networks struggle to learn similarly. They often rely on data rigorously preprocessed to learn solutions to specific problems such as classification or regression. In particular, they forget their past learning experiences if trained on new ones. Therefore, artificial neural networks are often inept to deal with real-life settings such as an autonomous-robot that has to learn on-line to adapt to new situations and overcome new problems without forgetting its past learning-experiences. Continual learning (CL) is a branch of machine learning addressing this type of problem. Continual algorithms are designed to accumulate and improve knowledge in a curriculum of learning-experiences without forgetting. In this thesis, we propose to explore continual algorithms with replay processes. Replay processes gather together rehearsal methods and generative replay methods. Generative Replay consists of regenerating past learning experiences with a generative model to remember them. Rehearsal consists of saving a core-set of samples from past learning experiences to rehearse them later. The replay processes make possible a compromise between optimizing the current learning objective and the past ones enabling learning without forgetting in sequences of tasks settings. We show that they are very promising methods for continual learning. Notably, they enable the re-evaluation of past data with new knowledge and the confrontation of data from different learning-experiences. We demonstrate their ability to learn continually through unsupervised learning, supervised learning and reinforcement learning tasks.
N.A.
2020Continual Learning in Neural Networks by and Rahaf Aljundi. arXiv, 2019. [cifar] [imagenet] [mnist] [vision]
@phdthesis{aljundi2019,
author = {Aljundi, Rahaf},
journal = {arXiv},
keywords = {[cifar],[imagenet],[mnist],[vision]},
number = {September},
school = {KU Leuven},
title = {Continual Learning in Neural Networks},
type = {PhD Thesis},
url = {https://arxiv.org/abs/1910.02718},
year = {2019}
}
Artificial neural networks have exceeded human-level performance in accomplishing several individual tasks (e.g. voice recognition, object recognition, and video games). However, such success remains modest compared to human intelligence that can learn and perform an unlimited number of tasks. Humans' ability of learning and accumulating knowledge over their lifetime is an essential aspect of their intelligence. Continual machine learning aims at a higher level of machine intelligence through providing the artificial agents with the ability to learn online from a non-stationary and never-ending stream of data. A key component of such a never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The work described in this thesis has been dedicated to the investigation of continual learning and solutions to mitigate the forgetting phenomena in neural networks. To approach the continual learning problem, we first assume a task incremental setting where tasks are received one at a time and data from previous tasks are not stored. Since the task incremental setting can't be assumed in all continual learning scenarios, we also study the more general online continual setting. We consider an infinite stream of data drawn from a non-stationary distribution with a supervisory or self-supervisory training signal. The proposed methods in this thesis have tackled important aspects of continual learning. They were evaluated on different benchmarks and over various learning sequences. Advances in the state of the art of continual learning have been shown and challenges for bringing continual learning into application were critically identified.
N.A.
2019Continual Deep Learning via Progressive Learning by and Haytham M. Fayek. RMIT University, 2019. [audio] [cifar] [imagenet] [sparsity]
@phdthesis{fayek2019,
author = {Fayek, Haytham M.},
journal = {RMIT University},
keywords = {[audio],[cifar],[imagenet],[sparsity]},
school = {RMIT University},
title = {Continual Deep Learning via Progressive Learning},
type = {PhD Thesis},
url = {http://researchbank.rmit.edu.au/eserv/rmit:162646/Fayek.pdf},
year = {2019}
}
Machine learning is one of several approaches to artificial intelligence. It allows us to build machines that can learn from experience as opposed to being explicitly programmed. Current machine learning formulations are mostly designed for learning and performing a particular task from a tabula rasa using data available for that task. For machine learning to converge to artificial intelligence, in addition to other desiderata, it must be in a state of continual learning, i.e., have the ability to be in a continuous learning process, such that when a new task is presented, the system can leverage prior knowledge from prior tasks, in learning and performing this new task, and augment the prior knowledge with the newly acquired knowledge without having a significant adverse effect on the prior knowledge. Continual learning is key to advancing machine learning and artificial intelligence. Deep learning is a powerful general-purpose approach to machine learning that is able to solve numerous and various tasks with minimal modification. Deep learning extends machine learning, and specially neural networks, to learn multiple levels of distributed representations together with the required mapping function into a single composite function. The emergence of deep learning and neural networks as a generic approach to machine learning, coupled with their ability to learn versatile hierarchical representations, has paved the way for continual learning. The main aim of this thesis is the study and development of a structured approach to continual learning, leveraging the success of deep learning and neural networks. This thesis studies the application of deep learning to a number of supervised learning tasks, and in particular, classification tasks in machine perception, e.g., image recognition, automatic speech recognition, and speech emotion recognition. The relation between the systems developed for these tasks is investigated to illuminate the layer-wise relevance of features in deep networks trained for these tasks via transfer learning, and these independent systems are unified into continual learning systems. The main contribution of this thesis is the construction and formulation of a deep learning framework, denoted progressive learning, that allows a holistic and systematic approach to continual learning. Progressive learning comprises a number of procedures that address the continual learning desiderata. It is shown that, when tasks are related, progressive learning leads to faster learning that converges to better generalization performance using less amounts of data and a smaller number of dedicated parameters, for the tasks studied in this thesis, by accumulating and leveraging knowledge learned across tasks in a continuous manner. It is envisioned that progressive learning is a step towards a fully general continual learning framework.
N.A.
2019Continual Learning with Deep Architectures by and Vincenzo Lomonaco. University of Bologna, 2019. [core50] [framework]
@phdthesis{lomonaco2019,
author = {Lomonaco, Vincenzo},
doi = {10.6092/unibo/amsdottorato/9073},
journal = {University of Bologna},
keywords = {[core50],[framework]},
language = {it},
school = {alma},
title = {Continual Learning with Deep Architectures},
type = {PhD Thesis},
url = {http://amsdottorato.unibo.it/9073/},
year = {2019}
}
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial ``continual learning'' agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills. However, despite early speculations and few pioneering works, very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for. Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus. In this dissertation, we study the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI. We propose a comprehensive and unifying framework for continual learning, new metrics, benchmarks and algorithms, as well as providing substantial experimental evaluations in different supervised, unsupervised and reinforcement learning tasks.
N.A.
2019Explanation-Based Neural Network Learning: A Lifelong Learning Approach by and Sebastian Thrun. Springer, 1996. [framework]
@book{thrun1996,
author = {Thrun, Sebastian},
isbn = {978-1-4612-8597-7},
keywords = {[framework]},
publisher = {Springer},
title = {Explanation-Based Neural Network Learning: A Lifelong Learning Approach},
url = {https://www.springer.com/gp/book/9780792397168},
year = {1996}
}
Lifelong learning addresses situations in which a learner faces a series of different learning tasks providing the opportunity for synergy among them. Explanation-based neural network learning (EBNN) is a machine learning algorithm that transfers knowledge across multiple learning tasks. When faced with a new learning task, EBNN exploits domain knowledge accumulated in previous learning tasks to guide generalization in the new one. As a result, EBNN generalizes more accurately from less data than comparable methods. Explanation-Based Neural Network Learning: A Lifelong Learning Approach describes the basic EBNN paradigm and investigates it in the context of supervised learning, reinforcement learning, robotics, and chess. `The paradigm of lifelong learning - using earlier learned knowledge to improve subsequent learning - is a promising direction for a new generation of machine learning algorithms. Given the need for more accurate learning methods, it is difficult to imagine a future for machine learning that does not include this paradigm.' From the Foreword by Tom M. Mitchell.
N.A.
1996Continual Learning in Reinforcement Environments by and Mark Ring. University of Texas, 1994. [framework]
@phdthesis{ring1994,
author = {Ring, Mark},
journal = {University of Texas},
keywords = {[framework]},
school = {University of Texas},
title = {Continual Learning in Reinforcement Environments},
type = {PhD Thesis},
url = {https://www.cs.utexas.edu/ ring/Ring-dissertation.pdf},
volume = {1},
year = {1994}
}
Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a continual-learning agent should learn hierarchically. CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation. CHILD accumulates useful behaviors in reinforcement environments by using the Temporal Transition Hierarchies learning algorithm, also derived in the dissertation. This constructive algorithm generates a hierarchical, higher-order neural network that can be used for predicting context-dependent temporal sequences and can learn sequential-task benchmarks more than two orders of magnitude faster than competing neural network systems. Consequently, CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still. This continual-learning approach is made possible by the unique properties of Temporal Transition Hierarchies, which allow existing skills to be amended and augmented in precisely the same way that they were constructed in the first place.
N.A.
1994
Generative Replay Methods¶
5 papers
In this section we collect all the papers introducing a continual learning strategy employing some generative replay methods.
Brain-Inspired Replay for Continual Learning with Artificial Neural Networks by Gido M. van de Ven, Hava T. Siegelmann and Andreas S. Tolias. Nature Communications, 2020. [cifar] [framework] [generative] [mnist]
@article{vandeven2020,
author = {van de Ven, Gido M. and Siegelmann, Hava T. and Tolias, Andreas S.},
doi = {10.1038/s41467-020-17866-2},
journal = {Nature Communications},
keywords = {[cifar],[framework],[generative],[mnist]},
note = {The paper shows a generative form of replay in which a VAE, conditioned on the current task, is able to generate pseudosamples and, when used as a classifier, to address new tasks. The idea is that the generative model is inspired by the hyppocampus, which sits hierarchically on top of the cortex (often thought as the classifier). In this way, replay is fed-back by the same model used to predict the class. Forgetting is prevented both on VAE and on the classification component through replay. It also shows that regularization approaches fail in class-incremental setting.},
title = {Brain-Inspired Replay for Continual Learning with Artificial Neural Networks},
url = {https://www.nature.com/articles/s41467-020-17866-2},
volume = {11},
year = {2020}
}
Artificial neural networks suffer from catastrophic forgetting. Unlike humans, when these networks are trained on something new, they rapidly forget what was learned before. In the brain, a mechanism thought to be important for protecting memories is the reactivation of neuronal activity patterns representing those memories. In artificial neural networks, such memory replay can be implemented as `generative replay', which can successfully – and surprisingly efficiently – prevent catastrophic forgetting on toy examples even in a class-incremental learning scenario. However, scaling up generative replay to complicated problems with many tasks or complex inputs is challenging. We propose a new, brain-inspired variant of replay in which internal or hidden representations are replayed that are generated by the network's own, context-modulated feedback connections. Our method achieves state-of-the-art performance on challenging continual learning benchmarks (e.g., class-incremental learning on CIFAR-100) without storing data, and it provides a novel model for replay in the brain.
N.A.
2020Complementary Learning for Overcoming Catastrophic Forgetting Using Experience Replay by Mohammad Rostami, Soheil Kolouri and Praveen K Pilly. arXiv, 2019.
@article{rostami2019,
annotation = {_eprint: 1903.04566},
author = {Rostami, Mohammad and Kolouri, Soheil and Pilly, Praveen K},
journal = {arXiv},
title = {Complementary Learning for Overcoming Catastrophic Forgetting Using Experience Replay},
url = {http://arxiv.org/abs/1903.04566},
year = {2019}
}
Despite huge success, deep networks are unable to learn effectively in sequential multitask learning settings as they forget the past learned tasks after learning new tasks. Inspired from complementary learning systems theory, we address this challenge by learning a generative model that couples the current task to the past learned tasks through a discriminative embedding space. We learn an abstract level generative distribution in the embedding that allows the generation of data points to represent the experience. We sample from this distribution and utilize experience replay to avoid forgetting and simultaneously accumulate new knowledge to the abstract distribution in order to couple the current task with past experience. We demonstrate theoretically and empirically that our framework learns a distribution in the embedding that is shared across all task and as a result tackles catastrophic forgetting.
N.A.
2019Continual Learning of New Sound Classes Using Generative Replay by Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis and Laurent Charlin. arXiv, 2019. [audio]
@article{wang2019,
author = {Wang, Zhepei and Subakan, Cem and Tzinis, Efthymios and Smaragdis, Paris and Charlin, Laurent},
journal = {arXiv},
keywords = {[audio],audio,Computer Science - Machine Learning,Computer Science - Sound,Electrical Engineering and Systems Science - Audio,sequence,sequences,Statistics - Machine Learning,time series},
note = {arXiv: 1906.00654
\par
arXiv: 1906.00654},
title = {Continual Learning of New Sound Classes Using Generative Replay},
url = {http://arxiv.org/abs/1906.00654},
year = {2019}
}
Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as "catastrophic forgetting." We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.
N.A.
2019Generative Replay with Feedback Connections as a General Strategy for Continual Learning by Gido M. van de Ven and Andreas S. Tolias. arXiv, 2018. [framework] [generative] [mnist]
@article{vandeven2018,
annotation = {_eprint: 1809.10635},
author = {van de Ven, Gido M. and Tolias, Andreas S.},
journal = {arXiv},
keywords = {[framework],[generative],[mnist]},
title = {Generative Replay with Feedback Connections as a General Strategy for Continual Learning},
url = {https://arxiv.org/abs/1809.10635},
year = {2018}
}
A major obstacle to developing artificial intelligence applications capable of true lifelong learning is that artificial neural networks quickly or catastrophically forget previously learned tasks when trained on a new one. Numerous methods for alleviating catastrophic forgetting are currently being proposed, but differences in evaluation protocols make it difficult to directly compare their performance. To enable more meaningful comparisons, here we identified three distinct scenarios for continual learning based on whether task identity is known and, if it is not, whether it needs to be inferred. Performing the split and permuted MNIST task protocols according to each of these scenarios, we found that regularization-based approaches (e.g., elastic weight consolidation) failed when task identity needed to be inferred. In contrast, generative replay combined with distillation (i.e., using class probabilities as "soft targets") achieved superior performance in all three scenarios. Addressing the issue of efficiency, we reduced the computational cost of generative replay by integrating the generative model into the main model by equipping it with generative feedback or backward connections. This Replay-through-Feedback approach substantially shortened training time with no or negligible loss in performance. We believe this to be an important first step towards making the powerful technique of generative replay scalable to real-world continual learning applications.
N.A.
2018Continual Learning with Deep Generative Replay by Hanul Shin, Jung Kwon Lee, Jaehong Kim and Jiwon Kim. Advances in Neural Information Processing Systems 30, 2990–2999, 2017. [mnist]
@inproceedings{shin2017,
author = {Shin, Hanul and Lee, Jung Kwon and Kim, Jaehong and Kim, Jiwon},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {Guyon, I and Luxburg, U V and Bengio, S and Wallach, H and Fergus, R and Vishwanathan, S and Garnett, R},
keywords = {[mnist]},
pages = {2990--2999},
publisher = {Curran Associates, Inc.},
title = {Continual Learning with Deep Generative Replay},
url = {http://papers.nips.cc/paper/6892-continual-learning-with-deep-generative-replay.pdf},
year = {2017}
}
N.A.
N.A.
2017
Hybrid Methods¶
8 papers
In this section we collect all the papers introducing a continual learning strategy employing some hybrid methods, mixing different strategies.
Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches by Vincenzo Lomonaco, Davide Maltoni and Lorenzo Pellegrini. CVPR Workshop on Continual Learning for Computer Vision, 246–247, 2020. [core50]
@inproceedings{lomonaco2020a,
author = {Lomonaco, Vincenzo and Maltoni, Davide and Pellegrini, Lorenzo},
booktitle = {CVPR Workshop on Continual Learning for Computer Vision},
keywords = {[core50]},
pages = {246--247},
title = {Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches},
url = {https://openaccess.thecvf.com/content_CVPRW_2020/html/w15/Lomonaco_Rehearsal-Free_Continual_Learning_Over_Small_Non-I.I.D._Batches_CVPRW_2020_paper.html},
year = {2020}
}
Robotic vision is a field where continual learning can play a significant role. An embodied agent operating in a complex environment subject to frequent and unpredictable changes is required to learn and adapt continuously. In the context of object recognition, for example, a robot should be able to learn (without forgetting) objects of never before seen classes as well as improving its recognition capabilities as new instances of already known classes are discovered. Ideally, continual learning should be triggered by the availability of short videos of single objects and performed on-line on on-board hardware with fine-grained updates. In this paper, we introduce a novel continual learning protocol based on the CORe50 benchmark and propose two rehearsal-free continual learning techniques, CWR* and AR1*, that can learn effectively even in the challenging case of nearly 400 small non-i.i.d. incremental batches. In particular, our experiments show that AR1* can outperform other state-of-the-art rehearsal-free techniques by more than 15% accuracy in some cases, with a very light and constant computational and memory overhead across training batches.
N.A.
2020Linear Mode Connectivity in Multitask and Continual Learning by Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu and Hassan Ghasemzadeh. arXiv, 2020. [cifar] [experimental] [mnist]
@article{mirzadeh2020,
annotation = {_eprint: 2010.04495},
author = {Mirzadeh, Seyed Iman and Farajtabar, Mehrdad and Gorur, Dilan and Pascanu, Razvan and Ghasemzadeh, Hassan},
journal = {arXiv},
keywords = {[cifar],[experimental],[mnist]},
note = {The authors observe how minima of CL and Multitask algorithms lie in a linear subspace (when sharing initialization). They use this argumento to build a CL strategy which forces minima to stay on the same subspace, using also small replay memories.
\par
The authors observe how minima of CL and Multitask algorithms lie in a linear subspace (when sharing initialization). They use this argumento to build a CL strategy which forces minima to stay on the same subspace, using also small replay memories.},
title = {Linear Mode Connectivity in Multitask and Continual Learning},
url = {https://github.com/imirzadeh/MC-SGD http://arxiv.org/abs/2010.04495},
year = {2020}
}
Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where continual learning can only have access to one task at a time, which for neural networks typically leads to catastrophic forgetting. That is, the solution found for a subsequent task does not perform well on the previous ones anymore. However, the relationship between the different minima that the two training regimes arrive at is not well understood. What sets them apart? Is there a local structure that could explain the difference in performance achieved by the two different schemes? Motivated by recent work showing that different minima of the same task are typically connected by very simple curves of low error, we investigate whether multitask and continual solutions are similarly connected. We empirically find that indeed such connectivity can be reliably achieved and, more interestingly, it can be done by a linear path, conditioned on having the same initialization for both. We thoroughly analyze this observation and discuss its significance for the continual learning process. Furthermore, we exploit this finding to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. We show that our method outperforms several state of the art continual learning algorithms on various vision benchmarks.
N.A.
2020Single-Net Continual Learning with Progressive Segmented Training (PST) by Xiaocong Du, Gouranga Charan, Frank Liu and Yu Cao. arXiv, 1629–1636, 2019. [cifar]
@article{du2019,
annotation = {_eprint: 1905.11550},
author = {Du, Xiaocong and Charan, Gouranga and Liu, Frank and Cao, Yu},
doi = {10.1109/ICMLA.2019.00267},
isbn = {9781728145501},
journal = {arXiv},
keywords = {[cifar]},
number = {2},
pages = {1629--1636},
title = {Single-Net Continual Learning with Progressive Segmented Training (PST)},
url = {http://arxiv.org/abs/1905.11550},
year = {2019}
}
There is an increasing need of continual learning in dynamic systems, such as the self-driving vehicle, the surveillance drone, and the robotic system. Such a system requires learning from the data stream, training the model to preserve previous information and adapt to a new task, and generating a single-headed vector for future inference. Different from previous approaches with dynamic structures, this work focuses on a single network and model segmentation to prevent catastrophic forgetting. Leveraging the redundant capacity of a single network, model parameters for each task are separated into two groups: one important group which is frozen to preserve current knowledge, and secondary group to be saved (not pruned) for a future learning. A fixed-size memory containing a small amount of previously seen data is further adopted to assist the training. Without additional regularization, the simple yet effective approach of PST successfully incorporates multiple tasks and achieves the state-of-the-art accuracy in the single-head evaluation on CIFAR-10 and CIFAR-100 datasets. Moreover, the segmented training significantly improves computation efficiency in continual learning.
N.A.
2019Continuous Learning in Single-Incremental-Task Scenarios by Davide Maltoni and Vincenzo Lomonaco. Neural Networks, 56–73, 2019. [core50] [framework]
@article{maltoni2019,
author = {Maltoni, Davide and Lomonaco, Vincenzo},
journal = {Neural Networks},
keywords = {[core50],[framework],Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Rec,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Continuous learning,Deep learning,ewc,Incremental class learning,incremental task,Lifelong learning,Object recognition,review,Single-incremental-task,Statistics - Machine Learning},
language = {en},
note = {Comment: 26 pages, 13 figures; v3: major revision (e.g. added Sec. 4.4), several typos and minor mistakes corrected arXiv: 1806.08568},
pages = {56--73},
title = {Continuous Learning in Single-Incremental-Task Scenarios},
url = {http://arxiv.org/abs/1806.08568},
volume = {116},
year = {2019}
}
It was recently shown that architectural, regularization and rehearsal strategies can be used to train deep models sequentially on a number of disjoint tasks without forgetting previously acquired knowledge. However, these strategies are still unsatisfactory if the tasks are not disjoint but constitute a single incremental task (e.g., class-incremental learning). In this paper we point out the differences between multi-task and single-incremental-task scenarios and show that well-known approaches such as LWF, EWC and SI are not ideal for incremental task scenarios. A new approach, denoted as AR1, combining architectural and regularization strategies is then specifically proposed. AR1 overhead (in terms of memory and computation) is very small thus making it suitable for online learning. When tested on CORe50 and iCIFAR-100, AR1 outperformed existing regularization strategies by a good margin.
N.A.
2019Toward Training Recurrent Neural Networks for Lifelong Learning by Shagun Sodhani, Sarath Chandar and Yoshua Bengio. Neural Computation, 1–35, 2019. [rnn]
@article{sodhani2019,
author = {Sodhani, Shagun and Chandar, Sarath and Bengio, Yoshua},
doi = {10.1162/neco_a_01246},
issn = {0899-7667},
journal = {Neural Computation},
keywords = {[rnn]},
number = {1},
pages = {1--35},
publisher = {MIT Press},
title = {Toward Training Recurrent Neural Networks for Lifelong Learning},
url = {https://doi.org/10.1162/neco_a_01246},
urldate = {2021-01-06},
volume = {32},
year = {2019}
}
Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we study these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.
N.A.
2019Continual Learning of New Sound Classes Using Generative Replay by Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis and Laurent Charlin. arXiv, 2019. [audio]
@article{wang2019,
author = {Wang, Zhepei and Subakan, Cem and Tzinis, Efthymios and Smaragdis, Paris and Charlin, Laurent},
journal = {arXiv},
keywords = {[audio],audio,Computer Science - Machine Learning,Computer Science - Sound,Electrical Engineering and Systems Science - Audio,sequence,sequences,Statistics - Machine Learning,time series},
note = {arXiv: 1906.00654
\par
arXiv: 1906.00654},
title = {Continual Learning of New Sound Classes Using Generative Replay},
url = {http://arxiv.org/abs/1906.00654},
year = {2019}
}
Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as "catastrophic forgetting." We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.
N.A.
2019Lifelong Learning via Progressive Distillation and Retrospection by Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang and Dahua Lin. ECCV, 2018. [imagenet] [vision]
@inproceedings{hou2018,
author = {Hou, Saihui and Pan, Xinyu and Loy, Chen Change and Wang, Zilei and Lin, Dahua},
booktitle = {ECCV},
doi = {10.1007/978-3-030-01219-9_27},
isbn = {978-3-030-01218-2},
issn = {16113349},
keywords = {[imagenet],[vision],Knowledge distillation,Lifelong learning,Retrospection},
title = {Lifelong Learning via Progressive Distillation and Retrospection},
url = {http://link.springer.com/10.1007/978-3-030-01219-9_27},
year = {2018}
}
Lifelong learning aims at adapting a learned model to new tasks while retaining the knowledge gained earlier. A key challenge for lifelong learning is how to strike a balance between the preservation on old tasks and the adaptation to a new one within a given model. Approaches that combine both objectives in training have been explored in previous works. Yet the performance still suffers from considerable degradation in a long sequence of tasks. In this work, we propose a novel approach to lifelong learning, which tries to seek a better balance between preservation and adaptation via two techniques: Distillation and Retrospection. Specifically, the target model adapts to the new task by knowledge distillation from an intermediate expert, while the previous knowledge is more effectively preserved by caching a small subset of data for old tasks. The combination of Distillation and Retrospection leads to a more gentle learning curve for the target model, and extensive experiments demonstrate that our approach can bring consistent improvements on both old and new tasks.
N.A.
2018Progress & Compress: A Scalable Framework for Continual Learning by Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu and Raia Hadsell. International Conference on Machine Learning, 4528–4537, 2018. [vision]
@inproceedings{schwarz2018,
author = {Schwarz, Jonathan and Czarnecki, Wojciech and Luketina, Jelena and Grabska-Barwinska, Agnieszka and Teh, Yee Whye and Pascanu, Razvan and Hadsell, Raia},
booktitle = {International Conference on Machine Learning},
keywords = {[vision],ewc,normalized ewc,online ewc},
language = {en},
pages = {4528--4537},
shorttitle = {Progress & Compress},
title = {Progress & Compress: A Scalable Framework for Continual Learning},
url = {http://proceedings.mlr.press/v80/schwarz18a.html},
year = {2018}
}
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to ...
N.A.
2018
Meta Continual Learning¶
8 papers
In this section we list all the papers related to the meta-continual learning.
Learning to Continually Learn by Shawn Beaulieu, Lapo Frati, Thomas Miconi, Joel Lehman, Kenneth O. Stanley, Jeff Clune and Nick Cheney. ECAI, 2020. [vision]
@inproceedings{beaulieu2020,
annotation = {_eprint: 2002.09571},
author = {Beaulieu, Shawn and Frati, Lapo and Miconi, Thomas and Lehman, Joel and Stanley, Kenneth O. and Clune, Jeff and Cheney, Nick},
booktitle = {ECAI},
keywords = {[vision]},
title = {Learning to Continually Learn},
url = {http://arxiv.org/abs/2002.09571},
year = {2020}
}
Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manually-designed solutions to the problem. We instead advocate meta-learning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated Meta-Learning Algorithm (ANML). It differentiates through a sequential learning process to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i.e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces state-of-the-art continual learning performance, sequentially learning as many as 600 classes (over 9,000 SGD updates).
N.A.
2020Continual Learning with Deep Artificial Neurons by Blake Camp, Jaya Krishna Mandivarapu and Rolando Estrada. arXiv, 2020. [experimental]
@article{camp2020,
annotation = {_eprint: 2011.07035},
author = {Camp, Blake and Mandivarapu, Jaya Krishna and Estrada, Rolando},
journal = {arXiv},
keywords = {[experimental]},
note = {The authors replace each neuron of a standard feedforward network with a small neural network with its own parameters, meta-learned and shared throughout the whole network. They experiment with regression on sine waves.},
title = {Continual Learning with Deep Artificial Neurons},
url = {http://arxiv.org/abs/2011.07035},
year = {2020}
}
Neurons in real brains are enormously complex computational units. Among other things, they're responsible for transforming inbound electro-chemical vectors into outbound action potentials, updating the strengths of intermediate synapses, regulating their own internal states, and modulating the behavior of other nearby neurons. One could argue that these cells are the only things exhibiting any semblance of real intelligence. It is odd, therefore, that the machine learning community has, for so long, relied upon the assumption that this complexity can be reduced to a simple sum and fire operation. We ask, might there be some benefit to substantially increasing the computational power of individual neurons in artificial systems? To answer this question, we introduce Deep Artificial Neurons (DANs), which are themselves realized as deep neural networks. Conceptually, we embed DANs inside each node of a traditional neural network, and we connect these neurons at multiple synaptic sites, thereby vectorizing the connections between pairs of cells. We demonstrate that it is possible to meta-learn a single parameter vector, which we dub a neuronal phenotype, shared by all DANs in the network, which facilitates a meta-objective during deployment. Here, we isolate continual learning as our meta-objective, and we show that a suitable neuronal phenotype can endow a single network with an innate ability to update its synapses with minimal forgetting, using standard backpropagation, without experience replay, nor separate wake/sleep phases. We demonstrate this ability on sequential non-linear regression tasks.
N.A.
2020Meta-Consolidation for Continual Learning by K J Joseph and Vineeth N Balasubramanian. NeurIPS, 2020. [bayes] [cifar] [imagenet] [mnist]
@inproceedings{joseph2020,
annotation = {_eprint: 2010.00352},
author = {Joseph, K J and Balasubramanian, Vineeth N},
booktitle = {NeurIPS},
keywords = {[bayes],[cifar],[imagenet],[mnist]},
note = {The authors leverage a bayesian framework in which the parameters of a model are sampled from a generating distribution. This distribution, parameterized by a task label, is used together with a VAE to consolidate online previous and current knowledge. Inference does not require task labels and exploit an ensemble of model, sampled from the generating distribution.
\par
The authors leverage a bayesian framework in which the parameters of a model are sampled from a generating distribution. This distribution, parameterized by a task label, is used together with a VAE to consolidate online previous and current knowledge. Inference does not require task labels and exploit an ensemble of model, sampled from the generating distribution.},
title = {Meta-Consolidation for Continual Learning},
url = {http://arxiv.org/abs/2010.00352},
year = {2020}
}
The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a novel methodology for continual learning called MERLIN: Meta-Consolidation for Continual Learning. We assume that weights of a neural network \$\ backslashboldsymbol \ backslashpsi\$, for solving task \$\ backslashboldsymbol t\$, come from a meta-distribution \$p(\ backslashboldsymbol\\ backslashpsi|t\)\$. This meta-distribution is learned and consolidated incrementally. We operate in the challenging online continual learning setting, where a data point is seen by the model only once. Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini-ImageNet datasets show consistent improvement over five baselines, including a recent state-of-the-art, corroborating the promise of MERLIN.
N.A.
2020Meta Continual Learning via Dynamic Programming by R Krishnan and Prasanna Balaprakash. arXiv, 2020. [omniglot]
@article{krishnan2020,
annotation = {_eprint: 2008.02219},
author = {Krishnan, R and Balaprakash, Prasanna},
journal = {arXiv},
keywords = {[omniglot]},
title = {Meta Continual Learning via Dynamic Programming},
url = {https://arxiv.org/abs/2008.02219},
year = {2020}
}
Meta-continual learning algorithms seek to rapidly train a model when faced with similar tasks sampled sequentially from a task distribution. Although impressive strides have been made in this area, there is no theoretical framework that enables systematic analysis of key learning challenges, such as generalization and catastrophic forgetting. We introduce a new theoretical framework for meta-continual learning using dynamic programming, analyze generalization and catastrophic forgetting, and establish conditions of optimality. We show that existing meta-continual learning methods can be derived from the proposed dynamic programming framework. Moreover, we develop a new dynamic-programming-based meta-continual approach that adopts stochastic-gradient-driven alternating optimization method. We show that, on meta-continual learning benchmark data sets, our theoretically grounded meta-continual learning approach is better than or comparable to the purely empirical strategies adopted by the existing state-of-the-art methods.
N.A.
2020Online Meta-Learning by Chelsea Finn, Aravind Rajeswaran, Sham Kakade and Sergey Levine. ICML, 2019. [experimental] [mnist]
@inproceedings{finn2019,
author = {Finn, Chelsea and Rajeswaran, Aravind and Kakade, Sham and Levine, Sergey},
booktitle = {ICML},
keywords = {[experimental],[mnist]},
note = {This paper focuses on meta learning a stream of tasks. As for most of the online learning literature the focus is not on catastrophic forgetting, which is not taken into consideration, but on forward / backward transfer and few-shot learning. It stores a replay buffer for each task in order to meta-optimize in the outer loop.},
title = {Online Meta-Learning},
url = {http://proceedings.mlr.press/v97/finn19a/finn19a.pdf},
year = {2019}
}
A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the tasks are available together as a batch. In contrast, online (regret based) learning considers a setting where tasks are revealed one after the other, but conventionally trains a single model without task-specific adaptation. This work introduces an online meta-learning setting, which merges ideas from both paradigms to better capture the spirit and practice of continual lifelong learning. We propose the follow the meta leader (FTML) algorithm which extends the MAML algorithm to this setting. Theoretically, this work provides an O(log T) regret guarantee with one additional higher order smoothness assumption (in comparison to the standard online setting). Our experimental evaluation on three different large-scale problems suggest that the proposed algorithm significantly outperforms alternatives based on traditional online learning approaches.
N.A.
2019Meta-Learning Representations for Continual Learning by Khurram Javed and Martha White. NeurIPS, 2019. [omniglot]
@inproceedings{javed2019,
author = {Javed, Khurram and White, Martha},
booktitle = {NeurIPS},
keywords = {[omniglot]},
title = {Meta-Learning Representations for Continual Learning},
url = {http://papers.nips.cc/paper/8458-meta-learning-representations-for-continual-learning},
year = {2019}
}
A continual learning agent should be able to build on top of existing knowledge to learn on new data quickly while minimizing forgetting. Current intelligent systems based on neural network function approximators arguably do the opposite-they are highly prone to forgetting and rarely trained to facilitate future learning. One reason for this poor behavior is that they learn from a representation that is not explicitly trained for these two goals. In this paper, we propose OML, an objective that directly minimizes catastrophic interference by learning representations that accelerate future learning and are robust to forgetting under online updates in continual learning. We show that it is possible to learn naturally sparse representations that are more effective for online updating. Moreover, our algorithm is complementary to existing continual learning strategies, such as MER and GEM. Finally, we demonstrate that a basic online updating strategy on representations learned by OML is competitive with rehearsal based methods for continual learning. 1
N.A.
2019Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference by Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu and Gerald Tesauro. ICLR, 2019. [mnist]
@inproceedings{riemer2019,
author = {Riemer, Matthew and Cases, Ignacio and Ajemian, Robert and Liu, Miao and Rish, Irina and Tu, Yuhai and Tesauro, Gerald},
booktitle = {ICLR},
keywords = {[mnist]},
title = {Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference},
url = {https://openreview.net/pdf?id=B1gTShAct7},
year = {2019}
}
Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. 1 We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.
N.A.
2019Meta Continual Learning by Risto Vuorio, Dong-Yeon Cho, Daejoong Kim and Jiwon Kim. arXiv, 2018. [mnist]
@article{vuorio2018,
annotation = {_eprint: 1806.06928},
author = {Vuorio, Risto and Cho, Dong-Yeon and Kim, Daejoong and Kim, Jiwon},
journal = {arXiv},
keywords = {[mnist]},
title = {Meta Continual Learning},
url = {https://arxiv.org/abs/1806.06928},
year = {2018}
}
Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting.
N.A.
2018
Metrics and Evaluations¶
6 papers
In this section we list all the papers related to the continual learning evalution protocols and metrics.
Online Fast Adaptation and Knowledge Accumulation: A New Approach to Continual Learning by Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexande Lacoste, David Vazquez and Laurent Charlin. arXiv, 2020. [fashion] [framework] [mnist]
@article{caccia2020,
annotation = {_eprint: 2003.05856},
author = {Caccia, Massimo and Rodriguez, Pau and Ostapenko, Oleksiy and Normandin, Fabrice and Lin, Min and Caccia, Lucas and Laradji, Issam and Rish, Irina and Lacoste, Alexande and Vazquez, David and Charlin, Laurent},
journal = {arXiv},
keywords = {[fashion],[framework],[mnist],Computer Science - Artificial Intelligence,Computer Science - Machine Learning,continual meta learning,framework,MAML,meta continual learning,OSAKA},
note = {arXiv: 2003.05856},
title = {Online Fast Adaptation and Knowledge Accumulation: A New Approach to Continual Learning},
url = {http://arxiv.org/abs/2003.05856},
year = {2020}
}
Learning from non-stationary data remains a great challenge for machine learning. Continual learning addresses this problem in scenarios where the learning agent faces a stream of changing tasks. In these scenarios, the agent is expected to retain its highest performance on previous tasks without revisiting them while adapting well to the new tasks. Two new recent continual-learning scenarios have been proposed. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting when trained on a sequence of tasks. In continual-meta learning, the goal is faster remembering, i.e., focusing on how quickly the agent recovers performance rather than measuring the agent's performance without any adaptation. Both scenarios have the potential to propel the field forward. Yet in their original formulations, they each have limitations. As a remedy, we propose a more general scenario where an agent must quickly solve (new) out-of-distribution tasks, while also requiring fast remembering. We show that current continual learning, meta learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. Accordingly, we propose a strong baseline: Continual-MAML, an online extension of the popular MAML algorithm. In our empirical experiments, we show that our method is better suited to the new scenario than the methodologies mentioned above, as well as standard continual learning and meta learning approaches.
N.A.
2020Optimal Continual Learning Has Perfect Memory and Is NP-HARD by Jeremias Knoblauch, Hisham Husain and Tom Diethe. ICML, 2020. [theoretical]
@inproceedings{knoblauch2020,
author = {Knoblauch, Jeremias and Husain, Hisham and Diethe, Tom},
booktitle = {ICML},
keywords = {[theoretical]},
note = {In this paper optimal continual learning which perfectly solves all tasks without forgetting is proofed to be NP-hard. Also, memorization of previous information is necessary, thus establishing superiority of replay based strategies against regularization based strategies.},
title = {Optimal Continual Learning Has Perfect Memory and Is NP-HARD},
url = {https://proceedings.icml.cc/paper/2020/file/274ad4786c3abca69fa097b85867d9a4-Paper.pdf},
year = {2020}
}
Continual Learning (CL) algorithms incremen-tally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular , we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-HARD problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches.
N.A.
2020Regularization Shortcomings for Continual Learning by Timothée Lesort, Andrei Stoian and David Filliat. arXiv, 2020. [fashion] [mnist]
@article{lesort2020b,
author = {Lesort, Timothée and Stoian, Andrei and Filliat, David},
journal = {arXiv},
keywords = {[fashion],[mnist],class incremental,Computer Science - Machine Learning,regularization,Statistics - Machine Learning},
note = {arXiv: 1912.03049},
title = {Regularization Shortcomings for Continual Learning},
url = {http://arxiv.org/abs/1912.03049},
year = {2020}
}
In most machine learning algorithms, training data are assumed independent and identically distributed (iid). Otherwise, the algorithms' performances are challenged. A famous phenomenon with non-iid data distribution is known as \$\ backslash\$saya̧tastrophic forgetting\. Algorithms dealing with it are gathered in the \$\ backslash\$textit\Continual Learning\ research field. In this article, we study the \$\ backslash\$textit\regularization\ based approaches to continual learning. We show that those approaches can not learn to discriminate classes from different tasks in an elemental continual benchmark: class-incremental setting. We make theoretical reasoning to prove this shortcoming and illustrate it with examples and experiments.
N.A.
2020Strategies for Improving Single-Head Continual Learning Performance by Alaa El Khatib and Fakhri Karray. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 452–460, 2019. [cifar] [mnist]
@incollection{elkhatib2019,
author = {El Khatib, Alaa and Karray, Fakhri},
booktitle = {Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
doi = {10.1007/978-3-030-27202-9_41},
isbn = {978-3-030-27201-2},
issn = {16113349},
keywords = {[cifar],[mnist],Catastrophic forgetting,Continual learning},
pages = {452--460},
publisher = {Springer Verlag},
title = {Strategies for Improving Single-Head Continual Learning Performance},
url = {http://link.springer.com/10.1007/978-3-030-27202-9_41},
volume = {11662 LNCS},
year = {2019}
}
Catastrophic forgetting has long been seen as the main obstacle to building continual learning models. We argue in this paper that an equally challenging characteristic of the continual learning framework is that data are never completely available at the same time, making it difficult to learn joint conditional distributions over them. This is most evident in the usually large gap between single-head and multi-head performance of continual learning models. We propose in this paper two strategies to improve performance of continual learning models, particularly in the single-head framework and for image classification tasks. First, we argue that learning multiple binary classifiers, rather than a single multi-class classifier, for each presentation of data is more consistent with the single-head framework. Moreover, we argue that auxiliary, unlabelled data can be used in tandem with this approach to slow the decay in performance of these binary classifiers over time.
N.A.
2019Towards Robust Evaluations of Continual Learning by Sebastian Farquhar and Yarin Gal. Privacy in Machine Learning and Artificial Intelligence Workshop, ICML, 2019. [fashion] [framework]
@inproceedings{farquhar2019,
author = {Farquhar, Sebastian and Gal, Yarin},
booktitle = {Privacy in Machine Learning and Artificial Intelligence Workshop, ICML},
keywords = {[fashion],[framework],Computer Science - Machine Learning,critique,evaluation,metrics,Statistics - Machine Learning},
note = {arXiv: 1805.09733},
title = {Towards Robust Evaluations of Continual Learning},
url = {http://arxiv.org/abs/1805.09733},
year = {2019}
}
Experiments used in current continual learning research do not faithfully assess fundamental challenges of learning continually. Instead of assessing performance on challenging and representative experiment designs, recent research has focused on increased dataset difficulty, while still using flawed experiment set-ups. We examine standard evaluations and show why these evaluations make some continual learning approaches look better than they are. We introduce desiderata for continual learning evaluations and explain why their absence creates misleading comparisons. Based on our desiderata we then propose new experiment designs which we demonstrate with various continual learning approaches and datasets. Our analysis calls for a reprioritization of research effort by the community.
N.A.
2019Three Scenarios for Continual Learning by Gido M van de Ven and Andreas S Tolias. Continual Learning Workshop NeurIPS, 2018. [framework] [mnist]
@inproceedings{vandeven2018a,
author = {van de Ven, Gido M and Tolias, Andreas S},
booktitle = {Continual Learning Workshop NeurIPS},
keywords = {[framework],[mnist],Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Rec,Computer Science - Machine Learning,Statistics - Machine Learning},
note = {The authors clearly show how regularization approaches fail to perform well on Class-IL scenarios. They used a MLP for all experiments.},
title = {Three Scenarios for Continual Learning},
url = {http://arxiv.org/abs/1904.07734},
year = {2018}
}
Standard artificial neural networks suffer from the well-known issue of catastrophic forgetting, making continual or lifelong learning difficult for machine learning. In recent years, numerous methods have been proposed for continual learning, but due to differences in evaluation protocols it is difficult to directly compare their performance. To enable more structured comparisons, we describe three continual learning scenarios based on whether at test time task identity is provided and– in case it is not– whether it must be inferred. Any sequence of well-defined tasks can be performed according to each scenario. Using the split and permuted MNIST task protocols, for each scenario we carry out an extensive comparison of recently proposed continual learning methods. We demonstrate substantial differences between the three scenarios in terms of difficulty and in terms of how efficient different methods are. In particular, when task identity must be inferred (i.e., class incremental learning), we find that regularization-based approaches (e.g., elastic weight consolidation) fail and that replaying representations of previous experiences seems required for solving this scenario.
N.A.
2018
Neuroscience¶
5 papers
In this section we maintain a list of all Neuroscience papers that can be related (and useful) for continual machine learning.
Can Sleep Protect Memories from Catastrophic Forgetting? by Oscar C Gonzalez, Yury Sokolov, Giri Krishnan and Maxim Bazhenov. bioRxiv, 569038, 2019.
@article{gonzalez2019,
author = {Gonzalez, Oscar C and Sokolov, Yury and Krishnan, Giri and Bazhenov, Maxim},
doi = {10.1101/569038},
journal = {bioRxiv},
keywords = {catastrophic,continual learning,memory consolidation,neural network,sleep},
pages = {569038},
title = {Can Sleep Protect Memories from Catastrophic Forgetting?},
url = {https://www.biorxiv.org/content/10.1101/569038v1},
year = {2019}
}
Previously encoded memories can be damaged by encoding of new memories, especially when they are relevant to the new data and hence can be disrupted by new training - a phenomenon called "catastrophic forgetting." Human and animal brains are capable of continual learning, allowing them to learn from past experience and to integrate newly acquired information with previously stored memories. A range of empirical data suggest important role of sleep in consolidation of recent memories and protection of the past knowledge from catastrophic forgetting. To explore potential mechanisms of how sleep can enable continual learning in neuronal networks, we developed a biophysically-realistic thalamocortical network model where we could train multiple memories with different degree of interference. We found that in a wake-like state of the model, training of a "new" memory that overlaps with previously stored "old" memory results in degradation of the old memory. Simulating NREM sleep state immediately after new learning led to replay of both old and new memories - this protected old memory from forgetting and ultimately enhanced both memories. The effect of sleep was similar to the interleaved training of the old and new memories. The study revealed that the network slow-wave oscillatory activity during simulated deep sleep leads to a complex reorganization of the synaptic connectivity matrix that maximizes separation between groups of synapses responsible for conflicting memories in the overlapping population of neurons. The study predicts that sleep may play a protective role against catastrophic forgetting and enables brain networks to undergo continual learning.
N.A.
2019Synaptic Consolidation: An Approach to Long-Term Learning by and Claudia Clopath. Cognitive Neurodynamics, 251–257, 2012. [hebbian]
@article{clopath2012,
author = {Clopath, Claudia},
doi = {10.1007/s11571-011-9177-6},
isbn = {1871-4080 (Print)},
issn = {1871-4080},
journal = {Cognitive Neurodynamics},
keywords = {[hebbian],Behavior,Electrophysiology,Model,Review,Synaptic consolidation,Synaptic plasticity,Synaptic tagging},
number = {3},
pages = {251--257},
pmid = {23730356},
title = {Synaptic Consolidation: An Approach to Long-Term Learning},
url = {http://link.springer.com/10.1007/s11571-011-9177-6},
volume = {6},
year = {2012}
}
Synaptic plasticity is thought to be the basis of learning and memory, but it is mostly studied on the timescale of mere minutes. This review discusses synaptic consolidation, a process that enables synapses to retain their strength for a much longer time (days to years), instead of returning to their original value. The process involves specific plasticity-related proteins, and depends on the dopamine D1/D5 receptors. Here, we review the research on synaptic consolidation, describing electrophysiology experiments, recent modeling work, as well as behavioral correlates.
N.A.
2012The Organization of Behavior: A Neuropsychological Theory by and D O Hebb. Lawrence Erlbaum, 2002. [hebbian]
@book{hebb2002,
author = {Hebb, D O},
isbn = {978-1-135-63191-8},
journal = {Lawrence Erlbaum},
keywords = {[hebbian],Psychology / Cognitive Psychology & Cognition,Psychology / General,Psychology / Neuropsychology,Psychology / Physiological Psychology},
language = {en},
publisher = {Psychology Press},
shorttitle = {The Organization of Behavior},
title = {The Organization of Behavior: A Neuropsychological Theory},
url = {https://www.amazon.com/Organization-Behavior-Neuropsychological-Theory/dp/0805843000 https://books.google.it/books/about/The_Organization_of_Behavior.html?id=ddB4AgAAQBAJ&printsec=frontcover&source=kp_read_button&redir_esc=y#v=onepage&q&f=false},
year = {2002}
}
Since its publication in 1949, D.O. Hebb's, The Organization of Behavior has been one of the most influential books in the fields of psychology and neuroscience. However, the original edition has been unavailable since 1966, ensuring that Hebb's comment that a classic normally means "cited but not read" is true in his case. This new edition rectifies a long-standing problem for behavioral neuroscientists– the inability to obtain one of the most cited publications in the field. The Organization of Behavior played a significant part in stimulating the investigation of the neural foundations of behavior and continues to be inspiring because it provides a general framework for relating behavior to synaptic organization through the dynamics of neural networks. D.O. Hebb was also the first to examine the mechanisms by which environment and experience can influence brain structure and function, and his ideas formed the basis for work on enriched environments as stimulants for behavioral development. References to Hebb, the Hebbian cell assembly, the Hebb synapse, and the Hebb rule increase each year. These forceful ideas of 1949 are now applied in engineering, robotics, and computer science, as well as neurophysiology, neuroscience, and psychology– a tribute to Hebb's foresight in developing a foundational neuropsychological theory of the organization of behavior.
N.A.
2002Negative Transfer Errors in Sequential Cognitive Skills: Strong-but-Wrong Sequence Application. by Dan J. Woltz, Michael K. Gardner and Brian G. Bell. Journal of Experimental Psychology: Learning, Memory, and Cognition, 601–625, 2000.
@article{woltz2000,
author = {Woltz, Dan J. and Gardner, Michael K. and Bell, Brian G.},
doi = {10.1037/0278-7393.26.3.601},
issn = {1939-1285},
journal = {Journal of Experimental Psychology: Learning, Memory, and Cognition},
language = {eng},
number = {3},
pages = {601--625},
shorttitle = {Negative Transfer Errors in Sequential Cognitive s},
title = {Negative Transfer Errors in Sequential Cognitive Skills: Strong-but-Wrong Sequence Application.},
url = {http://doi.apa.org/getdoi.cfm?doi=10.1037/0278-7393.26.3.601},
volume = {26},
year = {2000}
}
Three experiments investigated the role of processing sequence knowledge in negative transfer within multistep cognitive skills. In Experiments 1 and 2, more training resulted in higher error rates when new processing sequences that resembled familiar ones were introduced in transfer. Transfer error responses were executed with the same speed as correct responses to familiar sequence trials, and the errors appeared to be undetected by the performers. Experiment 3 tested whether the effects of sequence learning were attributable to explicit or implicit knowledge of processing sequences. Evidence favored the implicit learning interpretation. Findings are discussed in relationship to earlier demonstrations of the einstellung effect and to current taxonomic theories of human error.
N.A.
2000Connectionist Models of Recognition Memory: Constraints Imposed by Learning and Forgetting Functions. by and R Ratcliff. Psychological review, 285–308, 1990.
@article{ratcliff1990,
author = {Ratcliff, R},
issn = {0033-295X},
journal = {Psychological review},
number = {2},
pages = {285--308},
pmid = {2186426},
title = {Connectionist Models of Recognition Memory: Constraints Imposed by Learning and Forgetting Functions.},
url = {http://www.ncbi.nlm.nih.gov/pubmed/2186426},
volume = {97},
year = {1990}
}
Multilayer connectionist models of memory based on the encoder model using the backpropagation learning rule are evaluated. The models are applied to standard recognition memory procedures in which items are studied sequentially and then tested for retention. Sequential learning in these models leads to 2 major problems. First, well-learned information is forgotten rapidly as new information is learned. Second, discrimination between studied items and new items either decreases or is nonmonotonic as a function of learning. To address these problems, manipulations of the network within the multilayer model and several variants of the multilayer model were examined, including a model with prelearned memory and a context model, but none solved the problems. The problems discussed provide limitations on connectionist models applied to human memory and in tasks where information to be learned is not all available during learning.
N.A.
1990
Others¶
27 papers
In this section we list all the other papers not appearing in at least one of the above sections.
Continual Learning Using Task Conditional Neural Networks by Honglin Li, Payam Barnaghi, Shirin Enshaeifar and Frieder Ganz. arXiv, 2020. [cifar] [mnist]
@article{li2020,
annotation = {_eprint: 2005.05080},
author = {Li, Honglin and Barnaghi, Payam and Enshaeifar, Shirin and Ganz, Frieder},
journal = {arXiv},
keywords = {[cifar],[mnist]},
title = {Continual Learning Using Task Conditional Neural Networks},
url = {http://arxiv.org/abs/2005.05080},
year = {2020}
}
Conventional deep learning models have limited capacity in learning multiple tasks sequentially. The issue of forgetting the previously learned tasks in continual learning is known as catastrophic forgetting or interference. When the input data or the goal of learning change, a continual model will learn and adapt to the new status. However, the model will not remember or recognise any revisits to the previous states. This causes performance reduction and re-training curves in dealing with periodic or irregularly reoccurring changes in the data or goals. The changes in goals or data are referred to as new tasks in a continual learning model. Most of the continual learning methods have a task-known setup in which the task identities are known in advance to the learning model. We propose Task Conditional Neural Networks (TCNN) that does not require to known the reoccurring tasks in advance. We evaluate our model on standard datasets using MNIST and CIFAR10, and also a real-world dataset that we have collected in a remote healthcare monitoring study (i.e. TIHM dataset). The proposed model outperforms the state-of-the-art solutions in continual learning and adapting to new tasks that are not defined in advance.
N.A.
2020Energy-Based Models for Continual Learning by Shuang Li, Yilun Du, Gido M. van de Ven, Antonio Torralba and Igor Mordatch. arXiv, 2020. [cifar] [experimental] [mnist]
@article{li2020a,
annotation = {_eprint: 2011.12216},
author = {Li, Shuang and Du, Yilun and van de Ven, Gido M. and Torralba, Antonio and Mordatch, Igor},
journal = {arXiv},
keywords = {[cifar],[experimental],[mnist]},
note = {The paper introduces Energy-Based models for classification in single incremental task + new classes (i.e. class incremental) scenarios. The model does not require task labels at test time, nor task boundaries at training time. It does not make use of replay.},
title = {Energy-Based Models for Continual Learning},
url = {http://arxiv.org/abs/2011.12216},
year = {2020}
}
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs have a natural way to support a dynamically-growing number of tasks or classes that causes less interference with previously learned information. We find that EBMs outperform the baseline methods by a large margin on several continual learning benchmarks. We also show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a class of models naturally inclined towards the continual learning regime.
N.A.
2020Continual Universal Object Detection by Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika and Stefano Soatto. arXiv, 2020.
@article{liu2020,
annotation = {_eprint: 2002.05347},
author = {Liu, Xialei and Yang, Hao and Ravichandran, Avinash and Bhotika, Rahul and Soatto, Stefano},
journal = {arXiv},
title = {Continual Universal Object Detection},
url = {http://arxiv.org/abs/2002.05347},
year = {2020}
}
Object detection has improved significantly in recent years on multiple challenging benchmarks. However, most existing detectors are still domain-specific, where the models are trained and tested on a single domain. When adapting these detectors to new domains, they often suffer from catastrophic forgetting of previous knowledge. In this paper, we propose a continual object detector that can learn sequentially from different domains without forgetting. First, we explore learning the object detector continually in different scenarios across various domains and categories. Learning from the analysis, we propose attentive feature distillation leveraging both bottom-up and top-down attentions to mitigate forgetting. It takes advantage of attention to ignore the noisy background information and feature distillation to provide strong supervision. Finally, for the most challenging scenarios, we propose an adaptive exemplar sampling method to leverage exemplars from previous tasks for less forgetting effectively. The experimental results show the excellent performance of our proposed method in three different scenarios across seven different object detection datasets.
N.A.
2020Mnemonics Training: Multi-Class Incremental Learning without Forgetting by Yaoyao Liu, An-An Liu, Yuting Su, Bernt Schiele and Qianru Sun. arXiv, 2020. [cifar] [imagenet]
@article{liu2020a,
annotation = {_eprint: 2002.10211},
author = {Liu, Yaoyao and Liu, An-An and Su, Yuting and Schiele, Bernt and Sun, Qianru},
journal = {arXiv},
keywords = {[cifar],[imagenet]},
title = {Mnemonics Training: Multi-Class Incremental Learning without Forgetting},
url = {http://arxiv.org/abs/2002.10211},
year = {2020}
}
Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between different classes.
N.A.
2020Structured Compression and Sharing of Representational Space for Continual Learning by Gobinda Saha, Isha Garg, Aayush Ankit and Kaushik Roy. arXiv, 2020. [cifar] [mnist]
@article{saha2020,
annotation = {_eprint: 2001.08650},
author = {Saha, Gobinda and Garg, Isha and Ankit, Aayush and Roy, Kaushik},
journal = {arXiv},
keywords = {[cifar],[mnist]},
title = {Structured Compression and Sharing of Representational Space for Continual Learning},
url = {http://arxiv.org/abs/2001.08650},
year = {2020}
}
Humans are skilled at learning adaptively and efficiently throughout their lives, but learning tasks incrementally causes artificial neural networks to overwrite relevant information learned about older tasks, resulting in 'Catastrophic Forgetting'. Efforts to overcome this phenomenon suffer from poor utilization of resources in many ways, such as through the need to save older data or parametric importance scores, or to grow the network architecture. We propose an algorithm that enables a network to learn continually and efficiently by partitioning the representational space into a Core space, that contains the condensed information from previously learned tasks, and a Residual space, which is akin to a scratch space for learning the current task. The information in the Residual space is then compressed using Principal Component Analysis and added to the Core space, freeing up parameters for the next task. We evaluate our algorithm on P-MNIST, CIFAR-10 and CIFAR-100 datasets. We achieve comparable accuracy to state-of-the-art methods while overcoming the problem of catastrophic forgetting completely. Additionally, we get up to 4.5x improvement in energy efficiency during inference due to the structured nature of the resulting architecture.
N.A.
2020Lifelong Graph Learning by Chen Wang, Yuheng Qiu and Sebastian Scherer. arXiv, 2020. [graph]
@article{wang2020,
annotation = {_eprint: 2009.00647},
author = {Wang, Chen and Qiu, Yuheng and Scherer, Sebastian},
journal = {arXiv},
keywords = {[graph]},
title = {Lifelong Graph Learning},
url = {http://arxiv.org/abs/2009.00647},
year = {2020}
}
Graph neural networks are powerful models for many graph-structured tasks. In this paper, we aim to solve the problem of lifelong learning for graph neural networks. One of the main challenges is the effect of "catastrophic forgetting" for continuously learning a sequence of tasks, as the nodes can only be present to the model once. Moreover, the number of nodes changes dynamically in lifelong learning and this makes many graph models and sampling strategies inapplicable. To solve these problems, we construct a new graph topology, called the feature graph. It takes features as new nodes and turns nodes into independent graphs. This successfully converts the original problem of node classification to graph classification. In this way, the increasing nodes in lifelong learning can be regarded as increasing training samples, which makes lifelong learning easier. We demonstrate that the feature graph achieves much higher accuracy than the state-of-the-art methods in both data-incremental and class-incremental tasks. We expect that the feature graph will have broad potential applications for graph-structured tasks in lifelong learning.
N.A.
2020Superposition of Many Models into One by Brian Cheung, Alex Terekhov, Yubei Chen, Pulkit Agrawal and Bruno Olshausen. arXiv, 2019. [cifar] [mnist]
@article{cheung2019,
annotation = {_eprint: 1902.05522},
author = {Cheung, Brian and Terekhov, Alex and Chen, Yubei and Agrawal, Pulkit and Olshausen, Bruno},
journal = {arXiv},
keywords = {[cifar],[mnist]},
title = {Superposition of Many Models into One},
url = {http://arxiv.org/abs/1902.05522},
year = {2019}
}
We present a method for storing multiple models within a single set of parameters. Models can coexist in superposition and still be retrieved individually. In experiments with neural networks, we show that a surprisingly large number of models can be effectively stored within a single parameter instance. Furthermore, each of these models can undergo thousands of training steps without significantly interfering with other models within the superposition. This approach may be viewed as the online complement of compression: rather than reducing the size of a network after training, we make use of the unrealized capacity of a network during training.
N.A.
2019Continual Learning in Practice by Tom Diethe, Tom Borchert, Eno Thereska, Borja Balle and Neil Lawrence. arXiv, 2019.
@article{diethe2019,
annotation = {_eprint: 1903.05202},
author = {Diethe, Tom and Borchert, Tom and Thereska, Eno and Balle, Borja and Lawrence, Neil},
journal = {arXiv},
title = {Continual Learning in Practice},
url = {http://arxiv.org/abs/1903.05202},
year = {2019}
}
This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.
N.A.
2019Dynamically Constraining Connectionist Networks to Produce Distributed, Orthogonal Representations to Reduce Catastrophic Interference by and Robert French. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, 335–340, 2019.
@incollection{french2019,
author = {French, Robert},
booktitle = {Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society},
doi = {10.4324/9781315789354-58},
edition = {First},
editor = {Ram, Ashwin and Eiselt, Kurt},
isbn = {978-1-315-78935-4},
keywords = {sparsity},
language = {en},
pages = {335--340},
publisher = {Routledge},
title = {Dynamically Constraining Connectionist Networks to Produce Distributed, Orthogonal Representations to Reduce Catastrophic Interference},
url = {https://www.taylorfrancis.com/books/9781317729266/chapters/10.4324/9781315789354-58},
year = {2019}
}
It is well known that when a connectionist network is trained on one set of patterns and then attempts to add new patterns to its repertoire, catastrophic interference may result. The use of sparse, orthogonal hidden-layer representations has been shown to reduce catastrophic interference. The author demonstrates that the use of sparse representations may, in certain cases, actually result in worse performance on catastrophic interference. This paper argues for the necessity of maintaining hidden-layer representations that are both as highly distributed and as highly orthogonal as possible. The author presents a learning algorithm, called context-biasing, that dynamically solves the problem of constraining hiddenlayer representations to simultaneously produce good orthogonality and distributedness. On the data tested for this study, context-biasing is shown to reduce catastrophic interference by more than 50% compared to standard backpropagation. In particular, this technique succeeds in reducing catastrophic interference on data where sparse, orthogonal distributions failed to produce any improvement.
N.A.
2019Continual Learning via Neural Pruning by Siavash Golkar, Michael Kagan and Kyunghyun Cho. arXiv, 2019. [cifar] [mnist] [sparsity]
@article{golkar2019,
author = {Golkar, Siavash and Kagan, Michael and Cho, Kyunghyun},
journal = {arXiv},
keywords = {[cifar],[mnist],[sparsity],Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computi,Quantitative Biology - Neurons and Cognition,Statistics - Machine Learning},
note = {Comment: 12 pages, 5 figures, 3 tables arXiv: 1903.04476},
title = {Continual Learning via Neural Pruning},
url = {http://arxiv.org/abs/1903.04476},
year = {2019}
}
We introduce Continual Learning via Neural Pruning (CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model sparsity and performance, we formalize and incorporate the concept of graceful forgetting: the idea that it is preferable to suffer a small amount of forgetting in a controlled manner if it helps regain network capacity and prevents uncontrolled loss of performance during the training of future tasks. CLNP also provides simple continual learning diagnostic tools in terms of the number of free neurons left for the training of future tasks as well as the number of neurons that are being reused. In particular, we see in experiments that CLNP verifies and automatically takes advantage of the fact that the features of earlier layers are more transferable. We show empirically that CLNP leads to significantly improved results over current weight elasticity based methods.
N.A.
2019BooVAE: A Scalable Framework for Continual VAE Learning under Boosting Approach by Anna Kuzina, Evgenii Egorov and Evgeny Burnaev. arXiv, 2019. [bayes] [fashion] [mnist]
@article{kuzina2019,
annotation = {_eprint: 1908.11853},
author = {Kuzina, Anna and Egorov, Evgenii and Burnaev, Evgeny},
journal = {arXiv},
keywords = {[bayes],[fashion],[mnist]},
title = {BooVAE: A Scalable Framework for Continual VAE Learning under Boosting Approach},
url = {http://arxiv.org/abs/1908.11853},
year = {2019}
}
Variational Auto Encoders (VAE) are capable of generating realistic images, sounds and video sequences. From practitioners point of view, we are usually interested in solving problems where tasks are learned sequentially, in a way that avoids revisiting all previous data at each stage. We address this problem by introducing a conceptually simple and scalable end-to-end approach of incorporating past knowledge by learning prior directly from the data. We consider scalable boosting-like approximation for intractable theoretical optimal prior. We provide empirical studies on two commonly used benchmarks, namely MNIST and Fashion MNIST on disjoint sequential image generation tasks. For each dataset proposed method delivers the best results or comparable to SOTA, avoiding catastrophic forgetting in a fully automatic way.
N.A.
2019Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild by Kibok Lee, Kimin Lee, Jinwoo Shin and Honglak Lee. Proceedings of the IEEE International Conference on Computer Vision, 312–321, 2019.
@inproceedings{lee2019,
annotation = {_eprint: 1903.12648},
author = {Lee, Kibok and Lee, Kimin and Shin, Jinwoo and Lee, Honglak},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
doi = {10.1109/ICCV.2019.00040},
isbn = {978-1-72814-803-8},
issn = {15505499},
pages = {312--321},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
title = {Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild},
url = {http://arxiv.org/abs/1903.12648},
volume = {2019-Octob},
year = {2019}
}
Lifelong learning with deep neural networks is well-known to suffer from catastrophic forgetting: The performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a large stream of unlabeled data easily obtainable in the wild. In particular, we design a novel class-incremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a confidence-based sampling method to effectively leverage unlabeled external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: Our method shows up to 15.8% higher accuracy and 46.5% less forgetting compared to the state-of-the-art method. The code is available at https://github.com/kibok90/iccv2019-inc.
N.A.
2019Continual Learning Using Bayesian Neural Networks by HongLin Li, Payam Barnaghi, Shirin Enshaeifar and Frieder Ganz. arXiv, 2019. [bayes] [mnist]
@article{li2019,
annotation = {_eprint: 1910.04112},
author = {Li, HongLin and Barnaghi, Payam and Enshaeifar, Shirin and Ganz, Frieder},
journal = {arXiv},
keywords = {[bayes],[mnist],Bayesian neural networks,continual learning,in-cremental learning,Index Terms-Catastrophic forgetting,uncertainty},
title = {Continual Learning Using Bayesian Neural Networks},
url = {http://arxiv.org/abs/1910.04112},
year = {2019}
}
Continual learning models allow to learn and adapt to new changes and tasks over time. However, in continual and sequential learning scenarios in which the models are trained using different data with various distributions, neural networks tend to forget the previously learned knowledge. This phenomenon is often referred to as catastrophic forgetting. The catastrophic forgetting is an inevitable problem in continual learning models for dynamic environments. To address this issue, we propose a method, called Continual Bayesian Learning Networks (CBLN), which enables the networks to allocate additional resources to adapt to new tasks without forgetting the previously learned tasks. Using a Bayesian Neural Network, CBLN maintains a mixture of Gaussian posterior distributions that are associated with different tasks. The proposed method tries to optimise the number of resources that are needed to learn each task and avoids an exponential increase in the number of resources that are involved in learning multiple tasks. The proposed method does not need to access the past training data and can choose suitable weights to classify the data points during the test time automatically based on an uncertainty criterion. We have evaluated our method on the MNIST and UCR time-series datasets. The evaluation results show that our method can address the catastrophic forgetting problem at a promising rate compared to the state-of-the-art models.
N.A.
2019Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition by Martin Mundt, Sagnik Majumder, Iuliia Pliushch, Yong Won Hong and Visvanathan Ramesh. arXiv, 2019. [audio] [bayes] [fashion] [framework] [generative] [mnist] [vision]
@article{mundt2019,
annotation = {_eprint: 1905.12019},
author = {Mundt, Martin and Majumder, Sagnik and Pliushch, Iuliia and Hong, Yong Won and Ramesh, Visvanathan},
journal = {arXiv},
keywords = {[audio],[bayes],[fashion],[framework],[generative],[mnist],[vision]},
title = {Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition},
url = {http://arxiv.org/abs/1905.12019},
year = {2019}
}
We introduce a probabilistic approach to unify open set recognition with the prevention of catastrophic forgetting in deep continual learning, based on variational Bayesian inference. Our single model combines a joint probabilistic encoder with a generative model and a linear classifier that get shared across sequentially arriving tasks. In order to successfully distinguish unseen unknown data from trained known tasks, we propose to bound the class specific approximate posterior by fitting regions of high density on the basis of correctly classified data points. These bounds are further used to significantly alleviate catastrophic forgetting by avoiding samples from low density areas in generative replay. Our approach requires neither storing of old, nor upfront knowledge of future data, and is empirically validated on visual and audio tasks in class incremental, as well as cross-dataset scenarios across modalities.
N.A.
2019Continual Rare-Class Recognition with Emerging Novel Subclasses by Hung Nguyen, Xuejian Wang and Leman Akoglu. ECML, 2019. [nlp]
@inproceedings{nguyen2019,
annotation = {_eprint: 1906.12218},
author = {Nguyen, Hung and Wang, Xuejian and Akoglu, Leman},
booktitle = {ECML},
keywords = {[nlp]},
title = {Continual Rare-Class Recognition with Emerging Novel Subclasses},
url = {http://arxiv.org/abs/1906.12218},
year = {2019}
}
Given a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We introduce RaRecognize, which (i) estimates a general decision boundary between the rare and the majority class, (ii) learns to recognize individual rare subclasses that exist within the training data, as well as (iii) flags instances from previously unseen rare subclasses as newly emerging. The learner in (i) is general in the sense that by construction it is dissimilar to the specialized learners in (ii), thus distinguishes minority from the majority without overly tuning to what is seen in the training data. Thanks to this generality, RaRecognize ignores all future instances that it labels as majority and recognizes the recurrent as well as emerging rare subclasses only. This saves effort at test time as well as ensures that the model size grows moderately over time as it only maintains specialized minority learners. Through extensive experiments, we show that RaRecognize outperforms state-of-the art baselines on three real-world datasets that contain corporate-risk and disaster documents as rare classes.
N.A.
2019Random Path Selection for Incremental Learning by Jathushan Rajasegaran, Munawar Hayat, Salman Khan Fahad, Shahbaz Khan and Ling Shao. NeurIPS, 12669–12679, 2019. [cifar] [imagenet] [mnist]
@inproceedings{rajasegaran2019,
author = {Rajasegaran, Jathushan and Hayat, Munawar and Fahad, Salman Khan and Khan, Shahbaz and Shao, Ling},
booktitle = {NeurIPS},
keywords = {[cifar],[imagenet],[mnist]},
pages = {12669--12679},
title = {Random Path Selection for Incremental Learning},
url = {http://papers.nips.cc/paper/9429-random-path-selection-for-continual-learning.pdf},
year = {2019}
}
Incremental lifelong learning is a main challenge towards the long-standing goal of Artificial General Intelligence. In real-life settings, learning tasks arrive in a sequence and machine learning models must continually learn to increment already acquired knowledge. Existing incremental learning approaches, fall well below the state-of-the-art cumulative models that use all training classes at once. In this paper, we propose a random path selection algorithm, called RPS-Net, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing. Since the reuse of previous paths enables forward knowledge transfer, our approach requires a considerably lower computational overhead. As an added novelty, the proposed model integrates knowledge distillation and retrospection along with the path selection strategy to overcome catastrophic forgetting. In order to maintain an equilibrium between previous and newly acquired knowledge, we propose a simple controller to dynamically balance the model plasticity. Through extensive experiments, we demonstrate that the proposed method surpasses the state-of-the-art performance on incremental learning and by utilizing parallel computation this method can run in constant time with nearly the same efficiency as a conventional deep convolutional neural network.
N.A.
2019Improving and Understanding Variational Continual Learning by Siddharth Swaroop, Cuong V Nguyen, Thang D Bui and Richard E Turner. Continual Learning Workshop NeurIPS, 1–17, 2019. [bayes] [mnist]
@article{swaroop2019,
annotation = {_eprint: 1905.02099},
author = {Swaroop, Siddharth and Nguyen, Cuong V and Bui, Thang D and Turner, Richard E},
journal = {Continual Learning Workshop NeurIPS},
keywords = {[bayes],[mnist]},
pages = {1--17},
title = {Improving and Understanding Variational Continual Learning},
url = {http://arxiv.org/abs/1905.02099},
year = {2019}
}
In the continual learning setting, tasks are encountered sequentially. The goal is to learn whilst i) avoiding catastrophic forgetting, ii) efficiently using model capacity, and iii) employing forward and backward transfer learning. In this paper, we explore how the Variational Continual Learning (VCL) framework achieves these desiderata on two benchmarks in continual learning: split MNIST and permuted MNIST. We first report significantly improved results on what was already a competitive approach. The improvements are achieved by establishing a new best practice approach to mean-field variational Bayesian neural networks. We then look at the solutions in detail. This allows us to obtain an understanding of why VCL performs as it does, and we compare the solution to what an `ideal' continual learning solution might be.
N.A.
2019Continual Learning via Online Leverage Score Sampling by Dan Teng and Sakyasingha Dasgupta. arXiv, 2019. [cifar] [mnist]