The problem of catastrophic forgetting has a history of more than 30 years and has not been completely solved yet. Since the human brain has natural ability to perform continual lifelong learning, learning from the brain may provide solutions to this problem. In this paper, we propose a novel biologically plausible audio-visual integration model (AVIM) based on the assumption that the integration of audio and visual perceptual information in the medial temporal lobe during learning is crucial to form concepts and make continual learning possible. Specifically, we use multi-compartment Hodgkin-Huxley neurons to build the model and adopt the calcium-based synaptic tagging and capture as the model's learning rule. Furthermore, we define a new continual learning paradigm to simulate the possible continual learning process in the human brain. We then test our model under this new paradigm. Our experimental results show that the proposed AVIM can achieve state-of-the-art continual learning performance compared with other advanced methods such as OWM, iCaRL and GEM. Moreover, it can generate stable representations of objects during learning. These results support our assumption that concept formation is essential for continuous lifelong learning and suggest the proposed AVIM is a possible concept formation mechanism.