Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficient. We refer to such a model as a System of Bags. In this work, we investigate Systems of Bags with special emphasis on dynamic scenes and dynamic textures. Parameters of linear dynamic systems suffer from ambiguities. In order to cope with these ambiguities in the kernelized setting, we develop a kernelized version of the alignment distance. For its computation, we use a Jacobi-type method and prove its convergence to a set of critical points. We employ it as a dissimilarity measure on Systems of Bags. As such, it outperforms other known dissimilarity measures for kernelized linear dynamic systems, in particular the Martin Distance and the Maximum Singular Value Distance, in every tested classification setting. A considerable margin can be observed in settings, where classification is performed with respect to an abstract mean of video sets. For this scenario, the presented approach can outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or Orthogonal Tensor Dictionary Learning.