While commercial mid-air gesture recognition systems have existed for at least a decade, they have not become a widespread method of interacting with machines. This is primarily due to the fact that these systems require rigid, dramatic gestures to be performed for accurate recognition that can be fatiguing and unnatural. The global pandemic has seen a resurgence of interest in touchless interfaces, so new methods that allow for natural mid-air gestural interactions are even more important. To address the limitations of recognition systems, we propose a neuromorphic gesture analysis system which naturally declutters the background and analyzes gestures at high temporal resolution. Our novel model consists of an event-based guided Variational Autoencoder (VAE) which encodes event-based data sensed by a Dynamic Vision Sensor (DVS) into a latent space representation suitable to analyze and compute the similarity of mid-air gesture data. Our results show that the features learned by the VAE provides a similarity measure capable of clustering and pseudo labeling of new gestures. Furthermore, we argue that the resulting event-based encoder and pseudo-labeling system are suitable for implementation in neuromorphic hardware for online adaptation and learning of natural mid-air gestures.