Existing generative Zero-Shot Learning (ZSL) methods only consider the unidirectional alignment from the class semantics to the visual features while ignoring the alignment from the visual features to the class semantics, which fails to construct the visual-semantic interactions well. In this paper, we propose to synthesize visual features based on an auto-encoder framework paired with bi-adversarial networks respectively for visual and semantic modalities to reinforce the visual-semantic interactions with a bi-directional alignment, which ensures the synthesized visual features to fit the real visual distribution and to be highly related to the semantics. The encoder aims at synthesizing real-like visual features while the decoder forces both the real and the synthesized visual features to be more related to the class semantics. To further capture the discriminative information of the synthesized visual features, both the real and synthesized visual features are forced to be classified into the correct classes via a classification network. Experimental results on four benchmark datasets show that the proposed approach is particularly competitive on both the traditional ZSL and the generalized ZSL tasks.