Modern dense Flash memory devices operate at very low error rates, which require powerful error correcting coding (ECC) techniques. An emerging class of graph-based ECC techniques that has broad applications is the class of spatially-coupled (SC) codes, where a block code is partitioned into components that are then rewired multiple times to construct an SC code. Here, our focus is on SC codes with the underlying circulant-based structure. In this paper, we present a three-stage approach for the design of high performance non-binary SC (NB-SC) codes optimized for practical Flash channels; we aim at minimizing the number of detrimental general absorbing sets of type two (GASTs) in the graph of the designed NB-SC code. In the first stage, we deploy a novel partitioning mechanism, called the optimal overlap partitioning, which acts on the protograph of the SC code to produce optimal partitioning corresponding to the smallest number of detrimental objects. In the second stage, we apply a new circulant power optimizer to further reduce the number of detrimental GASTs. In the third stage, we use the weight consistency matrix framework to manipulate edge weights to eliminate as many as possible of the GASTs that remain in the NB-SC code after the first two stages (that operate on the unlabeled graph of the code). Simulation results reveal that NB-SC codes designed using our approach outperform state-of-the-art NB-SC codes when used over Flash channels.