While long short-term memory (LSTM) models have demonstrated stellar performance with streamflow predictions, there are major risks in applying these models in contiguous regions with no gauges, or predictions in ungauged regions (PUR) problems. However, softer data such as the flow duration curve (FDC) may be already available from nearby stations, or may become available. Here we demonstrate that sparse FDC data can be migrated and assimilated by an LSTM-based network, via an encoder. A stringent region-based holdout test showed a median Kling-Gupta efficiency (KGE) of 0.62 for a US dataset, substantially higher than previous state-of-the-art global-scale ungauged basin tests. The baseline model without FDC was already competitive (median KGE 0.56), but integrating FDCs had substantial value. Because of the inaccurate representation of inputs, the baseline models might sometimes produce catastrophic results. However, model generalizability was further meaningfully improved by compiling an ensemble based on models with different input selections.