Interaction plays a vital role during visual network exploration as users need to engage with both elements in the view (e.g., nodes, links) and interface controls (e.g., sliders, dropdown menus). Particularly as the size and complexity of a network grow, interactive displays supporting multimodal input (e.g., touch, speech, pen, gaze) exhibit the potential to facilitate fluid interaction during visual network exploration and analysis. While multimodal interaction with network visualization seems like a promising idea, many open questions remain. For instance, do users actually prefer multimodal input over unimodal input, and if so, why? Does it enable them to interact more naturally, or does having multiple modes of input confuse users? To answer such questions, we conducted a qualitative user study in the context of a network visualization tool, comparing speech- and touch-based unimodal interfaces to a multimodal interface combining the two. Our results confirm that participants strongly prefer multimodal input over unimodal input attributing their preference to: 1) the freedom of expression, 2) the complementary nature of speech and touch, and 3) integrated interactions afforded by the combination of the two modalities. We also describe the interaction patterns participants employed to perform common network visualization operations and highlight themes for future multimodal network visualization systems to consider.