Electricity load forecasting enables the grid operators to optimally implement the smart grid's most essential features such as demand response and energy efficiency. Electricity demand profiles can vary drastically from one region to another on diurnal, seasonal and yearly scale. Hence to devise a load forecasting technique that can yield the best estimates on diverse datasets, specially when the training data is limited, is a big challenge. This paper presents a deep learning architecture for short-term load forecasting based on bidirectional sequential models in conjunction with feature engineering that extracts the hand-crafted derived features in order to aid the model for better learning and predictions. In the proposed architecture, named as Deep Derived Feature Fusion (DeepDeFF), the raw input and hand-crafted features are trained at separate levels and then their respective outputs are combined to make the final prediction. The efficacy of the proposed methodology is evaluated on datasets from five countries with completely different patterns. The results demonstrate that the proposed technique is superior to the existing state of the art.