The performance of wake-up radios must be clearly measured and understood while designing and developing robust, dependable, and affordable systems, considering both benefits and shortcomings. State-of-the-art WURs display significant diversity in their architecture, processing capability, energy consumption, and receiver sensitivity. Standard methodologies for benchmarking are crucial for quantitatively evaluating the performance of this emerging technology, however, currently, no accepted standard for such quantitative measurement exists. Further, there is no consensus on what objective evaluation procedures and metrics should be used to understand the performance of whole systems exploiting this technology. This lack of standardization has prevented researchers from comparing results and leveraging previous work that could otherwise avoid duplication and speed up the validation process. This paper leads toward an evaluation framework, a benchmark, to enable accurate and repeatable profiling of WUR-based systems, leading to more consistent and therefore comparable evaluations for current and future systems.