The experimental evaluation of algorithms results in a large set of data which generally do not follow a normal distribution or are not heteroscedastic. Besides, some of its entries may be missing, due to the inability of an algorithm to find a feasible solution until a time limit is met. Those characteristics restrict the statistical evaluation of computational experiments. This work proposes a bi-objective lexicographical ranking scheme to evaluate datasets with such characteristics. The output ranking can be used as input to any desired statistical test. We used the proposed ranking scheme to assess the results obtained by the Iterative Rounding heuristic (IR). A Friedman's test and a subsequent post-hoc test carried out on the ranked data demonstrated that IR performed significantly better than the Feasibility Pump heuristic when solving 152 benchmark problems of Nonconvex Mixed-Integer Nonlinear Problems. However, is also showed that the RECIPE heuristic was significantly better than IR when solving the same benchmark problems.