A Novel Approach to Detect Redundant Activity Labels For More Representative Event Logs

Qifan Chen, Yang Lu, Charmaine Tam, Simon Poon

The insights revealed from process mining heavily rely on the quality of event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy of activity labels, which refer to labels that have different syntax but share the same behaviours. The identifications of these labels from data-driven process discovery are difficult and would rely heavily on human intervention. In this paper, we propose an approach to detect redundant activity labels using control-flow relations and data values from event logs. We have evaluated our approach using two publicly available logs and also a case study using the MIMIC-III data set. The results demonstrate that our approach can detect redundant activity labels even with low occurrence frequencies. This approach can value-add to the preprocessing step to generate more representative event logs for process mining tasks.

Knowledge Graph



Sign up or login to leave a comment