摘要
针对传统流程发现算法在处理大规模事件日志时的性能瓶颈问题,提出一种基于轨迹信息增量的日志采样方法,通过量化事件之间的直接跟随关系和轨迹的特征信息,将轨迹是否带有新的流程行为作为采样标准,基于统计理论确定了最小连续遍历样本数量。为了进一步提高预处理速度,提出二进制指数跳跃算法来避免扫描重复轨迹。通过4个真实事件日志的实验表明,所提采样方法可以快速有效地缩小事件日志的规模,并保留关键的控制流和频率信息,同时提高流程发现算法的运行速度。
To address the performance bottleneck of traditional process discovery algorithms in processing large-scale event logs, a log sampling method based on trace incremental information was proposed. This method quantified the directly follow relationship between events and the feature information of traces, takes whether a trace carries a new process behavior as the sampling criterion, and determined the minimum number of consecutive traversal samples based on statistical theory. To further improve the preprocessing speed, a binary exponential skip algorithm was proposed to avoid the scanning of duplicate traces. Experiments on four real-life event logs showed that the proposed sampling method could quickly and efficiently reduce the size of event logs and retain critical control flow and frequency information, while improving the running speed of process discovery algorithm.
作者
倪可
俞东进
孙笑笑
胡华
NI Ke;YU Dongjin;SUN Xiaoxiao;HU Hua(School of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China;Hangzhou Normal University,Hangzhou 311121,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2022年第10期3166-3174,共9页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金资助项目(61702144)
工信部工业互联网创新发展工程资助项目(TC200802G,TC2008033)
浙江省重点研发计划资助项目(2020C01165)
浙江省自然科学基金资助项目(LQ20F020017)。
关键词
流程发现
日志采样
事件日志
信息增量
流程模型
process discovery
log sampling
event log
incremental information
process model