As malware techniques evolve,threat actors continuously refine their code with evasion and anti-analysis strategies,making sandbox-based cyber threat intelligence(CTI)data collection essential for analyzing malicious ...As malware techniques evolve,threat actors continuously refine their code with evasion and anti-analysis strategies,making sandbox-based cyber threat intelligence(CTI)data collection essential for analyzing malicious behaviors.However,no prior research has systematically examined the relationship between execution time and intelligence data completeness,nor its impact on intelligence data fidelity.Existing sandbox configurations typically rely on predefined execution time thresholds without empirical justification,potentially leading to premature termination of critical behaviors or excessive computational overhead.To address this gap,we analyze malware execution dynamics through system calls,code execution,and data entry access patterns mapped within the MITRE ATT&CK framework.Leveraging Extreme Value Theory(EVT),we model the probabilistic distribution of intelligence data extraction over time,enabling us to estimate the likelihood of acquiring additional intelligence data as execution progresses.Our analysis reveals that the probability of obtaining new intelligence data decreases with time.Specifically,at a 95%confidence level,the probability of acquiring additional intelligence data after three minutes is 0.092,and after five minutes is 0.074,indicating a diminishing rate of intelligence extraction over extended execution periods.These findings indicate that extending execution time beyond a specific threshold provides limited additional intelligence data,highlighting the importance of determining an optimal execution time.By introducing an empirical framework for optimizing sandbox execution time in intelligence data extraction,we introduce a quantitative and principled execution model,providing a scientifically grounded methodology for malware analysis.Our findings provide a foundation for future research in adaptive threat intelligence data collection,enabling a data-driven approach to execution time selection in large-scale security operations.展开更多
基金Professor Huang Hao from Nanjing University for the financial support of this project and for his valuable guidance during the writing of this paper.
文摘As malware techniques evolve,threat actors continuously refine their code with evasion and anti-analysis strategies,making sandbox-based cyber threat intelligence(CTI)data collection essential for analyzing malicious behaviors.However,no prior research has systematically examined the relationship between execution time and intelligence data completeness,nor its impact on intelligence data fidelity.Existing sandbox configurations typically rely on predefined execution time thresholds without empirical justification,potentially leading to premature termination of critical behaviors or excessive computational overhead.To address this gap,we analyze malware execution dynamics through system calls,code execution,and data entry access patterns mapped within the MITRE ATT&CK framework.Leveraging Extreme Value Theory(EVT),we model the probabilistic distribution of intelligence data extraction over time,enabling us to estimate the likelihood of acquiring additional intelligence data as execution progresses.Our analysis reveals that the probability of obtaining new intelligence data decreases with time.Specifically,at a 95%confidence level,the probability of acquiring additional intelligence data after three minutes is 0.092,and after five minutes is 0.074,indicating a diminishing rate of intelligence extraction over extended execution periods.These findings indicate that extending execution time beyond a specific threshold provides limited additional intelligence data,highlighting the importance of determining an optimal execution time.By introducing an empirical framework for optimizing sandbox execution time in intelligence data extraction,we introduce a quantitative and principled execution model,providing a scientifically grounded methodology for malware analysis.Our findings provide a foundation for future research in adaptive threat intelligence data collection,enabling a data-driven approach to execution time selection in large-scale security operations.