摘要
With the rapid development of deep learning,neural network models have achieved remarkable performance.However,their large scale and high computational demands still limit widespread deployment.Therefore,model compression techniques have emerged,aiming to reduce computational complexity,memory usage,and energy overhead while meeting practical deployment needs without sacrificing model performance.This paper provides a systematic overview of recent advances in model compression,with particular emphasis on large-scale models such as vision-language models(VLMs)and large language models(LLMs).We firstly clarify the core concepts and objectives of model compression,then outline persistent challenges,including limited adaptability across different model variants and the trade-off between efficiency and accuracy.We examine five mainstream compression methods in detail:pruning,quantization,knowledge distillation,low-rank factorization,and parameter sharing.For each method,we analyze the guiding principles,strengths,weaknesses,and representative application scenarios.We also present a comparative analysis of trade-offs among compression ratio,accuracy retention,and computational efficiency.Moreover,we review commonly-used benchmarks such as Image Net,Wiki Text,GLUE,and MMLU,along with metrics for evaluating effectiveness in both vision and language tasks.Finally,we outline promising future directions,including automated pipelines,hybrid strategies,hardware-aware optimization,and cross-domain adaptability.This survey provides a comprehensive overview and roadmap for advancing model compression research.