Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making ...Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making heuristic optimization algorithms inadequate.To tackle these challenges,this paper proposes a new framework:the dueling-double-deep Q-network with twin state-value and action-advantage functions (D3QNTF).First,dual action-advantage and state-value functions are used to prevent overestimation of action values.Second,a method for random initialization of feasible solutions improves sample quality early in the optimization.Finally,a strict penalty for errors is added to the reward mechanism,making the agent more sensitive to and better at avoiding illegal actions,which reduces decision errors.Experimental results show that the proposed method outperforms state-of-the-art algorithms,including greedy algorithms,genetic algorithms,deep Q-networks,double deep Q-networks,and standard dueling-double-deep Q-networks.The findings highlight the potential of the D3QNTF framework for fiber allocation in optical cable production.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.52205519 and 62273264).
文摘Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making heuristic optimization algorithms inadequate.To tackle these challenges,this paper proposes a new framework:the dueling-double-deep Q-network with twin state-value and action-advantage functions (D3QNTF).First,dual action-advantage and state-value functions are used to prevent overestimation of action values.Second,a method for random initialization of feasible solutions improves sample quality early in the optimization.Finally,a strict penalty for errors is added to the reward mechanism,making the agent more sensitive to and better at avoiding illegal actions,which reduces decision errors.Experimental results show that the proposed method outperforms state-of-the-art algorithms,including greedy algorithms,genetic algorithms,deep Q-networks,double deep Q-networks,and standard dueling-double-deep Q-networks.The findings highlight the potential of the D3QNTF framework for fiber allocation in optical cable production.