The vision-language-action(VLA)paradigm is gradually becoming the core path of embodied intelligence.However,its training and validation,which rely on simulation environments,face serious sim2real challenges,such as n...The vision-language-action(VLA)paradigm is gradually becoming the core path of embodied intelligence.However,its training and validation,which rely on simulation environments,face serious sim2real challenges,such as navigation deviations in drones caused by wind speed differences between simulation and real-world environments.Existing iterative methods based on digital twins can alleviate the problem of virtual-real alignment to some extent.However,their high dependence on twin consistency limits their adaptability and scalability in complex environments.To break through this bottleneck,the PiVLA framework is proposed in this letter to reconstruct the VLA paradigm with parallel intelligence.Furthermore,we introduce the parallel deep foundation model(PDFM)and,based on it,propose model parallel control(MPC)and the parallel interaction protocol(PIP),establishing a unified interaction mechanism for disembodied agents and embodied agents.This provides a scalable and robust solution for complex tasks involving embodied intelligence.展开更多
基金supported by the Science and Technology Development Fund,Macao Special Administrative Region(Nos.0157/2024/RIA2,0145/2023/RIA3,and 0093/2023/RIA2).
文摘The vision-language-action(VLA)paradigm is gradually becoming the core path of embodied intelligence.However,its training and validation,which rely on simulation environments,face serious sim2real challenges,such as navigation deviations in drones caused by wind speed differences between simulation and real-world environments.Existing iterative methods based on digital twins can alleviate the problem of virtual-real alignment to some extent.However,their high dependence on twin consistency limits their adaptability and scalability in complex environments.To break through this bottleneck,the PiVLA framework is proposed in this letter to reconstruct the VLA paradigm with parallel intelligence.Furthermore,we introduce the parallel deep foundation model(PDFM)and,based on it,propose model parallel control(MPC)and the parallel interaction protocol(PIP),establishing a unified interaction mechanism for disembodied agents and embodied agents.This provides a scalable and robust solution for complex tasks involving embodied intelligence.