期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
Optimizing Fine-Tuning in Quantized Language Models:An In-Depth Analysis of Key Variables
1
作者 ao shen Zhiquan Lai +1 位作者 Dongsheng Li Xiaoyu Hu 《Computers, Materials & Continua》 SCIE EI 2025年第1期307-325,共19页
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci... Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments. 展开更多
关键词 Large-scale Language Model Parameter-Efficient Fine-Tuning parameter quantization key variable trainable parameters experimental analysis
在线阅读 下载PDF
The interfacial reactions of Mg battery anodes
2
作者 Xuehong Luo ao shen +6 位作者 Bo Liu Junjie Wu Mengbiao Fan Na Yang Gaopeng Zhang Xi Chen Qingwei Dai 《Journal of Magnesium and Alloys》 2025年第5期1915-1938,共24页
Mg batteries have high energy density,economic safety,and environmental friendliness.They show great potential as an ideal energy storage technology.This review summarizes the limitations of Mg batteries and explores ... Mg batteries have high energy density,economic safety,and environmental friendliness.They show great potential as an ideal energy storage technology.This review summarizes the limitations of Mg batteries and explores the complex reactions at the Mg anode/electrolyte interface.It focuses on critical issues such as the dissolution of Mg anodes,the evolution of hydrogen gas,the formation of a passivation layer that hinders Mg^(2+)migration,and dendrite growth.To address these interface problems,the review discusses strategies to improve interface reactions.These include the structural design of Mg anodes,suitable substitute materials for the anode,and artificial solid electrolyte interphase films.Finally,it outlines the future research directions for the ideal Mg anode interfaces.The goal is to develop more efficient interface design schemes and optimization strategies to advance Mg battery technology further. 展开更多
关键词 MAGNESIUM BATTERY Magnesium anode Interface reaction
在线阅读 下载PDF
Microstructure design of advanced magnesium-air battery anodes 被引量:3
3
作者 Xu Huang Qingwei Dai +4 位作者 Qing Xiang Na Yang Gaopeng Zhang ao shen Wanming Li 《Journal of Magnesium and Alloys》 SCIE EI CAS CSCD 2024年第2期443-464,共22页
Metal-air battery is an environmental friendly energy storage system with unique open structure.Magnesium(Mg)and its alloys have been extensively attempted as anodes for air batteries due to high theoretical energy de... Metal-air battery is an environmental friendly energy storage system with unique open structure.Magnesium(Mg)and its alloys have been extensively attempted as anodes for air batteries due to high theoretical energy density,low cost,and recyclability.However,the study on Mg-air battery(MAB)is still at the laboratory level currently,mainly owing to the low anodic efficiency caused by the poor corrosion resistance.In order to reduce corrosion losses and achieve optimal utilization efficiency of Mg anode,the design strategies are reviewed from microstructure perspectives.Firstly,the corrosion behaviors have been discussed,especially the negative difference effect derived by hydrogen evolution.Special attention is given to the effect of anode micro-structures on the MAB,which includes grain size,grain orientation,second phases,crystal structure,twins,and dislocations.For further improvement,the discharge performance,long period stacking ordered phase and its enhancing effect are considered.Meanwhile,given the current debates over Mg dendrites,the potential risk,the impact on discharge,and the elimination strategies are discussed.Microstructure control and single crystal would be promising ways for MAB anode. 展开更多
关键词 MAGNESIUM Air battery ANODE MICROSTRUCTURE Anodic efficiency
在线阅读 下载PDF
SnS_(2) nanosheets decorated SnO_(2) hollow multishelled nanostructures for enhanced sensing of triethylamine gas
4
作者 Wen-Di Liu Ya Xiong +4 位作者 ao shen Xin-Zhen Wang Xiao Chang Wen-Bo Lu Jian Tian 《Rare Metals》 SCIE EI CAS CSCD 2024年第5期2339-2348,共10页
Highly sensitive,fast and low-temperature detection of triethylamine(TEA)gas based on SnO_(2) is attractive yet remains challenging.Herein,SnS_(2) nanosheets(NSs)/SnO_(2) hollow multishelled structures(HoMSs)-based se... Highly sensitive,fast and low-temperature detection of triethylamine(TEA)gas based on SnO_(2) is attractive yet remains challenging.Herein,SnS_(2) nanosheets(NSs)/SnO_(2) hollow multishelled structures(HoMSs)-based sensors are designed by the synthesis of SnO_(2) HoMSs,followed by in situ sulfuration with thioacetamide.By varying the thioacetamide levels,Sn S_(2)/SnO_(2) heterostructures with different SnS_(2) contents were obtained and their TEA gas sensing performances were investigated. 展开更多
关键词 SNS ATTRACTIVE SHEETS
原文传递
A zwitterionic polymer-inspired material mediated efficient CRISPR-Cas9 gene editing 被引量:1
5
作者 Lingmin Zhang Langyu Yang +7 位作者 Jionghua Huang sheng Chen Chuangjia Huang Yinshan Lin ao shen ZhouYikang Zheng Wenfu Zheng Shunqing Tang 《Asian Journal of Pharmaceutical Sciences》 SCIE CAS 2022年第5期666-678,共13页
The typeⅡ prokaryotic CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR/Cas9) adaptive immune system is a cutting-edge genome-editing toolbox.However,its applications are still limited b... The typeⅡ prokaryotic CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR/Cas9) adaptive immune system is a cutting-edge genome-editing toolbox.However,its applications are still limited by its inefficient transduction.Herein,we present a novel gene vector,the zwitterionic polymer-inspired material with branched structure (ZEBRA) for efficient CRISPR/Cas9 delivery.Polo-like kinase 1 (PLK1) acts as a master regulator of mitosis and overexpresses in multiple tumor cells.The Cas9 and single guide sgRNA (sgRNA)-encoded plasmid was transduced to knockout Plk1 gene,which was expected to inhibit the expression of PLK1.Our studies demonstrated that ZEBRA enabled to transduce the CRISPR/Cas9 system with large size into the cells efficiently.The transduction with ZEBRA was cell line dependent,which showed~10-fold higher in CD44-positive cancer cell lines compared with CD44-negative ones.Furthermore,ZEBRA induced highlevel expression of Cas9 proteins by the delivery of CRISPR/Cas9 and efficient gene editing of Plk1 gene,and inhibited the tumor cell growth significantly.This zwitterionic polymerinspired material is an effective and targeted gene delivery vector and further studies are required to explore its potential in gene delivery applications. 展开更多
关键词 CRISPR/Cas9 Gene editing Zwitterionic polymers CD44 PLK1
暂未订购
Increased 5-hydroxymethylcytosine is a favorable prognostic factor of Helicobacter pylori-negative gastric cancer patients 被引量:1
6
作者 Ying-Li Fu Yan-Hua Wu +4 位作者 Dong-Hui Cao Zhi-Fang Jia ao shen Jing Jiang Xue-Yuan Cao 《World Journal of Gastrointestinal Oncology》 SCIE 2022年第7期1295-1306,共12页
BACKGROUND Most gastric cancer(GC)patients are diagnosed at middle or late stage because the symptoms in early stage are obscure,which causes higher mortality rates of GC.Helicobacter pylori(H.pylori)was identified as... BACKGROUND Most gastric cancer(GC)patients are diagnosed at middle or late stage because the symptoms in early stage are obscure,which causes higher mortality rates of GC.Helicobacter pylori(H.pylori)was identified as a class I carcinogen and leads to aberrant DNA methylation/hydroxymethylation.5-hydroxymethylcytosine(5-hmC)plays complex roles in gene regulation of tumorigenesis and can be considered as an activating epigenetic mark of hydroxymethylation.AIM To explore the association between 5-hmC levels and the progression and prognosis of GC patients with or without H.pylori infection.METHODS A retrospective cohort study was conducted to estimate the predicted value of 5-hmC level in the progression and prognosis of GC patients with different H.pylori infection status.A total of 144 GC patients were recruited.RESULTS The levels of 5-hmC were significantly decreased in tumor tissues(0.076±0.048)compared with the matched control tissues(0.110±0.057,P=0.001).A high level of 5-hmC was an independent significant favorable predictor of overall survival in GC patients(hazard ratio=0.61,95% confidence interval:0.38-0.98,P=0.040),the H.pylori-negative GC subgroup(hazard ratio=0.30,95% confidence interval:0.13-0.68,P=0.004)and the GC patients with TNM stage Ⅰ or Ⅱ(hazard ratio=0.32,95% confidence interval:0.13-0.77,P=0.011).CONCLUSION Increased 5-hmC is a favorable prognostic factor in GC,especially for H.pylori-negative subgroups. 展开更多
关键词 5-hydroxymethylation 5-hydroxymethylcytosine Helicobacter pylori Gastric cancer PROGNOSIS
暂未订购
Three Kinds of Discrete Formulae for the Caputo Fractional Derivative
7
作者 Zhengnan Dong Enyu Fan +1 位作者 ao shen Yuhao Su 《Communications on Applied Mathematics and Computation》 EI 2023年第4期1446-1468,共23页
In this paper,three kinds of discrete formulae for the Caputo fractional derivative are studied,including the modified L1 discretisation forα∈(O,1),and L2 discretisation and L2C discretisation forα∈(1,2).The trunc... In this paper,three kinds of discrete formulae for the Caputo fractional derivative are studied,including the modified L1 discretisation forα∈(O,1),and L2 discretisation and L2C discretisation forα∈(1,2).The truncation error estimates and the properties of the coeffcients of all these discretisations are analysed in more detail.Finally,the theoretical analyses areverifiedby thenumerical examples. 展开更多
关键词 Caputo fractional derivative Modified L1 discretisation L2 discretisation L2C discretisation Truncation error
在线阅读 下载PDF
Training large-scale language models with limited GPU memory:a survey
8
作者 Yu TANG Linbo QIao +5 位作者 Lujia YIN Peng LIANG ao shen Zhilin YANG Lizhi ZHANG Dongsheng LI 《Frontiers of Information Technology & Electronic Engineering》 2025年第3期309-331,共23页
Large-scale models have gained significant attention in a wide range of fields,such as computer vision and natural language processing,due to their effectiveness across various applications.However,a notable hurdle in... Large-scale models have gained significant attention in a wide range of fields,such as computer vision and natural language processing,due to their effectiveness across various applications.However,a notable hurdle in training these large-scale models is the limited memory capacity of graphics processing units(GPUs).In this paper,we present a comprehensive survey focused on training large-scale models with limited GPU memory.The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process,namely model parameters,model states,and model activations.Following this analysis,we present an in-depth overview of the relevant research work that addresses these aspects individually.Finally,the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models,emphasizing the necessity for continued research and innovation in this area.This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory. 展开更多
关键词 Training techniques Memory optimization Model parameters Model states Model activations
原文传递
Development of biodegradable modifed starch composite flms with Pleurotus citrinopileatus polysaccharide and nano titanium dioxide for enhanced fresh-cut yam preservation
9
作者 ao shen Zijun He +2 位作者 Jinhong Zhang Shuzhen Li Weiwei Yang 《Food Quality and Safety》 2025年第2期275-286,共12页
Modifed starch flms are gaining attention as biodegradable and sustainable materials in the food packaging industry.However,their inherent properties,including their brittleness and low antimicrobial and antioxidant c... Modifed starch flms are gaining attention as biodegradable and sustainable materials in the food packaging industry.However,their inherent properties,including their brittleness and low antimicrobial and antioxidant capacities,limit their extensive application.To address these shortcomings,in this study,a composite flm was developed using potato-modifed starch(PMS)as the base material,enhanced with konjac glucomannan(KGM),Pleurotus citrinopileatus polysaccharide(PCP),and nano titanium dioxide(nano TiO_(2)).Additionally,PCP and nano TiO_(2),which are bioactive components,were incorporated to improve the functional properties of the flms,promoting their application in food preservation.The optimal composition of the composite flms was determined through a fuzzy comprehensive evaluation,and the best performance was achieved with 10 g/L of PCP and 1.5 g/L of nano TiO_(2).These composite flms exhibited high mechanical strength,antimicrobial capacity,and antioxidant capacity while being noncytotoxic.The practical effcacy of the composite flms was verifed by applying them to preserve fresh-cut yams at room temperature,where they effectively delayed spoilage and maintained yam quality.This study demonstrates that PMS/KGM/PCP/nano TiO2 composite flms can signifcantly enhance the shelf life of fresh produce,providing a viable route for eco-friendly food preservation. 展开更多
关键词 Modifed starch flms konjac glucomannan Pleurotus citrinopileatus polysaccharide nano titanium dioxide yam preservation
原文传递
Ailanthone ameliorates pulmonary fibrosis by suppressing JUN-dependent MEOX1 activation
10
作者 Lixin Zhao Yuguang Zhu +12 位作者 Hua Tao Xiying Chen Feng Yin Yingyi Zhang Jianfeng Qin Yongyin Huang Bikun Cai Yonghao Lin Jiaxiang Wu Yu Zhang Lu Liang ao shen Xi-Yong Yu 《Acta Pharmaceutica Sinica B》 SCIE CAS CSCD 2024年第8期3543-3560,共18页
Pulmonary fibrosis poses a significant health threat with very limited therapeutic options available.In this study,we reported the enhanced expression of mesenchymal homobox 1(MEOX1)in pulmonary fibrosis patients,espe... Pulmonary fibrosis poses a significant health threat with very limited therapeutic options available.In this study,we reported the enhanced expression of mesenchymal homobox 1(MEOX1)in pulmonary fibrosis patients,especially in their fibroblasts and endothelial cells,and confirmed MEOX1 as a central orchestrator in the activation of profibrotic genes.By high-throughput screening,we identified Ailanthone(AIL)from a natural compound library as the first small molecule capable of directly targeting and suppressing MEOX1.AIL demonstrated the ability to inhibit both the activation of fibroblasts and endothelial-to-mesenchymal transition of endothelial cells when challenged by transforming growth factor-b1(TGF-b1).In an animal model of bleomycin-induced pulmonary fibrosis,AIL effectively mitigated the fibrotic process and restored respiratory functions.Mechanistically,AIL acted as a suppressor of MEOX1 by disrupting the interaction between the transcription factor JUN and the promoter of MEOX1,thereby inhibiting MEOX1 expression and activity.In summary,our findings pinpointed MEOX1 as a cell-specific and clinically translatable target in fibrosis.Moreover,we demonstrated the potent anti-fibrotic effect of AIL in pulmonary fibrosis,specifically through the suppression of JUN-dependent MEOX1 activation. 展开更多
关键词 Ailanthone MEOX1 Pulmonary fibrosis JUN TGF-B1 High-throughput screening Natural product
原文传递
SS-Pro:a simplified siamese contrastive learning approach for protein surface representation
11
作者 ao shen Mingzhi YUAN +1 位作者 Yingfan MA Manning WANG 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第5期243-245,共3页
Protein surface serves as an important representation of protein structure,revealing how protein interacts with other biomolecules to perform its functions.This forms the basis for pharmaceutical and fundamental biolo... Protein surface serves as an important representation of protein structure,revealing how protein interacts with other biomolecules to perform its functions.This forms the basis for pharmaceutical and fundamental biological research[1].Datadriven deep learning methods in protein surface representation face challenges of label scarcity,since labeled data are typically obtained through wet lab experiments. 展开更多
关键词 REVEALING REPRESENTATION simplified
原文传递
Experimental quantum secret sharing based on phase encoding of coherent states 被引量:9
12
作者 ao shen Xiao-Yu Cao +6 位作者 Yang Wang Yao Fu Jie Gu Wen-Bo Liu Chen-Xun Weng Hua-Lei Yin Zeng-Bing Chen 《Science China(Physics,Mechanics & Astronomy)》 SCIE EI CAS CSCD 2023年第6期139-147,共9页
Quantum secret sharing(QSS)is one of the basic communication primitives in future quantum networks which addresses part of the basic cryptographic tasks of multiparty communication and computation.Nevertheless,it is a... Quantum secret sharing(QSS)is one of the basic communication primitives in future quantum networks which addresses part of the basic cryptographic tasks of multiparty communication and computation.Nevertheless,it is a challenge to provide a practical QSS protocol with security against general attacks.A QSS protocol that balances security and practicality is still lacking.Here,we propose a QSS protocol with simple phase encoding of coherent states among three parties.Removing the requirement of impractical entangled resources and the need for phase randomization,our protocol can be implemented with accessible technology.We provide the finite-key analysis against coherent attacks and implement a proof-of-principle experiment to demonstrate our scheme’s feasibility.Our scheme achieves a key rate of 85.3 bps under a 35 d B channel loss.Combined with security against general attacks and accessible technology,our protocol is a promising candidate for practical multiparty quantum communication networks. 展开更多
关键词 quantum secret sharing coherent state phase encoding coherent attack FINITE-SIZE
原文传递
Supertough spontaneously self-healing polymer based on septuple dynamic bonds integrated in one chemical group 被引量:5
13
作者 Luzhi Zhang Qingbao Guan +3 位作者 ao shen Rasoul Esmaeely Neisiany Zhengwei You Meifang Zhu 《Science China Chemistry》 SCIE EI CSCD 2022年第2期363-372,共10页
The development of spontaneously self-healing materials with excellent mechanical properties remains a formidable challenge despite the extensive interest in such materials.This is because the self-healing and mechani... The development of spontaneously self-healing materials with excellent mechanical properties remains a formidable challenge despite the extensive interest in such materials.This is because the self-healing and mechanical properties of a material are usually optimized via contradictory routes.The present study demonstrated a supertough spontaneously self-healing polymer,Fe-(2,6-diacetylpyridine dioxime)-urethane-based polyurethane(Fe-PPOU)based on septuple dynamic bonds integrated in one chemical group.A synergistic effect was induced by the presence of multiple dynamic crosslinking points,which comprised the integrated dynamic interactions,and the hidden lengths of the folded polymeric chains in Fe-PPOU.Thus,the mechanical and self-healing properties of the polymer were simultaneously optimized.Fe-PPOU demonstrated the highest reported toughness(139.8 MJ m^(-3))among all the room-temperature spontaneously self-healing polymers with a nearly 100%healing rate.Fe-PPOU exhibited instant(30 s)self-healing to reach a strength of 1.6 MPa that was higher than the original strength of numerous recently reported self-healing polymers. 展开更多
关键词 oxime-urethane bonds hydrogen bonds metal-coordination bonds 2 6-diacetylpyridine dioxime SELF-HEALING
原文传递
4-Axis printing microfibrous tubular scaffold and tracheal cartilage application 被引量:3
14
作者 Dong Lei Bin Luo +12 位作者 Yifan Guo Di Wang Hao Yang Shaofei Wang Huixia Xuan ao shen Yi Zhang Zenghe Liu Chuanglong He Feng-Ling Qing Yong Xu Guangdong Zhou Zhengwei You 《Science China Materials》 SCIE EI CSCD 2019年第12期1910-1920,共11页
Long-segment defects remain a major problem in clinical treatment of tubular tissue reconstruction.The design of tubular scaffold with desired structure and functional properties suitable for tubular tissue regenerati... Long-segment defects remain a major problem in clinical treatment of tubular tissue reconstruction.The design of tubular scaffold with desired structure and functional properties suitable for tubular tissue regeneration remains a great challenge in regenerative medicine.Here,we present a reliable method to rapidly fabricate tissueengineered tubular scaffold with hierarchical structure via 4-axis printing system.The fabrication process can be adapted to various biomaterials including hydrogels,thermoplastic materials and thermosetting materials.Using polycaprolactone(PCL)as an example,we successfully fabricated the scaffolds with tunable tubular architecture,controllable mesh structure,radial elasticity,good flexibility,and luminal patency.As a preliminary demonstration of the applications of this technology,we prepared a hybrid tubular scaffold via the combination of the 4-axis printed elastic poly(glycerol sebacate)(PGS)bio-spring and electrospun gelatin nanofibers.The scaffolds seeded with chondrocytes formed tubular mature cartilage-like tissue both via in vitro culture and subcutaneous implantation in the nude mouse,which showed great potential for tracheal cartilage reconstruction. 展开更多
关键词 3D printing tissue engineering tubular scaffold tracheal cartilage
原文传递
Prediction of pandemic risk for animal-origin coronavirus using a deep learning method
15
作者 Zheng Kou Yi‑Fan Huang +3 位作者 ao shen Saeed Kosari Xiang‑Rong Liu Xiao‑Li Qiang 《Infectious Diseases of Poverty》 SCIE 2021年第5期62-70,共9页
Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep lear... Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.Methods:A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library.We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution.The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk.The best performances were explored with the use of pre-trained DNA vector and attention mechanism.The area under the receiver operating characteristic curve(AUROC)and the area under precision-recall curve(AUPR)were used to evaluate the predictive models.Results:The six specifc models achieved good performances for the corresponding virus groups(1 for AUROC and 1 for AUPR).The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups(1 for AUROC and 1 for AUPR)while those without pre-training vector or attention mechanism had obvi‑ously reduction of performance(about 5–25%).Re-training experiments showed that the general model has good capabilities of transfer learning(average for six groups:0.968 for AUROC and 0.942 for AUPR)and should give reason‑able prediction for potential pathogen of next pandemic.The artifcial negative data with the replacement of the coding region of the spike protein were also predicted correctly(100%accuracy).With the application of the Python programming language,an easy-to-use tool was created to implements our predictor.Conclusions:Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic. 展开更多
关键词 CORONAVIRUS Pandemic risk Viral genome Deep learning
原文传递
Efficient deep neural network training via decreasing precision with layer capacity
16
作者 ao shen Zhiquan LAI +4 位作者 Tao SUN shengwei LI Keshi GE Weijie LIU Dongsheng LI 《Frontiers of Computer Science》 2025年第10期39-55,共17页
Low-precision training has emerged as a practical approach,saving the cost of time,memory,and energy during deep neural networks(DNNs)training.Typically,the use of lower precision introduces quantization errors that n... Low-precision training has emerged as a practical approach,saving the cost of time,memory,and energy during deep neural networks(DNNs)training.Typically,the use of lower precision introduces quantization errors that need to be minimized to maintain model performance,often neglecting to consider the potential benefits of reducing training precision.This paper rethinks low-precision training,highlighting the potential benefits of lowering precision:(1)low precision can serve as a form of regularization in DNN training by constraining excessive variance in the model;(2)layer-wise low precision can be seen as an alternative dimension of sparsity,orthogonal to pruning,contributing to improved generalization in DNNs.Based on these analyses,we propose a simple yet powerful technique-DPC(Decreasing Precision with layer Capacity),which directly assigns different bit-widths to model layers,without the need for an exhaustive analysis of the training process or any delicate low-precision criteria.Thorough extensive experiments on five datasets and fourteen models across various applications consistently demonstrate the effectiveness of the proposed DPC technique in saving computational cost(-16.21%--44.37%)while achieving comparable or even superior accuracy(up to+0.68%,+0.21%on average).Furthermore,we offer feature embedding visualizations and conduct further analysis with experiments to investigate the underlying mechanisms behind DPC’s effectiveness,enhancing our understanding of low-precision training.Our source code will be released upon paper acceptance. 展开更多
关键词 low precision efficient training generalization regularization bit-width assignment
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部