Opening the silicon oxide mask of a capacitor in dynamic random access memory is a critical process on a capacitive coupled plasma(CCP)etch tool.Three steps,dielectric anti-reflective coating(DARC)etch back,silicon ox...Opening the silicon oxide mask of a capacitor in dynamic random access memory is a critical process on a capacitive coupled plasma(CCP)etch tool.Three steps,dielectric anti-reflective coating(DARC)etch back,silicon oxide etch and strip,are contained.To acquire good performance,such as low leakage current and high capacitance,for further fabricating capacitors,we should firstly optimize DARC etch back.We developed some experiments,focusing on etch time and chemistry,to evalu-ate the profile of a silicon oxide mask,DARC remain and critical dimension.The result shows that etch back time should be con-trolled in the range from 50 to 60 s,based on the current equipment and condition.It will make B/T ratio higher than 70%mean-while resolve the DARC remain issue.We also found that CH_(2)F_(2) flow should be~15 sccm to avoid reversed CD trend and keep in-line CD.展开更多
DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasin...DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasingly worse as increasing bitline length,which results in high row activation and precharge latencies.In this work,we propose a practical approach called Row Buffer Cache(RBC)to mitigate row buffer conflict overheads efficiently.At the core of our proposed RBC architecture,the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality.Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge,and thus improves overall system performance and energy efficiency.We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system.Results show that RBC improves the overall performance by up to 2:24(16:1%on average)and reduces the memory energy by up to 68:2%(23:6%on average)for single-core simulations.For multi-core simulations,RBC increases the overall performance by up to1:55(17%on average)and reduces memory energy consumption by up to 35:4%(21:3%on average).展开更多
In this work,an IGZO(In-Ga-Zn-O)2T0C DRAM(dynamic random access memory)is demonstrated as a cryogenic memory as low as 77 K.The effects of temperature on the IGZO TFTs electrical properties are investigated.We observe...In this work,an IGZO(In-Ga-Zn-O)2T0C DRAM(dynamic random access memory)is demonstrated as a cryogenic memory as low as 77 K.The effects of temperature on the IGZO TFTs electrical properties are investigated.We observe that the subthreshold swing(SS)is improved from 161 to 99 mV/dec with no penalty of on-state current(ION)@VTH+1 V reduction when temperature decreased from 300 to 77 K.More importantly,the corresponding VTH shift positively from-1 to 0.5 V,indicating a transition from depletion-mode to enhancement-mode of IGZO TFTs,which is crucial for the low power operation and data retention time(DRT)optimization.By integrating this IGZO TFT to 2T0C DRAM,the retention time of the DRAM cell is significantly enhanced to 8000 s at 77 K,more than 5 times longer than the one at 300 K.The optimized data retention time also results from the lower leakage current(6×10^(-18)A/μm)of at 77 K due to the suppress of carriers thermally excitation and tunneling in IGZO channel at cryogenic temperature.Additionally,a large read current margin(I_(data‘1’)/I_(data‘0’))of approximately 103 is achieved across wide temperature range.This study demonstrates the potential of IGZO 2T0C DRAM cells for future cryogenic computing systems.展开更多
Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main memory.To fully realize their potential,it is essential to improve D...Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main memory.To fully realize their potential,it is essential to improve DRAM cache hit rate and lower its cache hit latency.In order to take advantage of the high hit-rate of set-association and the low hit latency of direct-mapping at the same time,we propose a partial direct-mapped die-stacked DRAM cache called P3DC.This design is motivated by a key observation,i.e.,applying a unified mapping policy to different types of blocks cannot achieve a high cache hit rate and low hit latency simultaneously.To address this problem,P3DC classifies data blocks into leading blocks and following blocks,and places them at static positions and dynamic positions,respectively,in a unified set-associative structure.We also propose a replacement policy to balance the miss penalty and the temporal locality of different blocks.In addition,P3DC provides a policy to mitigate cache thrashing due to block type variations.Experimental results demonstrate that P3DC can reduce the cache hit latency by 20.5%while achieving a similar cache hit rate compared with typical set-associative caches.P3DC improves the instructions per cycle(IPC)by up to 66%(12%on average)compared with the state-of-the-art direct-mapped cache—BEAR,and by up to 19%(6%on average)compared with the tag-data decoupled set-associative cache—DEC-A8.展开更多
文摘Opening the silicon oxide mask of a capacitor in dynamic random access memory is a critical process on a capacitive coupled plasma(CCP)etch tool.Three steps,dielectric anti-reflective coating(DARC)etch back,silicon oxide etch and strip,are contained.To acquire good performance,such as low leakage current and high capacitance,for further fabricating capacitors,we should firstly optimize DARC etch back.We developed some experiments,focusing on etch time and chemistry,to evalu-ate the profile of a silicon oxide mask,DARC remain and critical dimension.The result shows that etch back time should be con-trolled in the range from 50 to 60 s,based on the current equipment and condition.It will make B/T ratio higher than 70%mean-while resolve the DARC remain issue.We also found that CH_(2)F_(2) flow should be~15 sccm to avoid reversed CD trend and keep in-line CD.
基金supported by the US National Science Foundation(Nos.CCF-1717660 and CNS-1828363)。
文摘DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasingly worse as increasing bitline length,which results in high row activation and precharge latencies.In this work,we propose a practical approach called Row Buffer Cache(RBC)to mitigate row buffer conflict overheads efficiently.At the core of our proposed RBC architecture,the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality.Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge,and thus improves overall system performance and energy efficiency.We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system.Results show that RBC improves the overall performance by up to 2:24(16:1%on average)and reduces the memory energy by up to 68:2%(23:6%on average)for single-core simulations.For multi-core simulations,RBC increases the overall performance by up to1:55(17%on average)and reduces memory energy consumption by up to 35:4%(21:3%on average).
文摘In this work,an IGZO(In-Ga-Zn-O)2T0C DRAM(dynamic random access memory)is demonstrated as a cryogenic memory as low as 77 K.The effects of temperature on the IGZO TFTs electrical properties are investigated.We observe that the subthreshold swing(SS)is improved from 161 to 99 mV/dec with no penalty of on-state current(ION)@VTH+1 V reduction when temperature decreased from 300 to 77 K.More importantly,the corresponding VTH shift positively from-1 to 0.5 V,indicating a transition from depletion-mode to enhancement-mode of IGZO TFTs,which is crucial for the low power operation and data retention time(DRT)optimization.By integrating this IGZO TFT to 2T0C DRAM,the retention time of the DRAM cell is significantly enhanced to 8000 s at 77 K,more than 5 times longer than the one at 300 K.The optimized data retention time also results from the lower leakage current(6×10^(-18)A/μm)of at 77 K due to the suppress of carriers thermally excitation and tunneling in IGZO channel at cryogenic temperature.Additionally,a large read current margin(I_(data‘1’)/I_(data‘0’))of approximately 103 is achieved across wide temperature range.This study demonstrates the potential of IGZO 2T0C DRAM cells for future cryogenic computing systems.
基金supported jointly by the National Key Research and Development Program of China under Grant No.2022YFB4500303the National Natural Science Foundation of China under Grant Nos.62072198,61825202,and 61929103.
文摘Die-stacked dynamic random access memory(DRAM)caches are increasingly advocated to bridge the performance gap between the on-chip cache and the main memory.To fully realize their potential,it is essential to improve DRAM cache hit rate and lower its cache hit latency.In order to take advantage of the high hit-rate of set-association and the low hit latency of direct-mapping at the same time,we propose a partial direct-mapped die-stacked DRAM cache called P3DC.This design is motivated by a key observation,i.e.,applying a unified mapping policy to different types of blocks cannot achieve a high cache hit rate and low hit latency simultaneously.To address this problem,P3DC classifies data blocks into leading blocks and following blocks,and places them at static positions and dynamic positions,respectively,in a unified set-associative structure.We also propose a replacement policy to balance the miss penalty and the temporal locality of different blocks.In addition,P3DC provides a policy to mitigate cache thrashing due to block type variations.Experimental results demonstrate that P3DC can reduce the cache hit latency by 20.5%while achieving a similar cache hit rate compared with typical set-associative caches.P3DC improves the instructions per cycle(IPC)by up to 66%(12%on average)compared with the state-of-the-art direct-mapped cache—BEAR,and by up to 19%(6%on average)compared with the tag-data decoupled set-associative cache—DEC-A8.