Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessi...Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.展开更多
Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a signi...Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.展开更多
Persistent memory(PM)file systems have been developed to achieve high performance by exploiting the advanced features of PMs,including nonvolatility,byte addressability,and dynamic random access memory(DRAM)like perfo...Persistent memory(PM)file systems have been developed to achieve high performance by exploiting the advanced features of PMs,including nonvolatility,byte addressability,and dynamic random access memory(DRAM)like performance.Unfortunately,these PMs suffer from limited write endurance.Existing space management strategies of PM file systems can induce a severely unbalanced wear problem,which can damage the underlying PMs quickly.In this paper,we propose a Wear-leveling-aware Multi-grained Allocator,called WMAlloc,to achieve the wear leveling of PMs while improving the performance of file systems.WMAlloc adopts multiple min-heaps to manage the unused space of PMs.Each heap represents an allocation granularity.Then,WMAlloc allocates less-worn blocks from the corresponding min-heap for allocation requests.Moreover,to avoid recursive split and inefficient heap locations in WMAlloc,we further propose a bitmap-based multi-heap tree(BMT)to enhance WMAlloc,namely,WMAlloc-BMT.We implement WMAlloc and WMAlloc-BMT in the Linux kernel based on NOVA,a typical PM file system.Experimental results show that,compared with the original NOVA and dynamic wear-aware range management(DWARM),which is the state-of-the-art wear-leveling-aware allocator of PM file systems,WMAlloc can,respectively,achieve 4.11×and 1.81×maximum write number reduction and 1.02×and 1.64×performance with four workloads on average.Furthermore,WMAlloc-BMT outperforms WMAlloc with 1.08×performance and achieves 1.17×maximum write number reduction with four workloads on average.展开更多
Emergence of new hardware,including persistent memory and smart network interface card(SmartNIC),has brought new opportunities to file system design.In this paper,we design and implement a new file system named NICFS ...Emergence of new hardware,including persistent memory and smart network interface card(SmartNIC),has brought new opportunities to file system design.In this paper,we design and implement a new file system named NICFS based on persistent memory and SmartNIC.We divide the file system into two parts:the front end and the back end.In the front end,data writes are appended to the persistent memory in a log-structured way,leveraging the fast persistence advantage of persistent memory.In the back end,the data in logs are fetched,processed,and patched to files in the background,leveraging the processing capacity of SmartNIC.Evaluation results show that NICFS outperforms Ext4 by about 21%/10%and about 19%/50%on large and small reads/writes,respectively.展开更多
采用客体回溯范式,以客体预览利化效应(object specific previewing benefit,OSPB)作为指标,考察表面特征线索对客体保持的作用。实验1使用双向隧道创建时空线索不明确的条件,研究表面颜色特征线索的作用。实验2使用单向隧道使时空线索...采用客体回溯范式,以客体预览利化效应(object specific previewing benefit,OSPB)作为指标,考察表面特征线索对客体保持的作用。实验1使用双向隧道创建时空线索不明确的条件,研究表面颜色特征线索的作用。实验2使用单向隧道使时空线索明确,研究表面颜色特征线索与时空线索一致、冲突情境下的客体保持。实验1和实验2均出现了OSPB效应,且实验2冲突情境的OSPB效应低于一致情境。研究结果表明在时空线索不明确的条件下,仅凭表面颜色特征线索就能实现客体保持;在时空线索明确的条件下,时空线索是客体保持的主要线索,同时表面颜色特征线索也起一定的作用。展开更多
英特尔于2019年4月正式发布基于3D-Xpoint技术的傲腾持久性内存(Optane DC persistent memory),这为构建高效的持久性内存存储系统提供了新的机遇.然而,现有的存储系统软件并不能很好地利用其字节寻址特性,持久性内存性能很难充分发挥....英特尔于2019年4月正式发布基于3D-Xpoint技术的傲腾持久性内存(Optane DC persistent memory),这为构建高效的持久性内存存储系统提供了新的机遇.然而,现有的存储系统软件并不能很好地利用其字节寻址特性,持久性内存性能很难充分发挥.提出一种文件系统数据页的混合管理机制HDPM,通过选择性使用写时复制机制和日志结构管理文件数据,充分发挥持久性内存字节可寻址特性,从而避免了传统单一模式在非对齐写或者小写造成的写放大问题.为避免影响读性能,HDPM引入逆向扫描机制,实现日志结构重构数据页时不引入额外数据拷贝.HDPM还提出一种多重垃圾回收机制进行日志清理.当单个日志结构过大时,通过读写流程主动回收日志结构;当持久性内存空间受限时,则通过后台线程使用免锁机制异步释放日志空间.实验显示,HDPM相比于NOVA文件系统,单线程写延迟降低达58%,且读延迟不受影响;Filebench多线程测试显示,HDPM相比于NOVA提升吞吐率33%.展开更多
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.
基金supported by the Major Research Plan of the National Natural Science Foundation of China under Grant No.92270202the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDB44030200.
文摘Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.
基金supported by ZTE Industry⁃University⁃Institute Coopera⁃tion Funds under Grant No.HC⁃CN⁃20181128026.
文摘Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.
基金Project supported by the National Natural Science Foundation of China(No.62162011)the Doctor Funds of Guizhou University,China(Nos.2020(13)and 2022(44))。
文摘Persistent memory(PM)file systems have been developed to achieve high performance by exploiting the advanced features of PMs,including nonvolatility,byte addressability,and dynamic random access memory(DRAM)like performance.Unfortunately,these PMs suffer from limited write endurance.Existing space management strategies of PM file systems can induce a severely unbalanced wear problem,which can damage the underlying PMs quickly.In this paper,we propose a Wear-leveling-aware Multi-grained Allocator,called WMAlloc,to achieve the wear leveling of PMs while improving the performance of file systems.WMAlloc adopts multiple min-heaps to manage the unused space of PMs.Each heap represents an allocation granularity.Then,WMAlloc allocates less-worn blocks from the corresponding min-heap for allocation requests.Moreover,to avoid recursive split and inefficient heap locations in WMAlloc,we further propose a bitmap-based multi-heap tree(BMT)to enhance WMAlloc,namely,WMAlloc-BMT.We implement WMAlloc and WMAlloc-BMT in the Linux kernel based on NOVA,a typical PM file system.Experimental results show that,compared with the original NOVA and dynamic wear-aware range management(DWARM),which is the state-of-the-art wear-leveling-aware allocator of PM file systems,WMAlloc can,respectively,achieve 4.11×and 1.81×maximum write number reduction and 1.02×and 1.64×performance with four workloads on average.Furthermore,WMAlloc-BMT outperforms WMAlloc with 1.08×performance and achieves 1.17×maximum write number reduction with four workloads on average.
基金Project supported by the National Key R&D Program of China(No.2021YFB0300500)the National Natural Science Foundation of China(No.62022051)。
文摘Emergence of new hardware,including persistent memory and smart network interface card(SmartNIC),has brought new opportunities to file system design.In this paper,we design and implement a new file system named NICFS based on persistent memory and SmartNIC.We divide the file system into two parts:the front end and the back end.In the front end,data writes are appended to the persistent memory in a log-structured way,leveraging the fast persistence advantage of persistent memory.In the back end,the data in logs are fetched,processed,and patched to files in the background,leveraging the processing capacity of SmartNIC.Evaluation results show that NICFS outperforms Ext4 by about 21%/10%and about 19%/50%on large and small reads/writes,respectively.
文摘采用客体回溯范式,以客体预览利化效应(object specific previewing benefit,OSPB)作为指标,考察表面特征线索对客体保持的作用。实验1使用双向隧道创建时空线索不明确的条件,研究表面颜色特征线索的作用。实验2使用单向隧道使时空线索明确,研究表面颜色特征线索与时空线索一致、冲突情境下的客体保持。实验1和实验2均出现了OSPB效应,且实验2冲突情境的OSPB效应低于一致情境。研究结果表明在时空线索不明确的条件下,仅凭表面颜色特征线索就能实现客体保持;在时空线索明确的条件下,时空线索是客体保持的主要线索,同时表面颜色特征线索也起一定的作用。
文摘英特尔于2019年4月正式发布基于3D-Xpoint技术的傲腾持久性内存(Optane DC persistent memory),这为构建高效的持久性内存存储系统提供了新的机遇.然而,现有的存储系统软件并不能很好地利用其字节寻址特性,持久性内存性能很难充分发挥.提出一种文件系统数据页的混合管理机制HDPM,通过选择性使用写时复制机制和日志结构管理文件数据,充分发挥持久性内存字节可寻址特性,从而避免了传统单一模式在非对齐写或者小写造成的写放大问题.为避免影响读性能,HDPM引入逆向扫描机制,实现日志结构重构数据页时不引入额外数据拷贝.HDPM还提出一种多重垃圾回收机制进行日志清理.当单个日志结构过大时,通过读写流程主动回收日志结构;当持久性内存空间受限时,则通过后台线程使用免锁机制异步释放日志空间.实验显示,HDPM相比于NOVA文件系统,单线程写延迟降低达58%,且读延迟不受影响;Filebench多线程测试显示,HDPM相比于NOVA提升吞吐率33%.