The current metadata modeling techniques can not meet the needs of knowledge conception expression, knowledge organization, and metadata semantic consistency in geological domain. This paper introduces ontology and in...The current metadata modeling techniques can not meet the needs of knowledge conception expression, knowledge organization, and metadata semantic consistency in geological domain. This paper introduces ontology and integrates this theory to geological domain metadata modeling. It adopts the first order logic equivalent algorithm and defines the metadata extended model as a quaternion group which is consists of geological term set, geological term definition set, attribute definition set and instance set. It also provides the formal description of each set. Finally the five steps for building geological domain metadata extended model are given. The result presents that this model not only provides the content standards for geological domain knowledge representation and knowledge organization, but also provides the basis for geological domain multi-source data and historical data integration and application in semantic consistency.展开更多
近年来,以Chat GPT为代表的大语言模型(large language model,LLM)技术发展迅速.随着模型参数规模的持续增长,构建和应用大模型对数据存储规模和存储访问效率提出了更高要求,这对传统存储系统带来了严峻挑战.首先分析了大模型在数据准...近年来,以Chat GPT为代表的大语言模型(large language model,LLM)技术发展迅速.随着模型参数规模的持续增长,构建和应用大模型对数据存储规模和存储访问效率提出了更高要求,这对传统存储系统带来了严峻挑战.首先分析了大模型在数据准备、模型训练和推理阶段的存储访问特征,深入探讨了传统存储系统在大模型场景下面临的主要问题和瓶颈.针对这些挑战,提出并实现了一种高性能、可扩展的分布式元数据设计Scale FS.通过目录树元数据与属性元数据解耦的架构设计,并结合深度与广度均衡的目录树分层分区策略设计,Scale FS实现了高效的路径解析、负载均衡和系统扩展能力,能够高效管理千亿级文件.此外,Scale FS设计了细粒度元数据结构,优化了元数据访问模式,并构建了面向文件语义优化的元数据键值存储底座,显著提升了元数据访问效率并减少了磁盘I/O操作.实验结果表明,Scale FS的每秒操作次数(operations per second,OPS)是HDFS的1.04~7.12倍,而延迟仅为HDFS的12.67%~99.55%.在千亿级文件规模下,Scale FS的大部分操作性能优于HDFS在十亿级文件规模下的表现,展现出更高的扩展性和访问效率,能够更好地满足大模型场景对千亿级文件存储及高效访问的需求.展开更多
针对数据语义治理方面的研究不足,尤其是缺乏从底层基础理论出发,对语义进行系统性探索,进而实现数据语义标准化注册与管理的深入研究。因此,从基础理论出发对语义组织与表示的本质进行揭示,进而提出了概念世界的概念系统模型。然后,结...针对数据语义治理方面的研究不足,尤其是缺乏从底层基础理论出发,对语义进行系统性探索,进而实现数据语义标准化注册与管理的深入研究。因此,从基础理论出发对语义组织与表示的本质进行揭示,进而提出了概念世界的概念系统模型。然后,结合ISO/IEC(International Organization for Standardization/International Electrotechnical Commission)11179系列标准分别构建了基于MDR(Metad ata Registry)的性质特征概念语义注册元模型和关系特征概念语义注册元模型,实现了富语义知识的注册与管理。最后,以油气勘探与评价领域数据治理为背景,设计并开发了元数据注册及治理系统。无论是从理论角度还是应用层面,基于MDR标准所提出的两类语义模型的合理性与正确性得到了验证,体现了其在实际应用中的有效性。展开更多
Remote sensing data acquisition is one of the most essential processes in the field of Earth observation.However,traditional methods to acquire data do not satisfy the requirements of current applications because larg...Remote sensing data acquisition is one of the most essential processes in the field of Earth observation.However,traditional methods to acquire data do not satisfy the requirements of current applications because large-scale data processing is required.To address this issue,this paper proposes a data acquisition framework that carries out remote sensing metadata planning and then realizes the online acquisition of large amounts of data.Firstly,this paper establishes a unified metadata cataloging model and realizes the catalog of metadata in a local database.Secondly,a coverage calculation model is presented,which can show users the data coverage information in a selected geographical region under the data requirements of a specific application.Finally,according to the data retrieval results and the coverage calcula-tion,a machine-to-machine interface is provided to acquire target remote sensing data.Experiments were conducted to verify the availability and practicality of the proposed frame-work,and the results show the strengths and powerful capabilities of our framework by overcoming deficiencies in traditional methods.It also achieved the online automatic acquisi-tion of large-scale heterogeneous remote sensing data,which can provide guidance for remote sensing data acquisition strategies.展开更多
The rapidly changing requirements and business rules stimulate software developers to make their applications more dynamic, configurable, and adaptable. An effective way to meet such requirements is to apply an adapti...The rapidly changing requirements and business rules stimulate software developers to make their applications more dynamic, configurable, and adaptable. An effective way to meet such requirements is to apply an adaptive object-model (AOM). The AOM architecture style is composed of metamodel, model engine and tools. Firstly, two small patterns for building up metamodel are analyzed in detail. Then model engine for interpreting metamodel and tools for end-uses to define and configure object models are discussed. Finally, a novel platform—applicationware—is proposed.展开更多
The reliability and high performance of metadata service is crucial to the store architecture. A novel design of a two-level metadata server file system (TTMFS) is presented, which behaves high reliability and perfo...The reliability and high performance of metadata service is crucial to the store architecture. A novel design of a two-level metadata server file system (TTMFS) is presented, which behaves high reliability and performance. The merits both centralized management and distributed management are considered simultaneously in our design. In this file system, the advanced-metadata server is responsible for manage directory metadata and the whole namespace. The double-metadata server is responsible for maintaining file metadata. And this paper uses the Markov return model to analyze the reliability of the two-level metadata server. The experiment data indicates that the design can provide high throughput.展开更多
This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are tr...This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are trained and constructed with STTS (Semantic Template Training System), which are taken as the bridge to realize the transition from various low-level media feature to object semantic. Furthermore, we put forward a kind of double layers metadata structure to efficaciously store and manage mined low-level feature and high-level semantic. This model has broad application in lots of domains such as intelligent retrieval engine, medical diagnoses, multimedia design and so on.展开更多
文摘The current metadata modeling techniques can not meet the needs of knowledge conception expression, knowledge organization, and metadata semantic consistency in geological domain. This paper introduces ontology and integrates this theory to geological domain metadata modeling. It adopts the first order logic equivalent algorithm and defines the metadata extended model as a quaternion group which is consists of geological term set, geological term definition set, attribute definition set and instance set. It also provides the formal description of each set. Finally the five steps for building geological domain metadata extended model are given. The result presents that this model not only provides the content standards for geological domain knowledge representation and knowledge organization, but also provides the basis for geological domain multi-source data and historical data integration and application in semantic consistency.
文摘针对数据语义治理方面的研究不足,尤其是缺乏从底层基础理论出发,对语义进行系统性探索,进而实现数据语义标准化注册与管理的深入研究。因此,从基础理论出发对语义组织与表示的本质进行揭示,进而提出了概念世界的概念系统模型。然后,结合ISO/IEC(International Organization for Standardization/International Electrotechnical Commission)11179系列标准分别构建了基于MDR(Metad ata Registry)的性质特征概念语义注册元模型和关系特征概念语义注册元模型,实现了富语义知识的注册与管理。最后,以油气勘探与评价领域数据治理为背景,设计并开发了元数据注册及治理系统。无论是从理论角度还是应用层面,基于MDR标准所提出的两类语义模型的合理性与正确性得到了验证,体现了其在实际应用中的有效性。
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences[grant number XDA19020201]。
文摘Remote sensing data acquisition is one of the most essential processes in the field of Earth observation.However,traditional methods to acquire data do not satisfy the requirements of current applications because large-scale data processing is required.To address this issue,this paper proposes a data acquisition framework that carries out remote sensing metadata planning and then realizes the online acquisition of large amounts of data.Firstly,this paper establishes a unified metadata cataloging model and realizes the catalog of metadata in a local database.Secondly,a coverage calculation model is presented,which can show users the data coverage information in a selected geographical region under the data requirements of a specific application.Finally,according to the data retrieval results and the coverage calcula-tion,a machine-to-machine interface is provided to acquire target remote sensing data.Experiments were conducted to verify the availability and practicality of the proposed frame-work,and the results show the strengths and powerful capabilities of our framework by overcoming deficiencies in traditional methods.It also achieved the online automatic acquisi-tion of large-scale heterogeneous remote sensing data,which can provide guidance for remote sensing data acquisition strategies.
基金Supported by the Foundation of National 863 Program of China (No. 2001AA112030)
文摘The rapidly changing requirements and business rules stimulate software developers to make their applications more dynamic, configurable, and adaptable. An effective way to meet such requirements is to apply an adaptive object-model (AOM). The AOM architecture style is composed of metamodel, model engine and tools. Firstly, two small patterns for building up metamodel are analyzed in detail. Then model engine for interpreting metamodel and tools for end-uses to define and configure object models are discussed. Finally, a novel platform—applicationware—is proposed.
基金Supported by the Industrialized Foundation ofHebei Province (F020501)
文摘The reliability and high performance of metadata service is crucial to the store architecture. A novel design of a two-level metadata server file system (TTMFS) is presented, which behaves high reliability and performance. The merits both centralized management and distributed management are considered simultaneously in our design. In this file system, the advanced-metadata server is responsible for manage directory metadata and the whole namespace. The double-metadata server is responsible for maintaining file metadata. And this paper uses the Markov return model to analyze the reliability of the two-level metadata server. The experiment data indicates that the design can provide high throughput.
基金Supported by the National Basic Research Program of China 973 Program (2007CB310801)the Specialized Research Fund for the Doctoral Program of Higer Education of China (20070486064)+1 种基金the Natural Science Foundation of Hubei Province (2007ABA038)the Programme of Introducing Talents of Discipline to Universities (B07037)
文摘This paper presents a cross-media semantic mining model (CSMM) based on object semantic. This model obtains object-level semantic information in terms of maximum probability principle. Then semantic templates are trained and constructed with STTS (Semantic Template Training System), which are taken as the bridge to realize the transition from various low-level media feature to object semantic. Furthermore, we put forward a kind of double layers metadata structure to efficaciously store and manage mined low-level feature and high-level semantic. This model has broad application in lots of domains such as intelligent retrieval engine, medical diagnoses, multimedia design and so on.