The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utilit...The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative ...In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.展开更多
Schema incompatibility is a major challenge to a federated database systemfor data sharing among heterogeneous,multiple and autonomous databases.This paperpresents a mapping approach based on import schema,export sche...Schema incompatibility is a major challenge to a federated database systemfor data sharing among heterogeneous,multiple and autonomous databases.This paperpresents a mapping approach based on import schema,export schema and domain conver-sion function,through which schema incompatibility problems such as naming conflict,domain incompatibility and entity definition incompatibility can be resolved effectively.The implementation techniques are also discussed.展开更多
We present the database of maser sources in H2 O, OH and Si O lines that can be used to identify and study variable stars at evolved stages. Detecting the maser emission in H2 O, OH and Si O molecules toward infrared-...We present the database of maser sources in H2 O, OH and Si O lines that can be used to identify and study variable stars at evolved stages. Detecting the maser emission in H2 O, OH and Si O molecules toward infrared-excess objects is one of the methods for identifing long-period variables(LPVs, including miras and semiregulars), because these stars exhibit maser activity in their circumstellar shells. Our sample contains 1803 known LPV objects. Forty-six percent of these stars(832 objects) manifest maser emission in the line of at least one molecule: H2 O, OH or Si O. We use the database of circumstellar masers in order to search for LPVs which are not included in the General Catalogue of Variable Stars(GCVS). Our database contains 4806 objects(3866 objects without associations in GCVS) with maser detection in at least one molecule. Therefore it is possible to use the database in order to locate and study the large sample of LPV stars. The database can be accessed at http://maserdb.net.展开更多
The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to und...The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.展开更多
Our paper describes the organizing of database,remarks about SNGO(Surlari National Geomagnetic Observatory)and network infrastructure.Based on the geomagnetic data acquired and stored on the database server,we perform...Our paper describes the organizing of database,remarks about SNGO(Surlari National Geomagnetic Observatory)and network infrastructure.Based on the geomagnetic data acquired and stored on the database server,we perform the processing and analysis of geomagnetic parameters through different spectral,statistical and correlation methods.All these parameters are included in the geomagnetic database on server.The web interface for the database meets the different needs of handling the data collected,raw or processed.The server-side programming language used for design is php.This allow us to select different periods for which access to stored data,required for different search filters and different parameters or data from different time periods can be compared.For a more in-depth analysis of the stored data,through JavaScript programming language graphs for different parameters can be drawn.Access to the web interface can be done with or without authentication,depending on the need to ensure the security of certain data collected,stored and processed.The applications are scalable for different devices that will access it:mobile,tablets,laptops or desktops.展开更多
Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient metho...Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.展开更多
In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military i...In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military intelligence database are discussed. On this condition, a new data-mining arithmetic based on relation intelligence database is presented according to the preference information and the requirement of time limit given by the commander. Furthermore, a simple calculative example is presented to prove the arithmetic with better maneuverability. Lastly, the problem of how to process the intelligence data mined from the intelligence database is discussed.展开更多
Sensor networks provide means to link people with real world by processing data in real time collected from real-world and routing the query results to the right people. Application examples include continuous monitor...Sensor networks provide means to link people with real world by processing data in real time collected from real-world and routing the query results to the right people. Application examples include continuous monitoring of environment, building infrastructures and human health. Many researchers view the sensor networks as databases, and the monitoring tasks are performed as subscriptions, queries, and alert. However, this point is not precise. First, databases can only deal with well-formed data types, with well-defined schema for their interpretation, while the raw data collected by the sensor networks, in most cases, do not fit to this requirement. Second, sensor networks have to deal with very dynamic targets, environment and resources, while databases are more static. In order to fill this gap between sensor networks and databases, we propose a novel approach, referred to as 'spatiotemporal data stream segmentation', or 'stream segmentation' for short, to address the dynamic nature and deal with 'raw' data of sensor networks. Stream segmentation is defined using Bayesian Networks in the context of sensor networks, and two application examples are given to demonstrate the usefulness of the approach.展开更多
Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. ...Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. This problem depends on the type and quality of data, and differs according to the volume of data set manipulated. In this paper we are going to introduce a novel framework based on extended fuzzy C-means algorithm by using topic ontology. This work aims to improve the OLAP querying process over heterogeneous data warehouses that contain big data sets, by improving query results integration, eliminating redundancies by using the extended classification algorithm, and measuring the loss of information.展开更多
Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off la...Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.展开更多
基金by the National Key Research and Development Program of China(2023YFC3303701-02 and 2024YFC3306701)the National Natural Science Foundation of China(T2425014 and 32270667)+3 种基金the Natural Science Foundation of Fujian Province of China(2023J06013)the Major Project of the National Social Science Foundation of China granted to Chuan-Chao Wang(21&ZD285)Open Research Fund of State Key Laboratory of Genetic Engineering at Fudan University(SKLGE-2310)Open Research Fund of Forensic Genetics Key Laboratory of the Ministry of Public Security(2023FGKFKT07).
文摘The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金supported by National High Technology Research and Development Program of China (863 Program) (No. AA420060)
文摘In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.
文摘Schema incompatibility is a major challenge to a federated database systemfor data sharing among heterogeneous,multiple and autonomous databases.This paperpresents a mapping approach based on import schema,export schema and domain conver-sion function,through which schema incompatibility problems such as naming conflict,domain incompatibility and entity definition incompatibility can be resolved effectively.The implementation techniques are also discussed.
基金funded by the Russian Foundationfor Basic Research through research project 18-32-00605supported by Russian Science Foundation grant18-12-00193supported by Act 211 of theGovernment of the Russian Federation, agreement No.02.A03.21.0006
文摘We present the database of maser sources in H2 O, OH and Si O lines that can be used to identify and study variable stars at evolved stages. Detecting the maser emission in H2 O, OH and Si O molecules toward infrared-excess objects is one of the methods for identifing long-period variables(LPVs, including miras and semiregulars), because these stars exhibit maser activity in their circumstellar shells. Our sample contains 1803 known LPV objects. Forty-six percent of these stars(832 objects) manifest maser emission in the line of at least one molecule: H2 O, OH or Si O. We use the database of circumstellar masers in order to search for LPVs which are not included in the General Catalogue of Variable Stars(GCVS). Our database contains 4806 objects(3866 objects without associations in GCVS) with maser detection in at least one molecule. Therefore it is possible to use the database in order to locate and study the large sample of LPV stars. The database can be accessed at http://maserdb.net.
文摘The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.
基金the Romanian Ministry of Education and Research for financing the projects:“The realization of 3D geological/geophysical models for the characterization of some areas of economic and scientific interest in Romania”,with Contract No.49N/2019“Institutional capacities and services for research,monitoring and forecasting of risks in extra-atmospheric space”,acronym SAFESPACE,Contract No.16PCCDI/2018,within PNCDIII.
文摘Our paper describes the organizing of database,remarks about SNGO(Surlari National Geomagnetic Observatory)and network infrastructure.Based on the geomagnetic data acquired and stored on the database server,we perform the processing and analysis of geomagnetic parameters through different spectral,statistical and correlation methods.All these parameters are included in the geomagnetic database on server.The web interface for the database meets the different needs of handling the data collected,raw or processed.The server-side programming language used for design is php.This allow us to select different periods for which access to stored data,required for different search filters and different parameters or data from different time periods can be compared.For a more in-depth analysis of the stored data,through JavaScript programming language graphs for different parameters can be drawn.Access to the web interface can be done with or without authentication,depending on the need to ensure the security of certain data collected,stored and processed.The applications are scalable for different devices that will access it:mobile,tablets,laptops or desktops.
文摘Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.
文摘In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military intelligence database are discussed. On this condition, a new data-mining arithmetic based on relation intelligence database is presented according to the preference information and the requirement of time limit given by the commander. Furthermore, a simple calculative example is presented to prove the arithmetic with better maneuverability. Lastly, the problem of how to process the intelligence data mined from the intelligence database is discussed.
文摘Sensor networks provide means to link people with real world by processing data in real time collected from real-world and routing the query results to the right people. Application examples include continuous monitoring of environment, building infrastructures and human health. Many researchers view the sensor networks as databases, and the monitoring tasks are performed as subscriptions, queries, and alert. However, this point is not precise. First, databases can only deal with well-formed data types, with well-defined schema for their interpretation, while the raw data collected by the sensor networks, in most cases, do not fit to this requirement. Second, sensor networks have to deal with very dynamic targets, environment and resources, while databases are more static. In order to fill this gap between sensor networks and databases, we propose a novel approach, referred to as 'spatiotemporal data stream segmentation', or 'stream segmentation' for short, to address the dynamic nature and deal with 'raw' data of sensor networks. Stream segmentation is defined using Bayesian Networks in the context of sensor networks, and two application examples are given to demonstrate the usefulness of the approach.
文摘Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. This problem depends on the type and quality of data, and differs according to the volume of data set manipulated. In this paper we are going to introduce a novel framework based on extended fuzzy C-means algorithm by using topic ontology. This work aims to improve the OLAP querying process over heterogeneous data warehouses that contain big data sets, by improving query results integration, eliminating redundancies by using the extended classification algorithm, and measuring the loss of information.
基金Supported by the National Natural Science Foundation of China. Acknowledgements The National Science Foundation of China supported these works. Thanks to NSFC and all the members of the research groups in Renmin University of China.
文摘Database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.