Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to...Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.展开更多
Objective: To establish an interactive management model for community-oriented high-risk osteoporosis in conjunction with a rural community health service center. Materials and Methods: Toward multidimensional analysi...Objective: To establish an interactive management model for community-oriented high-risk osteoporosis in conjunction with a rural community health service center. Materials and Methods: Toward multidimensional analysis of data, the system we developed combines basic principles of data warehouse technology oriented to the needs of community health services. This paper introduces the steps we took in constructing the data warehouse;the case presented here is that of a district community health management information system in Changshu, Jiangsu Province, China. For our data warehouse, we chose the MySQL 4.5 relational database, the Browser/Server, (B/S) model, and hypertext preprocessor as the development tools. Results: The system allowed online analysis processing and next-stage work preparation, and provided a platform for data management, data query, online analysis, etc., in community health service center, specialist outpatient for osteoporosis, and health administration sectors. Conclusion: The users of remote management system and data warehouse can include community health service centers, osteoporosis departments of hospitals, and health administration departments;provide reference for policymaking of health administrators, residents’ health information, and intervention suggestions for general practitioners in community health service centers, patients’ follow-up information for osteoporosis specialists in general hospitals.展开更多
Expenditure on wells constitute a significant part of the operational costs for a petroleum enterprise, where most of the cost results from drilling. This has prompted drilling departments to continuously look for wa...Expenditure on wells constitute a significant part of the operational costs for a petroleum enterprise, where most of the cost results from drilling. This has prompted drilling departments to continuously look for ways to reduce their drilling costs and be as efficient as possible. A system called the Drilling Comprehensive Information Management and Application System (DCIMAS) is developed and presented here, with an aim at collecting, storing and making full use of the valuable well data and information relating to all drilling activities and operations. The DCIMAS comprises three main parts, including a data collection and transmission system, a data warehouse (DW) management system, and an integrated platform of core applications. With the support of the application platform, the DW management system is introduced, whereby the operation data are captured at well sites and transmitted electronically to a data warehouse via transmission equipment and ETL (extract, transformation and load) tools. With the high quality of the data guaranteed, our central task is to make the best use of the operation data and information for drilling analysis and to provide further information to guide later production stages. Applications have been developed and integrated on a uniform platform to interface directly with different layers of the multi-tier DW. Now, engineers in every department spend less time on data handling and more time on applying technology in their real work with the system.展开更多
A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model...A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.展开更多
Data warehouse (DW), a new technology invented in 1990s, is more useful for integrating and analyzing massive data than traditional database. Its application in geology field can be divided into 3 phrases: 1992-1996,...Data warehouse (DW), a new technology invented in 1990s, is more useful for integrating and analyzing massive data than traditional database. Its application in geology field can be divided into 3 phrases: 1992-1996, commercial data warehouse (CDW) appeared; 1996-1999, geological data warehouse (GDW) appeared and the geologists or geographers realized the importance of DW and began the studies on it, but the practical DW still followed the framework of DB; 2000 to present, geological data warehouse grows, and the theory of geo-spatial data warehouse (GSDW) has been developed but the research in geological area is still deficient except that in geography. Although some developments of GDW have been made, its core still follows the CDW-organizing data by time and brings about 3 problems: difficult to integrate the geological data, for the data feature more space than time; hard to store the massive data in different levels due to the same reason; hardly support the spatial analysis if the data are organized by time as CDW does. So the GDW should be redesigned by organizing data by scale in order to store mass data in different levels and synthesize the data in different granularities, and choosing space control points to replace the former time control points so as to integrate different types of data by the method of storing one type data as one layer and then to superpose the layers. In addition, data cube, a wide used technology in CDW, will be no use in GDW, for the causality among the geological data is not so obvious as commercial data, as the data are the mixed result of many complex rules, and their analysis always needs the special geological methods and software; on the other hand, data cube for mass and complex geo-data will devour too much store space to be practical. On this point, the main purpose of GDW may be fit for data integration unlike CDW for data analysis.展开更多
This paper describes the process of design and construction of a data warehouse(“DW”)for an online learning platform using three prominent technologies,Microsoft SQL Server,MongoDB and Apache Hive.The three systems ...This paper describes the process of design and construction of a data warehouse(“DW”)for an online learning platform using three prominent technologies,Microsoft SQL Server,MongoDB and Apache Hive.The three systems are evaluated for corpus construction and descriptive analytics.The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts.Additionally,the paper addresses maintainability-performance tradeoff,storage considerations and accessibility of big data corpora.In this NSF-sponsored work,the data were processed,transformed,and stored in the three versions of a data warehouse in search for a better performing and more suitable platform.The data warehouse engines-a relational database,a No-SQL database,and a big data technology for parallel computations-were subjected to principled analysis.Design,construction and evaluation of a data warehouse were scrutinized to find improved ways of storing,organizing and extracting information.The work also examines building corpora,performing ad-hoc extractions,and ensuring confidentiality.It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB.In the aspect of analytical queries,the SQL Server was a top performer followed by MongoDB and Hive.This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations.Five phases for DW design are recommended:1)Establishing goals at the outset based on Evidence-Centered Design principles;2)Recognizing the unique demands of student data and use;3)Adopting a model that integrates cost with technical considerations;4)Designing a comparative database and 5)Planning for a DW design that is sustainable.Recommendations for future research include attempting DW design in contexts involving larger data sets,more refined operations,and ensuring attention is paid to sustainability of operations.展开更多
In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and d...In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.展开更多
Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accura...Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.展开更多
Since 1990s,the spatial data warehouse technology has rapidly been developing, but due to the complexity of multi-dimensional analysis, extensive application of the spatial data warehouse technology is affected. In th...Since 1990s,the spatial data warehouse technology has rapidly been developing, but due to the complexity of multi-dimensional analysis, extensive application of the spatial data warehouse technology is affected. In the light of the characteristics of the flood control and disaster mitigation in the Yangtze river basin, it is proposed to design a scheme about the subjects and data distribution of the spatial data warehouse of the flood control and disaster mitigation in Yangtze river basin, i.e., to adopt a distributed scheme. The creation and development of the spatial data warehouse of the flood control and disaster mitigation in Yangtze river basin is presented .The necessity and urgency of establishing the spatial data warehouse is expounded from the viewpoint of the present situation being short of available information for the flood control and disaster mitigation in Yangtze river basin.展开更多
Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and e...Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.展开更多
On the bas is of the reality of material supply management of the coal enterprise, this paper expounds plans of material management systems based on specific IT, and indicates the deficiencies, the problems of them an...On the bas is of the reality of material supply management of the coal enterprise, this paper expounds plans of material management systems based on specific IT, and indicates the deficiencies, the problems of them and the necessity of improving them. The structure, models and data organizing schema of the material management decision support system are investigated based on a new data management technology (data warehousing technology).展开更多
This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is...This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is developed and implemented.Earth scientists can utilize this system to not only manage their data,but also easily communicate and share it with other researchers.Data query methods provide the core functionality of the proposed management and sharing modules. The modules in this system have been developed using cloud GIS technologies,which help achieve real-time spatial area retrieval on a map. The system has been tested by approximately 263 users at Jilin University and Beijing SHRIMP Center. A survey was conducted among these users to estimate the usability of the primary functions of the system,and the assessment result is summarized and presented.展开更多
In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can su...In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.展开更多
To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maint...To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maintenance cost under a storage space constraint. First, a pre-processing algorithm based on the maximum benefit per unit space is used to generate initial solutions. Then, the initial solutions are improved by the genetic algorithm having the mixture of optimal strategies. Furthermore, the generated infeasible solutions during the evolution process are repaired by loss function. The experimental results show that the proposed algorithm outperforms the heuristic algorithm and canonical genetic algorithm in finding optimal solutions.展开更多
The challenge of achieving situational understanding is a limiting factor in effective, timely, and adaptive cyber-security analysis. Anomaly detection fills a critical role in network assessment and trend analysis, b...The challenge of achieving situational understanding is a limiting factor in effective, timely, and adaptive cyber-security analysis. Anomaly detection fills a critical role in network assessment and trend analysis, both of which underlie the establishment of comprehensive situational understanding. To that end, we propose a cyber security data warehouse implemented as a hierarchical graph of aggregations that captures anomalies at multiple scales. Each node of our proposed graph is a summarization table of cyber event aggregations, and the edges are aggregation operators. The cyber security data warehouse enables domain experts to quickly traverse a multi-scale aggregation space systematically. We describe the architecture of a test bed system and a summary of results on the IEEE VAST 2012 Cyber Forensics data.展开更多
The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesir...The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesired or of poor quality.A Data Warehouse(DW)is a huge collection of data gathered from many sources and an important part of any BI solution to assist management in making better decisions.The Extract,Transform,and Load(ETL)process is the backbone of a DW system,and it is responsible for moving data from source systems into the DW system.The more mature the ETL process the more reliable the DW system.In this paper,we propose the ETL Maturity Model(EMM)that assists organizations in achieving a high-quality ETL system and thereby enhancing the quality of knowledge produced.The EMM is made up of five levels of maturity i.e.,Chaotic,Acceptable,Stable,Efficient and Reliable.Each level of maturity contains Key Process Areas(KPAs)that have been endorsed by industry experts and include all critical features of a good ETL system.Quality Objectives(QOs)are defined procedures that,when implemented,resulted in a high-quality ETL process.Each KPA has its own set of QOs,the execution of which meets the requirements of that KPA.Multiple brainstorming sessions with relevant industry experts helped to enhance the model.EMMwas deployed in two key projects utilizing multiple case studies to supplement the validation process and support our claim.This model can assist organizations in improving their current ETL process and transforming it into a more mature ETL system.This model can also provide high-quality information to assist users inmaking better decisions and gaining their trust.展开更多
Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting ins...Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.展开更多
This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. T...This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. The maximum depth and the minimum depth of an individual CKT are equal and identical to data’s length. Insertion and deletion operations are defined; storage method and filtering algorithm are also designed for good compensation between efficiency and complexity. Applications to computer aided teaching of Chinese and protein selection show that an about 30% reduction of storage consumption and an over 60% reduction of computation may be easily obtained.展开更多
Along with the rapid development of internet,CRM has become one of the most important facts leading the enterprises to be competent.At the same time,the analytical CRM based on Date Warehouse is the kernel of CRM syst...Along with the rapid development of internet,CRM has become one of the most important facts leading the enterprises to be competent.At the same time,the analytical CRM based on Date Warehouse is the kernel of CRM system.This paper mainly explains the idea of CRM and the DW model of analytical CRM system.展开更多
基金Noncommunicable Chronic Diseases-National Science and Technology Major Project(2023ZD0503906)。
文摘Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.
文摘Objective: To establish an interactive management model for community-oriented high-risk osteoporosis in conjunction with a rural community health service center. Materials and Methods: Toward multidimensional analysis of data, the system we developed combines basic principles of data warehouse technology oriented to the needs of community health services. This paper introduces the steps we took in constructing the data warehouse;the case presented here is that of a district community health management information system in Changshu, Jiangsu Province, China. For our data warehouse, we chose the MySQL 4.5 relational database, the Browser/Server, (B/S) model, and hypertext preprocessor as the development tools. Results: The system allowed online analysis processing and next-stage work preparation, and provided a platform for data management, data query, online analysis, etc., in community health service center, specialist outpatient for osteoporosis, and health administration sectors. Conclusion: The users of remote management system and data warehouse can include community health service centers, osteoporosis departments of hospitals, and health administration departments;provide reference for policymaking of health administrators, residents’ health information, and intervention suggestions for general practitioners in community health service centers, patients’ follow-up information for osteoporosis specialists in general hospitals.
文摘Expenditure on wells constitute a significant part of the operational costs for a petroleum enterprise, where most of the cost results from drilling. This has prompted drilling departments to continuously look for ways to reduce their drilling costs and be as efficient as possible. A system called the Drilling Comprehensive Information Management and Application System (DCIMAS) is developed and presented here, with an aim at collecting, storing and making full use of the valuable well data and information relating to all drilling activities and operations. The DCIMAS comprises three main parts, including a data collection and transmission system, a data warehouse (DW) management system, and an integrated platform of core applications. With the support of the application platform, the DW management system is introduced, whereby the operation data are captured at well sites and transmitted electronically to a data warehouse via transmission equipment and ETL (extract, transformation and load) tools. With the high quality of the data guaranteed, our central task is to make the best use of the operation data and information for drilling analysis and to provide further information to guide later production stages. Applications have been developed and integrated on a uniform platform to interface directly with different layers of the multi-tier DW. Now, engineers in every department spend less time on data handling and more time on applying technology in their real work with the system.
文摘A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.
文摘Data warehouse (DW), a new technology invented in 1990s, is more useful for integrating and analyzing massive data than traditional database. Its application in geology field can be divided into 3 phrases: 1992-1996, commercial data warehouse (CDW) appeared; 1996-1999, geological data warehouse (GDW) appeared and the geologists or geographers realized the importance of DW and began the studies on it, but the practical DW still followed the framework of DB; 2000 to present, geological data warehouse grows, and the theory of geo-spatial data warehouse (GSDW) has been developed but the research in geological area is still deficient except that in geography. Although some developments of GDW have been made, its core still follows the CDW-organizing data by time and brings about 3 problems: difficult to integrate the geological data, for the data feature more space than time; hard to store the massive data in different levels due to the same reason; hardly support the spatial analysis if the data are organized by time as CDW does. So the GDW should be redesigned by organizing data by scale in order to store mass data in different levels and synthesize the data in different granularities, and choosing space control points to replace the former time control points so as to integrate different types of data by the method of storing one type data as one layer and then to superpose the layers. In addition, data cube, a wide used technology in CDW, will be no use in GDW, for the causality among the geological data is not so obvious as commercial data, as the data are the mixed result of many complex rules, and their analysis always needs the special geological methods and software; on the other hand, data cube for mass and complex geo-data will devour too much store space to be practical. On this point, the main purpose of GDW may be fit for data integration unlike CDW for data analysis.
文摘This paper describes the process of design and construction of a data warehouse(“DW”)for an online learning platform using three prominent technologies,Microsoft SQL Server,MongoDB and Apache Hive.The three systems are evaluated for corpus construction and descriptive analytics.The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts.Additionally,the paper addresses maintainability-performance tradeoff,storage considerations and accessibility of big data corpora.In this NSF-sponsored work,the data were processed,transformed,and stored in the three versions of a data warehouse in search for a better performing and more suitable platform.The data warehouse engines-a relational database,a No-SQL database,and a big data technology for parallel computations-were subjected to principled analysis.Design,construction and evaluation of a data warehouse were scrutinized to find improved ways of storing,organizing and extracting information.The work also examines building corpora,performing ad-hoc extractions,and ensuring confidentiality.It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB.In the aspect of analytical queries,the SQL Server was a top performer followed by MongoDB and Hive.This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations.Five phases for DW design are recommended:1)Establishing goals at the outset based on Evidence-Centered Design principles;2)Recognizing the unique demands of student data and use;3)Adopting a model that integrates cost with technical considerations;4)Designing a comparative database and 5)Planning for a DW design that is sustainable.Recommendations for future research include attempting DW design in contexts involving larger data sets,more refined operations,and ensuring attention is paid to sustainability of operations.
基金Supported by the Research Fund of Key GIS Lab of the Education Ministry (No. 200610)
文摘In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.
文摘Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.
文摘Since 1990s,the spatial data warehouse technology has rapidly been developing, but due to the complexity of multi-dimensional analysis, extensive application of the spatial data warehouse technology is affected. In the light of the characteristics of the flood control and disaster mitigation in the Yangtze river basin, it is proposed to design a scheme about the subjects and data distribution of the spatial data warehouse of the flood control and disaster mitigation in Yangtze river basin, i.e., to adopt a distributed scheme. The creation and development of the spatial data warehouse of the flood control and disaster mitigation in Yangtze river basin is presented .The necessity and urgency of establishing the spatial data warehouse is expounded from the viewpoint of the present situation being short of available information for the flood control and disaster mitigation in Yangtze river basin.
文摘Engineering data are separately organized and their schemas are increasingly complex and variable. Engineering data management systems are needed to be able to manage the unified data and to be both customizable and extensible. The design of the systems is heavily dependent on the flexibility and self-description of the data model. The characteristics of engineering data and their management facts are analyzed. Then engineering data warehouse (EDW) architecture and multi-layer metamodels are presented. Also an approach to manage anduse engineering data by a meta object is proposed. Finally, an application flight test EDW system (FTEDWS) is described and meta-objects to manage engineering data in the data warehouse are used. It shows that adopting a meta-modeling approach provides a support for interchangeability and a sufficiently flexible environment in which the system evolution and the reusability can be handled.
文摘On the bas is of the reality of material supply management of the coal enterprise, this paper expounds plans of material management systems based on specific IT, and indicates the deficiencies, the problems of them and the necessity of improving them. The structure, models and data organizing schema of the material management decision support system are investigated based on a new data management technology (data warehousing technology).
基金National Major Scientific Instruments and Equipment Development Special Funds,China(No.2016YFF0103303)National Science and Technology Support Program,China(No.2014BAK02B03)
文摘This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is developed and implemented.Earth scientists can utilize this system to not only manage their data,but also easily communicate and share it with other researchers.Data query methods provide the core functionality of the proposed management and sharing modules. The modules in this system have been developed using cloud GIS technologies,which help achieve real-time spatial area retrieval on a map. The system has been tested by approximately 263 users at Jilin University and Beijing SHRIMP Center. A survey was conducted among these users to estimate the usability of the primary functions of the system,and the assessment result is summarized and presented.
文摘In modern workforce management,the demand for new ways to maximize worker satisfaction,productivity,and security levels is endless.Workforce movement data such as those source data from an access control system can support this ongoing process with subsequent analysis.In this study,a solution to attaining this goal is proposed,based on the design and implementation of a data mart as part of a dimensional trajectory data warehouse(TDW)that acts as a repository for the management of movement data.A novel methodological approach is proposed for modeling multiple spatial and temporal dimensions in a logical model.The case study presented in this paper for modeling and analyzing workforce movement data is to support human resource management decision-making and the following discussion provides a representative example of the contribution of a TDW in the process of information management and decision support systems.The entire process of exporting,cleaning,consolidating,and transforming data is implemented to achieve an appropriate format for final import.Structured query language(SQL)queries demonstrate the convenience of dimensional design for data analysis,and valuable information can be extracted from the movements of employees on company premises to manage the workforce efficiently and effectively.Visual analytics through data visualization support the analysis and facilitate decisionmaking and business intelligence.
文摘To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maintenance cost under a storage space constraint. First, a pre-processing algorithm based on the maximum benefit per unit space is used to generate initial solutions. Then, the initial solutions are improved by the genetic algorithm having the mixture of optimal strategies. Furthermore, the generated infeasible solutions during the evolution process are repaired by loss function. The experimental results show that the proposed algorithm outperforms the heuristic algorithm and canonical genetic algorithm in finding optimal solutions.
文摘The challenge of achieving situational understanding is a limiting factor in effective, timely, and adaptive cyber-security analysis. Anomaly detection fills a critical role in network assessment and trend analysis, both of which underlie the establishment of comprehensive situational understanding. To that end, we propose a cyber security data warehouse implemented as a hierarchical graph of aggregations that captures anomalies at multiple scales. Each node of our proposed graph is a summarization table of cyber event aggregations, and the edges are aggregation operators. The cyber security data warehouse enables domain experts to quickly traverse a multi-scale aggregation space systematically. We describe the architecture of a test bed system and a summary of results on the IEEE VAST 2012 Cyber Forensics data.
基金King Saud University for funding this work through Researchers Supporting Project Number(RSP-2021/387),King Saud University,Riyadh,Saudi Arabia.
文摘The effectiveness of the Business Intelligence(BI)system mainly depends on the quality of knowledge it produces.The decision-making process is hindered,and the user’s trust is lost,if the knowledge offered is undesired or of poor quality.A Data Warehouse(DW)is a huge collection of data gathered from many sources and an important part of any BI solution to assist management in making better decisions.The Extract,Transform,and Load(ETL)process is the backbone of a DW system,and it is responsible for moving data from source systems into the DW system.The more mature the ETL process the more reliable the DW system.In this paper,we propose the ETL Maturity Model(EMM)that assists organizations in achieving a high-quality ETL system and thereby enhancing the quality of knowledge produced.The EMM is made up of five levels of maturity i.e.,Chaotic,Acceptable,Stable,Efficient and Reliable.Each level of maturity contains Key Process Areas(KPAs)that have been endorsed by industry experts and include all critical features of a good ETL system.Quality Objectives(QOs)are defined procedures that,when implemented,resulted in a high-quality ETL process.Each KPA has its own set of QOs,the execution of which meets the requirements of that KPA.Multiple brainstorming sessions with relevant industry experts helped to enhance the model.EMMwas deployed in two key projects utilizing multiple case studies to supplement the validation process and support our claim.This model can assist organizations in improving their current ETL process and transforming it into a more mature ETL system.This model can also provide high-quality information to assist users inmaking better decisions and gaining their trust.
文摘Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.
文摘This paper presents a simple complete K level tree (CKT) architecture for text database organization and rapid data filtering. A database is constructed as a CKT forest and each CKT contains data of the same length. The maximum depth and the minimum depth of an individual CKT are equal and identical to data’s length. Insertion and deletion operations are defined; storage method and filtering algorithm are also designed for good compensation between efficiency and complexity. Applications to computer aided teaching of Chinese and protein selection show that an about 30% reduction of storage consumption and an over 60% reduction of computation may be easily obtained.
文摘Along with the rapid development of internet,CRM has become one of the most important facts leading the enterprises to be competent.At the same time,the analytical CRM based on Date Warehouse is the kernel of CRM system.This paper mainly explains the idea of CRM and the DW model of analytical CRM system.