Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorpt...Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.展开更多
Target detection is always an important application in hyperspectral image processing field. In this paper, a spectral-spatial target detection algorithm for hyperspectral data is proposed.The spatial feature and spec...Target detection is always an important application in hyperspectral image processing field. In this paper, a spectral-spatial target detection algorithm for hyperspectral data is proposed.The spatial feature and spectral feature were unified based on the data filed theory and extracted by weighted manifold embedding. The novelties of the proposed method lie in two aspects. One is the way in which the spatial features and spectral features were fused as a new feature based on the data field theory, and the other is that local information was introduced to describe the decision boundary and explore the discriminative features for target detection. The extracted features based on data field modeling and manifold embedding techniques were considered for a target detection task.Three standard hyperspectral datasets were considered in the analysis. The effectiveness of the proposed target detection algorithm based on data field theory was proved by the higher detection rates with lower False Alarm Rates(FARs) with respect to those achieved by conventional hyperspectral target detectors.展开更多
Data modeling is the foundation of three-dimensional visualization technology. First the paper proposed the 3D integrated data model of stratum, laneway and drill on the basic of TIN and ARTP, and designed the relevan...Data modeling is the foundation of three-dimensional visualization technology. First the paper proposed the 3D integrated data model of stratum, laneway and drill on the basic of TIN and ARTP, and designed the relevant conceptual and logical model from the view of data model, and described the data structure of geometric elements of the model by adopting the object-oriented modeling idea. And then studied the key modeling technology of stratum, laneway and drill, introduced the ARTP modeling process of stratum, laneway and drill and studied the 3D geometric modeling process of different section laneways. At last, the paper realized the three-dimensional visualization system professionally coalmine-oriented, using SQL Server as background database, Visual C++6.0 and OpenGL as foreground development tools.展开更多
On the study of the basic characteristics of geological objects and the special requirement for computing 3D geological model, this paper gives an object-oriented 3D topologic data model. In this model, the geological...On the study of the basic characteristics of geological objects and the special requirement for computing 3D geological model, this paper gives an object-oriented 3D topologic data model. In this model, the geological objects are divided into four object classes: point, line, area and volume. The volume class is further divided into four subclasses: the composite volume, the complex volume, the simple volume and the component. Twelve kinds of topological relations and the related data structures are designed for the geological objects.展开更多
This paper describes multi view modeling and data model transformation for the modeling. We have proposed a reference model of CAD system generation, which can be applied to various domain specific languages. Howeve...This paper describes multi view modeling and data model transformation for the modeling. We have proposed a reference model of CAD system generation, which can be applied to various domain specific languages. However, the current CAD system generation cannot integrate data of multiple domains. Generally each domain has its own view of products. For example, in the domain of architectural structure, designers extract the necessary data from the data in architecture design. Domain experts translate one view into another view beyond domains using their own brains.The multi view modeling is a way to integrate product data of multiple domains, and make it possible to translate views among various domains by computers.展开更多
Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic mo...Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic monitoring sensor networks. Analysis of autocorrelation of the models shows that the proposed TSCBR model matches with the statistical characteristics of real data source closely. To further verify the validity of the TSCBR data source model, the performance metrics of power consumption and network lifetime was studied in the evaluation of sensor media access control (SMAC) algorithm. The simulation results show that compared with traditional data source models, TSCBR model can significantly improve accuracy of the algorithm evaluation.展开更多
Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to dist...Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.展开更多
Presented a study on the design and implementation of spatial data modelingand application in the spatial data organization and management of a coalfield geologicalenvironment database.Based on analysis of a number of...Presented a study on the design and implementation of spatial data modelingand application in the spatial data organization and management of a coalfield geologicalenvironment database.Based on analysis of a number of existing data models and takinginto account the unique data structure and characteristic, methodology and key techniquesin the object-oriented spatial data modeling were proposed for the coalfield geological environment.The model building process was developed using object-oriented technologyand the Unified Modeling Language (UML) on the platform of ESRI geodatabase datamodels.A case study of spatial data modeling in UML was presented with successful implementationin the spatial database of the coalfield geological environment.The modelbuilding and implementation provided an effective way of representing the complexity andspecificity of coalfield geological environment spatial data and an integrated managementof spatial and property data.展开更多
Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environ...Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environment, public health, welfare, and long-term economic growth, which affects the daily activities of human beings. In modern urban areas, the primary contaminants of groundwater are artificial products, such as gasoline and diesel. To protect this important water resource, a series of efforts have been exerted, including enforcement and remedial actions. Each year, the TGPC (Texas Groundwater Protection Committee) in US publishes a "Joint Groundwater Monitoring and Contamination Report" to describe historic and new contamination cases in each county, which is an important data source for the design of prevention strategies. In this paper, a DDM (data dependent modeling) approach is proposed to predict county-level NCC (new contamination cases). A case study with contamination information from Harris County in Texas was conducted to illustrate the modeling and prediction process with promising results. The one-step prediction error is 1.5%, while the two-step error is 12.1%. The established model can be used at the county-level, state-level, and even at the country-level. Besides, the prediction results could be a reference during decision-making processes.展开更多
This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictabi...This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.展开更多
Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusio...Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.展开更多
The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific req...The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.展开更多
Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(...Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(XCO_(2))since 2014.In this study,the OCO-2 XCO_(2)products were compared between in-situ data from the Total Carbon Column Network(TCCON)and Global Monitoring Division(GMD),and modeling data from CarbonTracker2019 over global ocean and land.Results showed that the OCO-2 XCO_(2)data are consistent with the TCCON and GMD in situ XCO_(2)data,with mean absolute biases of 0.25×10^(-6)and 0.67×10^(-6),respectively.Moreover,the OCO-2 XCO_(2)data are also consistent with the CarbonTracker2019 modeling XCO_(2)data,with mean absolute biases of 0.78×10^(-6)over ocean and 1.02×10^(-6)over land.The results indicated the high accuracy of the OCO-2 XCO_(2)product over global ocean which could be applied to estimate the air-sea CO_(2)flux.展开更多
To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm ...To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.展开更多
An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introdu...An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.展开更多
A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model...A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.展开更多
The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data...The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.展开更多
To manipulate the heterogeneous and distributed data better in the data grid,a dataspace management framework for grid data is proposed based on in-depth research on grid technology.Combining technologies in dataspace...To manipulate the heterogeneous and distributed data better in the data grid,a dataspace management framework for grid data is proposed based on in-depth research on grid technology.Combining technologies in dataspace management,such as data model iDM and query language iTrails,with the grid data access middleware OGSA-DAI,a grid dataspace management prototype system is built,in which tasks like data accessing,Abstraction,indexing,services management and answer-query are implemented by the OGSA-DAI workflows.Experimental results show that it is feasible to apply a dataspace management mechanism to the grid environment.Dataspace meets the grid data management needs in that it hides the heterogeneity and distribution of grid data and can adapt to the dynamic characteristics of the grid.The proposed grid dataspace management provides a new method for grid data management.展开更多
This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpos...This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpose of the series is to develop a trajectory-oriented road network data mode l, namely carriageway-based road network data model (CRNM). Part 1 deals with t he modeling background. Part 2 proposes the principle and architecture of the CR NM. Part 3 investigates the implementation of the CRNM in a case study. In the p resent paper, the challenges of managing trajectory data are discussed. Then, de veloping trajectory-oriented road network data models is proposed as a solution and existing road network data models are reviewed. Basic representation approa ches of a road network are introduced as well as its constitution.展开更多
基金RPSEA and U.S.Department of Energy for partially funding this study
文摘Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.
文摘Target detection is always an important application in hyperspectral image processing field. In this paper, a spectral-spatial target detection algorithm for hyperspectral data is proposed.The spatial feature and spectral feature were unified based on the data filed theory and extracted by weighted manifold embedding. The novelties of the proposed method lie in two aspects. One is the way in which the spatial features and spectral features were fused as a new feature based on the data field theory, and the other is that local information was introduced to describe the decision boundary and explore the discriminative features for target detection. The extracted features based on data field modeling and manifold embedding techniques were considered for a target detection task.Three standard hyperspectral datasets were considered in the analysis. The effectiveness of the proposed target detection algorithm based on data field theory was proved by the higher detection rates with lower False Alarm Rates(FARs) with respect to those achieved by conventional hyperspectral target detectors.
文摘Data modeling is the foundation of three-dimensional visualization technology. First the paper proposed the 3D integrated data model of stratum, laneway and drill on the basic of TIN and ARTP, and designed the relevant conceptual and logical model from the view of data model, and described the data structure of geometric elements of the model by adopting the object-oriented modeling idea. And then studied the key modeling technology of stratum, laneway and drill, introduced the ARTP modeling process of stratum, laneway and drill and studied the 3D geometric modeling process of different section laneways. At last, the paper realized the three-dimensional visualization system professionally coalmine-oriented, using SQL Server as background database, Visual C++6.0 and OpenGL as foreground development tools.
文摘On the study of the basic characteristics of geological objects and the special requirement for computing 3D geological model, this paper gives an object-oriented 3D topologic data model. In this model, the geological objects are divided into four object classes: point, line, area and volume. The volume class is further divided into four subclasses: the composite volume, the complex volume, the simple volume and the component. Twelve kinds of topological relations and the related data structures are designed for the geological objects.
文摘This paper describes multi view modeling and data model transformation for the modeling. We have proposed a reference model of CAD system generation, which can be applied to various domain specific languages. However, the current CAD system generation cannot integrate data of multiple domains. Generally each domain has its own view of products. For example, in the domain of architectural structure, designers extract the necessary data from the data in architecture design. Domain experts translate one view into another view beyond domains using their own brains.The multi view modeling is a way to integrate product data of multiple domains, and make it possible to translate views among various domains by computers.
基金The National Natural Science Foundation ofChia(No60372076)The Important cienceand Technology Key Item of Shanghai Science and Technology Bureau ( No05dz15004)
文摘Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic monitoring sensor networks. Analysis of autocorrelation of the models shows that the proposed TSCBR model matches with the statistical characteristics of real data source closely. To further verify the validity of the TSCBR data source model, the performance metrics of power consumption and network lifetime was studied in the evaluation of sensor media access control (SMAC) algorithm. The simulation results show that compared with traditional data source models, TSCBR model can significantly improve accuracy of the algorithm evaluation.
基金This work was supported by the National Key R&D Program of China(2020YFB0905900).
文摘Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.
基金Supported by the Natural Science Foundation of Shanxi Province(2008011028-2)
文摘Presented a study on the design and implementation of spatial data modelingand application in the spatial data organization and management of a coalfield geologicalenvironment database.Based on analysis of a number of existing data models and takinginto account the unique data structure and characteristic, methodology and key techniquesin the object-oriented spatial data modeling were proposed for the coalfield geological environment.The model building process was developed using object-oriented technologyand the Unified Modeling Language (UML) on the platform of ESRI geodatabase datamodels.A case study of spatial data modeling in UML was presented with successful implementationin the spatial database of the coalfield geological environment.The modelbuilding and implementation provided an effective way of representing the complexity andspecificity of coalfield geological environment spatial data and an integrated managementof spatial and property data.
文摘Groundwater is the water located beneath the earth's surface in the soil pore spaces and in the fractures of rock formations. As one of the most important natural resources, groundwater is associated with the environment, public health, welfare, and long-term economic growth, which affects the daily activities of human beings. In modern urban areas, the primary contaminants of groundwater are artificial products, such as gasoline and diesel. To protect this important water resource, a series of efforts have been exerted, including enforcement and remedial actions. Each year, the TGPC (Texas Groundwater Protection Committee) in US publishes a "Joint Groundwater Monitoring and Contamination Report" to describe historic and new contamination cases in each county, which is an important data source for the design of prevention strategies. In this paper, a DDM (data dependent modeling) approach is proposed to predict county-level NCC (new contamination cases). A case study with contamination information from Harris County in Texas was conducted to illustrate the modeling and prediction process with promising results. The one-step prediction error is 1.5%, while the two-step error is 12.1%. The established model can be used at the county-level, state-level, and even at the country-level. Besides, the prediction results could be a reference during decision-making processes.
文摘This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.
基金Shenyang Pharmaceutical University Young and Middle aged Teacher Career Development Support PlanPublic Welfare Research Fund for Scientific Undertakings of Liaoning Province in 2022(Soft Science Research Plan)(No.2022JH4/10100040).
文摘Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.
文摘The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.
基金The National Key Research and Development Programme of China under contract No.2017YFA0603004the Fund of Southern Marine Science and Engineering Guangdong Laboratory(Zhanjiang)(Zhanjiang Bay Laboratory)under contract No.ZJW-2019-08+1 种基金the National Natural Science Foundation of China under contract Nos 41825014,41676172 and 41676170the Global Change and Air-Sea Interaction Project of China under contract Nos GASI-02-SCS-YGST2-01,GASI-02-PACYGST2-01 and GASI-02-IND-YGST2-01。
文摘Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(XCO_(2))since 2014.In this study,the OCO-2 XCO_(2)products were compared between in-situ data from the Total Carbon Column Network(TCCON)and Global Monitoring Division(GMD),and modeling data from CarbonTracker2019 over global ocean and land.Results showed that the OCO-2 XCO_(2)data are consistent with the TCCON and GMD in situ XCO_(2)data,with mean absolute biases of 0.25×10^(-6)and 0.67×10^(-6),respectively.Moreover,the OCO-2 XCO_(2)data are also consistent with the CarbonTracker2019 modeling XCO_(2)data,with mean absolute biases of 0.78×10^(-6)over ocean and 1.02×10^(-6)over land.The results indicated the high accuracy of the OCO-2 XCO_(2)product over global ocean which could be applied to estimate the air-sea CO_(2)flux.
文摘To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.
文摘An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.
文摘A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.
文摘The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.
文摘To manipulate the heterogeneous and distributed data better in the data grid,a dataspace management framework for grid data is proposed based on in-depth research on grid technology.Combining technologies in dataspace management,such as data model iDM and query language iTrails,with the grid data access middleware OGSA-DAI,a grid dataspace management prototype system is built,in which tasks like data accessing,Abstraction,indexing,services management and answer-query are implemented by the OGSA-DAI workflows.Experimental results show that it is feasible to apply a dataspace management mechanism to the grid environment.Dataspace meets the grid data management needs in that it hides the heterogeneity and distribution of grid data and can adapt to the dynamic characteristics of the grid.The proposed grid dataspace management provides a new method for grid data management.
文摘This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpose of the series is to develop a trajectory-oriented road network data mode l, namely carriageway-based road network data model (CRNM). Part 1 deals with t he modeling background. Part 2 proposes the principle and architecture of the CR NM. Part 3 investigates the implementation of the CRNM in a case study. In the p resent paper, the challenges of managing trajectory data are discussed. Then, de veloping trajectory-oriented road network data models is proposed as a solution and existing road network data models are reviewed. Basic representation approa ches of a road network are introduced as well as its constitution.