Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to pro...Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to provide robust results because they mostly rely on the direct field investigations.This paper presents a novel approach involving high-resolution imagery and the Canopy-Height-Model(CHM)data to solve the ITDC problem.The new approach is studied in six urban scenes:farmland,woodland,park,industrial land,road and residential areas.First,it identifies tree canopy regions using a deep learning network from high-resolution imagery.It then deploys the CHM-data to detect treetops of the canopy regions using a local maximum algorithm and individual tree canopies using the region growing.Finally,it calculates and describes the number of individual trees and tree canopies.The proposed approach is experimented with the data from Shanghai,China.Our results show that the individual tree detection method had an average overall accuracy of 0.953,with a precision of 0.987 for woodland scene.Meanwhile,the R^(2) value for canopy segmentation in different urban scenes is greater than 0.780 and 0.779 for canopy area and diameter size,respectively.These results confirm that the proposed method is robust enough for urban tree planning and management.展开更多
The transmission of scientific data over long distances is required to enable interplanetary science expeditions. Current approaches include transmitting all collected data or transmitting low resolution data to enabl...The transmission of scientific data over long distances is required to enable interplanetary science expeditions. Current approaches include transmitting all collected data or transmitting low resolution data to enable ground controller review and selection of data for transmission. Model-based data transmission (MBDT) seeks to increase the amount of knowledge conveyed per unit of data transmitted by comparing high-resolution data collected in situ to a pre-existing (or potentially co-transmitted) model. This paper describes the application of MBDT to gravitational data and characterizes its utility and performance. This is performed by applying the MBDT technique to a selection of gravitational data previously collected for the Earth and comparing the transmission requirements to the level required for raw data transmis-sion and non-application-aware compression. Levels of transmission reduction up to 31.8% (without the use maximum-error-thresholding) and up to 97.17% (with the use of maximum-error-thresholding) resulted. These levels significantly exceed what is possible with non-application-aware compression.展开更多
The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific req...The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.展开更多
DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expres...DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.展开更多
Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The ...Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.展开更多
This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hac...This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.展开更多
Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive p...Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive panel data model for predicting volume.First,potential factors influencing airline passenger volume are analyzed from Geo-economic and service-related aspects.Second,the principal component analysis(PCA)is applied to identify key factors that impact the airline passenger volume of city pairs.Then the panel data model is estimated using 120 sets of data,which are a collection of observations for multiple subjects at multiple instances.Finally,the airline data from Chongqing to Shanghai,from 2003 to 2012,was used as a test case to verify the validity of the prediction model.Results show that railway and highway transportation assumed a certain proportion of passenger volumes,and total retail sales of consumer goods in the departure and arrival cities are significantly associated with airline passenger volume.According to the validity test results,the prediction accuracies of the model for 10 sets of data are all greater than 90%.The model performs better than a multivariate regression model,thus assisting airport operators decide which routes to adjust and which new routes to introduce.展开更多
This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for struct...This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for structuring and analyzing data is underlined, as it enables the measurement of the adequacy between training and the needs of the labor market. The innovation of the study lies in the adaptation of the MERISE model to the local context, the development of innovative indicators, and the integration of a participatory approach including all relevant stakeholders. Contextual adaptation and local innovation: The study suggests adapting MERISE to the specific context of the Republic of Congo, considering the local particularities of the labor market. Development of innovative indicators and new measurement tools: It proposes creating indicators to assess skills matching and employer satisfaction, which are crucial for evaluating the effectiveness of vocational training. Participatory approach and inclusion of stakeholders: The study emphasizes actively involving training centers, employers, and recruitment agencies in the evaluation process. This participatory approach ensures that the perspectives of all stakeholders are considered, leading to more relevant and practical outcomes. Using the MERISE model allows for: • Rigorous data structuring, organization, and standardization: Clearly defining entities and relationships facilitates data organization and standardization, crucial for effective data analysis. • Facilitation of monitoring, analysis, and relevant indicators: Developing both quantitative and qualitative indicators helps measure the effectiveness of training in relation to the labor market, allowing for a comprehensive evaluation. • Improved communication and common language: By providing a common language for different stakeholders, MERISE enhances communication and collaboration, ensuring that all parties have a shared understanding. The study’s approach and contribution to existing research lie in: • Structured theoretical and practical framework and holistic approach: The study offers a structured framework for data collection and analysis, covering both quantitative and qualitative aspects, thus providing a comprehensive view of the training system. • Reproducible methodology and international comparison: The proposed methodology can be replicated in other contexts, facilitating international comparison and the adoption of best practices. • Extension of knowledge and new perspective: By integrating a participatory approach and developing indicators adapted to local needs, the study extends existing research and offers new perspectives on vocational training evaluation.展开更多
Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusio...Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.展开更多
This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictabi...This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.展开更多
The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses we...The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.展开更多
Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec...Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm ...To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.展开更多
Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to...Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.展开更多
An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introdu...An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.展开更多
A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model...A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.展开更多
The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data...The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.展开更多
Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorpt...Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.展开更多
Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(...Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(XCO_(2))since 2014.In this study,the OCO-2 XCO_(2)products were compared between in-situ data from the Total Carbon Column Network(TCCON)and Global Monitoring Division(GMD),and modeling data from CarbonTracker2019 over global ocean and land.Results showed that the OCO-2 XCO_(2)data are consistent with the TCCON and GMD in situ XCO_(2)data,with mean absolute biases of 0.25×10^(-6)and 0.67×10^(-6),respectively.Moreover,the OCO-2 XCO_(2)data are also consistent with the CarbonTracker2019 modeling XCO_(2)data,with mean absolute biases of 0.78×10^(-6)over ocean and 1.02×10^(-6)over land.The results indicated the high accuracy of the OCO-2 XCO_(2)product over global ocean which could be applied to estimate the air-sea CO_(2)flux.展开更多
基金supported by the project funded by International Research Center of Big Data for Sustainable 740 Development Goals[Grant Number CBAS2022GSP07]Fundamental Research Funds for the Central Universities,Chongqing Natural Science Foundation[Grant Number CSTB2022NSCQMSX 2069]Ministry of Education of China[Grant Number 19JZD023].
文摘Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to provide robust results because they mostly rely on the direct field investigations.This paper presents a novel approach involving high-resolution imagery and the Canopy-Height-Model(CHM)data to solve the ITDC problem.The new approach is studied in six urban scenes:farmland,woodland,park,industrial land,road and residential areas.First,it identifies tree canopy regions using a deep learning network from high-resolution imagery.It then deploys the CHM-data to detect treetops of the canopy regions using a local maximum algorithm and individual tree canopies using the region growing.Finally,it calculates and describes the number of individual trees and tree canopies.The proposed approach is experimented with the data from Shanghai,China.Our results show that the individual tree detection method had an average overall accuracy of 0.953,with a precision of 0.987 for woodland scene.Meanwhile,the R^(2) value for canopy segmentation in different urban scenes is greater than 0.780 and 0.779 for canopy area and diameter size,respectively.These results confirm that the proposed method is robust enough for urban tree planning and management.
文摘The transmission of scientific data over long distances is required to enable interplanetary science expeditions. Current approaches include transmitting all collected data or transmitting low resolution data to enable ground controller review and selection of data for transmission. Model-based data transmission (MBDT) seeks to increase the amount of knowledge conveyed per unit of data transmitted by comparing high-resolution data collected in situ to a pre-existing (or potentially co-transmitted) model. This paper describes the application of MBDT to gravitational data and characterizes its utility and performance. This is performed by applying the MBDT technique to a selection of gravitational data previously collected for the Earth and comparing the transmission requirements to the level required for raw data transmis-sion and non-application-aware compression. Levels of transmission reduction up to 31.8% (without the use maximum-error-thresholding) and up to 97.17% (with the use of maximum-error-thresholding) resulted. These levels significantly exceed what is possible with non-application-aware compression.
文摘The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.
文摘DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.
文摘Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.
文摘This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.
基金The National Natural Science Fund of China(No.U1564201 and No.U51675235).
文摘Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive panel data model for predicting volume.First,potential factors influencing airline passenger volume are analyzed from Geo-economic and service-related aspects.Second,the principal component analysis(PCA)is applied to identify key factors that impact the airline passenger volume of city pairs.Then the panel data model is estimated using 120 sets of data,which are a collection of observations for multiple subjects at multiple instances.Finally,the airline data from Chongqing to Shanghai,from 2003 to 2012,was used as a test case to verify the validity of the prediction model.Results show that railway and highway transportation assumed a certain proportion of passenger volumes,and total retail sales of consumer goods in the departure and arrival cities are significantly associated with airline passenger volume.According to the validity test results,the prediction accuracies of the model for 10 sets of data are all greater than 90%.The model performs better than a multivariate regression model,thus assisting airport operators decide which routes to adjust and which new routes to introduce.
文摘This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for structuring and analyzing data is underlined, as it enables the measurement of the adequacy between training and the needs of the labor market. The innovation of the study lies in the adaptation of the MERISE model to the local context, the development of innovative indicators, and the integration of a participatory approach including all relevant stakeholders. Contextual adaptation and local innovation: The study suggests adapting MERISE to the specific context of the Republic of Congo, considering the local particularities of the labor market. Development of innovative indicators and new measurement tools: It proposes creating indicators to assess skills matching and employer satisfaction, which are crucial for evaluating the effectiveness of vocational training. Participatory approach and inclusion of stakeholders: The study emphasizes actively involving training centers, employers, and recruitment agencies in the evaluation process. This participatory approach ensures that the perspectives of all stakeholders are considered, leading to more relevant and practical outcomes. Using the MERISE model allows for: • Rigorous data structuring, organization, and standardization: Clearly defining entities and relationships facilitates data organization and standardization, crucial for effective data analysis. • Facilitation of monitoring, analysis, and relevant indicators: Developing both quantitative and qualitative indicators helps measure the effectiveness of training in relation to the labor market, allowing for a comprehensive evaluation. • Improved communication and common language: By providing a common language for different stakeholders, MERISE enhances communication and collaboration, ensuring that all parties have a shared understanding. The study’s approach and contribution to existing research lie in: • Structured theoretical and practical framework and holistic approach: The study offers a structured framework for data collection and analysis, covering both quantitative and qualitative aspects, thus providing a comprehensive view of the training system. • Reproducible methodology and international comparison: The proposed methodology can be replicated in other contexts, facilitating international comparison and the adoption of best practices. • Extension of knowledge and new perspective: By integrating a participatory approach and developing indicators adapted to local needs, the study extends existing research and offers new perspectives on vocational training evaluation.
基金Shenyang Pharmaceutical University Young and Middle aged Teacher Career Development Support PlanPublic Welfare Research Fund for Scientific Undertakings of Liaoning Province in 2022(Soft Science Research Plan)(No.2022JH4/10100040).
文摘Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.
文摘This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.
文摘The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.
文摘Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.
基金Item Sponsored by National Natural Science Foundation of China(51174253)
文摘Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.
文摘An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.
文摘A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.
文摘The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.
基金RPSEA and U.S.Department of Energy for partially funding this study
文摘Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.
基金The National Key Research and Development Programme of China under contract No.2017YFA0603004the Fund of Southern Marine Science and Engineering Guangdong Laboratory(Zhanjiang)(Zhanjiang Bay Laboratory)under contract No.ZJW-2019-08+1 种基金the National Natural Science Foundation of China under contract Nos 41825014,41676172 and 41676170the Global Change and Air-Sea Interaction Project of China under contract Nos GASI-02-SCS-YGST2-01,GASI-02-PACYGST2-01 and GASI-02-IND-YGST2-01。
文摘Atmospheric CO_(2)is one of key parameters to estimate air-sea CO_(2)flux.The Orbiting Carbon Observatory-2(OCO-2)satellite has observed the column-averaged dry-air mole fractions of global atmospheric carbon dioxide(XCO_(2))since 2014.In this study,the OCO-2 XCO_(2)products were compared between in-situ data from the Total Carbon Column Network(TCCON)and Global Monitoring Division(GMD),and modeling data from CarbonTracker2019 over global ocean and land.Results showed that the OCO-2 XCO_(2)data are consistent with the TCCON and GMD in situ XCO_(2)data,with mean absolute biases of 0.25×10^(-6)and 0.67×10^(-6),respectively.Moreover,the OCO-2 XCO_(2)data are also consistent with the CarbonTracker2019 modeling XCO_(2)data,with mean absolute biases of 0.78×10^(-6)over ocean and 1.02×10^(-6)over land.The results indicated the high accuracy of the OCO-2 XCO_(2)product over global ocean which could be applied to estimate the air-sea CO_(2)flux.