The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena...The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena, but simple enough to use. A data modeling method which can help engineers to write C++ code with high quality is introduced.展开更多
The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific req...The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.展开更多
DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expres...DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.展开更多
Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to pro...Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to provide robust results because they mostly rely on the direct field investigations.This paper presents a novel approach involving high-resolution imagery and the Canopy-Height-Model(CHM)data to solve the ITDC problem.The new approach is studied in six urban scenes:farmland,woodland,park,industrial land,road and residential areas.First,it identifies tree canopy regions using a deep learning network from high-resolution imagery.It then deploys the CHM-data to detect treetops of the canopy regions using a local maximum algorithm and individual tree canopies using the region growing.Finally,it calculates and describes the number of individual trees and tree canopies.The proposed approach is experimented with the data from Shanghai,China.Our results show that the individual tree detection method had an average overall accuracy of 0.953,with a precision of 0.987 for woodland scene.Meanwhile,the R^(2) value for canopy segmentation in different urban scenes is greater than 0.780 and 0.779 for canopy area and diameter size,respectively.These results confirm that the proposed method is robust enough for urban tree planning and management.展开更多
Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The ...Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.展开更多
This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for struct...This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for structuring and analyzing data is underlined, as it enables the measurement of the adequacy between training and the needs of the labor market. The innovation of the study lies in the adaptation of the MERISE model to the local context, the development of innovative indicators, and the integration of a participatory approach including all relevant stakeholders. Contextual adaptation and local innovation: The study suggests adapting MERISE to the specific context of the Republic of Congo, considering the local particularities of the labor market. Development of innovative indicators and new measurement tools: It proposes creating indicators to assess skills matching and employer satisfaction, which are crucial for evaluating the effectiveness of vocational training. Participatory approach and inclusion of stakeholders: The study emphasizes actively involving training centers, employers, and recruitment agencies in the evaluation process. This participatory approach ensures that the perspectives of all stakeholders are considered, leading to more relevant and practical outcomes. Using the MERISE model allows for: • Rigorous data structuring, organization, and standardization: Clearly defining entities and relationships facilitates data organization and standardization, crucial for effective data analysis. • Facilitation of monitoring, analysis, and relevant indicators: Developing both quantitative and qualitative indicators helps measure the effectiveness of training in relation to the labor market, allowing for a comprehensive evaluation. • Improved communication and common language: By providing a common language for different stakeholders, MERISE enhances communication and collaboration, ensuring that all parties have a shared understanding. The study’s approach and contribution to existing research lie in: • Structured theoretical and practical framework and holistic approach: The study offers a structured framework for data collection and analysis, covering both quantitative and qualitative aspects, thus providing a comprehensive view of the training system. • Reproducible methodology and international comparison: The proposed methodology can be replicated in other contexts, facilitating international comparison and the adoption of best practices. • Extension of knowledge and new perspective: By integrating a participatory approach and developing indicators adapted to local needs, the study extends existing research and offers new perspectives on vocational training evaluation.展开更多
Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive p...Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive panel data model for predicting volume.First,potential factors influencing airline passenger volume are analyzed from Geo-economic and service-related aspects.Second,the principal component analysis(PCA)is applied to identify key factors that impact the airline passenger volume of city pairs.Then the panel data model is estimated using 120 sets of data,which are a collection of observations for multiple subjects at multiple instances.Finally,the airline data from Chongqing to Shanghai,from 2003 to 2012,was used as a test case to verify the validity of the prediction model.Results show that railway and highway transportation assumed a certain proportion of passenger volumes,and total retail sales of consumer goods in the departure and arrival cities are significantly associated with airline passenger volume.According to the validity test results,the prediction accuracies of the model for 10 sets of data are all greater than 90%.The model performs better than a multivariate regression model,thus assisting airport operators decide which routes to adjust and which new routes to introduce.展开更多
This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hac...This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.展开更多
This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictabi...This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.展开更多
Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusio...Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.展开更多
Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec...Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.展开更多
The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses we...The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm ...To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.展开更多
An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introdu...An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.展开更多
A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model...A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.展开更多
The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data...The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.展开更多
This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpos...This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpose of the series is to develop a trajectory-oriented road network data mode l, namely carriageway-based road network data model (CRNM). Part 1 deals with t he modeling background. Part 2 proposes the principle and architecture of the CR NM. Part 3 investigates the implementation of the CRNM in a case study. In the p resent paper, the challenges of managing trajectory data are discussed. Then, de veloping trajectory-oriented road network data models is proposed as a solution and existing road network data models are reviewed. Basic representation approa ches of a road network are introduced as well as its constitution.展开更多
This is the second of a three-part series of papers which presents the principle and architecture of the CRNM, a trajectory-oriented, carriageway-based road network data model. The first part of the series has introdu...This is the second of a three-part series of papers which presents the principle and architecture of the CRNM, a trajectory-oriented, carriageway-based road network data model. The first part of the series has introduced a general background of building trajectory-oriented road network data models, including motivation, related works, and basic concepts. Based on it, this paper describs the CRNM in detail. At first, the notion of basic roadway entity is proposed and discussed. Secondly, carriageway is selected as the basic roadway entity after compared with other kinds of roadway, and approaches to representing other roadways with carriageways are introduced. At last, an overall architecture of the CRNM is proposed.展开更多
Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorpt...Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.展开更多
文摘The fact that most engineering applications are developed by engineers themselves rather than computer professionals calls for the data modeling methods to be powerful enough to represent complex engineering phenomena, but simple enough to use. A data modeling method which can help engineers to write C++ code with high quality is introduced.
文摘The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.
文摘DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.
基金supported by the project funded by International Research Center of Big Data for Sustainable 740 Development Goals[Grant Number CBAS2022GSP07]Fundamental Research Funds for the Central Universities,Chongqing Natural Science Foundation[Grant Number CSTB2022NSCQMSX 2069]Ministry of Education of China[Grant Number 19JZD023].
文摘Individual Tree Detection-and-Counting(ITDC)is among the important tasks in town areas,and numerous methods are proposed in this direction.Despite their many advantages,still,the proposed methods are inadequate to provide robust results because they mostly rely on the direct field investigations.This paper presents a novel approach involving high-resolution imagery and the Canopy-Height-Model(CHM)data to solve the ITDC problem.The new approach is studied in six urban scenes:farmland,woodland,park,industrial land,road and residential areas.First,it identifies tree canopy regions using a deep learning network from high-resolution imagery.It then deploys the CHM-data to detect treetops of the canopy regions using a local maximum algorithm and individual tree canopies using the region growing.Finally,it calculates and describes the number of individual trees and tree canopies.The proposed approach is experimented with the data from Shanghai,China.Our results show that the individual tree detection method had an average overall accuracy of 0.953,with a precision of 0.987 for woodland scene.Meanwhile,the R^(2) value for canopy segmentation in different urban scenes is greater than 0.780 and 0.779 for canopy area and diameter size,respectively.These results confirm that the proposed method is robust enough for urban tree planning and management.
文摘Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.
文摘This study proposes the use of the MERISE conceptual data model to create indicators for monitoring and evaluating the effectiveness of vocational training in the Republic of Congo. The importance of MERISE for structuring and analyzing data is underlined, as it enables the measurement of the adequacy between training and the needs of the labor market. The innovation of the study lies in the adaptation of the MERISE model to the local context, the development of innovative indicators, and the integration of a participatory approach including all relevant stakeholders. Contextual adaptation and local innovation: The study suggests adapting MERISE to the specific context of the Republic of Congo, considering the local particularities of the labor market. Development of innovative indicators and new measurement tools: It proposes creating indicators to assess skills matching and employer satisfaction, which are crucial for evaluating the effectiveness of vocational training. Participatory approach and inclusion of stakeholders: The study emphasizes actively involving training centers, employers, and recruitment agencies in the evaluation process. This participatory approach ensures that the perspectives of all stakeholders are considered, leading to more relevant and practical outcomes. Using the MERISE model allows for: • Rigorous data structuring, organization, and standardization: Clearly defining entities and relationships facilitates data organization and standardization, crucial for effective data analysis. • Facilitation of monitoring, analysis, and relevant indicators: Developing both quantitative and qualitative indicators helps measure the effectiveness of training in relation to the labor market, allowing for a comprehensive evaluation. • Improved communication and common language: By providing a common language for different stakeholders, MERISE enhances communication and collaboration, ensuring that all parties have a shared understanding. The study’s approach and contribution to existing research lie in: • Structured theoretical and practical framework and holistic approach: The study offers a structured framework for data collection and analysis, covering both quantitative and qualitative aspects, thus providing a comprehensive view of the training system. • Reproducible methodology and international comparison: The proposed methodology can be replicated in other contexts, facilitating international comparison and the adoption of best practices. • Extension of knowledge and new perspective: By integrating a participatory approach and developing indicators adapted to local needs, the study extends existing research and offers new perspectives on vocational training evaluation.
基金The National Natural Science Fund of China(No.U1564201 and No.U51675235).
文摘Airline passenger volume is an important reference for the implementation of aviation capacity and route adjustment plans.This paper explores the determinants of airline passenger volume and proposes a comprehensive panel data model for predicting volume.First,potential factors influencing airline passenger volume are analyzed from Geo-economic and service-related aspects.Second,the principal component analysis(PCA)is applied to identify key factors that impact the airline passenger volume of city pairs.Then the panel data model is estimated using 120 sets of data,which are a collection of observations for multiple subjects at multiple instances.Finally,the airline data from Chongqing to Shanghai,from 2003 to 2012,was used as a test case to verify the validity of the prediction model.Results show that railway and highway transportation assumed a certain proportion of passenger volumes,and total retail sales of consumer goods in the departure and arrival cities are significantly associated with airline passenger volume.According to the validity test results,the prediction accuracies of the model for 10 sets of data are all greater than 90%.The model performs better than a multivariate regression model,thus assisting airport operators decide which routes to adjust and which new routes to introduce.
文摘This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.
文摘This study demonstrates the complexity and importance of water quality as a measure of the health and sustainability of ecosystems that directly influence biodiversity,human health,and the world economy.The predictability of water quality thus plays a crucial role in managing our ecosystems to make informed decisions and,hence,proper environmental management.This study addresses these challenges by proposing an effective machine learning methodology applied to the“Water Quality”public dataset.The methodology has modeled the dataset suitable for providing prediction classification analysis with high values of the evaluating parameters such as accuracy,sensitivity,and specificity.The proposed methodology is based on two novel approaches:(a)the SMOTE method to deal with unbalanced data and(b)the skillfully involved classical machine learning models.This paper uses Random Forests,Decision Trees,XGBoost,and Support Vector Machines because they can handle large datasets,train models for handling skewed datasets,and provide high accuracy in water quality classification.A key contribution of this work is the use of custom sampling strategies within the SMOTE approach,which significantly enhanced performance metrics and improved class imbalance handling.The results demonstrate significant improvements in predictive performance,achieving the highest reported metrics:accuracy(98.92%vs.96.06%),sensitivity(98.3%vs.71.26%),and F1 score(98.37%vs.79.74%)using the XGBoost model.These improvements underscore the effectiveness of our custom SMOTE sampling strategies in addressing class imbalance.The findings contribute to environmental management by enabling ecology specialists to develop more accurate strategies for monitoring,assessing,and managing drinking water quality,ensuring better ecosystem and public health outcomes.
基金Shenyang Pharmaceutical University Young and Middle aged Teacher Career Development Support PlanPublic Welfare Research Fund for Scientific Undertakings of Liaoning Province in 2022(Soft Science Research Plan)(No.2022JH4/10100040).
文摘Objective To study the causal relationship between R&D investment and enterprise performance of domestic pharmaceutical enterprises.Methods Panel data model was adopted for empirical analysis.Results and Conclusion Increasing the R&D investment intensity of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang by 1%will increase their profit margins by 0.79%and 0.46%.On the contrary,if the profit margin increases by 1%,the R&D investment intensity will increase by 0.25%and 0.19%.If the profit margin of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions increases by 1%,the R&D investment intensity will increase by 0.14%,0.07%and 0.1%,respectively,which are lower than those in the Yangtze River Delta and Zhejiang.The relationship between R&D investment and enterprise performance of pharmaceutical enterprises in the Yangtze River Delta and Zhejiang Province is Granger causality,showing a two-way positive effect.Profits and R&D investment of pharmaceutical enterprises in Beijing,Tianjin,Hebei,Chengdu,Chongqing and other regions are also Granger causality.But in the Pearl River Delta,profits and R&D investment have not passed the stability test,it is impossible to determine the causality between them.
文摘Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.
文摘The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘To improve the performance of the traditional map matching algorithms in freeway traffic state monitoring systems using the low logging frequency GPS (global positioning system) probe data, a map matching algorithm based on the Oracle spatial data model is proposed. The algorithm uses the Oracle road network data model to analyze the spatial relationships between massive GPS positioning points and freeway networks, builds an N-shortest path algorithm to find reasonable candidate routes between GPS positioning points efficiently, and uses the fuzzy logic inference system to determine the final matched traveling route. According to the implementation with field data from Los Angeles, the computation speed of the algorithm is about 135 GPS positioning points per second and the accuracy is 98.9%. The results demonstrate the effectiveness and accuracy of the proposed algorithm for mapping massive GPS positioning data onto freeway networks with complex geometric characteristics.
文摘An object oriented data modelling in computer aided design (CAD) databases is focused. Starting with the discussion of data modelling requirements for CAD applications, appropriate data modelling features are introduced herewith. A feasible approach to select the “best” data model for an application is to analyze the data which has to be stored in the database. A data model is appropriate for modelling a given task if the information of the application environment can be easily mapped to the data model. Thus, the involved data are analyzed and then object oriented data model appropriate for CAD applications are derived. Based on the reviewed object oriented techniques applied in CAD, object oriented data modelling in CAD is addressed in details. At last 3D geometrical data models and implementation of their data model using the object oriented method are presented.
文摘A uniform metadata representation is introduced for heterogeneous databases, multi media information and other information sources. Some features about metadata are analyzed. The limitation of existing metadata model is compared with the new one. The metadata model is described in XML which is fit for metadata denotation and exchange. The well structured data, semi structured data and those exterior file data without structure are described in the metadata model. The model provides feasibility and extensibility for constructing uniform metadata model of data warehouse.
文摘The conception of multilevel security (MLS) is commonly used in the study of data model for secure database. But there are some limitations in the basic MLS model, such as inference channels. The availability and data integrity of the system are seriously constrained by it′s 'No Read Up, No Write Down' property in the basic MLS model. In order to eliminate the covert channels, the polyinstantiation and the cover story are used in the new data model. The read and write rules have been redefined for improving the agility and usability of the system based on the MLS model. All the methods in the improved data model make the system more secure, agile and usable.
文摘This is the first of a three-part series of pape rs which introduces a general background of building trajectory-oriented road net work data models, including motivation, related works, and basic concepts. The p urpose of the series is to develop a trajectory-oriented road network data mode l, namely carriageway-based road network data model (CRNM). Part 1 deals with t he modeling background. Part 2 proposes the principle and architecture of the CR NM. Part 3 investigates the implementation of the CRNM in a case study. In the p resent paper, the challenges of managing trajectory data are discussed. Then, de veloping trajectory-oriented road network data models is proposed as a solution and existing road network data models are reviewed. Basic representation approa ches of a road network are introduced as well as its constitution.
文摘This is the second of a three-part series of papers which presents the principle and architecture of the CRNM, a trajectory-oriented, carriageway-based road network data model. The first part of the series has introduced a general background of building trajectory-oriented road network data models, including motivation, related works, and basic concepts. Based on it, this paper describs the CRNM in detail. At first, the notion of basic roadway entity is proposed and discussed. Secondly, carriageway is selected as the basic roadway entity after compared with other kinds of roadway, and approaches to representing other roadways with carriageways are introduced. At last, an overall architecture of the CRNM is proposed.
基金RPSEA and U.S.Department of Energy for partially funding this study
文摘Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism(sorption process and flow behavior in complex fracture systems- induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called "hard data" directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The "hard data" refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of "soft data"(non-measured, interpretive data such as frac length, width,height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.