Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
This paper applied the gray system theory to error data processing of NCmachine tools according to the characteristic. It presented the gray metabolism model of error dataprocessing. The test method for the model need...This paper applied the gray system theory to error data processing of NCmachine tools according to the characteristic. It presented the gray metabolism model of error dataprocessing. The test method for the model needs less capacity. Practice proved that the method issimple, calculation is easy, and results are exact.展开更多
Building cyber-physical system(CPS) models of machine tools is a key technology for intelligent manufacturing. The massive electronic data from a computer numerical control(CNC) system during the work processes of a C...Building cyber-physical system(CPS) models of machine tools is a key technology for intelligent manufacturing. The massive electronic data from a computer numerical control(CNC) system during the work processes of a CNC machine tool is the main source of the big data on which a CPS model is established. In this work-process model, a method based on instruction domain is applied to analyze the electronic big data, and a quantitative description of the numerical control(NC) processes is built according to the G code of the processes. Utilizing the instruction domain, a work-process CPS model is established on the basis of the accurate, real-time mapping of the manufacturing tasks, resources, and status of the CNC machine tool. Using such models, case studies are conducted on intelligent-machining applications, such as the optimization of NC processing parameters and the health assurance of CNC machine tools.展开更多
Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures ...Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long- range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process. Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results. Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.展开更多
Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among thes...Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.展开更多
Large language models(LLMs)have demonstrated powerful decision-making and planning capabilities with external tools in solving real-world tasks.However,there are limitations in the data construction of tool learning s...Large language models(LLMs)have demonstrated powerful decision-making and planning capabilities with external tools in solving real-world tasks.However,there are limitations in the data construction of tool learning such as high data structuring,symbolization,and privacy.This makes annotations costly and time-consuming,which is unsuitable for vertical deployment of tool-augmented model in real-world scenarios.Therefore,in this paper,we propose AuToGen,an automated tool learning data generation approach with domain-specific structured data.AuToGen leverages structured database table structures for keyword extraction.Then utilizes state-of-the-art LLMs for initial seed set generation and expands these sets for enriched tool-assisted model training data.Our experiment demonstrate that the high-quality of the data generated by AuToGen.Compared with the data generated by manually written seed sets,the model trained using AuToGen generated data has higher performance,proving that our method can efficiently assist in the deployment of real-world models.展开更多
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
文摘This paper applied the gray system theory to error data processing of NCmachine tools according to the characteristic. It presented the gray metabolism model of error dataprocessing. The test method for the model needs less capacity. Practice proved that the method issimple, calculation is easy, and results are exact.
基金support of the studies is from the National Major Scientific and Technological Special Project for "Development and comprehensive verification of complete products of open high-end CNC system, servo device and motor" (2012ZX04001012)
文摘Building cyber-physical system(CPS) models of machine tools is a key technology for intelligent manufacturing. The massive electronic data from a computer numerical control(CNC) system during the work processes of a CNC machine tool is the main source of the big data on which a CPS model is established. In this work-process model, a method based on instruction domain is applied to analyze the electronic big data, and a quantitative description of the numerical control(NC) processes is built according to the G code of the processes. Utilizing the instruction domain, a work-process CPS model is established on the basis of the accurate, real-time mapping of the manufacturing tasks, resources, and status of the CNC machine tool. Using such models, case studies are conducted on intelligent-machining applications, such as the optimization of NC processing parameters and the health assurance of CNC machine tools.
基金This work is supported by the National Basic Research Program of China (Nos. 2016YFA0100703 and 2015CB964800) and the National Natural Science Foundation of China (No. 31271354).
文摘Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long- range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process. Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results. Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
基金supported by the National Institutes of Health,United States(Grant Nos.U01CA200147 and DP1HD087990)awarded to SZ
文摘Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.
基金supported by the National Natural Science Foundation of China(No.62276095 and 72204261)the National Social Science Foundation of China(No.20&ZD047).
文摘Large language models(LLMs)have demonstrated powerful decision-making and planning capabilities with external tools in solving real-world tasks.However,there are limitations in the data construction of tool learning such as high data structuring,symbolization,and privacy.This makes annotations costly and time-consuming,which is unsuitable for vertical deployment of tool-augmented model in real-world scenarios.Therefore,in this paper,we propose AuToGen,an automated tool learning data generation approach with domain-specific structured data.AuToGen leverages structured database table structures for keyword extraction.Then utilizes state-of-the-art LLMs for initial seed set generation and expands these sets for enriched tool-assisted model training data.Our experiment demonstrate that the high-quality of the data generated by AuToGen.Compared with the data generated by manually written seed sets,the model trained using AuToGen generated data has higher performance,proving that our method can efficiently assist in the deployment of real-world models.