Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for rese...Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv...Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>展开更多
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments ...Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.展开更多
Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we de...Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.展开更多
Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Qu...Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Quality Classroom Intelligent Analysis Standard system.This system was measured from the dimensions of Class Eficiency,Equity and Democracy,referred to as CEED system.展开更多
With the development of remote sensing technology and computing science,remote sensing data present typical big data characteristics.The rapid development of remote sensing big data has brought a large number of data ...With the development of remote sensing technology and computing science,remote sensing data present typical big data characteristics.The rapid development of remote sensing big data has brought a large number of data processing tasks,which bring huge challenges to computing.Distributed computing is the primary means to process remote sensing big data,and task scheduling plays a key role in this process.This study analyzes the characteristics of batch processing of remote sensing big data.This paper uses the Hungarian algorithm as a basis for proposing a novel strategy for task assignment optimization of remote sensing big data batch workflow,called optimal sequence dynamic assignment algorithm,which is applicable to heterogeneously distributed computing environments.This strategy has two core contents:the improved Hungarian algorithm model and the multi-level optimal assignment task queue mechanism.Moreover,the strategy solves the dependency,mismatch,and computational resource idleness problems in the optimal scheduling of remote sensing batch processing tasks.The proposed strategy likewise effectively improves data processing efficiency without increasing computer hardware resources and without optimizing the computational algorithm.We experimented with the aerosol optical depth retrieval algorithm workflow using this strategy.Compared with the processing before optimization,the makespan of the proposed method was shortened by at least 20%.Compared with popular scheduling algorithm,the proposed method has evident competitiveness in acceleration effect and large-scale task scheduling.展开更多
Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal alg...Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal algorithms cannot be performed. In order to make GPS/MET data assimilation able to satisfy the demand of numerical weather prediction, finding an algorithm with a great convergence rate of iteration will be the most important thing. A new method is presented that dynamically combines the limited memory BFGS (L-BFGS) method with the Hessian-free Newton(HFN) method, and it has a good rate of convergence in iteration. The numerical tests indicate that the computational efficiency of the method is better than the L-BFGS and HFN methods.展开更多
随着新能源汽车市场占有份额不断增大,从相关用户评论中挖掘用户需求及分析情感倾向十分重要,然而该领域评论复杂多样,常用的分析方法难以实现多维度、深层次的数据挖掘,提炼其真实情感仍然存在着不小的难度及挑战。针对以上问题,提出...随着新能源汽车市场占有份额不断增大,从相关用户评论中挖掘用户需求及分析情感倾向十分重要,然而该领域评论复杂多样,常用的分析方法难以实现多维度、深层次的数据挖掘,提炼其真实情感仍然存在着不小的难度及挑战。针对以上问题,提出一种基于Transformer的双向编码(Bidirectional Encoder Representation from Transformers,BERT)和VADER规则的情感分析框架。该框架由BERT模型对用户评论情感倾向进行分类预测、VADER情感词库进行情感打分,并根据最终得分结果进行综合比较分析。该情感分析框架在爬取的“汽车之家”和“爱卡汽车网”的50520条用户评论数据集上表现出良好的情感分析效果,能准确识别用户情感并体现出不同新能源汽车品牌之间的区分度,可以为企业产品改进创新和用户选择产品时提供相关建议和参考价值。展开更多
Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challengin...Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem.This study focuses on the construction of an effective solution designed for spatiotemporal data to predict the traffic state of large-scale traffic systems.In this study,we first summarize the three challenges faced by large-scale traffic state prediction,i.e.,scale,granularity,and sparsity.Based on the domain knowledge of traffic engineering,the propagation of traffic states along the road network is theoretically analyzed,which are elaborated in aspects of the temporal and spatial propagation of traffic state,traffic state experience replay,and multi-source data fusion.A deep learning architecture,termed as Deep Traffic State Prediction(DeepTSP),is therefore proposed to address the current challenges in traffic state prediction.Experiments demonstrate that the proposed DeepTSP model can effectively predict large-scale traffic states.展开更多
Galaxy groups are essential for studying the distribution of matter on a large scale in redshift surveys and for deciphering the link between galaxy traits and their associated halos.In this work,we propose a widely a...Galaxy groups are essential for studying the distribution of matter on a large scale in redshift surveys and for deciphering the link between galaxy traits and their associated halos.In this work,we propose a widely applicable method for identifying groups through machine learning techniques in real space,taking into account the impact of redshift distortion.Our methodology involves two neural networks:one is a classification model for identifying central galaxy groups,and the other is a regression model for predicting the mass of these groups.Both models input observable galaxy traits,allowing future applicability to real survey data.Testing on simulated datasets indicates our method accurately identifies over 92%of groups with M_(vir)≥10^(11) h^(−1)M_(⊙),with 80%achieving a membership completeness of at least 80%.The predicted group masses vary by less than 0.3 dex across different mass scales,even in the absence of a priori data.Our network adapts seamlessly to expand to sparse samples with a flux limit of mr<14,to high redshift samples at z=1.08,and to galaxy samples from the TNG300 hydrodynamical simulation without further training.Furthermore,the framework can easily adjust to real surveys by training on redshift-distorted samples without needing parameter changes.Careful consideration of different observational effects in redshift space makes it promising that this method will be applicable to real galaxy surveys.展开更多
We present the application of a machine learning based galaxy group finder to real observational data from the Sloan Digital Sky Survey Data Release 13(SDSS DR13).Originally designed and validated using simulated gala...We present the application of a machine learning based galaxy group finder to real observational data from the Sloan Digital Sky Survey Data Release 13(SDSS DR13).Originally designed and validated using simulated galaxy surveys in redshift space,our method utilizes deep neural networks to recognize galaxy groups and assess their respective halo masses.The model comprises three components:a central galaxy identifier,a group mass estimator,and an iterative group finder.Using mock catalogs from the Millennium Simulation,our model attains above 90%completeness and purity for groups covering a wide range of halo masses from~10^(11)to~10^(15)h^(-1)Me.When applied to SDSS DR13,it successfully identifies over 420,000 galaxy groups,displaying a strong agreement in group abundance,redshift distribution,and halo mass distribution with conventional techniques.The precision in identifying member galaxies is also notably high,with more than 80%of groups with lower mass achieving perfect alignments.The model shows strong performance across different magnitude thresholds,making retraining unnecessary.These results confirm the efficiency and adaptability of our methodology,offering a scalable and accurate solution for upcoming large-scale galaxy surveys and studies of cosmological formations.Our SDSS group catalog and the essential observable properties of galaxies are available at https://github.com/Juntao Ma/SDSS-DR13-group-catalog.git.展开更多
The rise in construction activities within mountainous regions has significantly increased the frequency of rockfalls.Statistical models for rockfall hazard assessment often struggle to achieve high precision on a lar...The rise in construction activities within mountainous regions has significantly increased the frequency of rockfalls.Statistical models for rockfall hazard assessment often struggle to achieve high precision on a large scale.This limitation arises primarily from the scarcity of historical rockfall data and the inadequacy of conventional assessment indicators in capturing the physical and structural characteristics of rockfalls.This study proposes a physically based deterministic model designed to accurately quantify rockfall hazards at a large scale.The model accounts for multiple rockfall failure modes and incorporates the key physical and structural parameters of the rock mass.Rockfall hazard is defined as the product of three factors:the rockfall failure probability,the probability of reaching a specific position,and the corresponding impact intensity.The failure probability includes probabilities of formation and instability of rock blocks under different failure modes,modeled based on the combination patterns of slope surfaces and rock discontinuities.The Monte Carlo method is employed to account for the randomness of mechanical and geometric parameters when quantifying instability probabilities.Additionally,the rock trajectories and impact energies simulated using Flow-R software are combined with rockfall failure probability to enable regional rockfall hazard zoning.A case study was conducted in Tiefeng,Chongqing,China,considering four types of rockfall failure modes.Hazard zoning results identified the steep and elevated terrains of the northern and southern anaclinal slopes as areas of highest rockfall hazard.These findings align with observed conditions,providing detailed hazard zoning and validating the effectiveness and potential of the proposed model.展开更多
In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.Th...In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information,spatial location,and correlation distribution using Bayes’rule.This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them,thus significantly reducing the computational cost.Furthermore,this enables reconstruction of the original data more accurately than existing methods.We demonstrate the effectiveness of our technique using six datasets,with the largest having one billion grid points.The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.展开更多
文摘Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110IUCAA,Pune for providing support through an associateship program+1 种基金IISER Tirupati for support through a postdoctoral fellowshipFunding for the SDSS and SDSS-Ⅱhas been provided by the Alfred P.Sloan Foundation,the U.S.Department of Energy,the National Aeronautics and Space Administration,the Japanese Monbukagakusho,the Max Planck Society,and the Higher Education Funding Council for England。
文摘Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.
基金supported by the National Natural Science Foundation of China,China(32470657 and 32270673).
文摘Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.
基金supported by the China National Social Science Foundation(BHA220144).
文摘Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Quality Classroom Intelligent Analysis Standard system.This system was measured from the dimensions of Class Eficiency,Equity and Democracy,referred to as CEED system.
基金supported by the National Natural Science Foundation of China(NSFC)under grant No.[42275147].
文摘With the development of remote sensing technology and computing science,remote sensing data present typical big data characteristics.The rapid development of remote sensing big data has brought a large number of data processing tasks,which bring huge challenges to computing.Distributed computing is the primary means to process remote sensing big data,and task scheduling plays a key role in this process.This study analyzes the characteristics of batch processing of remote sensing big data.This paper uses the Hungarian algorithm as a basis for proposing a novel strategy for task assignment optimization of remote sensing big data batch workflow,called optimal sequence dynamic assignment algorithm,which is applicable to heterogeneously distributed computing environments.This strategy has two core contents:the improved Hungarian algorithm model and the multi-level optimal assignment task queue mechanism.Moreover,the strategy solves the dependency,mismatch,and computational resource idleness problems in the optimal scheduling of remote sensing batch processing tasks.The proposed strategy likewise effectively improves data processing efficiency without increasing computer hardware resources and without optimizing the computational algorithm.We experimented with the aerosol optical depth retrieval algorithm workflow using this strategy.Compared with the processing before optimization,the makespan of the proposed method was shortened by at least 20%.Compared with popular scheduling algorithm,the proposed method has evident competitiveness in acceleration effect and large-scale task scheduling.
基金the National Excellent Youth Fund(Grant No.49825109)the CAS Key Innovation Direction Project(Grant No.KZCX2-208),and LASG Project.
文摘Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal algorithms cannot be performed. In order to make GPS/MET data assimilation able to satisfy the demand of numerical weather prediction, finding an algorithm with a great convergence rate of iteration will be the most important thing. A new method is presented that dynamically combines the limited memory BFGS (L-BFGS) method with the Hessian-free Newton(HFN) method, and it has a good rate of convergence in iteration. The numerical tests indicate that the computational efficiency of the method is better than the L-BFGS and HFN methods.
文摘随着新能源汽车市场占有份额不断增大,从相关用户评论中挖掘用户需求及分析情感倾向十分重要,然而该领域评论复杂多样,常用的分析方法难以实现多维度、深层次的数据挖掘,提炼其真实情感仍然存在着不小的难度及挑战。针对以上问题,提出一种基于Transformer的双向编码(Bidirectional Encoder Representation from Transformers,BERT)和VADER规则的情感分析框架。该框架由BERT模型对用户评论情感倾向进行分类预测、VADER情感词库进行情感打分,并根据最终得分结果进行综合比较分析。该情感分析框架在爬取的“汽车之家”和“爱卡汽车网”的50520条用户评论数据集上表现出良好的情感分析效果,能准确识别用户情感并体现出不同新能源汽车品牌之间的区分度,可以为企业产品改进创新和用户选择产品时提供相关建议和参考价值。
基金supported by the Distinguished Young Scholar Project(No.71922007)of the National Natural Science Foundation of China,and supported in part by the Jiangsu Provincial Key Laboratory of Networked Collective Intelligence under Grant BM2017002part of a project that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.101025896.
文摘Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem.This study focuses on the construction of an effective solution designed for spatiotemporal data to predict the traffic state of large-scale traffic systems.In this study,we first summarize the three challenges faced by large-scale traffic state prediction,i.e.,scale,granularity,and sparsity.Based on the domain knowledge of traffic engineering,the propagation of traffic states along the road network is theoretically analyzed,which are elaborated in aspects of the temporal and spatial propagation of traffic state,traffic state experience replay,and multi-source data fusion.A deep learning architecture,termed as Deep Traffic State Prediction(DeepTSP),is therefore proposed to address the current challenges in traffic state prediction.Experiments demonstrate that the proposed DeepTSP model can effectively predict large-scale traffic states.
基金supported by the National Key R&D Program of China(2022YFA1602901)the National Natural Science Foundation of China(NSFC,grant Nos.11988101,11873051,12125302,and 11903043)+2 种基金CAS Project for Young Scientists in Basic Research(grant No.YSBR-062)the China Manned Space Program(grant Nos.CMS-CSST-2025-A03 and CMSCSST-2025-A10)the K.C.Wong Education Foundation.
文摘Galaxy groups are essential for studying the distribution of matter on a large scale in redshift surveys and for deciphering the link between galaxy traits and their associated halos.In this work,we propose a widely applicable method for identifying groups through machine learning techniques in real space,taking into account the impact of redshift distortion.Our methodology involves two neural networks:one is a classification model for identifying central galaxy groups,and the other is a regression model for predicting the mass of these groups.Both models input observable galaxy traits,allowing future applicability to real survey data.Testing on simulated datasets indicates our method accurately identifies over 92%of groups with M_(vir)≥10^(11) h^(−1)M_(⊙),with 80%achieving a membership completeness of at least 80%.The predicted group masses vary by less than 0.3 dex across different mass scales,even in the absence of a priori data.Our network adapts seamlessly to expand to sparse samples with a flux limit of mr<14,to high redshift samples at z=1.08,and to galaxy samples from the TNG300 hydrodynamical simulation without further training.Furthermore,the framework can easily adjust to real surveys by training on redshift-distorted samples without needing parameter changes.Careful consideration of different observational effects in redshift space makes it promising that this method will be applicable to real galaxy surveys.
基金supported by the National Key R&D Program of China(2022YFA1602901)the National Natural Science Foundation of China(NSFC,grant Nos.11988101,11873051,12125302,and 11903043)+2 种基金CAS Project for Young Scientists in Basic Research(grant No.YSBR-062)the China Manned Space Program(grant Nos.CMS-CSST2025-A03 and CMS-CSST-2025-A10)the K.C.Wong Education Foundation。
文摘We present the application of a machine learning based galaxy group finder to real observational data from the Sloan Digital Sky Survey Data Release 13(SDSS DR13).Originally designed and validated using simulated galaxy surveys in redshift space,our method utilizes deep neural networks to recognize galaxy groups and assess their respective halo masses.The model comprises three components:a central galaxy identifier,a group mass estimator,and an iterative group finder.Using mock catalogs from the Millennium Simulation,our model attains above 90%completeness and purity for groups covering a wide range of halo masses from~10^(11)to~10^(15)h^(-1)Me.When applied to SDSS DR13,it successfully identifies over 420,000 galaxy groups,displaying a strong agreement in group abundance,redshift distribution,and halo mass distribution with conventional techniques.The precision in identifying member galaxies is also notably high,with more than 80%of groups with lower mass achieving perfect alignments.The model shows strong performance across different magnitude thresholds,making retraining unnecessary.These results confirm the efficiency and adaptability of our methodology,offering a scalable and accurate solution for upcoming large-scale galaxy surveys and studies of cosmological formations.Our SDSS group catalog and the essential observable properties of galaxies are available at https://github.com/Juntao Ma/SDSS-DR13-group-catalog.git.
基金supported by the National Natural Science Foundation of China(Grant Nos.42172318 and 42377186)the National Key R&D Program of China(Grant No.2023YFC3007201).
文摘The rise in construction activities within mountainous regions has significantly increased the frequency of rockfalls.Statistical models for rockfall hazard assessment often struggle to achieve high precision on a large scale.This limitation arises primarily from the scarcity of historical rockfall data and the inadequacy of conventional assessment indicators in capturing the physical and structural characteristics of rockfalls.This study proposes a physically based deterministic model designed to accurately quantify rockfall hazards at a large scale.The model accounts for multiple rockfall failure modes and incorporates the key physical and structural parameters of the rock mass.Rockfall hazard is defined as the product of three factors:the rockfall failure probability,the probability of reaching a specific position,and the corresponding impact intensity.The failure probability includes probabilities of formation and instability of rock blocks under different failure modes,modeled based on the combination patterns of slope surfaces and rock discontinuities.The Monte Carlo method is employed to account for the randomness of mechanical and geometric parameters when quantifying instability probabilities.Additionally,the rock trajectories and impact energies simulated using Flow-R software are combined with rockfall failure probability to enable regional rockfall hazard zoning.A case study was conducted in Tiefeng,Chongqing,China,considering four types of rockfall failure modes.Hazard zoning results identified the steep and elevated terrains of the northern and southern anaclinal slopes as areas of highest rockfall hazard.These findings align with observed conditions,providing detailed hazard zoning and validating the effectiveness and potential of the proposed model.
基金supported by the Chinese Postdoctoral Science Foundation(2021M700016).
文摘In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information,spatial location,and correlation distribution using Bayes’rule.This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them,thus significantly reducing the computational cost.Furthermore,this enables reconstruction of the original data more accurately than existing methods.We demonstrate the effectiveness of our technique using six datasets,with the largest having one billion grid points.The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.