This paper exploits coding to speed up computation offloading in a multi-server mobile edge computing(MEC)network with straggling servers and channel fading.The specific task we consider is to compute the product betw...This paper exploits coding to speed up computation offloading in a multi-server mobile edge computing(MEC)network with straggling servers and channel fading.The specific task we consider is to compute the product between a user-generated input data matrix and a large-scale model matrix that is stored distributively across the multiple edge nodes.The key idea of coding is to introduce computation redundancy to improve robustness against straggling servers and to create communication redundancy to improve reliability against channel fading.We utilize the hybrid design of maximum distance separable(MDS)coding and repetition coding.Based on the hybrid coding scheme,we conduct theoretical analysis on the average task uploading time,average edge computing time,and average output downloading time,respectively and then obtain the end-to-end task execution time.Numerical results demonstrate that when the task uploading phase or the edge computing phase is the performance bottleneck,the hybrid coding reduces to MDS coding;when the downlink transmission is the bottleneck,the hybrid coding reduces to repetition coding.The hybrid coding also outperforms the entangled polynomial coding that causes higher uplink and downlink communication loads.展开更多
Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined...Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.展开更多
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the ...This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.展开更多
Emerging hardware like non-volatile memory(NVM)and high-speed network interface cards are promising to improve the performance of matrix multiplication.However,a critical challenge in achieving high performance is the...Emerging hardware like non-volatile memory(NVM)and high-speed network interface cards are promising to improve the performance of matrix multiplication.However,a critical challenge in achieving high performance is the tradeoff between horizontal communication(data movement between processors)and vertical communication(data movement across memory hierarchies).In this paper,we provide an analysis in the distributed memory parallel model with additional consideration for communication between main memory and cache.We measure joint communication as the sum of the horizontal bandwidth and vertical bandwidth cost,and study the joint-communication cost of square matrix multiplication in the read-write symmetric setting(such as DRAM)and asymmetric setting(such as NVM).Specifically,we identify that in the symmetric setting,a joint-communication optimal algorithm can be directly obtained by combining the horizontally optimal and vertically optimal algorithms.We also identify that in the asymmetric setting,horizontal and vertical communications cannot be optimal at the same time,which means that there is a tradeoff between the two communications.In this case,we first present a joint-communication lower bound,and then we propose Joint-Communication Optimal Matrix Multiplication Algorithm(JOMMA),a parallel matrix multiplication algorithm whose joint-communication complexity meets the lower bound.The key idea behind JOMMA is to derive optimal matrix dimensions that each processor locally performs,which leads to determining the processor grid and an optimal schedule.展开更多
Optical computing accelerators,with high parallelism,large bandwidth,and low transmission loss,have the potential to enhance electronic computing in both computational power and energy efficiency.Photonic acceleration...Optical computing accelerators,with high parallelism,large bandwidth,and low transmission loss,have the potential to enhance electronic computing in both computational power and energy efficiency.Photonic acceleration plays a crucial role in supporting computationally intensive operations,such as dynamic-static matrix multiplication,significantly improving overall efficiency.Existing photonic architectures for dynamic-static matrix multiplication depend on complex coherent optical systems or costly nano-optics fabrication,limiting scalability.This study introduces a novel quantum dot fluorescence-based dynamic-static matrix multiplication photonic acceleration architecture that eliminates the need for coherent light sources or intricate fabrication.By leveraging simple,cost-effective quantum dot preparation and printing techniques,this architecture has significant potential for large-scale,high-performance,low-cost photonic accelerators.We detail the mathematical and physical mechanisms of the proposed architecture,experimentally validate the key physical processes,and demonstrate its application in template matching for image recognition,achieving 95%accuracy.展开更多
This article introduces the approach on studying the computational complexity of matrix multiplication by ranks of the matrix multiplication tensors.Basic results and recent developments in this area are reviewed.
Optics is a potential candidate in information, data, and image processing. In all-optical data and information processing, optics has been used as information carrying signal because of its inherent advantages of par...Optics is a potential candidate in information, data, and image processing. In all-optical data and information processing, optics has been used as information carrying signal because of its inherent advantages of parallelism. Several optical methods are proposed in support of the above processing. In many algebraic, arithmetic, and image processing schemes fundamental logic and memory operations are conducted exploring all-optical devices. In this communication we report an all-optical matrix multiplication operation with non-linear material based switching circuit.展开更多
We study asymptotically fast multiplication algorithms for matrix pairs of arbitrary dimensions, and optimize the exponents of their arithmetic complexity bounds. For a large class of input matrix pairs, we improve th...We study asymptotically fast multiplication algorithms for matrix pairs of arbitrary dimensions, and optimize the exponents of their arithmetic complexity bounds. For a large class of input matrix pairs, we improve the known exponents. We also show some applications of our results: (i) we decrease from O(n 2 + n 1+o(1)logq) to O(n 1.9998 + n 1+o(1)logq) the known arithmetic complexity bound for the univariate polynomial factorization of degree n over a finite field with q elements; (ii) we decrease from 2.837 to 2.7945 the known exponent of the work and arithmetic processor bounds for fast deterministic (NC) parallel evaluation of the determinant, the characteristic polynomial, and the inverse of an n × n matrix, as well as for the solution to a nonsingular linear system of n equations; (iii) we decrease from O(m 1.575 n) to O(m 1.5356 n) the known bound for computing basic solutions to a linear programming problem with m constraints and n variables.展开更多
In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations t...In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.展开更多
The purpose of this note is to establish a general representation of Hankel matrices of Bell numbers and the convoluted Bell numbers. As a special case, the results of Aigner are extended.
In this paper, we consider the problem of irregular shapes tracking for multiple extended targets by introducing the Gaussian surface matrix(GSM) into the framework of the random finite set(RFS) theory. The Gaussi...In this paper, we consider the problem of irregular shapes tracking for multiple extended targets by introducing the Gaussian surface matrix(GSM) into the framework of the random finite set(RFS) theory. The Gaussian surface function is constructed first by the measurements, and it is used to define the GSM via a mapping function. We then integrate the GSM with the probability hypothesis density(PHD) filter, the Bayesian recursion formulas of GSM-PHD are derived and the Gaussian mixture implementation is employed to obtain the closed-form solutions. Moreover, the estimated shapes are designed to guide the measurement set sub-partition, which can cope with the problem of the spatially close target tracking. Simulation results show that the proposed algorithm can effectively estimate irregular target shapes and exhibit good robustness in cross extended target tracking.展开更多
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these adv...Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing.展开更多
Food is one of the biggest industries in developed and underdeveloped countries. Supply chain sustainability is essential in established and emerging economies because of the rising acceptance of cost-based outsourcin...Food is one of the biggest industries in developed and underdeveloped countries. Supply chain sustainability is essential in established and emerging economies because of the rising acceptance of cost-based outsourcing and the growing technological, social, and environmental concerns. The food business faces serious sustainability and growth challenges in developing countries. A comprehensive analysis of the critical success factors (CSFs) influencing the performance outcome and the sustainable supply chain management (SSCM) process. A theoretical framework is established to explain how they are used to examine the organizational aspect of the food supply chain life cycle analysis. This study examined the CSFs and revealed the relationships between them using a methodology that included a review of literature, interpretative structural modeling (ISM), and cross-impact matrix multiplication applied in classification (MICMAC) tool analysis of soil liquefaction factors. The findings of this research demonstrate that the quality and safety of food are important factors and have a direct effect on other factors. To make sustainable food supply chain management more adequate, legislators, managers, and experts need to pay attention to this factor. In this work. It also shows that companies aiming to create a sustainable business model must make sustainability a fundamental tenet of their organization. Practitioners and managers may devise effective long-term plans for establishing a sustainable food supply chain utilizing the recommended methodology.展开更多
In this paper, we propose a practical and dynamic key management scheme based on the Rabin public key system and a set of matrices with canonical matrix multiplication to solve the access control problem in an arbitra...In this paper, we propose a practical and dynamic key management scheme based on the Rabin public key system and a set of matrices with canonical matrix multiplication to solve the access control problem in an arbitrary partially ordered user hierarchy. The advantage is in ensuring that the security class in the higher level can derive any of its successor’s secret keys directly and efficiently and show it is dynamic while a new security class is added into or a class is removed from the hierarchy. Even the ex-member problem can be solved efficiently. Moreover, any user can freely change its own key for some security reasons.展开更多
In this article, we give the area formula of the closed projection curve of a closed space curve in Lorentzian 3-space L3. For the 1-parameter closed Lorentzian space motion in L3, we obtain a Holditch Theorem taking ...In this article, we give the area formula of the closed projection curve of a closed space curve in Lorentzian 3-space L3. For the 1-parameter closed Lorentzian space motion in L3, we obtain a Holditch Theorem taking into account the Lorentzian matrix multiplication for the closed space curves by using their othogonal projections onto the Euclidean plane in the fixed Lorentzian space. Moreover, we generalize this Holditch Theorem for noncollinear three fixed points of the moving Lorentzian space and any other fixed point on the plane which is determined by these three fixed points.展开更多
The performance of existing diffusion-based algorithms in recommender systems is still limited by the processing ability of a single computer. In order to conduct the diffusion computation on large data sets,a paralle...The performance of existing diffusion-based algorithms in recommender systems is still limited by the processing ability of a single computer. In order to conduct the diffusion computation on large data sets,a parallel implementation of the classic diffusion method on the MapReduce framework is proposed. At first,the diffusion computation is transformed from a summation format to a cascade matrix multiplication format,and then,a parallel matrix multiplication algorithm based on dynamic vector is proposed to reduce the CPU and I / O cost on the MapReduce framework,which can also be applied to other parallel matrix multiplication scenarios. Then,block partitioning is used to further improve the performance,while the order of matrix multiplication is also taken into consideration.Experiments on different kinds of data sets have verified the efficiency of the proposed method.展开更多
Haze in China is primarily caused by high pollution of atmospheric fine particulates(PM2.5).However, the detailed source structures of PM2.5 light extinction have not been well established, especially for the roles ...Haze in China is primarily caused by high pollution of atmospheric fine particulates(PM2.5).However, the detailed source structures of PM2.5 light extinction have not been well established, especially for the roles of various organic aerosols, which makes haze management lack specified targets. This study obtained the mass concentrations of the chemical compositions and the light extinction coefficients of fine particles in the winter in Dongguan, Guangdong Province, using high time resolution aerosol observation instruments. We combined the positive matrix factor(PMF) analysis model of organic aerosols and the multiple linear regression method to establish a quantitative relationship model between the main chemical components, in particular the different sources of organic aerosols and the extinction coefficients of fine particles with a high goodness of fit(R^2= 0.953). The results show that the contribution rates of ammonium sulphate,ammonium nitrate, biomass burning organic aerosol(BBOA), secondary organic aerosol(SOA) and black carbon(BC) were 48.1%, 20.7%, 15.0%, 10.6%, and 5.6%, respectively. It can be seen that the contribution of the secondary aerosols is much higher than that of the primary aerosols(79.4% versus 20.6%) and are a major factor in the visibility decline. BBOA is found to have a high visibility destroying potential, with a high mass extinction coefficient, and was the largest contributor during some high pollution periods. A more detailed analysis indicates that the contribution of the enhanced absorption caused by BC mixing state was approximately 37.7% of the total particle absorption and should not be neglected.展开更多
Let S={x_1,x_2,...,x_n } be a set of n distinct positive integers and f be an arithmetic function.By(f[S])(resp.( f[S])),we denote the n*n matrix whose i,j entry is Σ[x_i,x_j]|l l∈S f(l) (resp.Σx∈Sf(x...Let S={x_1,x_2,...,x_n } be a set of n distinct positive integers and f be an arithmetic function.By(f[S])(resp.( f[S])),we denote the n*n matrix whose i,j entry is Σ[x_i,x_j]|l l∈S f(l) (resp.Σx∈Sf(x)-Σ x_i,|l l∈S f(l)-Σ x_j,|l l∈S f(l)+Σ[x_i,x_j]|l l∈S f(l)).In this paper,we first investigate the structures of the matrices ( f[S]) and( f[S]),then we give the formulae for the determinants of these matrices.These extend the results obtained by Bege in 2011.Finally,we give two examples to demonstrate the validity of our main results.展开更多
It is currently admitted that the intermolecular forces implicated in Gas Liquid Chromatography (GLC) can be expressed as a product of parameters (or descriptors) of solutes and of parameters of solvents. The present ...It is currently admitted that the intermolecular forces implicated in Gas Liquid Chromatography (GLC) can be expressed as a product of parameters (or descriptors) of solutes and of parameters of solvents. The present study is limited to those of solutes, and among them the three ones are involved in the Van der Waals forces, whereas the two ones involved in the hydrogen bonding are left aside at this stage. These three studied parameters, which we call δ, ω and ε, respectively reflect the three types of Van der Waals forces: dispersion, orientation or polarity strictly speaking, and induction-polarizability. These parameters have been experimentally obtained in previous studies for 121 Volatile Organic Compounds (VOC) via an original Multiplicative Matrix Analysis (MMA) applied to a superabundant and accurate GLC data set. Then, also in previous studies, attempts have been made to predict these parameters via a Simplified Molecular Topology procedure (SMT). Because these last published results have been somewhat disappointing, a promising new strategy of prediction is developed and detailed in the present article.展开更多
The adjacent matrix method for identifying isomorphism to planar kinematic chain with multiple joints and higher pairs is presented. The topological invariants of the planar kinematic chain can be calculated and compa...The adjacent matrix method for identifying isomorphism to planar kinematic chain with multiple joints and higher pairs is presented. The topological invariants of the planar kinematic chain can be calculated and compared by adjacent matrix. The quantity of calculation can be reduced effectively using the several divisions of bars and the reconfiguration of the adjacent matrix. As two structural characteristics of adjacent matrix, the number of division and division code are presented. It can be identified that two kinematic chains are isomorphic or not by comparing the structural characteristics of their adjacent matrixes using a method called matching row-to-row. This method may be applied to the planar linkage chain too. So, the methods of identifying isomorphism are unified in the planar kinematic chain that has or hasn't higher pairs with or without multiple joints. And it has some characters such as visual, simple and convenient for processing by computer, and so on.展开更多
基金supported by NSF of China under grant U1908210National Key R&D Project of China under grant 2019YFB1802702。
文摘This paper exploits coding to speed up computation offloading in a multi-server mobile edge computing(MEC)network with straggling servers and channel fading.The specific task we consider is to compute the product between a user-generated input data matrix and a large-scale model matrix that is stored distributively across the multiple edge nodes.The key idea of coding is to introduce computation redundancy to improve robustness against straggling servers and to create communication redundancy to improve reliability against channel fading.We utilize the hybrid design of maximum distance separable(MDS)coding and repetition coding.Based on the hybrid coding scheme,we conduct theoretical analysis on the average task uploading time,average edge computing time,and average output downloading time,respectively and then obtain the end-to-end task execution time.Numerical results demonstrate that when the task uploading phase or the edge computing phase is the performance bottleneck,the hybrid coding reduces to MDS coding;when the downlink transmission is the bottleneck,the hybrid coding reduces to repetition coding.The hybrid coding also outperforms the entangled polynomial coding that causes higher uplink and downlink communication loads.
基金supported in part by open project foundation of State Key Laboratory of Cryptology National Natural Science Foundation of China (NSFC) under Grant No. 61272492, No. 61572521 and No. 61309008Natural Science Foundation for Young of Shaanxi Province under Grant No. 2013JQ8013
文摘Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.
基金Supported by the National High Technology Research and Development Programme of China(No.2010AA012302,2009AA01 A134)Tsinghua National Laboratory for Information Science and Technology(TNList)Cross-discipline Foundation
文摘This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.
基金supported in part by the National Key Research and Development Program of China under Grant No.2022ZD0115301the National Natural Science Foundation of China under Grant Nos.61972447 and 61832006.
文摘Emerging hardware like non-volatile memory(NVM)and high-speed network interface cards are promising to improve the performance of matrix multiplication.However,a critical challenge in achieving high performance is the tradeoff between horizontal communication(data movement between processors)and vertical communication(data movement across memory hierarchies).In this paper,we provide an analysis in the distributed memory parallel model with additional consideration for communication between main memory and cache.We measure joint communication as the sum of the horizontal bandwidth and vertical bandwidth cost,and study the joint-communication cost of square matrix multiplication in the read-write symmetric setting(such as DRAM)and asymmetric setting(such as NVM).Specifically,we identify that in the symmetric setting,a joint-communication optimal algorithm can be directly obtained by combining the horizontally optimal and vertically optimal algorithms.We also identify that in the asymmetric setting,horizontal and vertical communications cannot be optimal at the same time,which means that there is a tradeoff between the two communications.In this case,we first present a joint-communication lower bound,and then we propose Joint-Communication Optimal Matrix Multiplication Algorithm(JOMMA),a parallel matrix multiplication algorithm whose joint-communication complexity meets the lower bound.The key idea behind JOMMA is to derive optimal matrix dimensions that each processor locally performs,which leads to determining the processor grid and an optimal schedule.
文摘Optical computing accelerators,with high parallelism,large bandwidth,and low transmission loss,have the potential to enhance electronic computing in both computational power and energy efficiency.Photonic acceleration plays a crucial role in supporting computationally intensive operations,such as dynamic-static matrix multiplication,significantly improving overall efficiency.Existing photonic architectures for dynamic-static matrix multiplication depend on complex coherent optical systems or costly nano-optics fabrication,limiting scalability.This study introduces a novel quantum dot fluorescence-based dynamic-static matrix multiplication photonic acceleration architecture that eliminates the need for coherent light sources or intricate fabrication.By leveraging simple,cost-effective quantum dot preparation and printing techniques,this architecture has significant potential for large-scale,high-performance,low-cost photonic accelerators.We detail the mathematical and physical mechanisms of the proposed architecture,experimentally validate the key physical processes,and demonstrate its application in template matching for image recognition,achieving 95%accuracy.
基金This work was partially supported by the National Natural Science Foundation of China(Nos.11431002,11771328,11871051)Young Elite Scientists Sponsorship Program by Tianjin,the Natural Science Foundation of Zhejiang Province(No.LD19A010002)Innovation Research Foundation of Tianjin University(No.2017XRG-0015).
文摘This article introduces the approach on studying the computational complexity of matrix multiplication by ranks of the matrix multiplication tensors.Basic results and recent developments in this area are reviewed.
文摘Optics is a potential candidate in information, data, and image processing. In all-optical data and information processing, optics has been used as information carrying signal because of its inherent advantages of parallelism. Several optical methods are proposed in support of the above processing. In many algebraic, arithmetic, and image processing schemes fundamental logic and memory operations are conducted exploring all-optical devices. In this communication we report an all-optical matrix multiplication operation with non-linear material based switching circuit.
文摘We study asymptotically fast multiplication algorithms for matrix pairs of arbitrary dimensions, and optimize the exponents of their arithmetic complexity bounds. For a large class of input matrix pairs, we improve the known exponents. We also show some applications of our results: (i) we decrease from O(n 2 + n 1+o(1)logq) to O(n 1.9998 + n 1+o(1)logq) the known arithmetic complexity bound for the univariate polynomial factorization of degree n over a finite field with q elements; (ii) we decrease from 2.837 to 2.7945 the known exponent of the work and arithmetic processor bounds for fast deterministic (NC) parallel evaluation of the determinant, the characteristic polynomial, and the inverse of an n × n matrix, as well as for the solution to a nonsingular linear system of n equations; (iii) we decrease from O(m 1.575 n) to O(m 1.5356 n) the known bound for computing basic solutions to a linear programming problem with m constraints and n variables.
基金the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+3 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060)the Education Research Project of Xi'an University of Posts and Telecommunications(No.JGA202108)the Graduate Student Innovation Fund of Xi’an University of Posts and Telecommunications(No.CXJJYL2022035).
文摘In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system inefficiencies.At the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in hardware.To address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with flexibility.The experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.
文摘The purpose of this note is to establish a general representation of Hankel matrices of Bell numbers and the convoluted Bell numbers. As a special case, the results of Aigner are extended.
基金supported by the National Natural Science Foundation of China(6130501761304264+1 种基金61402203)the Natural Science Foundation of Jiangsu Province(BK20130154)
文摘In this paper, we consider the problem of irregular shapes tracking for multiple extended targets by introducing the Gaussian surface matrix(GSM) into the framework of the random finite set(RFS) theory. The Gaussian surface function is constructed first by the measurements, and it is used to define the GSM via a mapping function. We then integrate the GSM with the probability hypothesis density(PHD) filter, the Bayesian recursion formulas of GSM-PHD are derived and the Gaussian mixture implementation is employed to obtain the closed-form solutions. Moreover, the estimated shapes are designed to guide the measurement set sub-partition, which can cope with the problem of the spatially close target tracking. Simulation results show that the proposed algorithm can effectively estimate irregular target shapes and exhibit good robustness in cross extended target tracking.
文摘Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing.
文摘Food is one of the biggest industries in developed and underdeveloped countries. Supply chain sustainability is essential in established and emerging economies because of the rising acceptance of cost-based outsourcing and the growing technological, social, and environmental concerns. The food business faces serious sustainability and growth challenges in developing countries. A comprehensive analysis of the critical success factors (CSFs) influencing the performance outcome and the sustainable supply chain management (SSCM) process. A theoretical framework is established to explain how they are used to examine the organizational aspect of the food supply chain life cycle analysis. This study examined the CSFs and revealed the relationships between them using a methodology that included a review of literature, interpretative structural modeling (ISM), and cross-impact matrix multiplication applied in classification (MICMAC) tool analysis of soil liquefaction factors. The findings of this research demonstrate that the quality and safety of food are important factors and have a direct effect on other factors. To make sustainable food supply chain management more adequate, legislators, managers, and experts need to pay attention to this factor. In this work. It also shows that companies aiming to create a sustainable business model must make sustainability a fundamental tenet of their organization. Practitioners and managers may devise effective long-term plans for establishing a sustainable food supply chain utilizing the recommended methodology.
文摘In this paper, we propose a practical and dynamic key management scheme based on the Rabin public key system and a set of matrices with canonical matrix multiplication to solve the access control problem in an arbitrary partially ordered user hierarchy. The advantage is in ensuring that the security class in the higher level can derive any of its successor’s secret keys directly and efficiently and show it is dynamic while a new security class is added into or a class is removed from the hierarchy. Even the ex-member problem can be solved efficiently. Moreover, any user can freely change its own key for some security reasons.
文摘In this article, we give the area formula of the closed projection curve of a closed space curve in Lorentzian 3-space L3. For the 1-parameter closed Lorentzian space motion in L3, we obtain a Holditch Theorem taking into account the Lorentzian matrix multiplication for the closed space curves by using their othogonal projections onto the Euclidean plane in the fixed Lorentzian space. Moreover, we generalize this Holditch Theorem for noncollinear three fixed points of the moving Lorentzian space and any other fixed point on the plane which is determined by these three fixed points.
基金Sponsored by the National High Technology Research and Development Program of China(No.2011AA01A102)the Key Program of the Chinese Academy of Sciences(No.KGZD-EW-103-2)
文摘The performance of existing diffusion-based algorithms in recommender systems is still limited by the processing ability of a single computer. In order to conduct the diffusion computation on large data sets,a parallel implementation of the classic diffusion method on the MapReduce framework is proposed. At first,the diffusion computation is transformed from a summation format to a cascade matrix multiplication format,and then,a parallel matrix multiplication algorithm based on dynamic vector is proposed to reduce the CPU and I / O cost on the MapReduce framework,which can also be applied to other parallel matrix multiplication scenarios. Then,block partitioning is used to further improve the performance,while the order of matrix multiplication is also taken into consideration.Experiments on different kinds of data sets have verified the efficiency of the proposed method.
基金supported by the National Natural Science Foundation of China(Nos.41622304,U1301234)the Ministry of Science and Technology of China(Nos.2014BAC21B03,2016YFC0203600)the Science and Technology Plan of Shenzhen Municipality
文摘Haze in China is primarily caused by high pollution of atmospheric fine particulates(PM2.5).However, the detailed source structures of PM2.5 light extinction have not been well established, especially for the roles of various organic aerosols, which makes haze management lack specified targets. This study obtained the mass concentrations of the chemical compositions and the light extinction coefficients of fine particles in the winter in Dongguan, Guangdong Province, using high time resolution aerosol observation instruments. We combined the positive matrix factor(PMF) analysis model of organic aerosols and the multiple linear regression method to establish a quantitative relationship model between the main chemical components, in particular the different sources of organic aerosols and the extinction coefficients of fine particles with a high goodness of fit(R^2= 0.953). The results show that the contribution rates of ammonium sulphate,ammonium nitrate, biomass burning organic aerosol(BBOA), secondary organic aerosol(SOA) and black carbon(BC) were 48.1%, 20.7%, 15.0%, 10.6%, and 5.6%, respectively. It can be seen that the contribution of the secondary aerosols is much higher than that of the primary aerosols(79.4% versus 20.6%) and are a major factor in the visibility decline. BBOA is found to have a high visibility destroying potential, with a high mass extinction coefficient, and was the largest contributor during some high pollution periods. A more detailed analysis indicates that the contribution of the enhanced absorption caused by BC mixing state was approximately 37.7% of the total particle absorption and should not be neglected.
基金Supported partially by the National Natural Science Foundation of China(11501387)Key Program of Universities of Henan Province of China(17A110010)+1 种基金China Postdoctoral Science Foundation Funded Project(2016M602251)the Natural Science Foundation of Henan Province(162300410076)
文摘Let S={x_1,x_2,...,x_n } be a set of n distinct positive integers and f be an arithmetic function.By(f[S])(resp.( f[S])),we denote the n*n matrix whose i,j entry is Σ[x_i,x_j]|l l∈S f(l) (resp.Σx∈Sf(x)-Σ x_i,|l l∈S f(l)-Σ x_j,|l l∈S f(l)+Σ[x_i,x_j]|l l∈S f(l)).In this paper,we first investigate the structures of the matrices ( f[S]) and( f[S]),then we give the formulae for the determinants of these matrices.These extend the results obtained by Bege in 2011.Finally,we give two examples to demonstrate the validity of our main results.
文摘It is currently admitted that the intermolecular forces implicated in Gas Liquid Chromatography (GLC) can be expressed as a product of parameters (or descriptors) of solutes and of parameters of solvents. The present study is limited to those of solutes, and among them the three ones are involved in the Van der Waals forces, whereas the two ones involved in the hydrogen bonding are left aside at this stage. These three studied parameters, which we call δ, ω and ε, respectively reflect the three types of Van der Waals forces: dispersion, orientation or polarity strictly speaking, and induction-polarizability. These parameters have been experimentally obtained in previous studies for 121 Volatile Organic Compounds (VOC) via an original Multiplicative Matrix Analysis (MMA) applied to a superabundant and accurate GLC data set. Then, also in previous studies, attempts have been made to predict these parameters via a Simplified Molecular Topology procedure (SMT). Because these last published results have been somewhat disappointing, a promising new strategy of prediction is developed and detailed in the present article.
文摘The adjacent matrix method for identifying isomorphism to planar kinematic chain with multiple joints and higher pairs is presented. The topological invariants of the planar kinematic chain can be calculated and compared by adjacent matrix. The quantity of calculation can be reduced effectively using the several divisions of bars and the reconfiguration of the adjacent matrix. As two structural characteristics of adjacent matrix, the number of division and division code are presented. It can be identified that two kinematic chains are isomorphic or not by comparing the structural characteristics of their adjacent matrixes using a method called matching row-to-row. This method may be applied to the planar linkage chain too. So, the methods of identifying isomorphism are unified in the planar kinematic chain that has or hasn't higher pairs with or without multiple joints. And it has some characters such as visual, simple and convenient for processing by computer, and so on.