Gaussian graphical models(GGMs) are widely used as intuitive and efficient tools for data analysis in several application domains. To address the reproducibility issue of structure learning of a GGM, it is essential t...Gaussian graphical models(GGMs) are widely used as intuitive and efficient tools for data analysis in several application domains. To address the reproducibility issue of structure learning of a GGM, it is essential to control the false discovery rate(FDR) of the estimated edge set of the graph in terms of the graphical model. Hence, in recent years, the problem of GGM estimation with FDR control is receiving more and more attention. In this paper, we propose a new GGM estimation method by implementing multiple data splitting. Instead of using the node-by-node regressions to estimate each row of the precision matrix, we suggest directly estimating the entire precision matrix using the graphical Lasso in the multiple data splitting, and our calculation speed is p times faster than the previous. We show that the proposed method can asymptotically control FDR, and the proposed method has significant advantages in computational efficiency. Finally, we demonstrate the usefulness of the proposed method through a real data analysis.展开更多
In this paper, we consider data separation problem, where the original signal is composed of two distinct subcomponents, via dual frames based Split-analysis approach. We show that the two distinct subcomponents, whic...In this paper, we consider data separation problem, where the original signal is composed of two distinct subcomponents, via dual frames based Split-analysis approach. We show that the two distinct subcomponents, which are sparse in two diff erent general frames respectively, can be exactly recovered with high probability, when the measurement matrix is a Weibull random matrix (not Gaussian) and the two frames satisfy a mutual coherence property. Our result may be significant for analysing Split-analysis model for data separation.展开更多
随着人工智能的迅速发展,越来越多的研究将BERT(Bidirectional Encoder Representations from Transformers)等大语言模型应用到了网络安全的检测技术中,邮件检测技术是最热门的应用场景之一。然而考虑到部署大模型所需要的资源和技术能...随着人工智能的迅速发展,越来越多的研究将BERT(Bidirectional Encoder Representations from Transformers)等大语言模型应用到了网络安全的检测技术中,邮件检测技术是最热门的应用场景之一。然而考虑到部署大模型所需要的资源和技术能力,越来越多的客户转向主流的MaaS(Models-as-a-Service,MaaS)服务商。MaaS服务商凭借丰富的预训练模型和强大的训练服务,为用户提供了便捷的模型精调途径。然而,在金融领域,邮件数据往往涉及到公司内部的个人身份、商业机密等信息,直接披露给服务商会导致严重的隐私泄露风险。在模型更新可持续性发展上面临数据隐私泄露、计算资源有限等问题。为了解决这些问题,提出了一种基于拆分学习的邮件检测模型共享方法。该方法将BERT模型拆分为可在客户端上本地执行的部分和在服务器端上训练的部分,通过客户端对训练数据进行编码处理,保证了数据隐私的同时减少了传输到服务器端的数据量。服务器端收到客户端的编码信息后,结合BERT的后半部分进行效率高且安全的模型训练。最终,训练生成的模型更新回客户端,实现了模型的迭代优化和即时更新。展开更多
An ill-posed inverse problem in quantitative susceptibility mapping (QSM) is usually solved using a regularization and optimization solver, which is time consuming considering the three-dimensional volume data. Howe...An ill-posed inverse problem in quantitative susceptibility mapping (QSM) is usually solved using a regularization and optimization solver, which is time consuming considering the three-dimensional volume data. However, in clinical diagnosis, it is necessary to reconstruct a susceptibility map efficiently with an appropriate method. Here, a modified QSM reconstruction method called weighted total variation using split Bregman (WTVSB) is proposed. It reconstructs the susceptibility map with fast computational speed and effective artifact suppression by incorporating noise-suppressed data weighting with split Bregman iteration. The noise-suppressed data weighting is determined using the Laplacian of the calculated local field, which can prevent the noise and errors in field maps from spreading into the susceptibility inversion. The split Bregman iteration accelerates the solution of the Ll-regularized reconstruction model by utilizing a preconditioned conjugate gradient solver. In an experiment, the proposed reconstruction method is compared with truncated k-space division (TKD), morphology enabled dipole inversion (MEDI), total variation using the split Bregman (TVSB) method for numerical simulation, phantom and in vivo human brain data evaluated by root mean square error and mean structure similarity. Experimental results demonstrate that our proposed method can achieve better balance between accuracy and efficiency of QSM reconstruction than conventional methods, and thus facilitating clinical applications of QSM.展开更多
Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize t...Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.展开更多
基金partially supported by the National Natural Science Foundation of China(Grant No.12171079)the National Key R&D Program of China(Grant No.2020YFA0714102)+1 种基金partially supported by the National Natural Science Foundation of China(Grant No.12101116)the National Key Research and Development Program of China(Grant No.2022YFA1003701)。
文摘Gaussian graphical models(GGMs) are widely used as intuitive and efficient tools for data analysis in several application domains. To address the reproducibility issue of structure learning of a GGM, it is essential to control the false discovery rate(FDR) of the estimated edge set of the graph in terms of the graphical model. Hence, in recent years, the problem of GGM estimation with FDR control is receiving more and more attention. In this paper, we propose a new GGM estimation method by implementing multiple data splitting. Instead of using the node-by-node regressions to estimate each row of the precision matrix, we suggest directly estimating the entire precision matrix using the graphical Lasso in the multiple data splitting, and our calculation speed is p times faster than the previous. We show that the proposed method can asymptotically control FDR, and the proposed method has significant advantages in computational efficiency. Finally, we demonstrate the usefulness of the proposed method through a real data analysis.
基金Supported by the National Natural Science Foundation of China(11171299 and 91130009)
文摘In this paper, we consider data separation problem, where the original signal is composed of two distinct subcomponents, via dual frames based Split-analysis approach. We show that the two distinct subcomponents, which are sparse in two diff erent general frames respectively, can be exactly recovered with high probability, when the measurement matrix is a Weibull random matrix (not Gaussian) and the two frames satisfy a mutual coherence property. Our result may be significant for analysing Split-analysis model for data separation.
文摘随着人工智能的迅速发展,越来越多的研究将BERT(Bidirectional Encoder Representations from Transformers)等大语言模型应用到了网络安全的检测技术中,邮件检测技术是最热门的应用场景之一。然而考虑到部署大模型所需要的资源和技术能力,越来越多的客户转向主流的MaaS(Models-as-a-Service,MaaS)服务商。MaaS服务商凭借丰富的预训练模型和强大的训练服务,为用户提供了便捷的模型精调途径。然而,在金融领域,邮件数据往往涉及到公司内部的个人身份、商业机密等信息,直接披露给服务商会导致严重的隐私泄露风险。在模型更新可持续性发展上面临数据隐私泄露、计算资源有限等问题。为了解决这些问题,提出了一种基于拆分学习的邮件检测模型共享方法。该方法将BERT模型拆分为可在客户端上本地执行的部分和在服务器端上训练的部分,通过客户端对训练数据进行编码处理,保证了数据隐私的同时减少了传输到服务器端的数据量。服务器端收到客户端的编码信息后,结合BERT的后半部分进行效率高且安全的模型训练。最终,训练生成的模型更新回客户端,实现了模型的迭代优化和即时更新。
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11474236,81671674,and 11775184)the Science and Technology Project of Fujian Province,China(Grant No.2016Y0078)
文摘An ill-posed inverse problem in quantitative susceptibility mapping (QSM) is usually solved using a regularization and optimization solver, which is time consuming considering the three-dimensional volume data. However, in clinical diagnosis, it is necessary to reconstruct a susceptibility map efficiently with an appropriate method. Here, a modified QSM reconstruction method called weighted total variation using split Bregman (WTVSB) is proposed. It reconstructs the susceptibility map with fast computational speed and effective artifact suppression by incorporating noise-suppressed data weighting with split Bregman iteration. The noise-suppressed data weighting is determined using the Laplacian of the calculated local field, which can prevent the noise and errors in field maps from spreading into the susceptibility inversion. The split Bregman iteration accelerates the solution of the Ll-regularized reconstruction model by utilizing a preconditioned conjugate gradient solver. In an experiment, the proposed reconstruction method is compared with truncated k-space division (TKD), morphology enabled dipole inversion (MEDI), total variation using the split Bregman (TVSB) method for numerical simulation, phantom and in vivo human brain data evaluated by root mean square error and mean structure similarity. Experimental results demonstrate that our proposed method can achieve better balance between accuracy and efficiency of QSM reconstruction than conventional methods, and thus facilitating clinical applications of QSM.
基金supported by the National Basic Research Program of China(973 program)under Grant No.2014CB347800 and No.2012CB315803the National High-Tech R&D Program of China(863 program)under Grant No.2013AA013303+1 种基金the Natural Science Foundation of China under Grant No.61170291,No.61133006,and No.61161140454ZTE IndustryAcademia-Research Cooperation Funds
文摘Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.