期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
基于NVIDIA Kepler的PIC方法并行 被引量:1
1
作者 文敏华 林新华 Simon Chong Wee See 《计算机工程与科学》 CSCD 北大核心 2013年第11期100-104,共5页
PIC方法是计算等离子体物理中广泛使用的一种计算方法。通常情况下需要使用大量的计算粒子以达到高的计算精度,这导致非常庞大的计算量。因而PIC方法的加速研究对于减少其时间成本非常有意义。设计了一个基于NVIDIA Kepler GPU的PIC算法... PIC方法是计算等离子体物理中广泛使用的一种计算方法。通常情况下需要使用大量的计算粒子以达到高的计算精度,这导致非常庞大的计算量。因而PIC方法的加速研究对于减少其时间成本非常有意义。设计了一个基于NVIDIA Kepler GPU的PIC算法,并使用CUDA在GPU上实现了该算法。在PIC方法中最耗时间的两个函数collision和mover被移植到GPU上。在实验中使用了NVIDIA新发布的Kepler K20GPU进行这两个函数的性能测试,相比于Intel Sandy Bridge E5-2650,最高获得了30倍的加速。 展开更多
关键词 PIC方法 CUDA NVIDIA KEPLER
在线阅读 下载PDF
半透明物体漫散射效果的实时绘制与材质编辑 被引量:3
2
作者 王锐 Ewen Cheslack-Postava +4 位作者 Rui Wang David Luebke 华炜 彭群生 鲍虎军 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2008年第8期993-1000,共8页
半透明物体透明效果的真实感绘制是近年来研究的热点,提出一种针对半透明物体漫散射效果的实时真实感绘制与材质动态编辑方法——基于双向表面散射反射率函数(BSSRDF)的Dipole近似.通过主元分析将Dipole近似中的漫散射材质函数分解为与... 半透明物体透明效果的真实感绘制是近年来研究的热点,提出一种针对半透明物体漫散射效果的实时真实感绘制与材质动态编辑方法——基于双向表面散射反射率函数(BSSRDF)的Dipole近似.通过主元分析将Dipole近似中的漫散射材质函数分解为与形状相关函数和与半透明材质相关函数的乘积形式;利用该分解表示,在预辐射传输的实时真实感绘制框架下,通过对散射传输的预计算来实现在多种光源环境下对半透明物体材质的实时编辑.此外,还提出一种对预计算辐射传输数据在空域上进行二次小波压缩的方法,利用表面点在空间分布位置的相关性,在保证绘制质量的前提下,大大压缩了数据,提升了绘制效率.实验结果表明,文中方法可以生成具有高度真实感的半透明效果并保证实时的绘制速度. 展开更多
关键词 半透明材质编辑 双向表面散射反射率函数 主元分析 Dipole近似
在线阅读 下载PDF
动态网格的DSMC方法在GPU上的并行
3
作者 文敏华 林新华 Simon Chong Wee See 《计算机科学与探索》 CSCD 2013年第5期472-479,共8页
直接模拟蒙特卡罗方法(direct simulation Monte Carlo,DSMC)是稀薄气体动力学领域的重要工具。然而,DSMC方法有两个比较主要的缺点:一是复杂的网格处理;另一个是庞大的计算量。使用动态网格的DSMC方法可以根据流场信息,动态生成自适应... 直接模拟蒙特卡罗方法(direct simulation Monte Carlo,DSMC)是稀薄气体动力学领域的重要工具。然而,DSMC方法有两个比较主要的缺点:一是复杂的网格处理;另一个是庞大的计算量。使用动态网格的DSMC方法可以根据流场信息,动态生成自适应的碰撞网格,能有效解决前一个缺点;针对后一个缺点,使用统一计算架构(compute unified device architecture,CUDA)编写并行程序,将基于动态网格的DSMC方法移植到图形处理器(graphic processing unit,GPU)上以减少计算时间。在并行实现中,GPU负责绝大部分的计算,而CPU只负责初始化、结果输出等少量工作。使用一个二维超音速横掠平板问题作为算例,验证了并行程序的正确性。对于不同规模的算例,在NVIDIA Fermi C2050之上均获得了10倍以上的加速比;对于相同算例,NVIDIA最新发布的Kepler K20上的速度约为FermiC2050上的1.3~1.6倍。 展开更多
关键词 统一计算架构(CUDA) 图形处理器(GPU) 直接模拟蒙特卡罗方法(DSMC) 动态网格DSMC 并行模拟
在线阅读 下载PDF
Proposal for a cross layer scheme for real-time wireless video
4
作者 JEYARAJ Arulsaravana CHENG Liang EL ZARKI Magda 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第10期1690-1694,共5页
This paper focuses on the design of the cross layer between the video application layer and the MIMO physical layer. MIMO physical layer research has promised an enormous increase in the capacity of wireless communica... This paper focuses on the design of the cross layer between the video application layer and the MIMO physical layer. MIMO physical layer research has promised an enormous increase in the capacity of wireless communication systems. Also MIMO wireless systems operate under fading conditions where the channel faces arbitrary fluctuations. Since the wireless channel changes over each coherence period, the capacity of the wireless channel, given the power constraints, changes. Hence to make efficient use of the available capacity one needs to adapt the video bit rate. However it is impossible to adapt at the application layer as changing the parameters of the video takes more time than the coherence period of the channel. In this paper we address this problem through a novel solution and also investigate its performance through a simulation study. 展开更多
关键词 MIMO V-BLAST Adaptive modulation Diversity Constant bit rate (CBR) Cross layer design Power control Fine granular scalability (FGS)
在线阅读 下载PDF
GPU Acceleration of the Locally Selfconsistent Multiple Scattering Code for First Principles Calculation of the Ground State and Statistical Physics of Materials
5
作者 Markus Eisenbach Jeff Larkin +2 位作者 Justin Lutjens Steven Rennich James H.Rogers 《国际计算机前沿大会会议论文集》 2015年第B12期64-66,共3页
The Locally Self-consistent Multiple Scattering(LSMS)code solves the first principles Density Functional theory Kohn-Sham equation for a wide range of materials with a special focus on metals,alloys and metallic nano-... The Locally Self-consistent Multiple Scattering(LSMS)code solves the first principles Density Functional theory Kohn-Sham equation for a wide range of materials with a special focus on metals,alloys and metallic nano-structures.It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures.We present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000)atoms and statistical physics sampling of finite temperature properties.Using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code. 展开更多
关键词 The LOCALLY SELF-CONSISTENT MULTIPLE Scattering(LSMS)
在线阅读 下载PDF
Audio-guided implicit neural representation for local imagestylization
6
作者 Seung Hyun Lee Sieun Kim +7 位作者 Wonmin Byeon Gyeongrok Oh Sumin In Hyeongcheol Park Sang Ho Yoon Sung-Hee Hong Jinkyu Kim Sangpil Kim 《Computational Visual Media》 CSCD 2024年第6期1185-1204,共20页
We present a novel framework for audio-guided localized image stylization.Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object.However,... We present a novel framework for audio-guided localized image stylization.Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object.However,existing image stylization works have focused on stylizing the entire image using an image or text input.Stylizing a particular part of the image based on audio input is natural but challenging.This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene.We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space.We then utilize an implicit neural representation(INR)along with the predicted localization map to stylize the target based on sound information.The INR manipulates local pixel values to be semantically consistent with the provided audio input.Our experiments show that the proposed framework outperforms other audio-guided stylization methods.Moreover,we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input. 展开更多
关键词 audio guidance image style transfer implicit neural representations(INR)
原文传递
Learning together:Towards foundation models for machine learning interatomic potentials with meta-learning
7
作者 Alice E.A.Allen Nicholas Lubbers +4 位作者 Sakib Matin Justin Smith Richard Messerly Sergei Tretiak Kipton Barros 《npj Computational Materials》 CSCD 2024年第1期1654-1662,共9页
The development of machine learning models has led to an abundance of datasets containing quantum mechanical(QM)calculations for molecular and material systems.However,traditional training methods for machine learning... The development of machine learning models has led to an abundance of datasets containing quantum mechanical(QM)calculations for molecular and material systems.However,traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method.Taking machine learning interatomic potentials(MLIPs)as an example,we show that meta-learning techniques,a recent advancement from the machine learning community,can be used to fit multiple levels of QMtheory in the same training process.Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data.We then demonstrate that metalearning enables simultaneously training to multiple large organic molecule datasets.As a proof of concept,we examine the performance of aMLIP refit to a small drug-like molecule and show that pretraining potentials to multiple levels of theory with meta-learning improves performance.This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced.We therefore show that meta-learning can utilize existing datasets with inconsistentQMlevels of theory to producemodels that are better at specializing to new datasets.This opens new routes for creating pre-trained,foundationmodels for interatomic potentials. 展开更多
关键词 LEARNING utilize SMOOTHNESS
原文传递
Design-for-Test Solutions for 3-D Integrated Circuits
8
作者 SHAO-CHUN HUNG PARTHO BHOUMIK +2 位作者 ARJUN CHAUDHURI SANMITRA BANERJEE KRISHNENDU CHAKRABARTY 《Integrated Circuits and Systems》 2024年第1期3-17,共15页
As Moore’s Law approaches its limits,3-D integrated circuits(ICs)have emerged as promising alternatives to conventional scaling methodologies.However,the benefits of 3-D integration in terms of lower power consumptio... As Moore’s Law approaches its limits,3-D integrated circuits(ICs)have emerged as promising alternatives to conventional scaling methodologies.However,the benefits of 3-D integration in terms of lower power consumption,higher performance,and reduced area are accompanied by testing challenges.The unique vertical stacking of components in 3-D ICs introduces concerns related to the robustness of bonding surfaces.Moreover,immature manufacturing processes during 3-D fabrication can lead to high defect rates in different tiers.Therefore,there is a need for design-for-test solutions to ensure the reliability and performance of 3-D-integrated architectures.In this paper,we provide a comprehensive survey of existing testing strategies for 3-D ICs.We describe recent advances,including research efforts and industry practice,that address concerns related to bonding defects,elevated power supply noise,fault diagnosis,and fault localization specific to the unique characteristics of 3-D ICs. 展开更多
关键词 3-D integrated circuits design for test through-silicon vias
在线阅读 下载PDF
BACH: A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories
9
作者 Jishen Zhao Cong Xu +1 位作者 Tao Zhang Yuan Xie 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期20-35,共16页
Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this... Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively. 展开更多
关键词 memory bandwidth hybrid cache reconfigurable cache nonvolatile memory
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部