联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2024-05-31 09:39

AMATH 483 / 583 (roche) - HW6

Due Friday May 31, 11:59pm PT

May 24, 2024

Homework 6 (80 points, 0 EC points)

1. (+20) Complex double linear system solver. Plot both the log of the residual and the log of the

normalized error ( kbAzk2

kAk1 kzk2 ✏machine ) versus the square matrix dimensions 16,32,64,...,8192 for the following

LAPACK routine. It is supported in the OpenBLAS build on Hyak. Submit your plot, and label it

accordingly.

l a p a c k i n t LAPACKE zgesv( int matrix orde r ,

l a p a c k i n t n ,

l a p a c k i n t nrhs ,

lapack compl ex doubl e ∗ a ,

l a p a c k i n t lda ,

l a p a c k i n t ∗ ipiv ,

lapack compl ex doubl e ∗ b ,

l a p a c k i n t ldb );

Use the following snippet code to initialize your matrices and rhs vectors and note the headers I use:

#include <ios t ream>

#include <complex>

#include <c s t d l i b >

#include <c s t r i n g >

#include <cmath>

#include <ve c tor>

#include <chrono>

#include <l i m it s >

#include <c b l a s . h>

#include <lapacke . h>

. . .

int main () {

. . .

a =( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ∗ na ) ;

b = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ ma ) ;

z = ( s td : : complex<double>∗) malloc ( s izeof ( s td : : complex<double>) ∗ na ) ;

. . .

s rand ( 0 );

int k =0;

for ( int j = 0 ; j < na ; j++) {

for ( int i = 0 ; i < ma ; i++) {

a [ k ] = 0 . 5 − (double ) rand () / (double )RANDMAX

+ s td : : complex<double>(0 , 1)

∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;

i f ( i==j ) a [ k]∗= s tat ic cas t<double>(ma ) ;

k++;

}

}

s rand ( 1 );

for ( int i = 0 ; i < ma; i++) {

b [ i ] = 0 . 5 − (double ) rand () / (double )RANDMAX

+ s td : : complex<double>(0 , 1)

∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ;

}

. . .

12. (+20) CPU-GPU data copy speed on HYAK. Write a C++ code to measure the data copy performance

between the host CPU and GPU (host to device), and between the GPU and the host CPU (device to host). Copy

8 bytes to 256MB increasing in multiples of 2. Plot the bandwidth for both directions: (bytes per second) on the

y-axis and the bu↵er size in bytes on the x-axis. Submit your plot and test code.

3. (+20) Compare FFTW to CUFFT on HYAK. Measure and plot the performance of calculating the gradient

of a 3D double complex plane wave defined on cubic lattices of dimension n3 from 163 to n = 2563, stride n⇤ = 2

for both the FFTW and CUDA FFT (CUFFT) implementations on HYAK. Let each n be measured ntrial times

and plot the average performance for each case versus n, ntrial 3. Submit your performance plot which should

have ’FLOPs’ on the y-axis (or some appropriate unit of FLOPs) and the dimension of the cubic lattices (n) on

the x-axis. You will need to estimate the operation count of computing the derivative using FFT on a lattice.

4. (+20) Fourier transforms. Evaluate the Fourier transform of the following functions by hand. Use the definitions

I provided (includes p1

2⇡ , this is common in physics but also now the default used in WolframAlpha - a powerful

math AI tool) as well as the definition for Dirac delta I used in lecture if needed.

(a) f(x) = 1

p2⇡ e

1

22 (xµ)2

(b) f(t) = sin(!0t) , !0 constant

(c) f(x) = ea|x| and a > 0

(d) (distribution) f(t) = (t)

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp