代写COMP528、代做C++编程语言、C/C++语言代写、代做parallel programming-代写C/C++编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> C/C++编程C/C++编程

代写COMP528、代做C++编程语言、C/C++语言代写、代做parallel programming

日期：2019-07-16 11:30

COMP528

Assignment Resits (2018/19)

4 assignments, each worth 10% of total

your letter will indicate which (if any) assignments you are expected to resit

Resits questions comparable to original, testing same learning etc

you will get lots of hints and help by going back to the lab work and to previous assignments all codes to be written in C, compiled and benchmarked on Chadwick standards of academic integrity expected (as per original) - reports may go through “TurnItIn”

for automatic checking will be marked on the code & report, for correctness & understanding of the topics Submission each assignment as a single zip file (comprising Report & code & any scripts (plus any

supporting evidence you wish)) submission to SAM: 91 for resit#1, 92 for resit#1, 93 for resit#3, 94 for resit#4

DEADLINE for all submissions: 10am, Friday 9th August 2019

Assignment 1: MPI

Assignment #1 Resit

Testing knowledge of

parallel programming & MPI & timing via a batch system

TASK: least squares regression – parallelisation using MPI

https://www.mathsisfun.com/data/least-squares-regression.html

for set of discrete points (x[i], y[i]), the best linear fit y=mx + b

using given equations (next slide) to determine m & b

write two C codes to determine m and b for a given input set of x,y

i. A serial test code

ii. One using MPI parallelism

use the Intel compiler and compile with no optimisation ‘-O0’

time the section of the code (running in batch) that finds m & b, and do this

on various number of MPI processes and discuss your findings e.g. in terms of

speed-up and parallel efficiency (and Amdahl’s Law)

Assignment #1 Resit

Remember:

can parallelise where lots of

independent work

MPI is single code with each

process having its own “rank”

(useful to split up work?)

MPI provides “Reduction” calls e.g. for doing summation over processes

and storing result on “root” process (or on all processes)

MPI provides timing MPI_Wtime function, and the wall-clock time is the

difference between two consecutive calls to MPI_Wtime

that N may not be equally divisible by the number of MPI processes

(available via MPI_Comm_size function)

https://www.mathsisfun.com/data/least-squares-regression.html

Assignment #1 Resit Data suggestion: use a small set of input data (x,y), to check you are getting the

correct answer (serially and for any number of MPI processes); once all good,

then use data for the assignment (as below). Remember to use the batch

system to undertake your timings for different numbers of MPI processes

Assignment data:

N=100,000

x[i] = (float)i/1000.0 for i=

1 to i=99,999 note we start at i=1 and go to N-1

y[i] = sin(x[i]/500.0) / cos(x[i]/499.0 + x[i]) you will need to include <math.h>

Assignment #1 Resit Code Submit both serial & MPI code

Submit any scripts used

Report: up to 3 pages

Discussion of your approach & of your results

Give command that you use to

Compile

Submit and run your parallel code

The equation of the best fit straight line

Marking

Correctness of codes: 50%

Explaining/understanding parallel principles & MPI: 25%

Discussion of results: 25%

Assignment 2: OpenMP

Assignment #2 Resit

Testing knowledge of

parallel programming & OpenMP & timing via a batch system

TASK: least squares regression – parallelisation using OpenMP

(see Assignment#1 for detailed description)

for set of points discrete points (x[i], y[i]), the best linear fit y=mx + b

using given equations (next slide) to determine m & b

use the same assignment data as described for Assignment#1 Resit

write a C code to determine m and b for a given input set of x,y that uses

OpenMP work-sharing constructs to parallelise the wor

k use the Intel compiler and compile with no optimisation ‘-O0’

time the section of the code (running in batch) that finds m & b, and do this

on various number of OpenMP threads and discuss your findings e.g. in terms

of speed-up and parallel efficiency (and Amdahl’s Law)

Assignment #2 Resit

Remember: can parallelise where lots of independent work

OpenMP is single code with fork-join parallel regions in which each thread

having its own thread number. Typically parallelise at the ‘for’ loop level

OpenMP provides a “Reduction” clause e.g. for doing summation over

processes and storing result on “master” thread

OpenMP provides timing omp_get_wtime function, and the wall-clock

time is the difference between two consecutive calls

OpenMP loop parallelisation can have different “schedules” which may be

useful for irregular work distribution between threads

You can use compiler flags to ignore all OpenMP.

Assignment #2 Resit

Code

Submit OpenMP code

Submit any scripts used

Report: up to 3 pages

Discussion of your approach & of your results

Give command that you use to

Compile

Submit and run your parallel code

The equation of the best fit straight line

Marking

Correctness of code: 50%

Explaining/understanding parallel principles & MPI: 25%

Discussion of results: 25%

Assignment 3: GPU Programming

Assignment #3 Resit

Testing knowledge of

parallel programming of GPUs

TASK: discretization using GPU

Function f(x) = exp(x/3.1) - x*x*x*x*18.0

You need to discretize this between x=0.0 and x=60.0 and find the minimum

using 33M points

Write a C-based code with an accelerated kernel written in either CUDA or

using OpenACC directives; the code should

time a serial run comprising setting values and then finding minimum (i.e. all on the CPU)

time an accelerated run with values set on the GPU, passed back to CPU and the

minimum found on the CPU

Assignment #3 Resit

Reminder for CUDA

write C + CUDA kernel in file e.g. myCode.cu (note the .cu suffix)

compile (on login node):

module load cuda-8.0

nvcc -Xcompiler -fopenmp myCode.cu

debug running in batch

qrsh -l gputype=tesla,h_rt=00:10:00 -pe smp 1-16 -V -cwd ./a.out

timing run in batch (hogging all GPU & CPU cores for yourself)

qrsh -l gputype=tesla,exclusive,h_rt=00:10:00 -pe smp 16 -V -cwd ./a.out

For openACC

please see lecture notes

Assignment #3 Resit

Code

Submit code and any scripts used

Report: up to 3 pages

Discussion of your approach & of your results

including how speed ratio of GPU to CPU

noting whether you include GPU memory & data costs (and what effect this would have)

Give command that you use to

Compile, submit and run your parallel code

Value of the minimum of f(x[i]) and for which value of x[i] this occurs

Marking

Correctness of code: 40%

Explaining/understanding parallel principles & GPUs: 30%

Discussion of results: 30%

Assignment 4: hybrid programming

Assignment #4 Resit

Testing knowledge of

parallel programming & hybrid MPI+OpenMP parallelism

TASK: hybrid MPI+OpenMP parallelisation of galaxy formation

using the C code “COMP528-assign4-resit.c” provided in Sub-Section “Resit

Assignments” at https://cgi.csc.liv.ac.uk/~mkbane/COMP528/

add MPI and OpenMP to accelerate the simulation (including, if appropriate, the

initialisation); as per the original assignment, use MPI to parallelise at a coarse

grained level (dividing the number of bodies (variable “BODIES”) between the

number of processes) and each MPI process then using OpenMP to parallelise its

work

use the Intel compiler and compile with optimisation flag ‘-O2’

time the section of the code (running in batch) that simulates the movement of the

galaxies, and do this on various number of MPI processes & OpenMP threads

Assignment #4 Resit

Code - submit MPI+OpenMP code & any scripts used

Report: up to 3 pages

Discussion of your approach & of your results

how you determined what to parallelise & explain why you chose the given parallelisation method

the results (accuracy, speed-up, parallel efficiency)

which combination of MPI/OpenMP you found to be the fastest

Include a paragraph on what you would need to scale the number of BODIES by 100

orders of magnitude (and keep run time about the same)

e.g. is Barkla big enough? is CPU the only option?

State commands that you use to

Compile, submit, run & time your code to get timing data presented

Marking

Code: 30%

Explaining/understanding parallel principles used: 25%

Discussion on scaling by 100 orders of magnitude: 20%

Discussion of results: 25%

? Good luck

Ask if any questions!

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写CS 246、代做c++设计、代写graphicsdemo、代做c/c++编程语言

【下一篇】：代写CS 246、代做c++设计、代写graphicsdemo、代做c/c++编程语言

联系方式

最新辅导

热门辅导

您当前位置：首页 >> C/C++编程C/C++编程

代写COMP528、代做C++编程语言、C/C++语言代写、代做parallel programming

日期：2019-07-16 11:30

相关文章