联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-06-08 09:08

Parallel Computing: Homework 5

Introduction

In this assignment, you are expected to parallellise the summation over a tree structure. In tree

summation, we recursively sum the values of the sub-trees at a given node, and then add the value

at the current node. In this manner, you will visit every node in the tree, and accumulate the sum

in a recursive fashion.

Objective

When traversing a large tree, this sum procedure might take a long time. Parallelisation can help

speed up this process. It is your task to implement the parallelisation, and evaluate how big its

impact is in different scenarios.

On Brightspace, we have provided two different tree structures: a basic binary tree, and a red-black

tree implementation. The former should be familiar to everyone; the latter has been discussed

during the course Advanced Algorithms and Data Structures. You can read more about the

red-black tree here.

The tree structure and construction algorithm have been put in place, so you do not need to have

in-depth knowledge of this implementation. Instead, we ask you to understand the properties of

the tree and how these properties could explain the speed-up behaviour you will observe.

For your implementation, make sure to keep the following aspects in mind:

? The tree structure is recursive. How can OpenMP deal with such recursive structures? Which

primitives should you use?

? OpenMP allows you to launch many tasks at the same time, potentially many more than the

number of threads available. Think about whether, and if so, how you can limit the number

of tasks OpenMP starts.

Execution

On Brightspace, the implementations of both rbTree.c and binaryTree.c have been provided.

They both share the same header file, tree.h, that is used by execute.c to perform operations on

the tree. This way, the same code for execute.c can work with both the red-black and simple

binary trees, depending on your compilation command.

The code can be compiled using the provided Makefile, by simply running make in your shell.

It will build two binaries: rbTree and binaryTree, that should be called as follows: ./rbTree n ,

where n is the number of threads to use, just like the previous assignment. The input, provided

on stdin consists of two numbers: the number of nodes in the tree, and whether you want to

have these numbers randomly generated (0), or sequential (1). This option affects the final tree

structure, and thus, the overall performance.

1

Note that for large trees, the program can quickly run out of stack space. To combat this,

execute.c tries to raise the stack limit, but this might not always work (for example, WSL1 does

not allow raising it above 8MB). You can investigate the stack size limits on your machine using

ulimit -Ss for the current limit (adjustable by the program), and ulimit -Hs for the hard

limit (only adjustable by root). Hábrók (like most modern Linux systems) by default has no hard

limit, so the program should work fine.

Deliverables

We ask you to submit your code to Themis, just like the previous assignment, which will only

perform some basic sanity checks and will not extensively test the efficiency of your implementation.

Themis will check that the output is still correct, ensure that the efficiency of serial execution is

not hurt by the parallel implementation, and check whether you achieved at least some speed-up

when running on 4 threads.

Note that you can sumbit ONLY rbTree.c or binaryTree.c, since Themis uses its own version of

execute.c to evaluate your code. Consequently, your parallel implementation should be contained

in rbTree.c or binaryTree.c. Furthermore, Themis will use the same tree.h header file, so you

cannot make any changes to the signatures of the functions included in this file. Of course, you

can change the implementation of these functions, and you can remove/add any other functions

you need - as long as they are contained within rbTree.c or binaryTree.c.

In addition, we ask you to submit (on Themis) a short and concise report (PDF, no more than 2

pages) outlining your approach to the parallelisation. The template for this report can be found

on Brightspace. More details on what we expect from the report can be found in the template.

This report should contain the table seen in Table 1, completed with the performance results you

obtained on Hábrók. The LATEX code for this table can be found on Brightspace. Please note the

following:

? The execution time should be reported in seconds (with no more than 1 decimal place).

? The speed-up and efficiency should be calculated relative to the execution time for 1 thread.

? The results can be (significantly) impacted by random variations in memory locality. For more

stable results, you may want to run the tests a number of times and average the performance.

In your report, describe your approach and elaborate on why this might be the case.

? The tree construction part takes the most time; that’s why this part is excluded from the

performance metrics as reported by execute.c. Constructing a tree can easily take 20

seconds, and only 1 second to be summed.

? Note that there is a difference between the ‘Serial’ and ‘1’ column: for ‘Serial’ we ask you to

run the original, unmodified code as provided, and for the ‘1’ column we ask you to run your

modified, parallelised code with 1 thread. If there are significant differences between these

two, please elaborate on that in your report.

? In your report, briefly elaborate on any behavioural differences between the tests (sequential

vs random, binary vs red-black) and try to explain where they originate.

2

Tree N Mode Metric Serial 1 2 4 8 16

Binary 20 000 000 RANDOM = 0

Execution time X X X X X X

Speed-up Y Y Y Y Y Y

Efficiency Z Z Z Z Z Z

Binary 10 000 000 SEQUENTIAL = 1

Execution time X X X X X X

Speed-up Y Y Y Y Y Y

Efficiency Z Z Z Z Z Z

Red-black 20 000 000 RANDOM = 0

Execution time X X X X X X

Speed-up Y Y Y Y Y Y

Efficiency Z Z Z Z Z Z

Red-black 10 000 000 SEQUENTIAL = 1

Execution time X X X X X X

Speed-up Y Y Y Y Y Y

Efficiency Z Z Z Z Z Z

Table 1: Performance Results.

Grading

Similar to the previous assignment, we will grade on the criteria correctness, efficiency, and code

style:

Correctness corresponds to passing test cases on Themis. If you did not pass all test cases, you

might still get partial points. A manual code check might override the results from Themis,

e.g. if you hardcoded answers, have a race condition in your code, or circumvented explicit

assignment instructions.

Efficiency corresponds to the results you report in the performance table. We will perform a

check of these performance numbers on Hábrók ourselves, so do not be tempted to alter

them! Additionally, we will also look at your implementation and report to see how you

approached problems such as load balancing and reducing parallel overhead. Particularly

efficient or sophisticated implementations might be awarded some bonus points.

Code style includes, but is not limited to: sufficient documentation, clear naming of variables,

proper code structring, proper code formatting, etc. This is not an exhaustive list!

Our grading will always be based on the most recent submission on Themis. Even though you

might have submitted a ‘better’(-performing) version before, we will only evaluate the most recent

one.

For this assignment, there are no exercises on Brightspace.

3


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp