Numerical Simulation and Parallel Programming
Notebook
Version 1.02
Haide College, OUC
2023.08
1
0 Information
Author:Fukaya Takeshi, Iwashita Takeshi (Hokkaido University), ZHANG Linjie, FU Kai (Ocean University of China)
E-mails: zhanglinjie@ouc.edu.cn, kfu@ouc.edu.cn
✓Outcomes ✏
1. Learn numerical simulations for simple physical models described by differential equations.
2. Learn the basic of MPI (Message-Passing Interface) parallel programming.
✒ ✑
Keyword: Heat conduction equation, finite difference method, MPI parallelization
1 Schedule This part has 30 credit hours in total.
1-4: How to program on Linux server.
5-8: Learn numerical simulation for 1-D heat conduction.
9-12: Learn numerical simulation program for 2-D heat conduction.
13-16: Learn the basic of parallel programming with MPI.
17-24: Parallelize the 2-D heat conduction simulation program with MPI.
It is recommended that make use of the class time effectively. Keep up with the programming even if
you don’t understand something.
2 How to submit a report Explain at the beginning (or in the middle) of the class.
• Time Limit: Further notice.
• How to submit: Print in A4 paper. Single-sided printing, double-sided printing, black and white
printing, color printing, are acceptable.
3 Academic Integrity It is unacceptable to submit work that is not your own. Assignments will be
checked for similarity and copying to ensure that students work is their own. Information sources must
be paraphrased (put in your OWN words - not copied directly) and the reference must be included. You
are also responsible for complying with any additional rules related to assessments communicated to you
by your instructors. Assignments that are found to breach this may be given zero. Further actions may
be taken.
2
4 Precautions about this part
• Represent numerical values (for example, temperature value) in double decision.
• It will not lose points if your program have to be recompiled every time when the setting changes.
• In the sample program, “ ” means a half-width space.
5 Precautions about report
• When visualizing the computing results, use the actual coordinates or time, instead of the number
of steps (space or time).
• Don’t insert too many pictures or graphs unless it is necessary.
• All pictures and graphs should be mentioned in the report.
• If you cite a paper or a website, identify it as a reference.
• Formulas should be concise, and be explained in words.
3
1 How to program on Linux server
Important information of server
• Server IP address: Further notice
• User Account: Further notice
• Password: Further notice
Basic Linux Commands The server is a Linux system (a type of operating system) which is
considerably different from Windows, and similar to Mac. In order to be more productive in this class,
you need to learn some basic about how to get things done in the Linux system.
1. ls (short for list). To find out what is in your current directory, type
$ ls
To list all files in your directory including those whose names begin with a dot (hidden files), type
$ ls -a
2. mkdir (make directory). Make a subdirectory in your current working directory, for example test.
Type
$ mkdir test
3. cd (change directory)
To change to the directory you have just made, type
$ cd test
Exercise 1a Make another directory inside the test directory called backups
4. The directories . and ..
In Linux, “.” means the current directory, ”..” means the parent of the current directory, so typing
$ cd .. (NOTE: there is a space between cd and the dot)
4
will take you one directory up the hierarchy.
5. pwd (print working directory). To find out the absolute pathname of your working directory, type
$ pwd
6. ∼ (your home directory). To list the contents of your home directory, type:
$ ls ~
7. cp (copy). Use “echo” and “>” to creat a file. Change to directory “test”, and copy the file
NumSim.txt to the current directory, type:
$ echo "Numerical simulation" > NumSim.txt
$ cd ~/test
$ cp ~/NumSim.txt .
8. mv (move). mv file1 file2 moves (or renames) file1 to file2. mv file1 dir moves file1 to directory
dir. For example:
$ cp NumSim.txt NumSim.bak.txt
$ mkdir backups
$ mv NumSim.bak.txt backups
9. Removing files and directories rm (remove), rmdir (remove directory)
As an example, create a copy of the NumSim.txt file then delete it.
$ cp NumSim.txt temp.txt
$ ls (to check if it has created the file)
$ rm temp.txt
$ ls (to check if it has deleted the file)
Summary
ls list files and directories
ls -a list all files and directories
mkdir make a directory
cd directory change to named directory
cd change to home-directory
cd ∼ change to home-directory
cd .. change to parent directory
pwd display the path of the current directory
cp file1 file2 copy file1 and call it file2
mv file1 file2 move or rename file1 to file2
rm file remove a file
rmdir directory remove a directory
Table 1.1: Summary of basic Linux commands
5
Review of C programming While reviewing the C language programming, we will try to compute
C ← AB + C (1.1)
as an example, where A, B, C are n × n matrices. There are 6 steps:
1. Read matrix dimension n from command-line argument.
2. Allocate memory dynamically for A, B, C.
3. Initialize elements of matrix A, B, C.
4. Compute each element of matrix C with Eq. (1.1).
5. Display execution time and FLOPS.
6. Free memory, etc.
In step 1, dimension n of matrix is given from command-line arguments,
$./a.out 1000
where 1000 is the given value of n. In the program, we can use
n = atoi (argv[1]);
to read the command-line argument 1000 and save it to variable n.
In step 2, it is straightforward to treat matrix as 2-D array in C programming. But, it is a little
complicated to create a 2-D array dynamically. So we will treat matrix as a 1-D array with size n2. Let
ai,j be the element belong to column i and row j in matrix A, then the corresponding element in 1-D
array is A[j ∗ n + i] (0 ≤ i, j ≤ n − 1). we use column-major form here. After declaring a double-type
pointer ∗A, we allocate memory space for A with the following code.
A = (double *)malloc (sizeof (double)*n*n);
Note that if we don’t know the size of array until the program is executed, we have to allocate memory
space dynamically.
In step 3, in order to make it easier to verify the correctness of our program, the initial values of matrix
A, B, C are set as
aij = 1
√n
, bij = 1
√n
, cij = 0. (1.2)
(It’s not difficult to find the the values of matrix C after Eq. (1.1).)
Now consider step 4, which is the main part of this program. The calculation of Eq. (1.1) is
cij ← cij +
n
!−1
k=0
aikbkj , (for each i, j). (1.3)
So, we can write the program like
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
/* Compute each element of matrix $C$ */
}
}
6
We can confirm whether our program runs correctly by checking the values of matrix C after calculation.
For measuring the execution time, we can write a function get time as following.
#include <sys/time.h>
double get_time ()
{
struct timeval tv;
gettimeofday (&tv, NULL);
return tv.tv_sec + (double)tv.tv_usec*1e-6;
}
Then we can get the elapsed time (seconds) like
t0 = get_time ();
/* process to be measured */
t1 = get_time ();
time = t1 - t0;
In addition to the execution time, we can calculate FLOPS (Floating-point Operations Per Second) too.
FLOPS = Times of Floating-point Operations
Elapsed Time (second) (1.4)
Since the number of floating-point operations in Eq. (1.1) is 2n3, so flops in our program can be computed
as
flops = 2.0* (double)n* (double)n* (double)n/time;
Then we display the execution time and FLOPS.
printf ("n!=!%d,!!time!(sec)!=!%8.3e,!!FLOPS!=!%8.3e\n", n, time, flops);
Finally, free memory allocated in step 2.
free (a);
7
Task 1.1
Most compilers have optimization option. Depending on this option, our program is optimized
at compile time. Please check how the program performance influenced by optimization option.
Write a program that performs the calculation in Eq. (1.1). Then compile it like
$gcc main.c -OX -lm
The letter after the hyphen − is the uppercase letter O, and the X part is a number from
0(zero) to 3. The higher the number, the stronger the compiler optimization (0 means no optimization). ”-lm” may be necessary if you include math.h in your program.
Set n = 1000, 2000, 3000, change the optimization option from -O0 to -O3 and measure the
execution time and FLOPS for each case.
Task 1.2
The main part of this program (the part that calculates Eq.(1.1)) is a triple loop structure, and
the loop variables are i, j, k. The order of these loops can be changed. For example, j → k →
i does not change the final result, but the memory access pattern (the way of reading/writing
the elements of array) will be different. Set n = 1000, 2000, 3000, change the loop order (6 types
in total), and check the performance. Note that, in order to eliminate the effects of compiler
optimization, please compile your program with optimization option -O0. If the execution time
(performance) differs, please explain it as far as you can understand.
8
2 Numerical simulation for 1-D heat
conduction
Computer simulations are widely used in the fields of engineering and natural sciences. Generally, when
simulating a certain physical phenomenon, four steps are included.
1. Modeling: derive one or more partial differential equations (PDEs) to describe the phenomenon.
2. Discretization: transform partial differential equations into discrete form.
3. Calculation: solve discretized problems with computer, get the numerical solutions of partial differential equations.
4. Visualization: Display calculation results into visible form.
We will experience the above flow through a simple physical phenomenon as an example.
Problem setting: One-dimensional heat conduction Consider the conduction of heat
along a stick of length L as shown in Fig. 2.1. We don’t care about the cross-sectional area since it’s very
small relative to the length. Let the left endpoint be the original point (x = 0), and the x-axis is taken in
the length direction. Heat conduction happens along ±x direction. In addition, heat exchange between
this stick and the outside world occurs only at the end points.
Fig 2.1: A stick
Derivation of partial differential equations Let u(x, t)denote the temperature at position x
at time t, 0 ≤ x ≤ L, 0 ≤ t ≤ T. 1 This 1-D heat conduction phenomenon can be described by a partial
differential equation. (See Task 2.3 for the derivation.)
∂
∂t
u(x, t) = α
∂2
∂x2 u(x, t). (2.1)
1Consider heat conduction up to time t = T.
9
α is temperature diffusivity. Eq. (2.1) is solvable when initial condition u(x, 0) (temperature distribution
of the object at time t = 0) and boundary condition (condition related to temperature at both left and
right ends) are given.
Discretization Most of the PDEs that appear in engineering and the natural sciences are difficult to
solve analytically. Therefore, instead of finding an analytical solution, computer is used to get numerical
solution.
In general, the domains in which the PDE is defined are continuous, and the solution is continuous
too. In case of Eq.(2.1), they are [0, L] × [0, T] and u(x, t) separately. So, they cannot be handled directly
on the computer, therefore “discretization” is required.
Now we will study how to discretize the 1-D heat conduction problem given in Eq.(2.1). As mentioned
before, it is not possible to handle all points (x, t) of 0 ≤ x ≤ L, 0 ≤ t ≤ T on a computer. Therefore, we
separate the spatial domain x by step size ∆x and the time domain t by step size ∆t. (See Fig. 2.2 and
Fig. 2.3)
xi := i∆x (i = 0, 1,..., L
∆x =: Mx), (2.2)
tn := n∆t (n = 0, 1,..., T
∆t =: N). (2.3)
Then we only compute the solutions on the discretized points (xi, tn) only, that is
un
i := u(xi, tn). (2.4)
Fig 2.2: Spatial discretization Fig 2.3: Temporal discretization
Now, we will compute the solutions on (Mx + 1) × (N + 1) discrete points. We approximate the
derivative with forward difference in time and with central difference in space. Then we get
un+1
i − un
i
∆t = α
un
i+1 − 2un
i + un
i−1
(∆x)2 . (2.5)
See Task 2.4 for details. Note that Eq. (2.5) is defined only for the points i = 1, 2,...,Mx − 1, end points
are not included. For end points (i = 0, Mx), we have
un
0 := u(0, tn), un
Mx := u(L, tn) (2.6)
by applying the given boundary conditions.
Calculation of numerical solution Discretization has made it possible to handle the PDE
problem on a computer.
First, Transform Eq. (2.5) to the following from
un+1
i = ··· . (2.7)
10
Now we can see that only the value at time step n appears on the right side. In other words, if un
i is
known, the values of un+1
i can be easily obtained from Eq. (2.7). 2 The values at time step n = 0 is given
by initial condition.
u0
i := u(xi, 0). (2.8)
Therefore, we take the values at time step n = 0 as a starting point, then we compute the values of other
time steps n = 1, 2,... with Eq. (2.7) in order.
See Program 2.1 for the skeleton. Note that, in this program, instead of setting the step sizes ∆x and
∆t, we use the number of discrete points in each dimension Mx, N, which can be specified from run-time
arguments. print data (...) is a function (described later) that outputs the calculation results to file.
Program 2.1: Overview of 1-D heat conduction simulation program by finite difference method
1 /∗ Include required header files ∗/
2 #define L 1.0 //Length of bar
3 #define T 1.0 //Maximun time to simulation
4 #define alpha 1.0 //Thermal diffusivity
5 #define T SPAN 20 //Result output frequency (adjusted appropriately)
6
7 int main(int argc, char ∗argv[])
8 {
9 /∗ Declaration of variables (add each necessary variable) ∗/
10 double ∗u, ∗uu, dx, dt;
11 int Mx, N;
12
13 /∗ Get the values of Mx and N from the arguments ∗/
14
15 /∗ Dynamic allocation of memory, variable setting, etc. ∗/
16
17 for(i = 0; i <= MX; i++)
18 {
19 /∗ Set initial conditions for u ∗/
20 }
21 print data(...); //Output the initial state to a file
22
23 for(n = 1; n <= N; n++)
24 {
25 for(i = 1; i < MX; i++)
26 {
27 /∗ Calculate the value (uu) of the next time step from the current value (u) ∗/
28 }
29 for(i = 1; i < MX; i++)
30 {
31 /∗ Copy the value of uu to u ∗/
32 }
33 if(n%T SPAN == 0)
34 {
35 print data(...); //Output to a file at a regular frequency
36 }
37 }
38
39 /∗ Free memory ∗/
40
41 return 0;
42 }
2At a certain point i, values at time step n + 1 can be (easily) obtained from the values at the previous step n. We call
this explicit method.
11
Visualization When analyzing the calculation results, it is important to show them in a visually easyto-see form, that is visualization. In this experiment, visualization will be performed using a free software
gnuplot, which can graph data stored in file.
First, save the calculation results (temperature in this sample) of a certain time step to file (See
Program 2.2). Function print data creates a file named data ******.dat, and outputs the x-coordinate
and computed temperature (separated by space) to this file. 3
Program 2.2: Function that saves the calculation result to file
1 void print data(int Mx, int n, double dx, double dt, double ∗u)
2 {
3 FILE ∗fp;
4 char sfile[256];
5 int i;
6
7 sprintf(sfile, "data_%06d.dat", n);
8 fp = fopen(sfile, "w");
9 fprintf(fp, "#time!=!%lf\n", (double)n ∗ dt);
10 fprintf(fp, "#x!u\n");
11 for(i = 0; i <= MX; i++)
12 {
13 fprintf(fp, "%lf!%12lf\n", (double)i∗dx, u[i]);
14 }
15 fclose(fp);
16 return;
17 }
Now we can use gnuplot to display data in a single file. First, create a script files shown in Programs
2.3 in the same folder with your program. Then run the following command from the terminal.
$gnuplot sample0.gp
Program 2.3: Scipt of creating a gif still image (sample0.gp)
1 set terminal gif
2 set output "test_still.gif" #Adjust as needed
3 set xrange [0:1] #Adjust as needed
4 set yrange [0:100] #Adjust as needed
5 plot "***.dat" using 1:2 with line
6 unset output
If it works, a gif animation image file named test still.gif will be found in the same folder.
Since the temperature may change over time, it will be a good idea to create a gif animation to visualize
the state of time change. Here’s how it works. First, create two script files shown in Programs 2.4 and
2.5 in one folder. Then run the following command from the terminal.
$gnuplot sample1.gp
If it works, a gif animation image file named test.gif will be found in the same folder.
Program 2.4: Script of creating a gif animation (sample1.gp)
1 reset
2 set nokey
3 set xrange [0:1] #Adjust as needed
4 set yrange [0:100] #Adjust as needed
5 set size square
6
3Lines beginning with # are ignored as comments when plotting with gnuplot.
12
7 set terminal gif animate delay 50 #Display interval can be adjusted by changing 50
8 set output "test.gif" #Specifying the name of the output file
9
10 n0 = 0
11 nmax = 100 #Number of divisions in the time direction
12 dn = 20 #Step size (program T SPAN value)
13
14 load "sample1.plt"
15
16 unset output
Program 2.5: Script of creating a gif animation (sample1.plt)
1 if(exist("n")==0 || n<0) n = n0
2
3 title n = sprintf("t!=!%f", 1.0∗n/nmax)
4 unset label
5 set label 1 at graph 0.05,0.95 title n #The position of the legend can be adjusted by changing (0.05, 0.95)
6
7 filename = sprintf("data_%06d.dat", n)
8 plot filename using 1:2 with line
9
10 n = n + dn
11 if(n <= nmax) reread
12
13 undefine n
Task 2.1
Set L = 1.0, T = 1.0, α = 1.0.
• Initial conditions: u(0, 0) = 20.0, u(x, 0) = 0.0 (0 <x<L), u(L, 0) = 50.0
• Boundary condition:u(0.t) = 20.0, u(L, t) = 50.0
Based on the explanation so far, create a simulation of 1-D heat conduction (including file output)
and complete the following tasks.
1. Run the program with Mx = 10, N = 1000 and print the results to file every 20 time steps.
And
• Check the temperature at t = 0, 0.1, 0.2, 0.5 by displaying it as a still image with
gnuplot.
• Create a gif animation with gnuplot.
2. For more precise simulation, increase the number of divisions Mx. Note that if we just
increase Mx, the simulation maybe fail (the program works correctly, but the calculated
values are meaningless, such as Nan and Inf). Check if it works with the setting Mx =
50, N = 1000.
3. To prevent the simulation from failing when the number of points in the spatial dimension
(Mx) is increased, we should increase the number of segments in time (N). Therefore,
consider increasing Mx = 50 and then increasing N = 2000, 3000, 4000, 5000 seperately.
Check if the simulation fails at each N. If the simulation is successful, create a gif animation
of the result.
13
Task 2.2
In order to verify the validity of the simulation results, we compare the numerical solution and
the analytical solution in this task. Set L = 1.0, T = 1.0, α = 1.0.
• Initial condition: u(x, 0) = sin(πx)
• Boundary condition: u(0, t) = u(L, t)=0.0
The analytical solution of the partial differential equation of Eq. (2.1) is
u(x, t) = e−π2t sin(πx). (2.9)
1. Set Mx = 10, N = 1000 and run the program. Then, at t = 0.1, 0.5, compare the results
calculated by program with Eq. (2.1). Here is an example of the comparison. Add
replot exp (-pi*pi*0.1)*sin (pi*x)
to Programs 2.3 (after line 5).
2. In general, the accuracy of simulation will be improved as the number of divisions increases.
Therefore, execute the program in three cases (Mx, N) = (10, 200),(50, 5000),(100, 20000)
and compute
max
i
|un
i − u˜n
i |
|u˜n
i | . (2.10)
u is the value obtained by simulation (numerical solution), and ˜u is the value obtained from
the analytical solution.
14
Task 2.3
We will derive the partial differential equation shown in equation (2.1) here. Please fill in the
blanks.
Let q(x, t) be the heat that flows through the unit cross section in a second. The flow
in the positive direction of x axis is positive. According to Fourier’s law, if κ is the
coefficient of thermal conductivity, then
q(x, t) = (a) . (2.11)
Next, let h be a small length, and consider the region between x and x+h. The amount
of heat stored in a small length in a second is
∆q = q(x, t) − q(x + h, t) ) (b) . (2.12)
Note that, the Taylor expansion is performed for x, and the terms after h2 are ignored.
If the density of the object is ρ and the specific heat is cp, the amount of heat required
to raise the temperature of this small region by 1 degree is (c) . Therefore, if all
the heat stored in this small region in a second is immediately spent on the temperature
change of the object. Denote α := κ
cpρ , then the temperature change per second is
∂
∂t
u(x, t) ≈ ∆q
(d)
= (e) . (2.13)
Now, we get equation (2.1).
Task 2.4
When deriving equation (2.5), we approximate the derivative. For the first-order derivative
( d
dx u(x)), please indicate “forward difference”, “backward difference”, and “center difference”
separately. And, for the second-order derivative ( d2
dx2 u(x)), show the approximation obtained by
repeating the center difference for the first-order derivative.
Task∗ 2.5
In equation (2.5), the derivative in the time dimension was approximated by “forward difference”
and in the spatial dimension, “center difference” was selected. Please give the formula with “backward difference” in time dimension and “center difference” in spatial dimension. And describe
the difference with Eq. (2.5).
15
3 Numerical simulation for 2-D heat
conduction
We will simulate 2-D heat conduction. It is an advanced version of 1-D. It is also a basis for parallelization.
Problem setting: 2-D heat conduction Consider heat conduction in a plate-shaped object
shown in Fig. 3.1. It is assumed that its thickness is very small, so we can ignore it. We also assume that
heat exchange only happen on its edges.
Fig 3.1: Plate-shaped object
Partial differential equations and discretization Write the temperature at time t at
position (x, y) as u(x, y, t). The 2-D heat conduction can be described like
∂
∂t
u(x, y, t) = α
" ∂2
∂x2 u(x, y, t) + ∂2
∂y2 u(x, y, t)
#
. (3.1)
In this experiment, initial condition is
u(x, y, 0) = u0(x, y), (0 <x<Lx, 0 <y<Ly), (3.2)
boundary condition is
u(x, y, t) = g(x, y, t), (x, y ∈ ∂Ω). (3.3)
Here, ∂Ω is the set of points on boundary. u0(x, y) and g(x, y, t) are a given functions. By finding the
solution of function u(x, y, t) we can know how the temperature on plate changes.
16
Discretization First, let the step sizes in x, y directions be ∆x, ∆y, and the time step size be ∆t, then
we have
xi := i∆x (i = 0, 1,..., Lx
∆x =: Mx), (3.4)
yj := j∆y (i = 0, 1,..., Ly
∆y =: My), (3.5)
tn := n∆t (n = 0, 1,..., T
∆t =: N). (3.6)
Discretization in spacial dimension is shown in Fig. 3.2.
Fig 3.2: Discretization in 2D
Our propose is to calculate the numerical solution of Eq. (3.1), i.e.,
u(n)
i,j := u(xi, yj , tn). (3.7)
Similar to the case of 1-D, the derivative in the x, y dimension is approximated by center difference,
and forward difference is used for time derivative. Then we can get
u(n+1)
i,j − u(n)
i,j
∆t = α
$u(n)
i+1,j − 2u(n)
i,j + u(n)
i−1,j
(∆x)2 +
u(n)
i,j+1 − 2u(n)
i,j + u(n)
i,j−1
(∆y)2
%
. (3.8)
By using Eq. (3.8) and the given initial and boundary conditions, numerical solutions u(n)
i,j can be calculated.
Calculation of numerical solution The calculation is basically the same as 1-D. First, set the
values at time step n = 0 with initial condition. Then, compute the values of other steps.
The most difference between the calculation of 1-D and 2-D, maybe, is how to store the values of ui,j
in an array. From the contents of “Review of C programming”, we know that 1-D array is more convenient
when we allocate memory dynamically. So, we keep using 1-D array here.
17
Visualization Visualization for 2-D problem is also similar to 1-D learned before. First, output
computed result data to file, please
• Output x, y, u(x, y, t) splited by spaces.
• Output a blank line when x changes.
Then we get output file like
#x y u (x, y, t) ※ Comment line (can be omitted)
0.0 0.0 ***
0.0 0.1 ***
0.0 0.2 ***
[omit]
0.0 0.9 ***
0.0 1.0 ***
(Important: a blank line)
0.1 0.0 ***
0.1 0.1 ***
[omit]
0.1 0.9 ***
0.1 1.0 ***
(Important: a blank line)
0.2 0.0 ***
0.2 0.1 ***
[omit]
In the case of 2-D heat conduction, bird’s-eye view (as shown in Fig. 3.3) and color distribution map
(as shown in Fig. 3.4) are well used.
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
10
20
30
40
50
60
70
80
"data_003400.dat" using 1:2:3
Fig 3.3: Bird’s-eye view
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
"data_003400.dat" using 1:2:3
0
10
20
30
40
50
60
Fig 3.4: Contour plot
Programs 3.1 and 3.2 are examples for outputting still gif image with bird’s-eye view and contour
plot, respectively. Note that l in “w l” is the short for “line” (the first letter of “line”). After creating
script files, run the following command from the terminal.
18
$gnuplot ***.gp
If it works, a gif still image file will be found in the same folder.
Program 3.1: Script of creating a gif still image with bird’s-eye view (plot 3d bird still.gp)
1 reset
2 set terminal gif
3 set output "test_heat_bird_still.gif"
4 set nokey
5 set view equal xy
6 set xlabel ’x’
7 set ylabel ’y’
8
9 set xrange [0:1]
10 set yrange [0:1]
11 set zrange [0:100]
12 set size square
13
14 splot ’data_000000.dat’ using 1:2:3 w l
15
16 unset output
Program 3.2: Script of creating a gif still image as contour plot (plot 3d map still.gp)
1 reset
2 set terminal gif
3 set output "test_heat_map_still.gif"
4 set nokey
5 set view equal xy
6 set xlabel ’x’
7 set ylabel ’y’
8
9 set xrange [0:1]
10 set yrange [0:1]
11 set zrange [0:100]
12
13 set cbrange [0:100] #Adjust as needed
14 set palette defined (0 ’blue’, 50 ’red’, 100 ’yellow’)
15 unset surface
16 unset ztics
17 set pm3d at b
18 set view 0,0
19 set colorbox vertical user origin 0.8, 0.2
20
21
22 splot ’data_000000.dat’ using 1:2:3 w l
23
24 unset output
The simulation results can be visualized with a gif animation by gnuplot too. You can realize it on
the basis of 1-D. Hint: the following code will make the time information more clear in gif animation.
set label 1 center at screen 0.5, 0.85 title_n
19
Task 3.1
Create a simulation program based on the explanations so far, and simulate the state of 2-D heat
conduction. Let Lx = Ly = 1.0, T = 1.0, α = 1.0. Initial condition and boundary conditions are
given as:
• u(x, y, 0) = 0.0 (x, y /∈ ∂Ω)
• u(x, 0, t) = 20x + 10 (0 ≤ x ≤ Lx, t ≥ 0)
• u(x, Ly, t) = 40x + 40 (0 ≤ x ≤ Lx, t ≥ 0)
• u(0, y, t) = 30y2 + 10 (0 <y<Ly, t ≥ 0)
• u(Lx, y, t) = 50y2 + 30 (0 <y<Ly, t ≥ 0)
Execute program with the following parameters and check the results.
1. Mx = My = 10, N = 1000
2. Mx = My = 20, N = 1000
3. Mx = My = 20, N = 4000
If it does not break down, create a gif animation with bird’s-eye view and color distribution.
(The frequency of outputting the calculation results should be set appropriately so that the gif
animation looks natural.)
Task 3.2
Set Lx = Ly = 1.0, T = 1.0, α = 1.0, and the initial and boundary conditions are the same as in
Task 3.1. It is known that the simulation does not fail if we compute N with
N = 2αT
$&Mx
Lx
'2
+
&My
Ly
'2
%
.
Therefore, we will determine N according to this formula and perform the following tasks. (Set
the compile option to -O2.)
1. Measure the execution time in case of Mx = My = 100. See “Review of C programming”
for the time measurement method. Note that, do not include time for outputting to files,
since we only care about the execution time of calculation.
2. In order to simulate it more accurately, make the division in the spatial dimension finer.
Make Mx, My 10 times of that in 1, that is Mx = My = 1000. Let’s assume that the
execution time is proportional to the number of elements in the array. Please predict the
execution time when Mx = My = 1000 from the real time measured in 1. Make a rough
prediction, not a precise one.
3. In order to perform the simulation more accurately, consider setting the spatial division to
100 times, 1000 times, 10000 times. For each case, predict the execution time in the same
way as in 2. At the same time, estimate memory requirements (byte). Note that we use
two double-precision (double) arrays to store temperature data.
20
Task∗ 3.3
Set Lx = Ly = 1.0, T = 1.0, α = 1.0, initial condition is u(x, y, 0) = sin(πx) sin(πy), boundary
condition is u(x, y, t)=0.0 (x, y ∈ ∂Ω).
1. Try to find the analytical solution in this case. (Hint: Try to guess from the analytical
solution for one dimension (Task 2.2).)
2. Compare the analytical solution with the simulation results at t = 0.1 in case of Mx = My =
10, 50, 100 separately. Take N with the formula shown in Task 3.2. Method of comparison
is the same as in Task 2.2.
21
4 Basic of parallel programming with
MPI
In this part, we will learn basic techniques of MPI (Message Passing Interface), which is widely used
in parallelizing simulation programs.
Introduction Let’s start with an overview of MPI. MPI is (strictly) an API standard for message
communication between processes. An implementation of this (MPI library) is provided by vendors for
free or not free. By now, we can use MPI in C language (or Fortran, etc.).
The parallel model of MPI is SPMD (Single Program Multiple Data) (See Fig. 4.1). The unit
of processing is called process. All of the processes are carried out at the same time. Each MPI process
is assigned a unique identification number (starting from 0) called rank, and has its own global variables,
environment. The programs executed on each process are the same (a.out in the case of Fig. 4.1), that
is “Single Program”. On the other hand, the data held by each process is generally different, that is
“Multiple Data”.
Fig 4.1: Parallel processing of MPI
Let’s illustrate the parallelism mechanism of MPI in more detail with Fig. 4.2. As already mentioned,
since the program on each process is same, so all of the processes have same variables. For example, in
Fig. 4.2, both rank 0 process (hereinafter, process 0) and rank 1 process (process 1) have a double type
array named “data”. However, each process can hold different elements (values).
Each process can only use the data held by itself, since the memory space of each process is independent to each other. if they want to exchange data with other processes, they have to send/receive data
(pass message) using functions given by MPI. See Fig. 4.2 as an example,
• process 0: Receive data from process 1 by function (MPI Recv)
• process 1:Send data to process 0 by function (MPI Send)
There are three kinds of functions provided by MPI. We will focus on one-to-one communication due
to limited credit hours.
• One-to-one communication: Functions for sending and receiving data between two processes (e.g.
MPI Send, MPI Recv)
22
Fig 4.2: Schematic diagram of memory space and data passing of MPI
• Group communication: Functions for aggregating data between many processes (e.g. MPI Reduce,
MPI Gather)
• Others: Functions for initialization, termination, information acquisition, time measurement, etc.
(e.g. MPI Init, MPI Comm size, MPI Wtime)
The minimum knowledge you should know for doing MPI parallelization is
• Basic structure of MPI program (initialization, termination, acquisition of rank number, etc.)
• How to compile MPI program (environment dependent)
• How to run MPI program (environment dependent)
23
MPI programming example (1): MPI Hello world In this program, each process sleeps
for my rank ∗ 2 seconds, where my rank is its rank number. Then a random number is generates. Along
with “Hello world”, rank number and random number are displayed.
Program 4.1: MPI hello world
1 #include <mpi.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <sys/time.h>
5 int main(int argc, char ∗argv[])
6 {
7 int nprocs, my rank;
8 int rand num;
9 struct timeval s;
10
11 MPI Init(&argc, &argv);
12 MPI Comm size(MPI COMM WORLD, &nprocs);
13 MPI Comm rank(MPI COMM WORLD, &my rank);
14
15 sleep(my rank∗2);
16
17 gettimeofday(&s, NULL);
18 srand(s.tv sec);
19 rand num = rand()%1000;
20
21 printf("Hello!world!(rank:!%d,!random_num:!%3d)\n", my rank, rand num);
22
23 MPI Finalize();
24
25 return 0;
26 }
In this program, lines related to MPI are:
• Line 1: Include header file for using MPI.
• Line 11: MPI initialization (must be done before calling all MPI functions).
• Line 12: Get the total number of processes.
• Line 13: Get rank number.
• Line 23: MPI termination.
Compile program 4.1 with
$mpicc mpi_parallel_hello.c
Then, run the executable MPI program with
$mpirun -np P ./a.out
where P is the number of processes evaluates the MPI job.
Task 4.1
Set the number of processes P to 1, 2, and 4, Run program 4.1, and check the outputs.
24
MPI programming example (2): One-to-one communication (in the case of
two processes) In Program 4.1, multiple processes are carried out in parallel. But there is no data
exchange. Now, we will learn how to send/receive message between two processes.
Program 4.2: Sending and receiving data between 2 processes
1 #include <mpi.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <sys/time.h>
5 int main(int argc, char ∗argv[])
6 {
7 int nprocs;
8 int my rank, your rank;
9 int my num, your num;
10 MPI Status stat;
11 struct timeval s;
12
13 MPI Init(&argc, &argv);
14 MPI Comm size(MPI COMM WORLD, &nprocs);
15 MPI Comm rank(MPI COMM WORLD, &my rank);
16
17 sleep(my rank∗2);
18
19 gettimeofday(&s, NULL);
20 srand(s.tv sec);
21 my num = rand()%1000;
22
23 if(my rank == 0)
24 {
25 your rank = 1;
26 MPI Send(&my num, 1, MPI INT, your rank, 0, MPI COMM WORLD);
27 MPI Recv(&your num, 1, MPI INT, your rank, 1, MPI COMM WORLD, &stat);
28 }
29 else
30 {
31 your rank = 0;
32 MPI Recv(&your num, 1, MPI INT, your rank, 0, MPI COMM WORLD, &stat);
33 MPI Send(&my num, 1, MPI INT, your rank, 1, MPI COMM WORLD);
34 }
35
36 printf("My!rank!is!%d.!My!num!is!%3d.!Your!num!is!%3d.\n", my rank, my num, your num);
37
38 MPI Finalize();
39
40 return 0;
41 }
In this program, a random number is generated in each process as in program 4.1. After that,
• Line 26: Process 0 sends its random number to process 1 with MPI Send.
• Line 27: Process 0 receive from process 1 with MPI Recv.
• Line 32: Process 1 receives from 0 process 0 with MPI Recv.
• Line 33: Process 1 sends its random number to process 0 with MPI Send.
MPI Send and MPI Recv used in this example are one-to-one communication functions provided by MPI.
• int MPI Send (const void *buf, int count, MPI Datatype datatype, int dest, int tag, MPI Comm
comm)
Function for sending data, will be blocked until sending is completed.
25
Fig 4.3: Deadlock Fig 4.4: Right way
– buf: a pointer to the buffer that contains the data to be sent.
– count: the number of elements in the buffer. If the data part of the message is empty, set the
count parameter to 0.
– datatype: the data type of the elements in the buffer array, for example, MPI INT, MPI DOUBLE,
MPI LONG, MPI CHAR.
– dest: the rank of the destination process within the communicator that is specified by the
comm parameter.
– tag: the message tag that can be used to distinguish different types of messages. Must be the
same number on the sending side and receiving side.
– comm: the handle to the communicator, basically MPI COMM WORLD.
• int MPI Recv (void *buf, int count, MPI Datatype datatype, int source, int tag, MPI Comm comm,
MPI Status *status)
Function for receiving data, will be blocked until receiving is completed.
– buf: a pointer to the buffer for storing the received data.
– count: the number of elements to be received.
– datatype: the data type of the elements in the buffer array.
– source: the rank of the sending process within the communicator that is specified by the comm
parameter.
– tag: the message tag that can be used to distinguish different types of messages. Must be the
same number on the sending side and receiving side.
– comm: the handle to the communicator, basically MPI COMM WORLD.
– status: on return, contains a pointer to an MPI Status structure.
Note that, some MPI message passing functions, such as MPI Send and MPI Recv, do not move to
the next step until the sending/receiving is completed (Library dependent). This is called “blocking
communication ”. Therefore, if both processes call MPI Send firstly as Fig. 4.3, then neither of them
can complete their sending mission since the other one is not ready to receive. So they have to wait for
ever. This state is called deadlock, which should be treated very carefully in MPI programming. To avoid
this, we can let one process sending and the other one receiving as shown in Fig. 4.4. There are other
solutions such as using MPI Sendrecv or ‘‘non-blocking communication” (asynchronous communication)
functions.
Task 4.2
Set the number of processes to 2, run program 4.2, and check the outputs.
26
MPI programming example (3): One-to-one communication (in the case of
multiple processes) Next, consider sending/receiving messages when the number of processes is a
general number (maybe greater than 2).
Program 4.3: Sending and receiving data between processes (when the number of processes is general)
1 #include <mpi.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <sys/time.h>
5 int main(int argc, char ∗argv[])
6 {
7 int nprocs;
8 int my rank, prev rank, next rank;
9 int my num, prev num, next num;
10 MPI Status stat;
11 struct timeval s;
12
13 MPI Init(&argc, &argv);
14 MPI Comm size(MPI COMM WORLD, &nprocs);
15 MPI Comm rank(MPI COMM WORLD, &my rank);
16
17 sleep(my rank∗2);
18
19 gettimeofday(&s, NULL);
20 srand(s.tv sec);
21 my num = rand()%1000;
22
23 prev rank = (my rank−1+nprocs)%nprocs;
24 next rank = (my rank+1)%nprocs;
25
26 MPI Sendrecv(&my num, 1, MPI INT, next rank, 0,
27 &prev num, 1, MPI INT, prev rank, 0,
28 MPI COMM WORLD, &stat);
29 MPI Sendrecv(&my num, 1, MPI INT, prev rank, 1,
30 &next num, 1, MPI INT, next rank, 1,
31 MPI COMM WORLD, &stat);
32
33 printf("My!rank!is!%d.!My!num!is!%3d.!Prev!num!is!%3d.!Next!num!is!%3d.\n",
34 my rank, my num, prev num, next num);
35
36 MPI Finalize();
37
38 return 0;
39 }
In program 4.3, each process exchange data with its neighbors as shown in Fig. 4.5. It is assumed
that process 0 and process nprocs − 1 are adjacent, like a ring.
As mentioned previously, we should avoid deadlock when sending/receiving data. Therefore, for the
sake of easy, we use function MPI Sendrecv here. MPI Sendrecv is basically a combination of MPI Send
and MPI Recv. The send and receive processes are processed in an appropriate order so that deadlock
does not occur inside function. The sender and receiver do not have to be the same process, nor do data.
• int MPI Sendrecv (const void *sendbuf, int sendcount, MPI Datatype sendtype, int dest, int sendtag,
void *recvbuf, int recvcount, MPI Datatype recvtype, int source, int recvtag, MPI Comm comm,
MPI Status *status)
Function for sending and receiving data.
– sendbuf: initial address of send buffer.
– sendcount: number of elements in send buffer.
– sendtype: type of elements in send buffer.
27
– dest: rank of destination.
– sendtag: send tag.
– recvbuf: initial address of receive buffer.
– recvcount: number of elements in receive buffer.
– recvtype: type of elements in receive buffer.
– source: rank of source.
– recvtag: receive tag.
– comm: the handle to the communicator, basically MPI COMM WORLD.
– status: on return, contains a pointer to an MPI Status structure.
In program 4.3,
• 26–28: Send data to its next process and receive data from its previous process
• 29–31: Send data to its previous process and receive data from its next process
A common mistake is to send data to its next process and receive data from its next process too, which
generally causes deadlock.
Fig 4.5: Sending and receiving in case of more than 2 processes
Task 4.3
Set the number of processes to 2, 3, 4, run program 4.3, and check the outputs.
28
MPI programming example (4): Compute the sum of array values in parallel
The serial version is given in program 4.4. It computes the sum of elements in array val. The sum of val
can be checked easily. Please don’t calculate the sum with formula directly in your program.
Program 4.4: Computing sum of array elements (serial version )
1 #include <stdio.h>
2 #define N 10000
3 int main(int argc, char ∗argv[])
4 {
5 double val[N];
6 int i;
7 double sum;
8
9 for(i = 0; i < N; i++)
10 {
11 val[i] = (double)i; //initialization
12 }
13
14 sum = 0.0;
15 for(i = 0; i < N; i++)
16 {
17 sum += val[i];
18 }
19
20 printf("sum!is!%8lf\n", sum);
21
22 return 0;
23 }
Now we consider the parallelization from line 14-18, which is the main calculation part of program4.4.
• Let P be the number of processes. Each process only computes the sum of N/P elements. The last
process is in charge of the remainder.
• Let process 0 gather the results calculated by each process. That is, process 0 receives data from
the others with MPI Recv, and the other processes send data to process 0 with MPI Send).
• Let process 0 calculate the total sum and output it.
Task 4.4
Parallelize program 4.4, run it with 2, 3, 4 processes separately, and check the results.
Task∗ 4.5
Since each process only computes the sum of a portion of array val, it is more reasonable if each
process only store those elements that used by itself. Please improve program 4.4 to reduce the
memory required by each process. Then, execute it with 2, 3, 4 processes, and check the results.
29
Supplementary explanation of Task 4.4 First, compile the serial version (program 4.4), then
run it.
$gcc main.c
$./a.out
The output is
$sum is ****.***
There is only one process now. Now try the following command without recompiling the program.
$mpirun -np 4 ./a.out
The output is
$sum is ****.***
$sum is ****.***
$sum is ****.***
$sum is ****.***
Command mpirun starts 4 processes this time. They execute the same program (a.out). That is the same
computation repeated 4 times, which is meaningless.
Consider the parallelization of program 4.4. First, insert
#include <mpi.h>
MPI_Init (&argc, &argv);
MPI_Finalize ();
to proper position. Then get the number of processes (n procs) and rank (my rank).
MPI_Comm_size (MPI_COMM_WORLD, &n_procs);
MPI_Comm_rank (MPI_COMM_WORLD, &my_rank);
Note that the rank of process (my rank) starts from 0 and ends with (n procs-1). We can confirm whether
they are acquired correctly by the following code.
printf ("my_rank!is!%d,!!n_procs!is!%d,!!sum!is!%8lf\n", my_rank, n_procs, sum);
Now, divide the task to P processors. Our propose is to calculate
S =
N
!−1
i=0
val[i].
Rewrite it with
S =
P
!−1
p=0
Sp,
Sp =
Np
!+1−1
i=Np
val[i] (p = 0, 1, ··· , P − 1), N0 = 0, NP = N.
最前
变量定义在
retum 0 前
✓
30
Note that the calculation of each Sp is independent of each other. In other words, it is possible to calculate
all Sp at the same time. Therefore, let P equal to the number of processes (n procs), and let each process
calculates Sp, p is the rank of each process.
If N is divisible by n procs, we set l n = N/n procs as the number of elements that one process should
be in charge of (hereinafter referred to as l n). If it is not divisible, let the last process (rank n procs-1)
be in charge of the remainder.
l_n = N / n_procs;
if (my_rank == n_procs-1)
{
l_n += N%n_procs;
}
Then the range of indexes that each process should be responsible for is determined as following.
l_istart = (N/n_procs) * my_rank;
l_iend = l_istart + l_n - 1;
Hence the loop in Program 4.4 (line 15) should be modified as
for (i = l_istart; i <= l_iend; i++)
Note that, each process only computes and holds the partial sum (SP ). In order to compute the sum of
the all array elements (S), we should let a certain process (for example, process 0) gather the computed
SP from the other processes, and add them together.
In detail, first, let all processes, excepting process 0, send their partial sum to process 0.
MPI_Send (&sum, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
Then, let process 0 receive partial sum from process i (i = 1, 2, ..., n procs-1) and add them to sum.
MPI_Recv (&tmp_sum, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &stat); sum += tmp_sum;
At last, let process 0 output the result, which is the sum of all elements in array val.
if (my_rank == 0)
{
printf ("sum!is!%8lf\n", sum);
}
Make rank 0 output the value of sum so that we can see the result.
After modifying the program so far, recompile it, and execute it with 4 processes.
$mpicc main.c
$mpirun -np 4 ./a.out
Change the number of processes, and check whether the results are consistent.
算⺟个进程从哪执⾏到 哪
31
Supplementary explanation of Task 4.5 Look back the program achieved in Task 4.4, we
can find that each process only use part of array val, from l istart to l iend. However, each process
stores the whole array. That is, memory is wasted. This will be a problem when we want to handle large
problems. Therefore, we will improve the program achieved in Task 4.4, so that only the required memory
is allocated.
First, allocate memory dynamically
val = (double *)malloc (sizeof (double)*l_n);
where l n is the number of elements referenced by each process.
Now, consider initializing elements of array val. It is
val[i]=(double)i;
in Task 4.4. However, the size of val is l n now, not N. Therefore, the range of index i on the left side
(val[i]) is from 0 to l n-1. The index i on the right side (double) i is from l istart to l iend. There
is a gap between local index and global index (see Picture 4.6), which needs to be dealt with carefully.
Fig 4.6: global index & local index
A conversion between local index and global index is needed. Hereinafter, the local index is referred
to as l i, and the global index is referred to as g i. The conversions between local index and global index
are
g_i = (N/n_procs)*my_rank + l_i;
l_i = g_i - (N/n_procs)*my_rank;
By using this, the initialization part becomes
for (l_i = 0; l_i < l_n; l_i++)
{
val[l_i]=(double)((N/n_procs)*my_rank + l_i);
}
32
5 Parallelization of 2-D heat
conduction simulation with MPI
Parallelization policy: Data distribution and communication In 2-D thermal heat
conduction simulation, the values of ui,j , (1 ≤ i ≤ Mx − 1, 1 ≤ j ≤ My − 1) are calculated at each
time step. For the calculation of ui,j , only the values of previous time step are necessary. There is no
dependency among unknowns at different locations. In other words, it is possible to calculate ui,j at
different points at the same time. Therefore, we divide the whole area into several regions and let each
processor in charge of one region.
There are various ways for dividing area. We consider the division shown in Fig. 5.1. Each process
only compute those ui,j in its region, l istart ≤ i ≤ l iend, 1 ≤ j ≤ My − 1.
Fig 5.1: Dividing area into 3 regions (in the case of
3 processes)
Fig 5.2: Overlapping regions of process 1 (black
dots)
Note that, in order to calculate ui,j at point (i, j), we need the values at this point and four adjacent
points (i ± 1, j ± 1) at previous time level. Fig. 5.2 is an example when we focus on process 1. Note that
the values on black dots are not held by process 1. They are hold by process 0 and process 1. So, process
1 should exchange values in “overlapping region” with its neighbour processes.
33
Precautions for programming: Sending and receiving multiple values In this
program, it is necessary to send/receive multiple values in overlapping region. Functions MPI Send/ Recv/
Sendrecv can send/receive multiple values at once if they are continuous in memory. The initial address
of them is specified with argument buf, and the number of values is specified with argument count.
Precautions for programming: Order of sending and receiving In this program, each
process should exchange data with its neighbor processes. We should be very careful to avoid deadlock.
Note that, the first process and the last process only have one neighbor. Of course, we can use “if else”
to handle it. But we have an easier way to avoid branching.
Prepare variable prev rank to save the rank of previous process, and next rank to save the rank
of next. For the first process, prev rank can be set with MPI PROC NULL. So do next rank for the last
process.
Precautions for programming: Outputting and visualization We should avoid multiple processes accessing file at the same time. A relatively simple way is to let each process output
to a different file. For example, process 0 output to file “data 00 001000.dat”, process 1 output to file
“data 01 001000.dat”, etc. We can put them together when visualizing with gnuplot.
splot for [i=0:2] sprintf("data_%02d_001000dat", i) using 1:2:3 w l
Precautions for programming: Measurement of execution time Since multiple processes are running in parallel, we should make sure that all processes start at the same time, and wait for
the slowest one.
MPI_Barrier (MPI_COMM_WORLD);
t_start = MPI_Wtime ();
Processing that requires time measurement
MPI_Barrier (MPI_COMM_WORLD);
t_end = MPI_Wtime ();
tiem = t_end - t_start;
MPI Barrier is a function that synchronizes between processes. MPI Wtime is a time acquisition function.
Both of them are provided by MPI.
34
Task 5.1
Parallelize the 2-D heat conduction simulation program created before. Don’t worry about wasted
memory in this task. Parameters are basically the same as Task 3.1: Lx = Ly = 1.0, T = 1.0, α =
1.0, Mx = My = 200, N = 160000. Initial conditions and boundary conditions are
• u(x, y, 0) = 0.0 (x, y /∈ ∂Ω)
• u(x, 0, t) = 20x + 10 (0 ≤ x ≤ Lx, t ≥ 0)
• u(x, Ly, t) = 40x + 40 (0 ≤ x ≤ Lx, t ≥ 0)
• u(0, y, t) = 30y2 + 10 (0 <y<Ly, t ≥ 0)
• u(Lx, y, t) = 50y2 + 30 (0 <y<Ly, t ≥ 0)
Set the number of processes from 1 to 8. Output the bird’s-eye view of solution at final time (T),
and measure the execution time for each case. Note:
• When testing whether program is correct, set Mx, My, N be small.
• When measuring time, do not include input/output time.
Task 5.2
Modify the program in Task 5.1 to avoid wasting memory. Then change the process number
from 1 to 8 and execute it to verify the operation. The settings and conditions are the same as
in Task 5.1.
Task∗ 5.3
Consider heat source inside the plate. The partial differential equation becomes
∂
∂t
u(x, y, t) = α
" ∂2
∂x2 u(x, y, t) + ∂2
∂y2 u(x, y, t)
#
+ S(x, y), (5.1)
where S(x, y) is the strength of heat source at position (x, y) and is time-independent.
Set Lx = Ly = 1.0, T = 1.0, α = 1.0, Mx = My = 40, N = 10000. Initial condition is u(x, y, 0) =
0.0, and boundary condition is u(x, y, t)=0.0,(x, y) ∈ ∂Ω. There is a heat source S(0.5, 0.5) =
40000. Modify your program to simulate the state of heat conduction under these conditions.
Task∗ 5.4
Consider heat conduction problem (5.1) with Lx = Ly = 1.0, α = 1.0 and an analytical solution
u(x, y, t) = −e−2π2αt
(sin(πx) sin(πy)) + 20xy(x − 1)(y − 1). (5.2)
Boundary condition is u(x, y, t)=0.0,(x, y) ∈ ∂Ω and initial condition is obtained by setting
t = 0 in Eq. (5.2). Heat source is S(x, y) = 40α(x(1 − x) + y(1 − y)). Get simulation results at
t = 0.1 by using different Mx = My = 10, 50, 100. Take N with the formula shown in Task 3.2.
Compare the results to the analytical solution by figures and compute the errors according to Eq.
(2.10) and perform analysis.
35
6* Computation of steady-state heat
conduction
For the heat conduction problem, if the boundary conditions are time-independent, the system will arrive
to a steady state after sufficient time. If our purpose is to find this steady state, set the time change of u,
∂u/∂t to 0. Therefore, in the case of 1-D, the problem becomes
d2
dx2 u(x)=0. (6.1)
As before, discretize it with central difference (number of divisions: M), we get
ui+1 − 2ui + ui−1
∆x2 = 0 ⇔ 2ui − ui+1 − ui−1 = 0, (i ≤ i ≤ M − 1), (6.2)
where u0 and uM are determined by boundary conditions. This is a linear system with M − 1 unknowns.
There are various methods for solving this linear system. For example, LU decomposition (Gaussian
elimination), Jacobi method and CG method. In general, when the coefficient matrix is large and sparse
(mostly zero components), LU decomposition is not workable. We will solve it with conjugate gradient
(CG) method.
The algorithm of CG method is shown in Algorithm 1. The initial approximate solution is x0 = 0.
And & = 10−12 is a tolerance for convergence judgment.
Algorithm 1 CG method (Conjugate Gradient )
1: x0 = 0
2: r0 = b, p0 = r0
3: for k = 0, 1, 2,... do
4: qk = Apk
5: αk = r"
k rk/p"
k qk
6: xk+1 = xk + αkpk
7: rk+1 = rk − αkqk
8: if ||rk+1||2/||b||2 ≤ & then
9: convergence
10: end if
11: βk = r"
k+1rk+1/r"
k rk
12: pk+1 = rk+1 + βkpk
13: end for
36
Task∗ 6.1
Write the unknowns of the simultaneous equation of Eq. (6.2) as vector
x =
u1
u2
.
.
.
uM−2
uM−1
.
Then Eq. (6.2) can be rewritten as
Ax = b.
Please give the elements of coefficient matrix A and the right-hand side vector b.
Task∗ 6.2
Use conjugate gradient method to find the steady state of 1-D heat conduction. The boundary
condition is u0 = 0.2, uM = 1.0. Compare the results with that we simulated before.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。