MATLAB GPU Programming Tutorial for Monte Carlo Simulations
Table of Contents
- Basic Setup
- Key Functions
- Example Implementation
- Best Practices
- Common Pitfalls
1. Basic Setup
| Matlab |
|---|
| % Initialize random number generators
seed = 1234;
rng(seed); % CPU random number generator
gpurng(seed); % GPU random number generator
% Check GPU availability
hasGPU = gpuDevice(); % Returns GPU device object if available
|
2. Key Functions
Creating GPU Arrays
| Matlab |
|---|
| % Convert CPU array to GPU array
gpu_array = gpuArray(cpu_array);
% Create arrays directly on GPU
zeros_gpu = zeros(100, 100, 'gpuArray');
rand_gpu = randn(100, 100, 'gpuArray');
|
Data Transfer
| Matlab |
|---|
| % GPU to CPU transfer
cpu_data = gather(gpu_data);
% CPU to GPU transfer
gpu_data = gpuArray(cpu_data);
|
Parallel Execution
| Matlab |
|---|
| % Apply function to each element of GPU array
result = arrayfun(@my_function, gpu_array);
|
3. Example Implementation
| Matlab |
|---|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65 | % GPU_vs_CPU_Example.m
function GPU_vs_CPU_Example()
% Parameters
n_paths = 10000; % Number of Monte Carlo paths
n_steps = 1000; % Time steps per path
% Test both implementations
disp('Testing CPU implementation...');
tic;
[cpu_results, cpu_time] = runCPU(n_paths, n_steps);
cpu_elapsed = toc;
disp('Testing GPU implementation...');
tic;
[gpu_results, gpu_time] = runGPU(n_paths, n_steps);
gpu_elapsed = toc;
% Compare results
fprintf('\nResults Summary:\n');
fprintf('CPU total time: %.4f seconds\n', cpu_elapsed);
fprintf('GPU total time: %.4f seconds\n', gpu_elapsed);
fprintf('Speedup factor: %.2fx\n', cpu_elapsed/gpu_elapsed);
% Verify results are similar
max_diff = max(abs(cpu_results - gather(gpu_results)));
fprintf('Maximum difference between CPU and GPU results: %.2e\n', max_diff);
end
function [results, computation_time] = runCPU(n_paths, n_steps)
% CPU implementation
tic;
results = zeros(1, n_paths);
for i = 1:n_paths
% Initialize path
x = 0;
% Simulate path
for j = 1:n_steps
eps = randn();
x = x + 0.1*x*0.01 + 0.2*eps*sqrt(0.01);
end
% Store result
results(i) = x;
end
computation_time = toc;
end
function [results, computation_time] = runGPU(n_paths, n_steps)
% GPU implementation
tic;
% Initialize all paths on GPU
paths = zeros(1, n_paths, 'gpuArray');
% Simulate all paths in parallel
for j = 1:n_steps
eps = randn(1, n_paths, 'gpuArray');
paths = paths + 0.1*paths*0.01 + 0.2*eps*sqrt(0.01);
end
results = paths;
computation_time = toc;
end
|
4. Best Practices
Efficient Data Transfer
| Matlab |
|---|
1
2
3
4
5
6
7
8
9
10
11
12
13 | % Good Practice
gpu_data = gpuArray(cpu_data);
for i = 1:n_steps
% Process on GPU
end
final_result = gather(gpu_result);
% Bad Practice
for i = 1:n_steps
gpu_data = gpuArray(cpu_data); % Avoid frequent transfers
% Process
cpu_result = gather(gpu_result);
end
|
Memory Management
| Matlab |
|---|
| % Clear GPU memory
reset(gpuDevice);
% Check GPU memory usage
gpu = gpuDevice();
fprintf('Available GPU memory: %g GB\n', gpu.AvailableMemory/1e9);
|
When to Use GPU
- Large-scale parallel computations
- Computationally intensive operations
- Minimal data dependencies
- Large datasets
When to Avoid GPU
- Small datasets
- Sequential operations
- Operations requiring frequent CPU-GPU transfers
- Memory-constrained situations
5. Common Pitfalls
- Excessive Data Transfer
- Minimize transfers between CPU and GPU
-
Keep computations on GPU as long as possible
-
Poor Memory Management
- Monitor GPU memory usage
- Clear GPU memory when needed
-
Consider memory limitations
-
Inefficient Parallelization
- Use vectorized operations instead of loops
- Utilize built-in GPU-enabled functions
-
Ensure proper batch sizing
-
Error Handling
| Matlab |
|---|
| % Always check for GPU availability
if gpuDeviceCount > 0
% GPU code
else
% CPU fallback
end
|
This tutorial is based on the implementation shown in the provided Monte Carlo simulation code. For specific applications, you may need to adjust the approaches based on your computational requirements and hardware capabilities.