Programming part: CUDA, matrix addition
Implement matrix addition in CUDA, C = A+B where the matrices are NxN and N is large. This is an extension of the program in the "CUDA by Example" book that adds two long vectors of length N. One could also refer to the [login to view URL] program, which uses 2-dimensional arrays.
In your main program assign (float) values to the elements of A and B: a[i][j] = 2*i + j + 1 and b[i][j] = i + 4*j + 2.
Call your kernel. Then check if all elements of C are correct; if they are correct, print "We did it!".
Also execute the matrix addition sequentially, and time this (nested loop) with gettimeofday(). Compare the time to the execution time of the kernel plus the cudaMemcpy calls (using CUDA Event timing); do not include the malloc or the cudaMalloc times, and calculate the speedup. Do this for 10 (large to very large) values of N.
Submit a typescript showing: a listing (with "cat") of your source code, your compilation, and executions with output. Discuss your findings in your report.
Hi there, I have done tasks in CUDA and I can do this one too.
Relevant Skills and Experience
Doing final year project using CUDA
Proposed Milestones
$250 USD - milestone