Posts Tagged ‘parallel programming’

Modern C++ concurrency - Returning values from Threads - std::future

Introduction

In this lesson we will talk about a way of returning values from threads, more precisely we will talk about std::future which is a mechanism that C++ offers in order to perform asynchronous tasks and query for the result in the future.
A future represents an asynchronous task, i.e. an operation that runs in parallel to the current thread and which the latter can wait (if it needs to) until the former is ready.
You can use a future all the time you need a thread to wait for a one-off event to happen. The thread can check the status of the asynchronous operation by periodically polling the future while still performing other tasks, or it can just wait for the future to become ready.

Read On…

Modern C++ Concurrency - Synchronizing threads - Condition Variables

Introduction

In the previous lesson we have seen how data can be protected using mutex. We now know how to make threads do their work concurrently without messing around with shared resources and data. But sometimes we need to synchronize their actions in time, meaning that we might want a thread t1 to wait until a certain condition is true before allowing it to continue its execution.

This lesson discusses the tools that we can use to achieve such behavior efficiently using condition variables.

Read On…

Concurrency in Modern C++ - Cumulative sum of a vector using N threads

tutorial2

Modern C++ Concurrency - Cumulative sum of a vector - Part 2

In this tutorial we will continue the exercice we started out in part-1 and we will:

  1. split the work among a number of threads that will be specified by the user via command line
  2. perform some benchmarking to see how our code scales as the number of threads increases. We will compare the execution time of the version of the program running on one thread versus the execution time when running on an increasing number of threads.

Read On…

CUDA - Julia Set example code - Fractals

This article present a CUDA parallel code for the generation of the famous Julia Set. Informally a point of the complex plane belongs to the set if given a function f(z) the serie z_{n+1} = f(z_n) does not tend to infinity. Different function and different initial condition give raise eventually to fractals. One of the most famous serie is z_{n+1} = z_n^2 +c  (the one that generated the video below).
Some nice picture may be obtained with the following initial conditions:

  1. c = 1j # dentrite fractal
  2. c = -0.123 + 0.745j # douady's rabbit fractal
  3. c = -0.750 + 0j # san marco fractal
  4. c = -0.391 - 0.587j # siegel disk fractal
  5. c = -0.7 - 0.3j # NEAT cauliflower thingy
  6. c = -0.75 - 0.2j # galaxies
  7. c = -0.75 + 0.15j # groovy
  8. c = -0.7 + 0.35j # frost

Here a video, showing a sequence of picture generated using different initial conditions. A nice example of how math and computer science can produce art!

CUDA Julia set code

The code is such that it is very easy to change the function and the initial condition (just edit the device function functor).

#define TYPEFLOAT

#ifdef TYPEFLOAT
#define TYPE float
#define cTYPE cuFloatComplex
#define cMakecuComplex(re,i) make_cuFloatComplex(re,i)
#endif
#ifdef TYPEDOUBLE
#define TYPE double
#define cMakecuComplex(re,i) make_cuDoubleComplex(re,i)
#endif

__global__ void computeJulia(int* data,cTYPE c,float zoom){
	int i =  blockIdx.x * blockDim.x + threadIdx.x;
	int j =  blockIdx.y * blockDim.y + threadIdx.y;

	if(i<DIMX && j<DIMY){
		cTYPE p = convertToComplex(i,j,zoom);
		data[i*DIMY+j]=evolveComplexPoint(p,c);
	}
}


cTYPE c0;
__device__ cTYPE juliaFunctor(cTYPE p,cTYPE c){
	return cuCaddf(cuCmulf(p,p),c);
}

__device__ int evolveComplexPoint(cTYPE p,cTYPE c){
	int it =1;
	while(it <= MAXITERATIONS && cuCabsf(p) <=4){
		p=juliaFunctor(p,c);
		it++;
	}
	return it;
}


__device__ cTYPE convertToComplex(int x, int y,float zoom){

	TYPE jx = 1.5 * (x - DIMX / 2) / (0.5 * zoom * DIMX) + moveX;
	TYPE jy = (y - DIMY / 2) / (0.5 * zoom * DIMY) + moveY;
	return cMakecuComplex(jx,jy);

}


The code above are the GPU instructions that are devoted to the computation of a single set. The kernel computeJulia() is executed on DIMX*DIMY (the dimension of the image) threads that indipendently compute the evolveComplexPoint() function on the converted to complex corrensponding pixel indices, the viewport (each threads compute a single pixel that has integer coordinates). The evolveComplexPoint() take care of checking how fast che point diverges.

The code is available here on github. It generates a number of png tha can be then merged in a video (as the one I present here) using a command line tool as ffmpeg.

In order to compile and run it need the CUDA SDK,a CUDA capable device and libpng installed on your system.
To compile and/or generate the video simply hit:

#compilation
nvcc -I"/usr/local/cuda-6.5/samples/common/inc"  -O3 -gencode arch=compute_35,code=sm_35  -odir "src" -M -o "src/juliaSet.d" "../src/juliaSet.cu"

nvcc  -I"/usr/local/cuda-6.5/samples/common/inc" -O3 --compile --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35  -x cu -o  "src/juliaSet.o" "../src/juliaSet.cu"

/usr/local/cuda-6.5/bin/nvcc --cudart static --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o  "JuliaSetDynParallelism"  ./src/juliaSet.o ./src/main.o   -lpng

#video generation
ffmpeg -f image2 -i "julia_%05d.png" -r 40 output.mov

Here another animation