Archive for the ‘Linux’ Category

Dynamic Message of the Day - motd - Fedora Linux

HOW-TO: Dynamic Message of the day

This article is  about setting up a dynamic message of the day (possibly informative and fun) as header of each newly opened shell.

The final result will be something like the following:


It mixes a fun message using fortune and cowsay which you can install using

sudo dnf install fortune-mod cowsay

utility and some informative info about the status of the system as:

  • System load
  • Ram and Swap available and used
  • Disk space
  • Ip address


The script file can be easily configured and extended to suits the your needs. Colors can also be easily customized.

Read On…

Distributed Hadoop installation - Fedora Linux

Distributed  Hadoop and HBase installation - Fedora Linux

In this post I will describe how to get started with the latest versions of hadoop and hbase describing all the step to obtain a working hadoop installation. The steps described here can be easily used to perform a working installation on a large cluster (even tough it can requires additional steps as shared filesystem for instance).


 sudo dnf install openssh openssh-askpass openssh-clients openssh-server 

Don't forget to start the ssh service using the following command:

 sudo service sshd start 

Add a dedicated Hadoop/HBase User (optional but reccomended)

Read On…

CUDA - Julia Set example code - Fractals

This article present a CUDA parallel code for the generation of the famous Julia Set. Informally a point of the complex plane belongs to the set if given a function f(z) the serie z_{n+1} = f(z_n) does not tend to infinity. Different function and different initial condition give raise eventually to fractals. One of the most famous serie is z_{n+1} = z_n^2 +c  (the one that generated the video below).
Some nice picture may be obtained with the following initial conditions:

  1. c = 1j # dentrite fractal
  2. c = -0.123 + 0.745j # douady's rabbit fractal
  3. c = -0.750 + 0j # san marco fractal
  4. c = -0.391 - 0.587j # siegel disk fractal
  5. c = -0.7 - 0.3j # NEAT cauliflower thingy
  6. c = -0.75 - 0.2j # galaxies
  7. c = -0.75 + 0.15j # groovy
  8. c = -0.7 + 0.35j # frost

Here a video, showing a sequence of picture generated using different initial conditions. A nice example of how math and computer science can produce art!

CUDA Julia set code

The code is such that it is very easy to change the function and the initial condition (just edit the device function functor).


#define TYPE float
#define cTYPE cuFloatComplex
#define cMakecuComplex(re,i) make_cuFloatComplex(re,i)
#define TYPE double
#define cMakecuComplex(re,i) make_cuDoubleComplex(re,i)

__global__ void computeJulia(int* data,cTYPE c,float zoom){
	int i =  blockIdx.x * blockDim.x + threadIdx.x;
	int j =  blockIdx.y * blockDim.y + threadIdx.y;

	if(i<DIMX && j<DIMY){
		cTYPE p = convertToComplex(i,j,zoom);

cTYPE c0;
__device__ cTYPE juliaFunctor(cTYPE p,cTYPE c){
	return cuCaddf(cuCmulf(p,p),c);

__device__ int evolveComplexPoint(cTYPE p,cTYPE c){
	int it =1;
	while(it <= MAXITERATIONS && cuCabsf(p) <=4){
	return it;

__device__ cTYPE convertToComplex(int x, int y,float zoom){

	TYPE jx = 1.5 * (x - DIMX / 2) / (0.5 * zoom * DIMX) + moveX;
	TYPE jy = (y - DIMY / 2) / (0.5 * zoom * DIMY) + moveY;
	return cMakecuComplex(jx,jy);


The code above are the GPU instructions that are devoted to the computation of a single set. The kernel computeJulia() is executed on DIMX*DIMY (the dimension of the image) threads that indipendently compute the evolveComplexPoint() function on the converted to complex corrensponding pixel indices, the viewport (each threads compute a single pixel that has integer coordinates). The evolveComplexPoint() take care of checking how fast che point diverges.

The code is available here on github. It generates a number of png tha can be then merged in a video (as the one I present here) using a command line tool as ffmpeg.

In order to compile and run it need the CUDA SDK,a CUDA capable device and libpng installed on your system.
To compile and/or generate the video simply hit:

nvcc -I"/usr/local/cuda-6.5/samples/common/inc"  -O3 -gencode arch=compute_35,code=sm_35  -odir "src" -M -o "src/juliaSet.d" "../src/"

nvcc  -I"/usr/local/cuda-6.5/samples/common/inc" -O3 --compile --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35  -x cu -o  "src/juliaSet.o" "../src/"

/usr/local/cuda-6.5/bin/nvcc --cudart static --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o  "JuliaSetDynParallelism"  ./src/juliaSet.o ./src/main.o   -lpng

#video generation
ffmpeg -f image2 -i "julia_%05d.png" -r 40

Here another animation


Kernel Linux List - An Efficient Generic Linked List

Linux Kernel Linked List

Linux Kernel especially in the past years was plenty of different and replicated implementation of generic data structures such as linked lists, stacks and queues.  Kernel's developers decided to provide a unique set of API for using these kind of software machineries. We will briefly look in this article  at the list.h file (in the linux source tree under /usr/src/linux-headers-`uname -r`/include/) that provides macros and function for utilizing a doubly linked circular list. It is a very smart written piece of code that is worth to take a look at as it is  an opportunity  to improve  C programming skills and at the same time to see an "unconventional" implementation for a common data structure.

From canonical list to Linux kernel linked list

The canonical implementation of a C's linked list consists of a structure and two (for a doubly linked) recursive pointer fields to the structure itself. The links to the previous and subsequent element of the list.

struct point2DNode{
 int x,y;
 struct point2DNode* prev, next;

Read On…

Apple Xserve xeon Linux/Debian armed

Finally at my Departement we succesfully converted an  unsed  Apple xserve  cluster to a full functional Linux/Debian machine.

The cluster is make up of eleven node connected by Myrinet 10G low latency network and each node is a composed by a dual 4x cores Intel Xeon 2.8 Ghz Harpertown with 10Gb of DDR2 for a total of 88 cores. It is a quite old and small machine and hence was unused. We thought that  it could have been useful if well re-configured as an ad-hoc machine for small sized scientific computation session, as  developing and testing platform of parallel application or for training of undergraduate students that can experience directly with a real parallel machine. So we decided to cast a new light on it by configuring it as a Debian cluster (according to google just few people did it). 10949741_766948060061020_241789111_n

Read On…