Archive for the ‘Programming’ Category

C/C++ - Byte to Number Conversion

Byte Number Conversion

For a project I'm working on these days I was forced to convert raw byte to several numerical values ( to and from binary format).

Bytes are represented in C/C++ as  unsigned char, and std::string are often used as byte buffer as well as old well known raw arrays. I wrote a simple struct that relieve the pain in performing such conversions. It exposes two functions: fromBytes and toBytes.fromBytes takes an additional boolean parameter that takes care of different endianness.

Author: Davide Spataro 2016

Converts byte buffer to type T
typedef unsigned char byte;

template <class T>
struct converter{

 static const size_t size = sizeof(T);

    union conv{
        T value;
        byte bytes[ sizeof( T ) ];
    } c ;

    T fromBytes( const  byte *bytes, bool endianness = false){

            #pragma unroll
            for(int i=0;i<size;i++)
                c.bytes[size-1-i] = bytes[i];
          #pragma unroll
            for(int i=0;i<size;i++)
                c.bytes[i] = bytes[i];

        return c.value;

     byte* toBytes(const T& value,bool endianness = false){
        c.value =value;
        return c.bytes;

    void reverse(){
        #pragma unroll
        for(int i=0;i<size/2;i++){
            byte tmp = c.bytes[i];
            c.bytes[size-1-i] = tmp;



template<class T>
void printHex(const T& key, size_t size){
  std::cout.setf(std::ios::hex, std::ios::basefield);
    for (int i = 0; i < size; i++){ if (i > 0) printf(":");
            printf("%02X", (unsigned char)key[i]);


Usage is very simple: let's say for instance you have the following 8 bytes into an unsigned long long


converter<int64_t> c;
//binary value is 00:00:00:00:00:00:00:5E (8 bytes)
std::string binary = readBinary(); //read binary unsigned long long from somewhere
int64_t res =  c.fromBytes(reinterpret_cast<const unsigned char*>(binary.c_str()),true);
std::cout <<res<< " \n";

this will output clearly output 94.

It works with almost all numerical types (float, all int flavors and floating points)

Programming Question - Integer Parity

Programming Question: Given an integer, compute its parity(easy)
The parity of an integer is true iff number of set bits (1) is odd, false otherwise. Example: has 5 set bits and hence its parity is true.



Programming Question - 2D array spiral Fashion

Programming Question: given a 2D array, print it in spiral fashion (easy)

Given a 2D array print it in spiral fashion as shown in the picture.


Haskell - Sudoku Solver - List Monad

Haskell Sudoku Solver - List Monad

In this article we will write a Sudoku solver in Haskell using the expressive and powerful List Monad to enumerate the solutions trasparently. I assume that the reader has a general understanding of monads (if not read

Su Doku means literally  number place is the name of a popular puzzle game that a special case of the old Latin Square (invented by Leonard Euler). Latin Square force each symbol to appear once in each row and column. In addition to these contraint Sudoku force each symbol to apper one in each sub-matrix. In the general case Sudoku is NP-Complete (easy reduction to graph coloring), but certain schemeas can be solved by deduction and logic only (in this case the puzzle is said well constructed).

At high level a Sudoku can be viewed as a list of cell

data Board= Board [Square]

and a Square must certanly carries information about its position and the digit it contains, or even better it could contain (according to the cells in same row, column and submatrix)

data Square= Square ColDigit RowDigit BoxDigit (Either [Digit] Digit)
    deriving (Show)

type RowDigit= Digit
type ColDigit= Digit
type BoxDigit= Digit
type Digit= Char

While solving the puzzle each square will contains either a list of possible digits that can be placed in the square OR (this is the Either logic) a fixed digit (i.e., the square was given by a clue or has been deduced).

The following Squares have respectively position and , both lie in submatrix and can contain the digit 1 OR 5 OR 7 OR 4 while is fixed to 6.

let s1=Square '4' '1' '4' (Left ['1','5','7','4'])
let s2=Square '4' '2' '4' (Right '6')

The solver works s.t. it return the list of possible solution for a schema (not well constructed Sudoku may have multiple solutions). In particular if no unresolved square are left (the one with the Left in the Ethier part) it means that the board is solved and does not need any further processing, otherwise we take the square with the minimum number of possible values and try solve the board trying, one at the time, all the values for it. This sort of backtracking is perfomed transparently by the List monad that is used to represent nondeterminism (i.e. multiple possibilities).

solveMany :: Board -> [Board]
solveMany brd =
	case getUnsolvedSquare brd of
		[] 	-> return brd --all squares correctly set up
		sqs -> do
		let selSquare@(Square c r b (Left ds)) : _ = sortBy (byLeftLenght) sqs
		sq <- [ Square c r b (Right d) | d <- ds ] -- Try all possible moves
		solveMany (setSquare sq brd)

Recall that do notation is just syntactic sugar for bind (>>=) operator and that for for list it is defined as following

(>>=) :: [a] -> (a->[b]) >[b]
xs >>= f = concat (map f xs)

It then applies f=solveMany (that takes a board) and produces [Board] for all the digits in the Left of selSquare.
Here the same function wrote with explicit bind operator application that makes much clear the monadiac operation that do the dirty job for us.

solveMany :: Board -> [Board]
solveMany brd =
	case getUnsolvedSquare brd of
		[] 	-> return brd --all squares correctly set up
		sqs ->
			let Square c r b (Left ds) : _ = sortBy (byLeftLenght) sqs
			in [ Square c r b (Right d) | d <- ds ] >>=(\b->solveMany (setSquare b brd))

getUnsolvedSquare :: Board -> [Square]
getUnsolvedSquare (Board sqs) = filter (isSolved) sqs
		isSolved (Square _ _ _ (Left _)) = True
		isSolved _						  = False

The setSquare function is the one devoted to the constraint propagation and is defined as follow:

setSquare :: Square-> Board ->Board
setSquare sq@(Square scol srow sbox (Right d))(Board sqs) = Board (map set sqs)
		set osq@(Square ocol orow obox ods)
			| scol==ocol && srow==orow  			 = sq
			| scol==ocol || srow==orow || sbox==obox = Square ocol orow obox (checkEither ods)
			| otherwise 							 = osq

		checkEither (Left  lds ) 			= Left (lds \\ [d])
		checkEither r@(Right d') | d==d'	= error "Already set this square"
		checkEither dd = dd
setSquare _ _ = error "Bad call to setSquare"

It places Square (with a Right Digit) on a Board and return the new Board taking care of removing the just inserted digit from the possible values for Squares of the same row, column and submatrix. This is how the constraint are propagated, cutting off the search space.

The complete source code is available here and on github.

Stream Compaction on GPU - Efficient implementation - CUDA

Stream Compaction - Introduction1)Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures2)InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes

An efficient implementation of stream compaction on GPU is presented with code example. Full source code on github.

Stream compaction/reduction is commonly referred as the operation of removing unwanted elements in a collection. More formally, imagine we have a list of element of N elements and a predicate that bisects in wanted and unwanted elements (ones that satisfy the predicates and ones that don't). The stream compaction of under is an Read On…

References   [ + ]

1. Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures
2. InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes

Haskell Seminar @ UNICAL

Haskell Seminar at University of Calabria

During the last year I got much interest in functional programming and in particular in Haskell.
Very few people write and study Haskell at University of Calabria (one of them is the manteiner of gnome for NixOs) and so I decided to do a introductive lesson (a big thanks go to the guys of the ciotoflow). It was a great experience, and people were enthusiast and curious.
All the material i produced is available for download and use, and hopefully will be gradually updated and improved. All the files are available on github.

Introductive slides here



CUDA - Julia Set example code - Fractals

This article present a CUDA parallel code for the generation of the famous Julia Set. Informally a point of the complex plane belongs to the set if given a function f(z) the serie does not tend to infinity. Different function and different initial condition give raise eventually to fractals. One of the most famous serie is   (the one that generated the video below).
Some nice picture may be obtained with the following initial conditions:

  1. # dentrite fractal
  2. # douady's rabbit fractal
  3. # san marco fractal
  4. # siegel disk fractal
  5. # NEAT cauliflower thingy
  6. # galaxies
  7. # groovy
  8. # frost

Here a video, showing a sequence of picture generated using different initial conditions. A nice example of how math and computer science can produce art!

CUDA Julia set code

The code is such that it is very easy to change the function and the initial condition (just edit the device function functor).


#define TYPE float
#define cTYPE cuFloatComplex
#define cMakecuComplex(re,i) make_cuFloatComplex(re,i)
#define TYPE double
#define cMakecuComplex(re,i) make_cuDoubleComplex(re,i)

__global__ void computeJulia(int* data,cTYPE c,float zoom){
	int i =  blockIdx.x * blockDim.x + threadIdx.x;
	int j =  blockIdx.y * blockDim.y + threadIdx.y;

	if(i<DIMX && j<DIMY){
		cTYPE p = convertToComplex(i,j,zoom);

cTYPE c0;
__device__ cTYPE juliaFunctor(cTYPE p,cTYPE c){
	return cuCaddf(cuCmulf(p,p),c);

__device__ int evolveComplexPoint(cTYPE p,cTYPE c){
	int it =1;
	while(it <= MAXITERATIONS && cuCabsf(p) <=4){
	return it;

__device__ cTYPE convertToComplex(int x, int y,float zoom){

	TYPE jx = 1.5 * (x - DIMX / 2) / (0.5 * zoom * DIMX) + moveX;
	TYPE jy = (y - DIMY / 2) / (0.5 * zoom * DIMY) + moveY;
	return cMakecuComplex(jx,jy);


The code above are the GPU instructions that are devoted to the computation of a single set. The kernel computeJulia() is executed on DIMX*DIMY (the dimension of the image) threads that indipendently compute the evolveComplexPoint() function on the converted to complex corrensponding pixel indices, the viewport (each threads compute a single pixel that has integer coordinates). The evolveComplexPoint() take care of checking how fast che point diverges.

The code is available here on github. It generates a number of png tha can be then merged in a video (as the one I present here) using a command line tool as ffmpeg.

In order to compile and run it need the CUDA SDK,a CUDA capable device and libpng installed on your system.
To compile and/or generate the video simply hit:

nvcc -I"/usr/local/cuda-6.5/samples/common/inc"  -O3 -gencode arch=compute_35,code=sm_35  -odir "src" -M -o "src/juliaSet.d" "../src/"

nvcc  -I"/usr/local/cuda-6.5/samples/common/inc" -O3 --compile --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35  -x cu -o  "src/juliaSet.o" "../src/"

/usr/local/cuda-6.5/bin/nvcc --cudart static --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o  "JuliaSetDynParallelism"  ./src/juliaSet.o ./src/main.o   -lpng

#video generation
ffmpeg -f image2 -i "julia_%05d.png" -r 40

Here another animation


Fold operator - a brief detour

This article aims to be a brief practical introduction to new haskell programmer about the fold high order function (which roughly speaking is a function that takes as input and/or produces as output a function) and its usage. It will also outline a bit of the theory that lies behind this operator (the term operator comes from the field of recursion theory (Kleene, 1952), even if, striclty speaking fold is a function in haskell).

Haskell Fold - Introduction and formal definition

Many problem in function programming are solved by means of tail recursive functions (one that do not perform any other operations that need some values from the recursive call). For instance let's say we want to get the product of integers stored in a list.
What we would do is to write down a tail recursive function (more or less) like this

tailmultiply :: [Int] -> Int -> Int
tailmultiply [] acc     = acc
tailmultiply (x:xs) acc = tailmultiply xs (acc * x)

Note, to stress the tail recursion, a definition like the one that follows would not be tail recursive.

Read On…

Kernel Linux List - An Efficient Generic Linked List

Linux Kernel Linked List

Linux Kernel especially in the past years was plenty of different and replicated implementation of generic data structures such as linked lists, stacks and queues.  Kernel's developers decided to provide a unique set of API for using these kind of software machineries. We will briefly look in this article  at the list.h file (in the linux source tree under /usr/src/linux-headers-`uname -r`/include/) that provides macros and function for utilizing a doubly linked circular list. It is a very smart written piece of code that is worth to take a look at as it is  an opportunity  to improve  C programming skills and at the same time to see an "unconventional" implementation for a common data structure.

From canonical list to Linux kernel linked list

The canonical implementation of a C's linked list consists of a structure and two (for a doubly linked) recursive pointer fields to the structure itself. The links to the previous and subsequent element of the list.

struct point2DNode{
 int x,y;
 struct point2DNode* prev, next;

Read On…

Apple Xserve xeon Linux/Debian armed

Finally at my Departement we succesfully converted an  unsed  Apple xserve  cluster to a full functional Linux/Debian machine.

The cluster is make up of eleven node connected by Myrinet 10G low latency network and each node is a composed by a dual 4x cores Intel Xeon 2.8 Ghz Harpertown with 10Gb of DDR2 for a total of 88 cores. It is a quite old and small machine and hence was unused. We thought that  it could have been useful if well re-configured as an ad-hoc machine for small sized scientific computation session, as  developing and testing platform of parallel application or for training of undergraduate students that can experience directly with a real parallel machine. So we decided to cast a new light on it by configuring it as a Debian cluster (according to google just few people did it). 10949741_766948060061020_241789111_n

Read On…