# Bird Flocking Simulation

Bird flocking is an extremely interesting natural phenomenon that have been widely studied as witnessed by the number of papers in literature). I will present here a work on aggregate motion of large number of boids in a virtual environment with the presence of predators using CUDA as computational framework.

Beautiful Flocking motion

Collective motion or flocking appears at different fields and scales in nature and several mathematical tools have been developed for analyzing such motions

1. organism are threaten as particles in Brownian motion combined with attraction/repulsion forces
2. Differential equation models
3. Agent based models.

The model presented here is based on a work of Reynolds (1987) which is based on three key behavioral rules:

• Cohesion: to attempt to stay close to nearby flock-mates;
• Collision avoidance: to evade objects that are too close;
• Velocity/Heading Matching: to head in the same direction of nearby flock-mates

and extends it adding predator avoidance and multiple species and bird groups interaction

# Bird Flocking Model

The environment is parameterized using the following a set of parameters that describe the size of the virtual environment and the duration of a timestep. What is really interesting is the set of bird's parameter that describe how a bird behave and react to event in its surrounding. Some notable parameters  include the Fyeld Of View (FOV), peak velocity , thrust and others (see figure).

Bird Parameters

The environment is partially observable and the portion of space that is visible to each bird is defined by its FOV. FOV is defined as follows:

Let the position vector of the object in the 's frame of reference, then is 's neighbor if and only if the followings hold:

where is the maximum horizontal range of view, is the maximum vertical range of view and

## Cohesion

In formal terms , the bird 's centroid at time , is given by:

Cohesion

which is basically a weighted average of neighbors position.

## Separation

A bird try to keep certain distance between itself and
its neighbors. Bird 's separation vector at time is given
by:

where determines how strong is the repulsion against to the neighbor .

Alignment

Bird's alignment is computed as follows

It is a weighted average of the neighbors's heading direction.

Other species and predator avoidance
Other species avoidance is a behavior pretty much similar to the separation. The only difference is that only birds that belong to other species contribute to the result.

Predator avoidance is also a "flee or separation" behavior, but what happens here is that we do not take into account the current predator position, but instead, birds try to "separate" from predators's next position (the prediction is made from current position and velocity and acceleration of the predator).

The predator avoidance vector is defined as follows:

where:

•   is the 's set of predators
•
is the predator avoidance coefficient, where is the minimum distance bird avoid predator.

The model has been implemented in CUDA to speedup the simulation. The following is a short video which I used during my presentation at PDP16 conference.  The model and the implementation described in this article are much more greatly described in the following slides (download here).

Last summer I was invited at CodeWeek 2015 organized by Hacklab CS at UNICAL to talk about Haskell and Functional Programming.

Here the video of my (short) talk. In Italian

Slides Here

## Byte Number Conversion

For a project I'm working on these days I was forced to convert raw byte to several numerical values ( to and from binary format).

Bytes are represented in C/C++ as  unsigned char, and std::string are often used as byte buffer as well as old well known raw arrays. I wrote a simple struct that relieve the pain in performing such conversions. It exposes two functions: fromBytes and toBytes.fromBytes takes an additional boolean parameter that takes care of different endianness.

/*
Author: Davide Spataro 2016

Converts byte buffer to type T
*/
typedef unsigned char byte;

template <class T>
struct converter{

static const size_t size = sizeof(T);

union conv{
T value;
byte bytes[ sizeof( T ) ];
} c ;

T fromBytes( const  byte *bytes, bool endianness = false){

if(endianness)
#pragma unroll
for(int i=0;i<size;i++)
c.bytes[size-1-i] = bytes[i];
else
#pragma unroll
for(int i=0;i<size;i++)
c.bytes[i] = bytes[i];

return c.value;
}

byte* toBytes(const T& value,bool endianness = false){
c.value =value;
if(endianness)
reverse();
return c.bytes;
}

void reverse(){
#pragma unroll
for(int i=0;i<size/2;i++){
byte tmp = c.bytes[i];
c.bytes[i]=c.bytes[size-1-i];
c.bytes[size-1-i] = tmp;
}

}

};

template<class T>
void printHex(const T& key, size_t size){
std::cout.setf(std::ios::hex, std::ios::basefield);
for (int i = 0; i < size; i++){ if (i > 0) printf(":");
printf("%02X", (unsigned char)key[i]);

printf("\n");
}



Usage is very simple: let's say for instance you have the following 8 bytes into an unsigned long long

00:00:00:00:00:00:00:5E


converter<int64_t> c;
//binary value is 00:00:00:00:00:00:00:5E (8 bytes)
int64_t res =  c.fromBytes(reinterpret_cast<const unsigned char*>(binary.c_str()),true);
std::cout <<res<< " \n";



this will output clearly output 94.

It works with almost all numerical types (float, all int flavors and floating points)

A great experience, that I hope will became an joyful habit, took place last week when a colleague of mine at the Department of Engineering at University of Edinburgh (Kino) told me about a meeting among the researchers/musicians of the university.

I was invited to play some music, and I obviously accepted! It was great to meet new smart people and musician from all over the world and share with them delicious food and wine. There were several instrument involved in the performances such as viola, violin, cello and voice.

Here two (very) short videos from my performance:

The following was played to fulfill a special request from Alice (a P.hD collegue) from Milan.

# Largest Prime Number

Recently a team at the University of Central Missouri, headed by Curtis Cooper has announced, via press release from the Mersenne organization

to have discovered a new largest prime. The number  have more than 22M digits,  to be precise. It's so large that writing 4 digits per centimeter one would be able to cover the entire distance between Edinburgh and Glasgow! Here the full http://www.filedropper.com/largestprimenumber number.

The following is an Haskell micro-script for computing and writing the number to a file (using Haskell because it's lazy input/output allows to write big files to disk without any concern about memory).


import Data.Time
import System.Environment (getArgs)
main :: IO ()
main = do
[path]<-getArgs
let n = 2^74207281 -1
startTime <- getCurrentTime
writeFile path (show n)
stopTime <-getCurrentTime
putStrLn ("ElapsedTime: " ++ show (diffUTCTime stopTime startTime))



Compile using using : ghc --make

and execute passing the output filename as the only parameter.

The full number is ~22MB, not surprisingly as one char occupies one byte in memory.

Here the first digits of the number:

###### 300376418084606182052986098359166050056875863030301484843941693345547723219067 994296893655300772688320448214882399426727835290700904836432218015348199652241 372287684310213386284573666361506667532122772859359864057780256875647795865832 142051171109635844262936572650387240710147982631320437143129112198392188761288 503958771920355017186438665809954286344460536606761717933683749624756782578361 731044883934155387085250868537297205931251606849781532670414744928294883449429 443999003776831072496868250622866039978884541062234219154504645252386846303469 724807334155852889497374778705327594144808269546049745682886662634337786061551 354498294392788969717277814170247857840825173814169979529718831378258156460855 598404801012277963664118162318740241984446339571147500893873350471752282309276 960908368218257475857949333688648781647084935600389442816615101269892941620923 700583920438303155576675128697727353015966198570119971508975499769430113632520 704976596018662818527213338297501690033894692212329648575780270141964029454297 379598752963111110166054910922708870780155972725875622704085120422206985800208 953699779570148521239387340972873010415557408840313517334104245951181312377569 862268931591236073913864912702341514442871893227806578339072908082737776944438 541558625494782239705021522924186805591226430219483495972094802701924328600534 393128646703341368026587734561209964921713257134223641483136379023890310042525 635413014854847842999675719601547926712259803033804208054192341842074795499467 736417866657681142429045674308204219551025449960330608429729874249539051023991 353492744406378092116867003111452756638147874006136238963152211561563090034814 454337404268972669143336589608026262105540337915734652847488347593274189154190 268344381703937005859988258738844104703265786972872467031538046586054465054455 ....

Programming Question: Given an integer, compute its parity(easy)
The parity of an integer is true iff number of set bits (1) is odd, false otherwise. Example: has 5 set bits and hence its parity is true.

## Solutions:

Programming Question: Compute the Greatest Common Divisor of two integers (easy)

This question is divided in two parts: you will first asked to write a function that computes the  GCD withouth any space/time constraints. The second part will be more challenching, asking for not using multiplication or division or addition.

## Part1: write a function that takes two integers as input and returns their gcd using the famous Euclidean algorithm.

Part 2: compute the Greatest Common Divisor of two integers without using multiplication addition or division operators (easy)

# Programming Question: given a 2D array, print it in spiral fashion (easy)

Given a 2D array print it in spiral fashion as shown in the picture.

# Fibonacci serie - Knuth multiplication - Java

Leonardo Fibonacci,the Italian mathematician (from Pisa, 1170, one of the most important mathematician of all time) invented the famous serie defined recursively as follows:

We all know that as programmers once in life (at least) we will write some piece of code to compute the element of the serie. Recursive implementation is straightforward but inefficient, due to the exponential number of recursive calls, that make "hard" to compute numbers bigger that 40. It is very easy to come up with some iterative or memoised implementation, that lowers the complexity to and allows to efficently compute "any" number of the serie.
In this post we will derive an implementation (JAVA) that have complexity 1)using results from Knuth, Fibonacci multiplication,1988. Read On…

References   [ + ]

 1 ↑ using results from Knuth, Fibonacci multiplication,1988

Su Doku means literally  number place is the name of a popular puzzle game that a special case of the old Latin Square (invented by Leonard Euler). Latin Square force each symbol to appear once in each row and column. In addition to these contraint Sudoku force each symbol to apper one in each sub-matrix. In the general case Sudoku is NP-Complete (easy reduction to graph coloring), but certain schemeas can be solved by deduction and logic only (in this case the puzzle is said well constructed).

At high level a Sudoku can be viewed as a list of cell

data Board= Board [Square]
deriving(Show)


and a Square must certanly carries information about its position and the digit it contains, or even better it could contain (according to the cells in same row, column and submatrix)

data Square= Square ColDigit RowDigit BoxDigit (Either [Digit] Digit)
deriving (Show)

type RowDigit= Digit
type ColDigit= Digit
type BoxDigit= Digit
type Digit= Char


While solving the puzzle each square will contains either a list of possible digits that can be placed in the square OR (this is the Either logic) a fixed digit (i.e., the square was given by a clue or has been deduced).

The following Squares have respectively position and , both lie in submatrix and can contain the digit 1 OR 5 OR 7 OR 4 while is fixed to 6.

let s1=Square '4' '1' '4' (Left ['1','5','7','4'])
let s2=Square '4' '2' '4' (Right '6')


The solver works s.t. it return the list of possible solution for a schema (not well constructed Sudoku may have multiple solutions). In particular if no unresolved square are left (the one with the Left in the Ethier part) it means that the board is solved and does not need any further processing, otherwise we take the square with the minimum number of possible values and try solve the board trying, one at the time, all the values for it. This sort of backtracking is perfomed transparently by the List monad that is used to represent nondeterminism (i.e. multiple possibilities).

solveMany :: Board -> [Board]
solveMany brd =
case getUnsolvedSquare brd of
[] 	-> return brd --all squares correctly set up
sqs -> do
let selSquare@(Square c r b (Left ds)) : _ = sortBy (byLeftLenght) sqs
sq <- [ Square c r b (Right d) | d <- ds ] -- Try all possible moves
solveMany (setSquare sq brd)


Recall that do notation is just syntactic sugar for bind (>>=) operator and that for for list it is defined as following

(>>=) :: [a] -> (a->[b]) >[b]
xs >>= f = concat (map f xs)


It then applies f=solveMany (that takes a board) and produces [Board] for all the digits in the Left of selSquare.
Here the same function wrote with explicit bind operator application that makes much clear the monadiac operation that do the dirty job for us.

solveMany :: Board -> [Board]
solveMany brd =
case getUnsolvedSquare brd of
[] 	-> return brd --all squares correctly set up
sqs ->
let Square c r b (Left ds) : _ = sortBy (byLeftLenght) sqs
in [ Square c r b (Right d) | d <- ds ] >>=(\b->solveMany (setSquare b brd))

getUnsolvedSquare :: Board -> [Square]
getUnsolvedSquare (Board sqs) = filter (isSolved) sqs
where
isSolved (Square _ _ _ (Left _)) = True
isSolved _						  = False



The setSquare function is the one devoted to the constraint propagation and is defined as follow:

setSquare :: Square-> Board ->Board
setSquare sq@(Square scol srow sbox (Right d))(Board sqs) = Board (map set sqs)
where
set osq@(Square ocol orow obox ods)
| scol==ocol && srow==orow  			 = sq
| scol==ocol || srow==orow || sbox==obox = Square ocol orow obox (checkEither ods)
| otherwise 							 = osq

checkEither (Left  lds ) 			= Left (lds \\ [d])
checkEither r@(Right d') | d==d'	= error "Already set this square"
checkEither dd = dd
setSquare _ _ = error "Bad call to setSquare"



It places Square (with a Right Digit) on a Board and return the new Board taking care of removing the just inserted digit from the possible values for Squares of the same row, column and submatrix. This is how the constraint are propagated, cutting off the search space.

The complete source code is available here and on github.

## Stream Compaction - Introduction1)Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures jQuery("#footnote_plugin_tooltip_4520_1").tooltip({ tip: "#footnote_plugin_tooltip_text_4520_1", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });2)InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes jQuery("#footnote_plugin_tooltip_4520_2").tooltip({ tip: "#footnote_plugin_tooltip_text_4520_2", tipClass: "footnote_tooltip", effect: "fade", fadeOutSpeed: 100, predelay: 400, position: "top right", relative: true, offset: [10, 10] });

An efficient implementation of stream compaction on GPU is presented with code example. Full source code on github.

Stream compaction/reduction is commonly referred as the operation of removing unwanted elements in a collection. More formally, imagine we have a list of element of N elements and a predicate that bisects in wanted and unwanted elements (ones that satisfy the predicates and ones that don't). The stream compaction of under is an Read On…

References   [ + ]

 1 ↑ Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures 2 ↑ InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes

# Haskell Seminar at University of Calabria

During the last year I got much interest in functional programming and in particular in Haskell.
Very few people write and study Haskell at University of Calabria (one of them is the manteiner of gnome for NixOs) and so I decided to do a introductive lesson (a big thanks go to the guys of the ciotoflow). It was a great experience, and people were enthusiast and curious.
All the material i produced is available for download and use, and hopefully will be gradually updated and improved. All the files are available on github.

Introductive slides here

In this article we want to briefly discuss a possible solution to the project euler problem 24 which asks:

A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are:

012 021 102 120 201 210

What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?

One possible approach could obviously be to brute force all the lexicographical permutations and then take the millionth element out of the previously sorted collection of permutation (interpreted as number). The drawback of this solution is that unfortunately the number of permutations is more than exponential, is .

We want to show here a more mathematical/combinatorial approach. First of all start considering that how the permutations looks like.
First permutations are (thanks to Stefano Germano for typo correction) .
A deeper look at the permutations reveals that are all the sequences starting with 0 and concatenated with a permutation of the ramaining 9 elements . How many of them? .
It means that the permutation we are looking for  has to be greater than the th permutation: .
From this we can deduce that we can also skip all the sequences starting by 1 because they produces (again) permutations and

. Moreover our millioth permutation is in the sequences starting with 2 because (hence is not in the permutations starting with 3). Read On…

This article present a CUDA parallel code for the generation of the famous Julia Set. Informally a point of the complex plane belongs to the set if given a function f(z) the serie does not tend to infinity. Different function and different initial condition give raise eventually to fractals. One of the most famous serie is   (the one that generated the video below).
Some nice picture may be obtained with the following initial conditions:

1. # dentrite fractal
3. # san marco fractal
4. # siegel disk fractal
5. # NEAT cauliflower thingy
6. # galaxies
7. # groovy
8. # frost

Here a video, showing a sequence of picture generated using different initial conditions. A nice example of how math and computer science can produce art!

# CUDA Julia set code

The code is such that it is very easy to change the function and the initial condition (just edit the device function functor).

#define TYPEFLOAT

#ifdef TYPEFLOAT
#define TYPE float
#define cTYPE cuFloatComplex
#define cMakecuComplex(re,i) make_cuFloatComplex(re,i)
#endif
#ifdef TYPEDOUBLE
#define TYPE double
#define cMakecuComplex(re,i) make_cuDoubleComplex(re,i)
#endif

__global__ void computeJulia(int* data,cTYPE c,float zoom){
int i =  blockIdx.x * blockDim.x + threadIdx.x;
int j =  blockIdx.y * blockDim.y + threadIdx.y;

if(i<DIMX && j<DIMY){
cTYPE p = convertToComplex(i,j,zoom);
data[i*DIMY+j]=evolveComplexPoint(p,c);
}
}

cTYPE c0;
__device__ cTYPE juliaFunctor(cTYPE p,cTYPE c){
}

__device__ int evolveComplexPoint(cTYPE p,cTYPE c){
int it =1;
while(it <= MAXITERATIONS && cuCabsf(p) <=4){
p=juliaFunctor(p,c);
it++;
}
return it;
}

__device__ cTYPE convertToComplex(int x, int y,float zoom){

TYPE jx = 1.5 * (x - DIMX / 2) / (0.5 * zoom * DIMX) + moveX;
TYPE jy = (y - DIMY / 2) / (0.5 * zoom * DIMY) + moveY;
return cMakecuComplex(jx,jy);

}



The code above are the GPU instructions that are devoted to the computation of a single set. The kernel computeJulia() is executed on DIMX*DIMY (the dimension of the image) threads that indipendently compute the evolveComplexPoint() function on the converted to complex corrensponding pixel indices, the viewport (each threads compute a single pixel that has integer coordinates). The evolveComplexPoint() take care of checking how fast che point diverges.

The code is available here on github. It generates a number of png tha can be then merged in a video (as the one I present here) using a command line tool as ffmpeg.

In order to compile and run it need the CUDA SDK,a CUDA capable device and libpng installed on your system.
To compile and/or generate the video simply hit:

#compilation
nvcc -I"/usr/local/cuda-6.5/samples/common/inc"  -O3 -gencode arch=compute_35,code=sm_35  -odir "src" -M -o "src/juliaSet.d" "../src/juliaSet.cu"

nvcc  -I"/usr/local/cuda-6.5/samples/common/inc" -O3 --compile --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35  -x cu -o  "src/juliaSet.o" "../src/juliaSet.cu"

/usr/local/cuda-6.5/bin/nvcc --cudart static --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o  "JuliaSetDynParallelism"  ./src/juliaSet.o ./src/main.o   -lpng

#video generation
ffmpeg -f image2 -i "julia_%05d.png" -r 40 output.mov



Here another animation

This article aims to be a brief practical introduction to new haskell programmer about the fold high order function (which roughly speaking is a function that takes as input and/or produces as output a function) and its usage. It will also outline a bit of the theory that lies behind this operator (the term operator comes from the field of recursion theory (Kleene, 1952), even if, striclty speaking fold is a function in haskell).

## Haskell Fold - Introduction and formal definition

Many problem in function programming are solved by means of tail recursive functions (one that do not perform any other operations that need some values from the recursive call). For instance let's say we want to get the product of integers stored in a list.
What we would do is to write down a tail recursive function (more or less) like this

tailmultiply :: [Int] -> Int -> Int
tailmultiply [] acc     = acc
tailmultiply (x:xs) acc = tailmultiply xs (acc * x)


Note, to stress the tail recursion, a definition like the one that follows would not be tail recursive.

Linux Kernel especially in the past years was plenty of different and replicated implementation of generic data structures such as linked lists, stacks and queues.  Kernel's developers decided to provide a unique set of API for using these kind of software machineries. We will briefly look in this article  at the list.h file (in the linux source tree under /usr/src/linux-headers-uname -r/include/) that provides macros and function for utilizing a doubly linked circular list. It is a very smart written piece of code that is worth to take a look at as it is  an opportunity  to improve  C programming skills and at the same time to see an "unconventional" implementation for a common data structure.

## From canonical list to Linux kernel linked list

The canonical implementation of a C's linked list consists of a structure and two (for a doubly linked) recursive pointer fields to the structure itself. The links to the previous and subsequent element of the list.


struct point2DNode{
int x,y;
struct point2DNode* prev, next;
};


To fall in the void as I fell: none of you knows what that means… I went down into the void, to the most absolute bottom conceivable, and once there I saw that the extreme limit must have been much, much farther below, very remote, and I went on falling, to reach it.” [Cosmicomics, Italo Calvino]

Finally at my Departement we succesfully converted an  unsed  Apple xserve  cluster to a full functional Linux/Debian machine.

The cluster is make up of eleven node connected by Myrinet 10G low latency network and each node is a composed by a dual 4x cores Intel Xeon 2.8 Ghz Harpertown with 10Gb of DDR2 for a total of 88 cores. It is a quite old and small machine and hence was unused. We thought that  it could have been useful if well re-configured as an ad-hoc machine for small sized scientific computation session, as  developing and testing platform of parallel application or for training of undergraduate students that can experience directly with a real parallel machine. So we decided to cast a new light on it by configuring it as a Debian cluster (according to google just few people did it).

A recently developed computational methodology for executing numerical calculations with infinities and infinitesimals. The approach developed has a pronounced applied character and is based on the principle “The part is less than the whole” introduced by the ancient Greeks. This principle is applied to all numbers (finite, infinite, and infinitesimal) and to all sets and processes (finite and infinite).(Ya. D. Sergeyev)

## A brief detour in infinity

The infinity is a concept that attracts and fascinate humans from its earlist time. The earliest mathematical account of infinity comes from Zeno of Elea (490-430 BCE) who was a pre-Socratic Greek philosopher of southern Italy (named by Aristotle the "invector of dialectic"). But in order to have a systematic and mathematically rigorouse use of the concept of infinity we have to wait the 17th century, when an English mathematician, Jonh Wallis for the first time used the symbol to denote an infinite quantity in its attempt to divide a region in infinitesimal parts . Than, thanks to his concepts of infinity and infinitesimal,  Leibniz was able to develop big part of his work on calculus, but they were still abstract objects,  different from appreciable quantities as finite numbers. But they shared some interesting properties with them (law of continuity,"whatever succeeds for the finite, also succeeds for the infinite").

Cantor set, we begin with a straight line, of length 1 and remove its the middle third and repest the process with the new two pieces over and over again.

# What is it?

A GPGPU CUDA powered cellular automata library , capable of hiding all the complexity behind the GPU programming of classica/complex and macroscopic cellular automata. The aim of this library is to make life easier to researchers that need to write and execute big cellular automata model.

Figure 1. Dewdney's Belousov-Zhabotinsky reaction cellular automaton.

The programmer has only to write the elementary processes that compose the transition function, register the substates and set up the automaton parameters (for example, the number of steps and cellular space size). The Memory management is completely behind the scene, letting the programmer concentrate only on the model he needs to write.

This library is open source. Help make it better!

Nikolai Kapustin is an **autodidact on composing**; he made his first attempt to compose a piano sonata at age of 13. During his conservatory time he composed and played his Op. 1; a Concertino for piano and orchestra. The Op.1 was a jazz piece and turned out to be his first work performed publicly (1957). He also had his own quintet and was a member of Yuri Saulsky’s Big Band.