Blog

Stream Compaction on GPU - Efficient implementation - CUDA

Stream Compaction - Introduction((Markus Billeter et al. Efficient Stream Compaction on Wide SIMD Many-Core Architectures))((InK-Compact-: In kernel Stream Compaction and Its Application to Multi-kernel Data Visualization on GPGPU- D.M. Hughes))


An efficient implementation of stream compaction on GPU is presented with code example. Full source code on github.

Stream compaction/reduction is commonly referred as the operation of removing unwanted elements in a collection. More formally, imagine we have a list of element A_{0...N} of N elements and a predicate p : A \to\left \{ True,False\right \} that bisects A in wanted and unwanted elements (ones that satisfy the predicates p and ones that don't). The stream compaction of A under p is an Read More »Stream Compaction on GPU - Efficient implementation - CUDA

Haskell Seminar @ UNICAL

Haskell Seminar at University of Calabria During the last year I got much interest in functional programming and in particular in Haskell. Very few people write and study Haskell at University of Calabria (one of them is the manteiner of gnome for NixOs) and so I decided to do a introductive lesson (a big thanks go to the guys of the ciotoflow). It was a great experience, and people were enthusiast and curious. All the material i produced is available for download and use, and hopefully will be gradually updated and improved. All the files are available on github. Introductive slides here

Project Euler - Problem 24

In this article we want to briefly discuss a possible solution to the project euler problem 24 which asks:

A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are:

012 021 102 120 201 210

What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?

One possible approach could obviously be to brute force all the lexicographical permutations and then take the millionth element out of the previously sorted collection of permutation (interpreted as number). The drawback of this solution is that unfortunately the number of permutations is more than exponential, is O(n!).

We want to show here a more mathematical/combinatorial approach. First of all start considering that how the permutations looks like.
First permutations are (thanks to Stefano Germano for typo correction) \{0,1,2,3,4,5,6,7,8,9\}, \{0,1,2,3,4,5,6,7,9,8\},\{0,1,2,3,4,5,6,8,7,9\},...,\{0,9,8,7,6,5,4,3,2,1\}.
A deeper look at the permutations reveals that are all the sequences starting with 0 and concatenated with a permutation of the ramaining 9 elements [1,2,3,4,5,6,7,8,9]. How many of them? 9!.
It means that the permutation we are looking for  has to be greater than the 9!=362880th permutation: \{0,9,8,7,6,5,4,3,2,1\}.
From this we can deduce that we can also skip all the sequences starting by 1 because they produces (again) 9! permutations and

2 \times 9! < 10^6. Moreover our millioth permutation is in the sequences starting with 2 because 3 \times 9! > 10^6 (hence is not in the permutations starting with 3).Read More »Project Euler - Problem 24

CUDA - Julia Set example code - Fractals

This article present a CUDA parallel code for the generation of the famous Julia Set. Informally a point of the complex plane belongs to the set if given a function f(z) the serie does not tend to infinity. Different function and different initial condition give raise eventually to fractals. One of the most famous serie is   (the one that generated the video below). Some nice picture may be obtained with the following initial conditions: # dentrite fractal # douady's rabbit fractal # san marco fractal # siegel disk fractal # NEAT cauliflower thingy # galaxies # groovy # frost Here a video, showing a sequence of picture… Read More »CUDA - Julia Set example code - Fractals

Fold operator - a brief detour

This article aims to be a brief practical introduction to new haskell programmer about the fold high order function (which roughly speaking is a function that takes as input and/or produces as output a function) and its usage. It will also outline a bit of the theory that lies behind this operator (the term operator comes from the field of recursion theory (Kleene, 1952), even if, striclty speaking fold is a function in haskell).

Haskell Fold - Introduction and formal definition

Many problem in function programming are solved by means of tail recursive functions (one that do not perform any other operations that need some values from the recursive call). For instance let's say we want to get the product of integers stored in a list.
What we would do is to write down a tail recursive function (more or less) like this

tailmultiply :: [Int] -> Int -> Int
tailmultiply [] acc     = acc
tailmultiply (x:xs) acc = tailmultiply xs (acc * x)

Note, to stress the tail recursion, a definition like the one that follows would not be tail recursive.

Read More »Fold operator - a brief detour

Kernel Linux List - An Efficient Generic Linked List

Linux Kernel Linked List

Linux Kernel especially in the past years was plenty of different and replicated implementation of generic data structures such as linked lists, stacks and queues.  Kernel's developers decided to provide a unique set of API for using these kind of software machineries. We will briefly look in this article  at the list.h file (in the linux source tree under /usr/src/linux-headers-`uname -r`/include/) that provides macros and function for utilizing a doubly linked circular list. It is a very smart written piece of code that is worth to take a look at as it is  an opportunity  to improve  C programming skills and at the same time to see an "unconventional" implementation for a common data structure.

From canonical list to Linux kernel linked list

The canonical implementation of a C's linked list consists of a structure and two (for a doubly linked) recursive pointer fields to the structure itself. The links to the previous and subsequent element of the list.


struct point2DNode{
 int x,y;
 struct point2DNode* prev, next;
};

Read More »Kernel Linux List - An Efficient Generic Linked List

Francis Poulenc - Rapsodie Negre

To fall in the void as I fell: none of you knows what that means… I went down into the void, to the most absolute bottom conceivable, and once there I saw that the extreme limit must have been much, much farther below, very remote, and I went on falling, to reach it.” [Cosmicomics, Italo Calvino]

Apple Xserve xeon Linux/Debian armed

Finally at my Departement we succesfully converted an  unsed  Apple xserve  cluster to a full functional Linux/Debian machine.

The cluster is make up of eleven node connected by Myrinet 10G low latency network and each node is a composed by a dual 4x cores Intel Xeon 2.8 Ghz Harpertown with 10Gb of DDR2 for a total of 88 cores. It is a quite old and small machine and hence was unused. We thought that  it could have been useful if well re-configured as an ad-hoc machine for small sized scientific computation session, as  developing and testing platform of parallel application or for training of undergraduate students that can experience directly with a real parallel machine. So we decided to cast a new light on it by configuring it as a Debian cluster (according to google just few people did it). 10949741_766948060061020_241789111_n

Read More »Apple Xserve xeon Linux/Debian armed

The Grossone Numeral System

A recently developed computational methodology for executing numerical calculations with infinities and infinitesimals. The approach developed has a pronounced applied character and is based on the principle “The part is less than the whole” introduced by the ancient Greeks. This principle is applied to all numbers (finite, infinite, and infinitesimal) and to all sets and processes (finite and infinite).(Ya. D. Sergeyev)

 A brief detour in infinity

The infinity is a concept that attracts and fascinate humans from its earlist time. The earliest mathematical account of infinity comes from Zeno of Elea (490-430 BCE) who was a pre-Socratic Greek philosopher of southern Italy (named by Aristotle the "invector of dialectic"). But in order to have a systematic and mathematically rigorouse use of the concept of infinity we have to wait the 17th century, when an English mathematician, Jonh Wallis for the first time used the symbol to denote an infinite quantity in its attempt to divide a region in infinitesimal parts \frac{1}{\infty}. Than, thanks to his concepts of infinity and infinitesimal,  Leibniz was able to develop big part of his work on calculus, but they were still abstract objects,  different from appreciable quantities as finite numbers. But they shared some interesting properties with them (law of continuity,"whatever succeeds for the finite, also succeeds for the infinite").

Cantor set, we begin with a straight line, of length 1 and remove its the middle third and repest the process with the new two pieces over and over again.

 

Read More »The Grossone Numeral System

Cellular automata library - CuCCAl

What is it?

A GPGPU CUDA powered cellular automata library , capable of hiding all the complexity behind the GPU programming of classica/complex and macroscopic cellular automata. The aim of this library is to make life easier to researchers that need to write and execute big cellular automata model.

Belousov-Zhabotinsky reaction cellular automaton.

Figure 1. Dewdney's Belousov-Zhabotinsky reaction cellular automaton.

The programmer has only to write the elementary processes that compose the transition function, register the substates and set up the automaton parameters (for example, the number of steps and cellular space size). The Memory management is completely behind the scene, letting the programmer concentrate only on the model he needs to write.

This library is open source. Help make it better!

Read More »Cellular automata library - CuCCAl