# Archive for the ‘Programming’ Category

This article discusses (a problem that I recently solved on codility ).

The core of the problem is the following:
Given two non negative integers N and M, $1 \leq M \leq N \leq 2147483647$, the task is to check whether they have the same set of prime divisors.
A prime divisor of an integer P is a prime d s.t. $d \times k = P$ for some positive $k$. You are given up to $6 \times 10^3$ of such queries, and should return the total number of them that evaluates to true.

For instance given if $N = 156$ and $M = 78$ then our function should return *true* because the set of prime divisor of $N$ is equal the
the set of primal divisor of $M$ i.e. $\{2,13,3\}$ while for $N=45$ and $M=120$ the function should return *false*.

## Smart (not enough) Brute Force approach

An easy approach in solving this problem could be computing by bruteforce the set of prime divisors of both numbers and compare them, and return true if they contains the same elements.
In order to achieve that we should:

1. compute the set of primes up to $L=max(N,M)$. There are approximately $\frac{L}{log (L - 1)}$ (Read here for a more info on this topic https://primes.utm.edu/howmany.html)
2. Using the Sieve of Έρατοσθένης approach we are able to compute it in $L \times (log (log (L)))$ time and space (see the following code).
3. 
void getPrimesSieve(vector<long long>&primes, const ll n) {
using ll = long;
vector<bool> nums(n + 1, true);
primes.push_back(2);
for (ll i = 3; i * i < n + 1; i += 2) {
if (nums[i]) {
for (ll j = i * i; j < n + 1; j += 2 * i) {
nums[j] = false;
}
}
}

for (ll i = 3 ; i <= n ; i += 2)
if (nums[i])
primes.push_back(i);
}

4. With the set of prime numbers in place then we can start finding the primal divisors. The following function computes the set of unique prime divisors of $p$ and stores them in $F$.
5. bool primalDivisors( long p , vector<long>& F, vector<long>& primes) {
int i = 0;
int c = 0;
if (i >= primes[primes.size()-1])
return false;
while (p > 1) {
if (p % primes[i] == 0)
{
c++;
p /= primes[i];
}
else
{
if (c > 0) {
F.push_back(primes[i]);
c = 0;
}
i++;
}
}
if (c > 0)
F.push_back(primes[i]);

return true;
}

6. Finally we can iterate among all the queries and count the number of pairs returning *true*.
7. int solution(vector<int>& A,vector<int>& B){
int LIM = -1;
int ans = 0;
LIM = max( *max_element(A.begin(), A.end()) ,*max_element(B.begin(), B.end()) ) ;
vector<int> primes;
getPrimesSieve(primes, LIM);

for (unsigned i = 0; i < A.size(); ++i)
{
vector<int> divA;
vector<int> divB;
primalDivisors(A[i] , divA, primes);
primalDivisors(B[i] , divB, primes);

//same set of divisors
if( divA == divB)
ans++;

}
return ans;
}


Altough this approach is correct, it is too slow. There are $\approx \frac{L}{log (L - 1)} = \frac{2147483647}{log (2147483647 - 1)} = 99940774$ primes up to $2147483647$. Since the primalDivisors function gets called up to $K=6000$ times and it has linear complexity then the overall time complexity is: $K \frac{L}{log (L - 1)}$ which translates to $\approx 6 \times 10^{11}$!

## Fast Approach using GCD

For two number to share the same set $S$ of prime divisors it must be the case that $S(N) \subseteq S(M)$ and $S(M) \subseteq S(N)$. Checking if a number has a set of prime divisors that is subset of another is easy. In order for two number to have the same set of prime divisors the following must hold:
$N = p_{0}^{a_0} \times p_1^{a_1} \ldots p_n^{a_n}$ and $M = p_0^{b_0} \times p_1^{b_1} \ldots p_n^{b_n}$ i.e. their factorization only differ in the exponents.
Since clearly the $GCD(M,N) \ne 1$ we can divide $M\$ by it obtaining $M' = \frac{M}{GCD(M,N)}$. $M'$ cannot have a larget set of factors than $M$ and hence its set of prime divisor is still contained into $N$'s one. We can keep repeating this process until $M'$ becomes one.
Dividing by the $GCD$ eliminates the common part of among the two number. And if eventually the dividend becomes $1$, that means that the common part was the only part present.

As an example consider $M = 2^2 \times 3^3 \times 5^2$ and $N = 2^1 \times 3^1 \times 5^1$, $GCD(M,N) = N$ and $M' = 2^1 \times 3^2 \times 5^1$. Repeting the process leads to $GCD(M',N) = N$ and $M'' = 3^1$ and $GCD(M'',N) = 3$ and $M''' = 1$.
This shows that the set of prime divisor of M is contained in the set of prime divisor of N.
If we can show that the set of N is included in M's one, than the two sets must be equal. It is sufficient to do the same process with roles of N and M swapped.
If apply the same process to swapping the values of M and N from the previous example we obtain that $M = 2^1 \times 3^1 \times 5^1$ and $N =2^2 \times 3^3 \times 5^2$, $GCD(M,N) = N$ and $M' = 1$.

If there is a divisor that is not common to both number then the process will eventually isolate it.
Imagine that $M = 2^2 \times 3^3 \times 5^2 \times 11$ and $N = 2^1 \times 3^1 \times 5^1$. All common divisors will be cancelled out by repeting the division by the GCD leaving $M=11$ and $N=2^1 \times 3^1 \times 5^1$.

Finally, the following is a possible implementation of the overall previous idea, which is also much simpler and cleaner than the one presented in the previous section.

bool containedSet(int M, int N){
int g = __gcd(min(N,M), max(N,M));
while(g != 1){
M /= g;
g = __gcd(min(N,M), max(N,M));
}
return M==1;
}

int solution(vector<int>& A,vector<int>& B){
int ans = 0;
for (unsigned i = 0; i < A.size(); ++i)
{
//divisor A contained in divisor of B
//divisor A contained in divisor of A
//implies divisor A =  divisor of B
if(containedSet(A[i],B[i]) && containedSet(B[i],A[i]))
ans++;

}
return ans;
}


# A Few Words on Polymorphism

Polymorphism is the ability to assign a pointer to a base class to an instance of one its derived class. When a method is invoked on that pointer, the derived implementation, if provided, of the method is called otherwise, the inherited method is. The following is an example of such feature.

class Polygon {
protected:
double width, height;
public:
void Polygon(int a, int b) : width(a), height(b){};
double area() const =0;
int perimeter() const
{ return -1; }
};
};

class Rectangle: public Polygon {
public:
double area()
{ return width*height; }
int perimeter() const{ return width*2 + height*2;};
};

class Triangle: public Polygon {
public:
double area()
{ return width*height/2; }
};

int main () {
Rectangle rect;
Triangle trgl;
Polygon * ppoly1 = &rect;
Polygon * ppoly2 = &trgl;
ppoly1->set_values (4,5);
ppoly2->set_values (4,5);
cout << rect.area() << '\n';
cout << trgl.area() << '\n';
return 0;
}


It is implemented with a cost in term of memory and time. For each class a virtual method table is stored and a pointer to it is added to the definition (transparently, by the compiler) of each class containing a virtual method (or deriving from a class that contains a virtual method). The table in turn contains pointer to the actual implementation of the virtual methods for the derived class. So the compiler only knows the object through a pointer to its base class, but it can still generate the correct code since it can indirect the calls to overridden methods via the virtual method table, then lookup for the method in the table and finally call it. So polymorphism comes at a cost of storing a virtual method table for each class, a pointer to it in each instance of a polymorphic class and two level in indirection when calling a virtual method.
Another pitfall is that since the indirection is required, usually virtual methods cannot be in-lined.

## Curious Recurring Template Pattern

The key idea is: polymorphism without extra run-time cost. Static polymorphism.

Templates can mitigated performance problems of dynamic polymorphism via the so called static polymorphism, or simulated dynamic binding. Read On…

Complex number in OpenCL - cl_complex

Recently I've been involved in the developing of a library OpenCAL capable of  parallel execution of cellular automata  and finite differences models.

I though it could have been fun to render some huge fractal with it and so I ended up writing some OpenCL code for the generation of Julia sets. Unfortunately OpenCL does not provide support for complex number (CUDA does, check the following link out: CUDA complex number example)  so I had to write it myself.

The following might be useful to anyone with need of support for complex number operations as exponentiation, argument, modulus etc. in OpenCL.

Here is a link to a 324 Megapixels Julia set Rendered image (warning, size >150 MB)

# Weighted Tree Vertex Cover Problem

Vertex cover of graph $G=(V,E)$ is defined as $V'\subseteq V$ s.t. $\forall (u,v) \in E$ $u \in V' \vee v \in V'$. In other word a subset of the vertices such that all vertices are incident to a vertex in the vertex cover.
We will derive an algorithm for finding the weight of a minimal (yes is not unique) vertex cover for a subclass of graphs i.e. tree (which are acyclic graphs with the property that only one path between each vertex exists).

Remember that vertex cover for graphs is an NP-Complete (NP-hard and NP, hard at least as all NP problems and it is an NP problem itself) problem i.e. no deterministic polynomial tyme algorithm was discovered (if you discover one, contact me, we will be millionaire).

# Tree Vertex Cover - Problem Definition

Given a weighted tree $T = (V,E,f)$ with $f : \mathcal{N^+} \rightarrow \mathcal{N^+}$ write an algorithm for computing a vertex cover with minimum weight i.e. V' is a vertex cover and the sum of the weight of its element is minimal.

The following is the tree structure that we will use throughout the article.

template<typename T,int DEGREE>
struct node{
array<node*,DEGREE> children;

T data;
int weight;

node(const T& v , int _weight) : weight(_weight){
data=v;
mincover=0;
}
};



What if the weight is equal to the degree of the node?

The first observation we can make is that the root node can weather be or not be in the vertex cover. If we include it in the solution then we are sure that all the edges from it to its children have been covered and to solve the problem we need only to compute the cover of its children (which is a simpler problem). Read On…

Programming Interview Question - Merge Intervals

This post will explore the solutions to the question #4 of the famous website cakeinreview (here the link).

Programming Interview Question - Merge Intervals - Problem Statement

Given a list of pair of integers return a list of merged or condensed intervals.

Given for instance the following input list

Your function should take care of corner cases like merging two intervals like the following $(0,1),(1,2)$ in $(0,2)$. Give a $O(n^2)$ solution first. Then try to solve it in $O(nlog(n))$.

# List Cycle Detection

Linked list cycle detection problem is a very instructive and fun problem to reason about. This article will state the problem and explains how we can solve it efficiently while giving some insight on the underlying maths.

Well, list can gets corrupted and some node can be linked by one or more nodes (as in the following figure). This could lead to never ending traversal of the list. So it make sense to solve the following problem:

Cicular List

## List Cycle Detection - Problem Statement

1. Given  a linked list detect (if any) check if the list is circular i.e. contains a cycle
2. Find the starting node of the cycle (the node with two inward arrows in the figure)

The problem is easily solvable in $O(n^2)$ time and$O(n)$ space considering that we can visit the list from the head and store in a separate list the visited nodes. As the visit continues we check if the node we are examining was previously visited (is that node contained in the support list?). If yes the list is circular and that node is the start point of the cycle. If we reach the tail of the list then the list is not circular. We can lower the complexity of this approach down to$O(nlog(n))$ time using a more efficient support (set) data structure (like a tree set). But we can do much better, and the rest of the article will show how to obtain $O(n)$ time and $O(1)$.

## List Cycle Detection - Floyd’s algorithm

We use the fact that like clock's hands things iterating on a cicle will meet at some point in the future. Consider two runner $R_1,R_2$ with velocities$V_1,V_2=2V_1$ starting running from the same point in a circular stadium. They will meet again when the slower runner reach the starting point for the second time (Why? By the time the slower one has completed half circle the faster has completed a complete cycle and by the time the slower finishes his run, arriving at the starting point again, the faster has completed a second entire cycle).

Things are a bit more complicated in the List Cycle Detection problems because the iterators do not necessarily start from the same point. Consider two iterators $p,q$ with velocities $v_p=1,v_q=2v_p = 2$  respectively. Suppose the cycle has length $n$ and that it starts at node number $A < n$ When the slower iterator reaches $A$ the faster is at location $2A$. How many iteration $k$ it will take before they meet? And at which node?

The situation is described by the following congruence

• $A + kv_p \equiv 2A + k2v_p mod(n)$
• $\Rightarrow 2A + k2v_p \equiv A + kv_p \; mod(n)$
• $\Rightarrow A + k2v_p \equiv kv_p \; mod(n)$
• $\Rightarrow A +kv_p \equiv 0 \;mod(n)$
• $\Rightarrow A +k \equiv 0 \;mod(n)$

which has solution $k = n-A$. This means that they will meet after $k=n-A$ iteration of the slower iterator and since it starts from $A$ this means that they will meet at $A$ nodes before the beginning of the cycle. We can use that fact to count $A$ nodes from the beginning of the list. Once the iterators matche in the cycle, we can move the fast iterator back to the beginning of the list and iterate forward one node per stepo with both iterators until they match again (They are far away $A$ nodes from the beginning of the cycle, so they will meet exaclty in that point.

Let's consider now the case when $A \geq n$. This means that by the time the slower iterator reaches the beginning of the cycle the faster one has completed more that a cycle. What will be the starting point for the faster one? We argue that once p reaches A q is at node 2A but since A > n this means that some it will be at position A + (A mod(n)). We can now use similar argument to the previous case and write:

• $A + kv_p \equiv A + (A mod(n)) + k2v_p \;mod(n)$
• $A + (A mod(n)) + k2v_p \equiv A + kv_p\;mod(n)$
• $(A mod(n)) + kv_p mod(n) \equiv 0mod(n)$
• $(A mod(n)) + k mod(n) \equiv 0 mod(n)$ since $v_p =1$

which has solution $k = n-(A mod(n))$. This means that the meetpoint is $A mod(n)$ nodes before the beginning of the cycle. If we do the sape operation as the previous case we obtain the sme result. Itrators wil meet at the beginning of the cycle. Why? Well advancing $q$ makes $p$ cycles possibly several times ( remember that $A \geq n$  ) and it will clearly stops at $A+(n-A mod(n)) + A mod(n) = A +n \;(mod (n))= A$.

In other words the slower pointer is at first  at node number $A+(n-A mod(n))$. We can write$A = bn + r$ where $r = A \;mod(n)$. After $A$ advancing steps it will be at location  $A+(n-A \;mod(n)) +bn +r (mod n)$ Since $bn \; mod (n)=0$ the result follows.

As an example consider in which we have a list with a cycle of length $n=4$ and starts at node number $10$. The first part of the algorithm tells us that the nodes will meet at node $10 + 4 - 10 \: mod(4) = 12$. Moving the fast pointer back to te head of the list and iterating one node per time both iterators will lead the slower point to node:

• 12 again after advancing of 4 nodes
• 12 again after advancing of 4 nodes
• 10 advancing of the remaining 2 nodes.

# HOW-TO: Dynamic Message of the day

The final result will be something like the following:

It mixes a fun message using fortune and cowsay which you can install using


sudo dnf install fortune-mod cowsay



utility and some informative info about the status of the system as:

• Ram and Swap available and used
• Disk space

The script file can be easily configured and extended to suits the your needs. Colors can also be easily customized.

# Distributed  Hadoop and HBase installation - Fedora Linux

In this post I will describe how to get started with the latest versions of hadoop and hbase describing all the step to obtain a working hadoop installation. The steps described here can be easily used to perform a working installation on a large cluster (even tough it can requires additional steps as shared filesystem for instance).

## Prerequisites

 sudo dnf install openssh openssh-askpass openssh-clients openssh-server

Don't forget to start the ssh service using the following command:

 sudo service sshd start

# Programming Interview Question

## Question: given two string write a method to decide if one is a permutation of the other.

This is a common question that could be easily solved if we know in advance which is the size of the alphabeth. A straightforward approach could be to sort both strings and then compare them. This approach has complexity $O(nlog(n))$ due to sorting. We can do better and lower the complexity to $O(n)$ if we reason as follows: given a boolean sequence of the same size of the alphabet  initially false initialized.  Then what happen if  for each char we found in the  string $s$ we negate the value of the correnspoding bool in the sequence? We end up having the $True$ at position $i$ if the char $i$ appeared an odd number of times and $False$ otherwise. Now if the other string $v$ is a permutation of the first one, we all agree that for each char in $s$ it contains the same number of occurrences. This means that if we apply the same process as before on the boolean sequence using as input the string $v$ each bool is being negated an even number of times and this means that its value would be the same as the beginning of the process (all $False$). So if the sequence does not contains any true value, than it means that the strings contains the same elements i.e. they are permutation of each other.

Example:

1. $s:$"abbccd" , $v:$ "cbdabc" , $S=\{False,False,False,False,False, False, False ...\}$
2. Apply negation using s
• $S=\{True,False,False,True,False, False, False \ldots\}$
3. Apply negation using v
• $S=\{False,False,False,False,False, False, False ...\}$

A possible C++ implementation is shown here

# Programming Intervew Question: Unique Characters in a String

##### Question: implement an algorithm to determine if a string has all unique characters

The idea behing this (fairly easy) question is that whenever we found that any chars is repeated at least twice we should return false. In order to do that we have to look at each char we are loloking for a solution which complexity is at least $O(n)$. We are free to use any support data structure. $256$ is the size of the ASCII charset.