# Transform array such that each element contains the greatest element on its right side

## Problem statement

In this article we will be discussing how to solve an easy problem that has been asked at Amazon coding interviews. This is a must do problem for your coding interview preparation. The problem statement goes as follows:

You are given an array A of size N. You have to modify A in place s.t. A[i] = M where M is the maximum value among all elements A[j], j > i. If such element does not exists set A[i] = -1.

## Example

Input :  A = {15, 22, 12, 13, 12, 19, 0, 2}
Output : A = {22, 19, 19, 19, 19, 2, 2, -1}

Input :  A = {2, 3, 1, 9}
Output : A = {9, 9, 9, -1}


## Solution

Since this is a quite easy problem, we need to focus on making a good impression on the interviewer by showing a clean reasoning and elegant code implementation of the solution.

Let's start by outlining our solution:

1. The rightmost element of A will always be -1 There is no element at the right of the rightmost location.
2. When processing the element at index i all we need to do is substituting it with the maximum element found so far.
3. We need to make sure that at each iteration we keep track of what is the maximum element so far.

## C++ Code

The ideas above can be coded as follows:

void solve(std::vector<int>&amp; V)
{
if(V.size() <=0 )
return;
//max so far
int M = -1;
int i = V.size()-1;
do{
const int m = max(M,V[i]);
V[i] = M;
M = m;
i--;
}while(i >= 0);
}


Another alternative and cleaner implementation is given below:

void solve(std::vector<int>&amp; V)
{
if(V.size() > 0 )
{
V.back() = -1;
for(int i = V.size()-2, M = V.back(); i >=0; i--){
const int m = max(M,V[i]);
V[i] = M;
M = m;
}
}
}


The time and space complexity of the code above is linear in the length of A.

If you liked this article and you want to read more about coding interview preparation, take a look at the previous article of the series.

# Return the Maximum absolute difference of values and indices

In this article we will be discussing one of the most asked coding interview question asked at Amazon.
The problem statement goes as follows:

## Problem statement

Given an array A determine the maximum possible value of the following expression:
$f(i,j) =| A[i]-A[j] | +|i-j|$

## Example

Input : A = {1, 3, -1}
Output : 5
because the maximum value obtainable is:
=> f(1,2) = |3 - (-1)| + |1-2| = 5

Input : A={3, -2, 5, -4}
Output : 10
=> f(1, 4) = f(4, 1) = |3 - (-4)| + |1 - 4| = 10
Note that in this case we can also obtain 10 from
=> f(3, 4) = f(4, 3) = |5 - (-4)| + |3 - 4| = 10

Read On…

# Sort an array of size N made of numbers from 0 to K

## Problem statement

In this article we will be discussing another (see the previous article) very popular programming interview question. This time we are asked to sort an array of size N whose elements are in the range [0,K).

## Problem statement

Given an array $A=[a_1,a_2,…a_n],\: 1 \leq a_i \leq K$
modify A s.t. it is sorted.

As usual you should ask your interviewer some clarifying questions. For example a sensible question to ask could be:

• What is the maximum value of $K$ and of $N$? Let's assume that $N \leq 10^7$ and $K\leq 10^3$

## Example

Input : A={1,8,9,66,2,1,45,12}
Output : A={1,1,2,8,9,12,45,66}


## Solution

First start noticing that the question seems rather simple at first. After all all you are asked to do is to sort an array. This is an easy peasy task that can be accomplished in a one-liner in C++.

## Simple sort solution

bool sort_range_K(vector<int>&amp; A)
{
sort(begin(A), end(A));
}


But at this point you should also notice that you are not using the information about the range of the elements, and this should make your eyebrows raise and make you think a bit harder. This means that there is a better way of solving this problem.
The complexity of the above code is of $O(n \times log_2(n))$

## Linear time Solution

You might not be aware of this (and if you are not I highly encourage you to get familiar with the topic) that it is possible to achieve better asymptotic complexity for sorting than $O(n \times log_2(n))$.
If you know the range of the elements you are dealing with, sorting can be done in linear time using Counting sort.

The idea behind is really simple. For each number in the range [0,K) you count how many times it appears in the original array in an array F of size K.
W.r.t. to the example above for instance:

A={1,8,9,66,2,1,45,12}
F ={0,2,1,0,0,0,0,0,1,1,....}

• F[0] = 0 because 0 appears 0 times in A
• F[1] = 2 because 1 appears 2 times in A
• $…$
• F[9] = 2 because 9 appears 1 times in A

With that information we can easily produce the output array by putting all the zeros first, then the ones, and so on.

## C++ code

The idea above can be coded as follows:

template<class T>
void sort_range_K(vector<T>&amp; A, const int K)
{
vector<int> F(K,0);
for(const auto&amp;x : A)
F[x]++;

int p = 0;
for(int i = 0 ; i < K ; i++)
for(int j = 0 ; j < F[i] ; j++)
A[p++] = i;

}


The complexity of the code above is linear the length of $\Theta(A)$.
The space complexity is $\Theta(K)$
You can try the above code online on Wandbox.

## Conclusions

In this article we discussed a common coding interview question asked by many tech companies. The takeaways from this exercise are:

• If an information is provided in the problem statement, it needs to be used
• If you know the range of the element you want to sort, you can sort them in linear time. The space complexity is linear in the biggest possible value an element of the array can take.

If you like this article check the previous article of the series (determine if a number is an Armstrong number).

# Coding interview question: Determine if number is an Armstrong number

## Problem statement

In this article we will be discussing another (see the previous article counting triples with a given sum) of the most common coding interview questions. The problem statement is really simple and goes as follows:

Given an positive integer N determine if it is an Armstrong number?
A number $x_1x_2…x_n$ (1 \leq x_i \leq 9 $is a digit ) of length$n$is an Armstrong number if the following is true:$x_nx_{n-1}…x_2x_1 = pow(x_1,n) + pow(x_2,n) + … pow(x_n,n)$In other words if raising all digits to power$n$and sum all them up you obtain the original number, then$N\$ is an Armstrong number.

Read On…

## Problem statement

In this article we will be discussing one of the most common coding interview question asked in many google interviews, especially during one of the first stages. The problem statement goes as follows:

Given an array N of distinct integers and a integer K, print on the standard output the number of triples (N[i], N[j], N[l]) whose sum is equal to K. In other words how many triples(i,j,k), i < j < k s.t. N[i] + N[j] + N[l] = K are in N?

Read On…

In this article I will present what are in my opinion the most useful and worth learning BASH shortcuts. Learning this bash cheatsheet can substantially make you more productive and work faster with the terminal.
This is not intended to be a full list of all possible BASH shortcuts, and if you need it you better check the bash documentation.

You can also download a printer-friendly BASH cheat sheet PDF that you can print and keep on your desk. You will find a link to it at the end of the article.

## Command Editing Shortcuts

ShortcutDescription
Ctrl + ago to the start of the command line
Ctrl + ego to the end of the command line
Ctrl + kdelete from cursor to the end of the command line
Ctrl + udelete from cursor to the start of the command line
Ctrl + wdelete from cursor to start of word (i.e. delete backwards one word)
Ctrl + ypaste word or text that was cut using one of the deletion shortcuts (such as the one above) after the cursor
Ctrl + xxmove between start of command line and current cursor position (and back again)
Alt + bmove backward one word (or go to start of word the cursor is currently on)
Alt + fmove forward one word (or go to end of word the cursor is currently on)
Alt + ddelete to end of word starting at cursor (whole word if cursor is at the beginning of word)
Alt + ccapitalize to end of word starting at cursor (whole word if cursor is at the beginning of word)
Alt + umake uppercase from cursor to end of word
Alt + lmake lowercase from cursor to end of word
Alt + tswap current word with previous
Ctrl + fmove forward one character
Ctrl + bmove backward one character
Ctrl + ddelete character under the cursor
Ctrl + hdelete character before the cursor
Ctrl + tswap character under cursor with the previous one

## Command Control Shortcuts

ShortcutDescription
Ctrl + lclear the screen
Ctrl + sstops the output to the screen (for long running verbose command)
Ctrl + qallow output to the screen (if previously stopped using command above)
Ctrl + cterminate the command
Ctrl + zsuspend/stop the command

## Command Recall Shortcuts

ShortcutDescription
Ctrl + rsearch the history backwards
Ctrl + gescape from history searching mode
Ctrl + pprevious command in history (i.e. walk back through the command history)
Ctrl + nnext command in history (i.e. walk forward through the command history)
Alt + .use the last word of the previous command

----

In this article we are going to discuss an interview question that is asked quite frequently in companies like Facebook, Microsoft, or Google.

The problem statement goes as follows: Given an vector<T> A of elements of type T of length Nand a vector<size_t> P where Pis a permutation of numbers from [0-N-1]you need to apply the permutation Pto the vector A. You are not allowed to use more than O(1) additional space.

For example suppose A=['a','b','c','d'] and P=['2','0','4','1']. After applying P to A, the latter becomes: ['c','a','d','b']. P[i] basically tell you which element of A should go in position i in the final arrangement of A.

Solving this problem can be very easy if you are allowed to use additional O(n) memory. All you need to do is to construct a new array, taking the elements as specified in P. The following code shows how this approach can be implemented in C++

template<typename T>
void apply_permutation(vector<T>&amp; A, const vector<int>&amp; P)
{
vector<T> A1;
A1.reserve(A.size());
for(size_t i = 0 ; i < A.size() ; i++)
A1.push_back(std::move(A[P[i]]));

A = move(A1);
}


Note how we use move which will avoid the copy if possible) of potentially heavy objects. Once the additional array A1 has been constructed, we modify A by moving A1 into it. That is pretty easy stuff.

In order to solve the hardest version we need to work a bit more.

The approach that will lead to the correct solution goes as follows: for each element that it is not yet in the correct location, then move it to its correct location. Sounds easy, and in-fact it it. Let's use the following concrete problem instance example as a guide:

• A = [a,b,c,d,e,f]
• P = [3,1,0,2,5,4]

For i = 0 we notice that P[i] != 0, meaning that in A[i] we should moveA[3]. In order to achieve so we swap(A[i], A[3]) which results in A=[d, b, c, a, e, f]. Now the first position of the answer is correct. Well done.

But we moved element originally at positioni=0to position 3which is clearly not its correct position ( because according to P, a should be placed in position 2). What do we do? We swap the correct element ( A[P[3]]) with A[3]=a . In other words we substitute the awith the element that should really go in that position i.e. c. After swap(A[3], A[2]), A=[d,b,a,c,e,f]. a is now not at location 2 and P[2] says that the element that should go here is a. good, we found the correct spot for the element originally in position i=0. In the process we also placed many other elements in the correct position.

Ideally now we should process another element that is not in its correct location. But how do we know which one is in the correct location? Since we cannot use any additional memory, we need to be clever and use P instead. What if we set P[i] = iwhenever we do a swap? If we do that then we do not really do nothing for that element because when P[i]= ithere is no work to do, as the element is already in its final location.

If we apply this line of reasoning to all elements of A then the final result will be what we really expected. Basically for an element i not yet in its correct location we keep swapping it until it reaches its final location. If we do it for all elements, then by definition we have solved the problem.

I am sure the idea above will be much more clear once we look at the code: curr and next are respectively the indices of the elements we are considering for a swap.

template<typename T>
void apply_permutation_in_place(vector<T>&amp; A, vector<int>&amp; P)
{
for(size_t i = 0 ; i < A.size() ; i++){
size_t curr = i;
size_t next = P[curr];
while(next != i)
{
swap(A[curr], A[next]);
P[curr] = curr;
curr = next;
next = P[next];
}
P[curr] = curr;
}
}


You can try the above code online on wandbox ( https://wandbox.org/permlink/NevnhZFq2G0cgL7O ).

Command-line argument management is tricky to get right especially when the number of options that we support and the number of their combination is big. For this kind of applications, there are already a number of very effective libraries and tools supporting the programming in such a scenario. One such library is Boost::program_options which I highly encourage you to use as is it awesome.

But there are other cases where we need to quickly write a prototype or when the number of options and possible configuration is small where using Boost could be overkill. In such cases what we usually do is writing ad-hoc command-line options management code which will never be reuse and it is tailored for the specific application at hand. This approach costs time and it is boring once you did it once.

For this reason, I wrote a super simple and minimalistic C++17 command-line argument manager which allows you to:

• Check if a set of options has been specified
• Get the argument of each element of a set of options (as string).

The class is extremely easy to use. You only need to construct an instance of the class by passing the argc and argv variables and then you are ready to go. No more need for ugly and error prone command line argument parsing. For instance, as you can see from the code snippet below, I check for the existence of the -h or --help options and of all the other required options. If one of -X, -O, -P are not present or its required corresponding argument are not specified a help message is printed. Otherwise, the arguments are retrieved and returned as a vector of strings ready to be used.

The example code is listed below or can be downloaded from here (gist on GitHub). You can also try it live on wandbox.

#include <string>
#include <algorithm>
#include <vector>
#include <tuple>
#include <iostream>
#include <optional>

void usage(const std::string&amp; progName) {
std::cout << progName << " -X FILE -P FILE -O DIRECTORY " << std::endl
<< "Options:"                        << std::endl
<< "-h  | --help    Print this help" << std::endl
<< "-X              Path to X file"  << std::endl
<< "-P              Path to P file"  << std::endl
<< "-O              Path to output directory." << std::endl;
}

class cmdline_args_parser{
public:
cmdline_args_parser (const int argc, char **argv){
for (int i=0; i < argc; ++i)
tokens.push_back(std::string(argv[i]));
}

const std::optional<std::string> getCmdOption(const std::string &amp;option) const{
auto  itr =  std::find(tokens.begin(), tokens.end(), option);
if (itr != tokens.end() &amp;&amp; ++itr != tokens.end())
return std::optional(*(itr));
return std::nullopt;
}

template<typename... Options>
const auto get_all_options(const Options... ops) const{
std::vector<std::optional<std::string>> v;
(v.push_back(getCmdOption(ops)), ...);
return v;
}

bool cmdOptionExists(const std::string &amp;option) const{
return
std::find(tokens.begin(), tokens.end(), option) != tokens.end();
}

template<typename... Options>
bool all_options_exists(const Options... opts) const{
return (... &amp;&amp; cmdOptionExists(opts));
}
template<typename... Options>
bool any_options_exists(const Options... opts) const{
return (... || cmdOptionExists(opts));
}

const std::string&amp; get_program_name() const{ return tokens[0]; }

private:
std::vector<std::string> tokens;
};

auto process_args(const cmdline_args_parser&amp; p){
auto opts = p.get_all_options("-X","-O","-P");
if(p.any_options_exists("-h", "--help") ||
!all_of(begin(opts), end(opts), [](const  auto&amp; x ){return x;}) )
{
usage(p.get_program_name());
return;
}
for(const auto opt : opts){
std::cout<<opt.value()<<std::endl;
}
}

int main(int argc, char** argv){
cmdline_args_parser p (argc, argv);
process_args(p);

return 0;
}


In this short lesson we will discuss how to parallelize a simple and rather inefficient (because this is not an in-place version) implementation of quick-sort using asynchronous tasks and futures.

We will perform some benchmarking and performance analysis and we will try to understand how we can further improve our implementation.

## Quick sort

In this section, I will briefly refresh your memory on quick-sort. I will do so by showing you a simple and self-explicative Haskell version first. We will also write a C++ (serial) version of the same code implementation C++ that we will use as a basis for our parallelization.

Here it goes the Haskell version:

quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
where
lesser = filter (< p) xs
greater = filter (>= p) xs


It is beautifully simple: in order to sort a list with at least, p one element is only necessary to partition sort the rest of the elements xs into two sublists:

• lesser: containing all the elements in xs smaller than p
• greater: containing all the elements in xs greater than p
Once both sublists are sorted we can finally return the whole sorted list using by simply returning gluing lesser, p and greater together in this order.

If you still have trouble understanding the quick-sort algorithm please refer to Wikipedia.

## Quick-sort serial version

The following is the serial C++ implementation of the same idea described above. It should be pretty easy to map the following implementation to the Haskell one. Run it on Wandbox

template <typename T>
void quick_sort_serial(vector<T>&amp; v) {
if (v.size() <= 1) return;
auto start_it = v.begin();
auto end_it = v.end();

const T pivot = *start_it;

//partition the list
vector<T> lesser;
copy_if(start_it, end_it, std::back_inserter(lesser),
[&amp;](const T&amp; el) { return el < pivot; });

vector<T> greater;
copy_if(start_it + 1, end_it, std::back_inserter(greater),
[&amp;](const T&amp; el) { return el >= pivot; });

//solve subproblems
quick_sort_serial(lesser);
quick_sort_serial(greater);

//merge
std::copy(lesser.begin(), lesser.end(), v.begin());
v[lesser.size()] = pivot;
std::copy(greater.begin(), greater.end(),
v.begin() + lesser.size() + 1);
}


## Parallelizing Quick-sort using std::future

In order to speed-up things we are going to use the fact that quick-sort is a divide and conquer algorithm. Each subproblem can be solved independently:
creating and sorting lesser and greater are two independent tasks. We can easily perform both on different threads.

The following is the first parallel version of the quick_sort_serial() above.
Run it on Wandbox

template <typename T>
void filter_less_than(const vector<T>&amp; v, vector<T>&amp; lesser, const int pivot) {
for (const auto el : v) {
if (el < pivot) lesser.push_back(el);
}
}

template <typename T>
void quick_sort_parallel1(vector<T>&amp; v) {
if (v.size() <= 1) return;
auto start_it = v.begin();
auto end_it = v.end();

const T pivot = *start_it;
vector<T> lesser;
auto fut1 = std::async(
std::launch::async,
[&amp;]() {
filter_less_than<T>(std::ref(v), std::ref(lesser), pivot);
quick_sort_parallel1<T>(std::ref(lesser));
});

vector<T> greater;
copy_if(start_it + 1, end_it, std::back_inserter(greater),
[&amp;](const T&amp; el) { return el >= pivot; });

quick_sort_parallel1(greater);

fut1.wait();

std::copy(lesser.begin(), lesser.end(), v.begin());
v[lesser.size()] = pivot;
std::copy(greater.begin(), greater.end(),
v.begin() + lesser.size() + 1);
}


As you can notice, the creation and sorting of lesser and are performed in parallel. Each thread running an instance of quick_sort_parallel1() will create another thread running quick-sort on one of the two sub-problems while the other subproblem is solved by the current thread.

This is exactly what we are doing when we spawn the following async task:
we are creating a task that will populate lesser with all the elements from v less than pivot and, once ready, it will sort it.
Please note that everything we need to have modified by reference need to be wrapped in a std::ref as we discussed in the previous lessons.

The following picture shows how the execution unfolds for the unsorted list: [2,7,1,6,9,5,8,3,4,10]:

The following code shows hot to spawn an async thread solving the lesser subproblem:

  vector<T> lesser;
auto fut1 = std::async([&amp;]() {
filter<T>(std::ref(v), std::ref(lesser), pivot);
quick_sort<T>(std::ref(lesser));
});


While this task is running on the newly created thread, we can solve greater on the current thread.

The asynchronous task will recursively spawn other async tasks until a list of size <=1 is created, which is of course already sorted. There is nothing to do in this case.

Once the main thread is done with sorting the greater list, it waits for the asynchronous task to be ready using the std:.future::wait() function.
Once wait returns, both lists are sorted, and we can proceed with merging the result and finally, here it is, we have a sorted list.

## Performance analysis

Let's quickly analyze our implementation. We will compare execution time for the single-thread and async-parallel versions above.

Let's start our analysis by taking looking at this graph depicting the execution time (average of 10 runs) for both versions above:

It might be a surprising result to see that the Async parallel version is way slower than the single threaded version, ~55x slower!
Why is that? The reason is that, the parallel version creates a new thread for every single subproblem, even for the ones that are quite small.
Threads are costly to manage by the OS, they use resources and need to be scheduled. For smaller tasks the overhead caused by the additional thread is larger than the gain in performance that we might get by processing the sublist in parallel. This is exactly what is happening.

In order to solve this issue, we want to modify the async code above so that a new thread is spawned only when the input list v is larger than a certain threshold. The code below implements the aforementioned idea:

template <typename T>
void quick_sort_async_lim(vector<T>&amp; v) {
if (v.size() <= 1) return;
auto start_it = v.begin();
auto end_it = v.end();

const T pivot = *start_it;
vector<T> lesser;

vector<T> greater;
copy_if(start_it + 1, end_it, std::back_inserter(greater),
[&amp;](const T&amp; el) { return el >= pivot; });

if (v.size() >= THRESHOLD) {
auto fut1 = std::async([&amp;]() {
filter<T>(std::ref(v), std::ref(lesser), pivot);
quick_sort_async_lim<T>(std::ref(lesser));
});

quick_sort_async_lim(greater);
fut1.wait();

} else {
//problem is too small.
//Do not create new threads
copy_if(start_it, end_it, std::back_inserter(lesser),
[&amp;](const T&amp; el) { return el < pivot; });
quick_sort_async_lim(lesser);
quick_sort_async_lim(greater);
}

std::copy(lesser.begin(), lesser.end(), v.begin());
v[lesser.size()] = pivot;
std::copy(greater.begin(), greater.end(),
v.begin() + lesser.size() + 1);
}


As you can notice, the only addition that this optimized version has is that a new thread is spawned only when the size of the input list is larger than THRESHOLD If the list is too small, then we fall back on the classic single-thread version.
The following pictures show the result for the optimized version above with value of THRESHOLD=4000. As you can notice the execution time drops sensibly w.r.t single thread version. We have achieved ~4x speedup with a minimal programming effort.

We have introduced a new parameter in our code, and we need to figure out what is the best value of THRESHOLD. In order to do so, let's analyze the performance of the code above for various values of the threshold.
The following graph depict the execution time for various values of THRESHOLD. Note that the y-axis is in log scale. The execution time drops quite abruptly from 0 to 300.

## Conclusion

We have used std::future to parallelize the quick-sort algorithm. The async code differs slightly from the serial single thread implementation, but runs 4x faster. On the other hand, we have learned that running too many threads it is definitely not a good idea because each thread comes with an overhead: the OS needs to allocate resources and time to manage them.

## Introduction

In this lesson we will talk about a way of returning values from threads, more precisely we will talk about std::future which is a mechanism that C++ offers in order to perform asynchronous tasks and query for the result in the future.
A future represents an asynchronous task, i.e. an operation that runs in parallel to the current thread and which the latter can wait (if it needs to) until the former is ready.
You can use a future all the time you need a thread to wait for a one-off event to happen. The thread can check the status of the asynchronous operation by periodically polling the future while still performing other tasks, or it can just wait for the future to become ready.

Read On…

## Introduction

In the previous lesson we have seen how data can be protected using mutex. We now know how to make threads do their work concurrently without messing around with shared resources and data. But sometimes we need to synchronize their actions in time, meaning that we might want a thread t1 to wait until a certain condition is true before allowing it to continue its execution.

This lesson discusses the tools that we can use to achieve such behavior efficiently using condition variables.

Read On…

In this lesson, we will cover the topic of data sharing and resources between threads. Imagine a scenario where an integer o needs to be modified by two threads t1 and t2. If we are not careful in handling this scenario a data race might occur. But what is a data race exactly?

## Data Race

A data race occurs when two or more threads access some shared data and at least one of them is modifying such data. Because the threads are scheduled by OS, and scheduling is not under our control, you do not know upfront which thread is going to access the data first. The final result might depend on the order in which threads are scheduled by the OS.

Race conditions occur typically when an operation, in order to be completed, requires multiple steps or sub-operations, or the modification of multiple data. Since this sub-operations end up being executed by the CPU in different instructions, other threads can potentially mess up with the state of the data while the other's thread operation is still ongoing.

Read On…

tutorial3