Skip to content

mihael-tunik/coordered_tuples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Coordered pairs in two arrays

Given arrays a, b of n elements each, count number of pairs $a_i < a_j$, $b_i < b_j$.

This is the general version of the following problem:

Coordered elements in array problem

  • Given array a of n elements, count number of pairs $a_i < a_j, i < j$.

Or (with different order operator) of the following:

Number of inversions problem

  • Given array a of n elements, count number of pairs $a_i > a_j, i < j$.

Both mentioned problems are twins to each other. For example, in 'distinct elements' case it's simply $\sum_{{a_i < a_j, i < j}} 1 = \frac{n(n-1)}{2} - \sum_{{a_i > a_j, i < j}} 1$

Any of them can be solved in quasilinear time with divide-and-conquer or with any range-sum-point-update container.

But what if we count array elements according to elements of another array instead of indexes? This bring us back to the starting problem. Long story short general version also can be solved in $O(n \cdot \log^2 n)$ and even in $O(n \cdot \log n)$.

See details here:

ull count_coordered(vector <int> a, vector <int> b); // O(n^2)
ull count_coordered_fenwick_2d(vector <int> a, vector <int> b); // O(n (log n)^2)
ull count_coordered_fenwick(vector <int> a, vector <int> b); // O(n log n)

Solution $O(n \cdot \log^2 n)$

Let's start from standard ordered pairs count algorithm:

for i in range(n):
   T.update(a[i], 1)
   cnt += T.prefix(a[i]-1)

This code, as I said before, counts pairs $a_i &lt; a_j, i &lt; j$. Here, T is 'counting' array initialized by zeros, where we store counts for every value in a. To perform both operations efficiently one should choose any $O(\log n)$ range query structure as T, let's say Fenwick tree. It works, because updates are made in order of indexes in a and so T.prefix(a[i]-1) counts only $a_k &lt; a_i$ with $k &lt; i$ by default. Every $a[k], k &lt; i$ processed earlier than $a[i]$.

Now let's modify our task in order to exploit this idea. Main observation that given arrays a, b we can take any permutation of both and that doesn't change the number of coordered pairs. Indeed any pair $a_i &lt; a_j, b_i &lt; b_j$ would be mapped to $a_{p_i} &lt; a_{p_j}, b_{p_i} &lt; b_{p_j}$. It means we can sort!

Now, one of arrays (lets say a) is completely ordered. Let's use the same idea, but instead of counting array we will modify entries in 'counting' matrix.

sort((a, b), coordinate=0) 
for i in range(n):
   T.update((a[i], b[i]), 1)
   cnt += T.prefix((a[i]-1, b[i]-1))

Every $a_k &lt; a_i$ appears before $a_i$. This works in $O(n \cdot \log^2 n)$ time for 2D Fenwick tree.

Note, since the memory consumption can be high for this algorithm, coordinate compression technique is recommended.

Solution $O(n \cdot \log n)$

There's another way how we can benefit from permutation properties. If b is array with distinct elements and a is arbitrary array, then the following algorithm would count coordered pairs:

b = sort((b, range(n)))
p = b[1, :] # extract indexes

for i in range(n):
   T.update(a[p[i]], 1)
   cnt += T.prefix(a[p[i]]-1)

Suppose, you solve the coordered count problem for array $c_k = a_{p_{k}}$.

This way you count $c_i &lt; c_j, i &lt; j$ that corresponds exactly to $a_{p_{i}} &lt; a_{p_{j}}, i &lt; j$. Then say $i = p^{-1}(u), j = p^{-1}(v)$, and pairs can be viewed as $a_{u} &lt; a_{v}, p^{-1}_u &lt; p^{-1}_v$.

Last thing we need to find $p: p^{-1} = b$. This can be done in $O(n \cdot \log n)$ by simultaneous sorting b and range from 1 to n.

However, when b contains equal elements (and therefore bijections doesn't work) this algorithm would count extra pairs.

To correct this we need to iterate groups of equal elements in b and decrease the answer by number of pairs ordered by a, but equal by b.

It is not necessary to use

sum += count_coordered_fenwick_p(a_group, range);

one can use any code which counts ordered pairs in single array, range variable is redundant.

Corrections are made in $\sum n_k \log(n_k) &lt; \sum n_k \log n = O(n \log n)$ time, where $n_{1..k}$ are sizes of groups with equal elements. This version of algorithm contains sorting, splitting sorted array in linear time and counting pairs with some corrections. Resulting complexity: $O(n \cdot \log n)$.

Build

This command builds stress-tests for three algorithms (naive and two with Fenwick tree):

g++ -O3 count_coordered_pairs.cpp -o count

About

Let's count coordered pairs in arrays

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published