Problem
Find the $k$th smallest element in an unsorted array $A$ of size $n$.
Sorting has $\mathcal{O}(n\log n)$ time complexity and is overkill for this specific problem (solves for all $k$).
QuickSelect
Algorithm
$\texttt{QuickSelect}(A, k)$
- Arbitrarily pick a pivot element $p$ from $A$
- Split $A$ into $L = \lbrace a | \thickspace a \in A \text{ and } a < p \rbrace$ and $G = \lbrace a | \thickspace a \in A \text{ and } a > p \rbrace$
- Recurse
- If $|L| = k$ then return $p$
- If $|L| > k - 1$ then $\texttt{QuickSelect}(L, k)$
- If $|L| < k - 1$ then $\texttt{QuickSelect}(G, k - (|L| + 1))$
The $A_k\in G$ case has to readjust the recursive $k$ value since we are essentially throwing away the first $|L| + 1$ elements ($L$ and $p$).
Example
Suppose we always pick the first element $A_0$ to be $p$ (this is arbitrary for arbitrary $A$).
$A$ can adversarially be monotonically ordered and the time complexity becomes $\mathcal{O}(n^2)$
Analysis
- Theorem
- The expected number of comparisons is upper bounded by $4n$
- Proof
- Let $C(A, n, k)$ be the cost (number of comparisons) to find the $k$th smallest element in an array $A$ of $n$ elements. Let $C(n) = \max_{A,k} C(A,n,k)$.
Convince yourself that $C(A,n,k)$ doesn’t depend on $A$
- It takes $n - 1$ comparisons to split $A$ into $L$ and $G$. The sizes of the pieces take the same distribution as $p$, which is uniformly random. By our definition of $C$, we will always consider the larger of the two pieces in cost calculation.
$$\mathbb{E}[C(n)] \leq (n-1) + \text{avg}[C(\frac{n}{2}),…,C(n-1)]$$
Inductively, assume $\mathbb{E}[C(i)] \leq 4i$ for $i < n$ ($0$ comparisons when $n=1$ so the base case holds).
- $$\begin{align*}\mathbb{E}[C(n)] &\leq (n-1) + \text{avg}[4(\frac{n}{2}),…,4(n-1)] \newline &\leq (n-1) + 4(\frac{3n}{4}) \newline &\leq 4n\end{align*}$$
Technically we assumed $n$ is even, but the proof is basically the same in the odd case
Thus, QuickSelect has $\mathcal{O}(n)$ time complexity.
DeterministicSelect
Algorithm
Pick a parameter constant $\alpha$. It matters what $\alpha$ is, but for now suppose $\alpha=5$
$\texttt{ApproxMedian}(A, \alpha=5)$
- If $|A| = 1$ then return $A_0$
- Partition $A$ into groups $G_1, G_2, … G_{\frac{n}{\alpha}}$ each of size $\alpha$
- Let $M = \lbrace \text{median}(G_i) \forall i \rbrace$
It doesn’t matter how median is calculated here since $\alpha$ is constant implies each calculation has $\mathcal{O}(1)$ time complexity ($\mathcal{O}(n)$ total)
- return $\texttt{DeterministicSelect}(M, \lceil\frac{|M|}{2}\rceil)$
In other words, find the true median of the group medians
$\texttt{DeterministicSelect}(A, k)$There are some integrality issues (that we will ignore) which complicate the analysis, though the time complexity should be unaffected
- If $|A| \leq \alpha$ then return $k$th smallest value in $A$ by brute force
- Let the pivot $p$ be $\texttt{ApproxMedian}(A)$
- Split $A$ into $L = \lbrace a | \thickspace a \in A \text{ and } a < p \rbrace$ and $G = \lbrace a | \thickspace a \in A \text{ and } a > p \rbrace$
- Recurse
- If $|L| = k$ then return $p$
- If $|L| > k - 1$ then $\texttt{DeterministicSelect}(L, k)$
- If $|L| < k - 1$ then $\texttt{DeterministicSelect}(G, k - (|L| + 1))$
Notice that the only difference between $\texttt{QuickSelect}$ and $\texttt{DeterministicSelect}$ is how the pivot $p$ is chosen
ApproxMedian Example
$\alpha=3$
Analysis
- ApproxMedian Bound Lemma
- Let $p = \texttt{ApproxMedian}(A)$. Let $n = \lceil\frac{n}{2\alpha}\rceil\lceil\frac{\alpha}{2}\rceil$. At least $n$ elements in $A$ are greater than $p$ and at least $n$ elements in $A$ are less than $p$.
- Proof
- WLOG look at the $\leq$ case. By definition, $p$ is the true median of the group medians, so $\lceil\frac{n}{2\alpha}\rceil$ groups (including the group $p$ is in) have medians $\leq p$.
For each of those groups, there are $\lceil\frac{\alpha}{2}\rceil$ elements (including that group’s median) $\leq$ that group’s median and therefore $p$.
Visual
$$\texttt{ApproxMedian}(A)=8$$ $$\begin{align}8 &\geq 5\geq 4 \tag*{medians} \newline 8&\geq 2 \tag*{group 3} \newline 5&\geq 1 \tag*{group 2} \newline 4&\geq 3 \tag*{group 1} \newline 8&\geq i \thickspace \forall i\in\lbrace 3, 4, 1, 5, 2, 8 \rbrace \tag*{$\therefore$} \end{align}$$
- Theorem
- If $\alpha$ is chosen intelligently, $\texttt{DeterministicSelect}$ makes $\mathcal{O}(n)$ comparisons
- Proof
- Let $C(A, n, k)$ be the worst-case cost to find the $k$th smallest element in an array $A$ of $n$ elements and let $C(n) = \max_{A,k} C(A,n,k)$.
- Finding the median of a single group has constant cost (since its size is bounded by $\alpha$). Since there are a linear number of groups, this step has linear time complexity.
- As in $\texttt{QuickSelect}$ it also take linear time to split $A$ into $L$ and $G$.
- The first recursive call to find the true median of medians has time complexity $C(\frac{n}{\alpha})$.
Since there are that many number of medians
- The second recursive call to search from a sub-piece has time complexity $C(n - \lceil\frac{n}{2\alpha}\rceil\lceil\frac{\alpha}{2}\rceil)$ by the ApproxMedian Bound Lemma.
Since in worst case we recurse on the larger sub-piece
$$C(n) \leq \gamma n + C(\frac{n}{\alpha}) + C(n - \lceil\frac{n}{2\alpha}\rceil\lceil\frac{\alpha}{2}\rceil)$$
$\gamma$ is a constant where its subexpression represents linear time complexity from the first two bullet points
- Using $\alpha=5$
- $$\begin{align*}C(n)&\leq \gamma n + C(\frac{n}{5}) + C(n - 3\lceil\frac{n}{10}\rceil)\newline&\leq \gamma n + C(\frac{n}{5}) + C(\frac{7n}{10})\newline&\leq \gamma n[1 + (\frac{1}{5} + \frac{7}{10}) + (\frac{1}{5}(\frac{1}{5} + \frac{7}{10}) + \frac{7}{10}(\frac{1}{5} + \frac{7}{10})) + … ] \newline &\leq \gamma n[1 + (\frac{9}{10}) + (\frac{1}{5}\cdot\frac{9}{10} + \frac{7}{10}\cdot\frac{9}{10}) + …] \newline &\leq \gamma n[1 + (\frac{9}{10}) + (\frac{9}{10})^2 + (\frac{9}{10})^3 + …] \newline &\leq 10\gamma n \newline &\in \mathcal{O}(n)\end{align*}$$
- Corollary
- $\texttt{DeterministicSelect}$ has linear time complexity if $$\begin{align*}\frac{1}{\alpha} + 1 - \frac{1}{2\alpha}\lceil\frac{\alpha}{2}\rceil &< 1\newline\frac{1}{\alpha} - \frac{1}{2\alpha}\lceil\frac{\alpha}{2}\rceil &< 0\end{align*}$$