Review of Princeton Algorithms I course on Coursera.

Algorithms II Review

My solution to the homework

Course Overview

topic data structures and algorithms
data types stack, queue, bag, union-find, priority queue
sorting quicksort, mergesort, heapsort
searching BST, red-black BST, hash table
graphs BFS, DFS, Prim, Kruskal, Dijkstra
strings radix sorts, tries, KMP, regexps, data compression
advanced B-tree, suffix array, maxflow

# Union-Find

Given a set of N objects, we want to implement the following two function.

• Union: connect two objects.
• Find: is there a path connecting the two objects?

initialize union connected
Quick-find $$N$$ $$N$$ $$1$$
Quick-union $$N$$ $$N$$ $$N$$
Weighted QU $$N$$ $$\lg N$$ $$\lg N$$

## Quick-Find

1. Initialization: integer array id[] of length N
2. Union(p, q): change all entries whose id equal id[p] to id[q]
3. Find: check if p and q have the same id.
• Union too expensive ($$N$$ array accesses).
• Trees are flat, but too expensive to keep them flat.

## Quick-Union

1. Initialization: integer array id[] of length N. id[i] is parent of i
2. Union(p, q): change the root id of p to the root id of q
3. Find: check if p and q have the same root.
• Find too expensive (could be $$N$$ array accesses).
• Trees can get tall.

## Weighted Quick-Union

1. Initialization: addition array sz[i] to count number of objects in the tree rooted at i
2. Union(p, q): change the root of smaller tree to root of larger tree and update sz[]
3. Find: same as quick-union
4. Path compression: we can add extra line of code to make the tree more flat when computing the root of a node
• Find takes time proportional to depth of $$p$$ and $$q$$.
• Union takes constant time, given roots.
• Depth of any node is at most $$\lg N$$.

## Applications

• Pixels in a digital photo.
• Computers in a network.
• Friends in a social network.
• Transistors in a computer chip.
• Elements in a mathematical set.
• Variable names in Fortran program.
• Metallic sites in a composite system.

## HW1 Percolation

Specification

• $$N$$-by-$$N$$ grid of sites.
• Each site is open with probability $$p$$.
• System percolates iff top and bottom are connected by open sites.

When $$N$$ is large, theory guarantees a sharp threshold $$p^*$$. What is the value of $$p^*$$?

Use Monte Carlo simulation:

• Initialize $$N$$-by-$$N$$ whole grid to be blocked.
• Declare random sites open until top connected bottom.
• Vacancy percentage estimates $$p^*$$.

How to check whether an $$N$$-by-$$N$$ system percolates:

• Create an object for each site and name them $$0$$ to $$N^2 - 1$$.
• Sites are in same component if connected by open sites.
• Percolates iff any site on bottom row is connected to site on top row (brute-force algorithm: $$N^2$$ calls to connected()).
• Clever trick: introduce $$2$$ virtual sites connected to top and bottom separately and check if virtual top site is connected to virtual bottom site. (only $$1$$ call to connected())

Virtual top/bottom backwash problem:

• When the system percolates (virtual top is connected to virtual bottom), every node that is connected to the bottom is also considered to be connected to the top (even though it is not). This is the backwash problem.
• A simple way to solve this problem is to use two WeightedQuickUnionUF object, one with the virtual bottom and the other not. Use the first to decide whether the system percolates and the second to see whether a node is full (connected to the top).
• A more memory-efficient way to solve this is to manually maintain two boolean arrays, contop[] and conbot[] to keep track whether the root of a node is connected to the top/bottom. When a node is connected to the top and the bottom at the same time, the system percolates.

Solution Percolation.java

# Bags, Queues, and Stacks

• Operations: insert, remove, iterate.
• Stack: LIFO (last in first out).
• Queue: FIFO (first in first out).
• Bag: Adding items to a collection and iterating (order doesn't matter)

## Stack

### Resizing-Array vs. Linked-List

Linked-list implementation:

• Every operation takes constant time in the worst case.
• Use extra time and space to deal with the links.

Resizing-array implementation:

• Every operation takes constant amortized time.
• Less wasted space.

## Generics

• Avoid casting in client.
• Discover type mismatch errors at compile-time instead of run-time.
• Client code can use generic stack for any type of data.

Take Stack as an example

## Iterators

• Support iteration over stack items by client, without revealing the internal representation of the stack.
• Implement the java.lang.Iterable interface supported by Java.

### Java Support

Java supports elegant client code if Iterable interface is implemented.

## Applications

• Parsing in a compiler.
• Java virtual machine.
• Undo in a word processor.
• Back button in a Web browser.
• PostScript language for printers.
• Implementing function calls in a compiler

### Two-Stack Algorithm

To evaluate infix expressions. $\displaylines{(\ 1\ +\ (\ (\ 2\ +\ 3\ )\ *\ (\ 4\ *\ 5\ )\ )\ ) }$

• Value: push onto the value stack.
• Operator: push onto the operator stack.
• Left parenthesis: ignore.
• Right parenthesis: pop operator and two values; push the result of applying that operator to those values onto the operand stack.
• Dijkstra's two-stack algorithm computes the same value if the operator occurs after the two values.

$\displaylines{(\ 1\ (\ (\ 2\ 3\ +\ )\ (\ 4\ 5\ *\ )\ *\ )\ +\ ) }$

• All of the parentheses are redundant.

$\displaylines{1\ 2\ 3\ +\ 4\ 5\ *\ *\ + }$

## HW2 Deques and Randomized Queues

Specification

• Dequeue: A double-ended queue or deque is a generalization of a stack and a queue that support adding and removing items from either the front or the back of the data structure. Deque.java
• Randomized queue: A randomized queue is similar to a stack or queue, except that the item removed is chosen uniformly at random from items in the data structure. RandomizedQueue.java
• Client: Write a client program that takes an integer $$k$$ as a command-line argument; reads in a sequence of strings from standard input using StdIn.readString(); and prints exactly $$k$$ of them, uniformly at random. Permutation.java

Use reservoir sampling to randomly choose $$k$$ samples from a list of $$n$$ elements in a single pass over the items:

• Keep the first $$k$$ items
• When the $$i$$-th item arrives
• with probability $$k/i$$, discard an old item at random and keep the new one
• with probability $$1 - k/i$$, discard the new item

# Elementary Sorts

inplace? stable? worst average best remarks
selection $$N^2/2$$ $$N^2/2$$ $$N^2/2$$ $$N$$ exchanges
insertion $$N^2/2$$ $$N^2/4$$ $$N$$ use for small $$N$$ or partially ordered
shell $$N$$ tight code, subquadratic
merge $$N\lg N$$ $$N\lg N$$ $$N\lg N$$ $$N\lg N$$ guarantee, stable
quick $$N^2/2$$ $$2N\ln N$$ $$N\lg N$$ $$N\lg N$$ probabilistic guarantee fastest in practice
3-way quick $$N^2/2$$ $$2N\ln N$$ $$N$$ improves quicksort in presence of duplicate keys
heap $$2N\lg N$$ $$2N\lg N$$ $$N\lg N$$ $$N\lg N$$ guarantee, in-place

## Comparable

Interface:

• Built-in comparable types: Integer, Double, String, Date, File, ...
• User-defined comparable types. Implement the Comparable interface supported by Java.

Two useful sorting helper functions:

• Less: is item $$v$$ less than $$w$$?
• Exchange: swap item in array at index $$i$$ with the one at index $$j$$.

## Selection Sort

• Scan from left to right $$[i, N]$$
• Find the minimum entry and exchange with $$i$$'s entry
• Continue next scan from $$[i+1, N]$$

## Insertion Sort

• Scan $$i$$ from left to right $$[0, N]$$
• Scan $$j$$ from right to left $$[i, 0]$$, exchange a[j] with each larger entry to its left
• Start next scan $$[1, N]$$

## Shellsort

• Insertion sort with stride length $$h$$.
• A $$g$$-sorted array remains $$g$$-sorted after $$h$$-sorting it.
• Use a sequence of increment steps to shellsort the array.

Which increment sequence to use?

• Power of two: $$1, 2, 4, 8, 16, 32, \dots$$ No
• Power of two minus one: $$1, 3, 7, 15, 31, 63, \dots$$ Maybe
• $$3x + 1$$: $$1, 4, 13, 40, 121, 364, \dots$$ OK. Easy to compute
• Sedgewick: $$1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, \dots$$ Good. Tough to beat in empirical studies

## Shuffle

Rearrange array so that result is a uniform random permutation.

Shuffle sort

• Generate a random real number for each array entry.
• Sort the array.

Knuth shuffle

• In iteration $$i$$, pick integer $$r$$ between $$0$$ and $$i$$ uniformly at random.
• Swap $$a[i]$$ and $$a[r]$$.

## Convex Hull

The convex hull of a set of $$N$$ points is the smallest perimeter fence enclosing the points.

Convex hull applications

• Robot motion planning: Find shortest path in the plane from $$s$$ to $$t$$ to avoids a polygonal obstacle. Shortest path is either a straight line from $$s$$ to $$t$$ or it is one of two polygonal chains of convex hull.

• Farthest pair problem: Given $$N$$ points in the plane, find a pair of points with the largest Euclidean distance between them. Farthest pair of points are extreme points on convex hull.

Graham scan:

• Choose point $$p$$ with smallest $$y$$-coordinate.
• Sort points by polar angle with $$p$$.
• Consider points in order; discard unless it create a counterclockwise turn.

# Mergesort

• Divide array into two halves.
• Recursively sort each half.
• Merge two halves.

Java assert statement can be enable or disable at runtime

## Practical Improvements

• Use insertion sort for small subarrays since mergesort has too much overhead for tiny subarrays.
• Stop if the biggest item in first half $$\le$$ smallest item in second half.
• Eliminate the copy to the auxiliary array by switching the role of the input and auxiliary array in each recursive call.

## Bottom-Up Mergesort

• Pass through array, merging subarrays of size $$1$$.
• Repeat for subarrays of size $$2, 4, 8, 16, \dots$$

## Comparator

Interface:

• Sorting using natural order by implementing the Comparable interface.
• Sorting using an alternate order by implementing the Comparator interface.
• Implement the compare() method.

To use with Java system sort:

• Create Comparator object.
• Pass as second argument to Arrays.sort()

To support comparators in our sort implementations:

• Use Object instead of Comparable.
• Pass Comparator to sort() and less() and use it in less().

## Stability

A stable sort preserves the relative order of items with equal keys. The following is a counter example.

• Stable: Insertion sort, Mergesort
• Unstable: Selection sort, Shellsort

## HW3 Collinear Points

Specification

Given a set of $$n$$ distinct points in the plane, find every (maximal) line segment that connects a subset of $$4$$ or more of the points.

Point data type: Create an immutable data type Point that represents a point in the plane. Point.java

Brute force: Examines $$4$$ points at a time and checks whether they all lie on the same line segment (check the three slope between them are equal), returning all such line segments. BruteCollinearPoints.java

Fast, sorting-based solution: For each point $$p$$, calculate the slope it makes with every other point and sort. The 3 other collinear points must have equal slopes w.r.t. $$p$$. FastCollinearPoints.java

# Quicksort

• Shuffle the array
• Partition so that, for some $$j$$
• entry $$a[j]$$ is in place
• no larger entry to the left of $$j$$
• no smaller entry to the right of $$j$$
• Sort each piece recursively.

Quicksort is an in-place sorting algorithm but is not stable.

## Practical Improvements

• Use insertion sort for small subarrays.
• Choose the median item as the pivot

## Quick Select

Use partition to find a $$k^{th}$$ smallest item in a given array.

Quick select takes linear time on average with the random shuffle provides a probabilistic guarantee.

## 3-Way Partitioning

When all keys equal, the quicksort needs ~$$1/2 N^2$$ compares. We need to modify quicksort to deal with duplicate elements.

Partition array into $$3$$ parts so that:

• Entries between $$lt$$ and $$gt$$ equal to partition item $$v$$.
• No larger entries to left of $$lt$$.
• No smaller entries to right of $$gt$$.

# Priority Queues

Remove the largest (or smallest) item in deletion.

implementation insert del max max
unordered array $$1$$ $$N$$ $$N$$
ordered array $$N$$ $$1$$ $$1$$
binary heap $$\lg N$$ $$\lg N$$ $$1$$

## Binary Heap

• Parent's key no smaller than children's keys.
• If child's key becomes larger than its parent's key, then repeatedly exchange child's key with parent's key.
• If parent's key becomes smaller than one (or both) of its children's, then repeatedly exchange parent's key with larger child's key.
• When insert a key, add node at end, then swim it up.
• When delete the maximum, exchange root with node at end, then sink it down.
• Index starts from $$1$$.

## Heapsort

• Create max-heap with all $$N$$ keys and repeatedly remove the maximum key

## HW4 8 Puzzle

Specification

A $$3$$-by-$$3$$ grid with $$8$$ square blocks labeled $$1$$ through $$8$$ and a blank square. The goal is to rearrange the blocks so that they are in order, using as few moves as possible. You are permitted to slide blocks horizontally or vertically into the blank square. Here is an example: $\displaylines{\begin{matrix} & 1 & 3 \\ 4 & 2 & 5 \\ 7 & 8 & 6 \end{matrix} \quad \Rightarrow \quad \begin{matrix} 1 & & 3 \\ 4 & 2 & 5 \\ 7 & 8 & 6 \end{matrix} \quad \Rightarrow \quad\begin{matrix} 1 & 2 & 3 \\ 4 & & 5 \\ 7 & 8 & 6 \end{matrix} \quad \Rightarrow \quad\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & \\ 7 & 8 & 6 \end{matrix} \quad \Rightarrow \quad\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & \end{matrix} }$ We use the A* search algorithm to solve this problem. We require:

• A search node of the game to be a board.
• The number of moves made to reach the board.
• The predecessor search node.

The search process:

• Insert the initial search node (the initial board, $$0$$ moves, and a null predecessor search node) into a priority queue.
• Delete min priority search node from the priority queue, and insert all neighboring search nodes back.
• Repeat until the search node dequeued corresponds to a goal board.

Choices of priority function:

• Hamming priority function: the number of blocks in the wrong position, plus the number of moves made so far to get to the search node.
• Manhattan priority function: the sum of the Manhattan distances (sum of the vertical and horizontal distance) from the blocks to their goal position, plus the number of moves so far to get to the search node.

Optimization:

• To avoid search node corresponding to the same board enqueued on the priority queue many times, don't enqueue a neighbor if its board is the same as the board of the predecessor search node.
• Pre-compute the Manhattan priority when constructing the search node and save it in an instance variable.

Board: Creat an immutable data type Board that models an $$n$$-by-$$n$$ board with sliding tiles. Board.java

Solver: Solver.java

# Symbol Tables

Key-value pair abstraction:

• Insert a value with specified key.
• Search for the corresponding value with a given key.

# Binary Search Trees

• A BST is a binary tree in symmetric order.
• A binary tree is either empty or two disjoint binary trees (left and right).
• Each node has a key and every node's key is
• larger than all keys in its left subtree.
• smaller than all keys in its right subtree.

## Basic Operations

• Search: If less, go left; if greater, go right; if equal, search hit.
• Insert: If less, go left; if greater, go right; if null, insert.

### Node

A Node is comprised of four fields:

• A key and value
• A reference to the left and right subtree

### Get

Return value corresponding to given key, or null if no such key.

### Put

Search for key, then

• If key in tree, reset value.
• If not in tree, add new node.

## Ordered Operations

### Max, Min, Floor, Ceiling

• Minimum: smallest key in table. Keep search x.left() until it is null, then return x.
• Maximum: Largest key in table. Keep search x.right() until it is null, then return x.
• Floor: Largest key $$\le$$ a given key.
• Ceiling: Smallest key $$\ge$$ a given key.

### Subtree Counts

In each node, we store the number of nodes in the subtree rooted at that node.

### Rank

The number of keys $$<$$ k.

### Inorder Traversal

• Traverse left subtree.
• Enqueue key.
• Traverse right subtree.

## Deletion

### Deleting The Minimum

• Go left until finding a node with a null left link.
• Replace that node by its right link.
• Update subtree count.

### Hibbard Deletion

Search for node $$t$$ containing key $$k$$.

• If no children: Delete $$t$$ by setting parent link to null
• If one child: Delete $$t$$ by replacing parent link
• If two children:
• Find the minimum in $$t$$'s right subtree $$x$$.
• Delete $$x$$ in $$t$$'s right subtree.
• Put $$x$$ in $$t$$'s spot.

Update subtree counts.

# Balanced Search Trees

## 2-3 Tree

Allow 1 or 2 keys per node

• 2-node: one key, two children.
• 3-node: two key, three children.

Symmetric order. Inorder traversal yields keys in ascending order.

Perfect balance. Every path from root to null link has same length.

Search is similar to binary search trees. Recursively compare search key against keys in node. The only difference is for 3-node, when a search key needs to be compared with these two keys.

### Insertion

Insertion into a 2-node at bottom will change it to a 3-node.

Insertion into a 3-node at bottom:

• Add new key to 3-node to create temporary 4-node.
• Move middle key in 4-node into parent and split the other two keys.
• Repeat up the tree, as necessary.
• If you reach the root and it's a 4-node, split it into three 2-nodes.

This insertion maintains symmetric order and perfect balance.

### Implementation

Direct implementation is complicated, because:

• Maintaining multiple node types is cumbersome.
• Need multiple compares to move down tree.
• Need to move back up the tree to split 4-nodes.
• Large number of cases for splitting.

Instead, we choose a better way to do it, the red-black BST.

## Red-Black BST

Represent 2-3 tree as a BST by splitting 3-node to two 2-node with a red link connecting them.

• No node has two red links connected to it.
• Every path from root to null link has the same number of black links.
• Red links lean left.

### Representation

Each node is pointed to by precisely one link from its parent and can encode color of the link.

### Elementary Operations

Left rotation. Orient a right-leaning red link to lean left.

Right rotation. Orient a left-leaning red link to (temporarily) lean right (To solve two red links in a row).

Color flip. Recolor to split a (temporary) 4-node.

### Search

Search is the same as for elementary BST (ignore color) but runs faster because of better balance.

### Insertion

• Right child red, left child black: rotate left.
• Left child, left-left grandchild red: rotate right.
• Both children red: flip colors.

## B-Tree

B-tree and its variants are widely used for file systems and databases to reduce the time for probing.

Generalize 2-3 trees by allowing up to $$M-1$$ key-link pairs per node. Choose $$M$$ as large as possible so that $$M$$ links fit in a page.

• At least 2 key-link pairs at root.
• At least $$M/2$$ key-link pairs in other nodes.
• External nodes contain client keys.
• Internal nodes contain copies of keys to guide search (each key is a copy of min key in subtree).

A search or an insertion in a B-tree of order $$M$$ with $$N$$ keys requires between $$\log_{M-1} N$$ and $$\log_{M/2} N$$ probes. In practice, the number of probes is at most 4. e.g. $$M=1024; N=62 \text{ billion}$$, $$\log_{M/2} N \le 4$$.

# Geometric Applications of BSTs

Extension of ordered symbol table. Besides insertion, search and deletion, we require Range count and Range search.

Range count: number of keys between $$k_1$$ and $$k_2$$.

Range search: find all keys between $$k_1$$ and $$k_2$$.

• Recursively find all keys in left subtree (if any could fall in range).
• Check key in current node.
• Recursively find all keys in right subtree (if any could fall in range).

## Line Segment Intersection

Given $$N$$ horizontal and vertical line segments, find all intersections.

Sweep-line algorithm: reduces 2d line intersection to 1d range search.

• x-coordinates define events.
• h-segment (left endpoint): insert y-coordinate into BST.
• h-segment (right endpoint): remove y-coordinate from BST.
• v-segment: range search for interval of y-endpoints.

## Kd Trees

Start from 2d trees. It is the extension of ordered symbol-table to 2d keys.

### Space-Partitioning Trees

• Grid: Divide space uniformly into squares.
• 2d tree: Recursively divide space into two halfplanes.
• Quadtree: Recursively divide space into four quadrants.
• BSP tree: Recursively divide space into two regions.

### Grid Implementation

• Divide space into $$M\times M$$ grid of squares.
• Create list of points contained in each square.
• Use 2d array to directly index relevant square.
• Insert: add $$(x,y)$$ to list for corresponding square.
• Range search: examine only squares that intersect 2d range query.

Grid size is the key to performance. If too small, it will waste space; if too large, there will be too many points per square. A reasonable choice is $$\sqrt{N}$$-by-$$\sqrt{N}$$. Grid implementation is fast and simple solution for evenly-distributed points but not for clustering.

### 2d Tree

BST but alternate using x and y coordinates as key.

Range search: Find all points in a query axis-aligned rectangle.

• Check if point in node lies in given rectangle.
• Recursively search trees (if any could fall in rectangle).
• If query rectangle does not intersect the rectangle corresponding to a subtree, prune this subtree.

Nearest neighbor search: Find closest point to query point.

• Recursively search trees.
• If closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a subtree, prune this subtree.

### Applications

• Ray tracing
• 2d range search
• Flight simulators
• N-body simulation
• Collision detection
• Astronomical databases
• Nearest neighbor search
• Adaptive mesh generation
• Accelerate rendering in Doom
• Hidden surface removal and shadow casting

## Interval Search Trees

1d interval search:

• Insert an interval $$(lo, hi)$$.
• Search for an interval.
• Delete an interval.
• Given an interval, find all intervals in data structure that intersects it.

Non-degeneracy assumption: No two intervals have the same left endpoint.

Create BST, where each node stores an interval. Use left endpoint as BST key and store max endpoint in subtree rooted at node.

To insert an interval $$(lo, hi)$$, first insert into BST using $$lo$$ as the key, then update max in each node on search path.

To find intersections for query interval $$(lo, hi)$$:

• If interval in node intersects query interval, return it.
• Else if left subtree is null, go right.
• Else if max endpoint in left subtree is less than $$lo$$, go right.
• Else go left

## Rectangle Intersection

Find all intersections among a set of $$N$$ orthogonal rectangles.

Non-degeneracy assumption: All x and y coordinates are distinct.

Sweep-line algorithm: reduces 2d orthogonal rectangle intersection search to 1d interval search.

• x-coordinates of left and right end points define events.
• Maintain set of rectangles that intersect the sweep line in an interval search tree (using y-intervals of rectangle).
• Left endpoint: interval search for y-interval of rectangle; insert y-interval.
• Right endpoint: remove y-interval

## HW5 Kd-Trees

specification

Write a data type to represent a set of points in the unit square using a 2d-tree to support efficient range search and nearest-neighbor search.

Two immutable data types are provided, the Point2D represents points in the plane and RectHV represents axis-aligned rectangles.

Brute-force implementation: use red-black BST. PointSET.java

2d-tree implementation: KdTree.java

# Hash Table

Save items in a key-indexed table (index is a function of the key)

• Computing the hash function
• Equality test: Method for checking whether two keys are equal
• Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index

## Hash Function

Idealist goal:

• Efficiently computable
• Each table index equally likely for each key

Java library implementations for primitive types: Integer, Boolean, Double, String

• Hash code: an int between $$-2^{31}$$ and $$2^{31} - 1$$
• Hash function output: an int between $$0$$ and $$M - 1$$

## Separate Chaining

Use an array of $$M < N$$ linked lists to deal with collisions:

• Hash: map key to integer $$i$$ between $$0$$ and $$M-1$$
• Insert: put at front of $$i$$-th chain (if not already there)
• Search: need to search only $$i$$-th chain

Typical choice: $$M \sim N / 5$$

## Linear Probing

When a new key collides, find next empty slog, and put it there ($$M > N$$)

• Hash: map key to integer $$i$$ between $$0$$ and $$M-1$$
• Insert: put at table index $$i$$ if free, if not try $$i+1$$, $$i+2$$, etc
• Search: search table index $$i$$, if occupied but no match, try $$i+1$$, $$i+2$$, etc

Typical choice: $$M \sim 2N$$, mean displacement will be $$\sim 3/2$$

Buy me a coffee
WeChat Pay
Alipay
Venmo