Review of Princeton Algorithms I course on Coursera.
My solution to the homework
Course Overview
topic | data structures and algorithms |
---|---|
data types | stack, queue, bag, union-find, priority queue |
sorting | quicksort, mergesort, heapsort |
searching | BST, red-black BST, hash table |
graphs | BFS, DFS, Prim, Kruskal, Dijkstra |
strings | radix sorts, tries, KMP, regexps, data compression |
advanced | B-tree, suffix array, maxflow |
Union-Find
Given a set of N objects, we want to implement the following two function.
Union
: connect two objects.Find
: is there a path connecting the two objects?
initialize | union | connected | |
---|---|---|---|
Quick-find | $N$ | $N$ | $1$ |
Quick-union | $N$ | $N$ | $N$ |
Weighted QU | $N$ | $\lg N$ | $\lg N$ |
Quick-Find
Initialization
: integer arrayid[]
of length NUnion(p, q)
: change all entries whose id equalid[p]
toid[q]
Find
: check ifp
andq
have the same id.
|
|
- Union too expensive ($N$ array accesses).
- Trees are flat, but too expensive to keep them flat.
Quick-Union
Initialization
: integer arrayid[]
of length N.id[i]
is parent ofi
Union(p, q)
: change the root id ofp
to the root id ofq
Find
: check ifp
andq
have the same root.
|
|
- Find too expensive (could be $N$ array accesses).
- Trees can get tall.
Weighted Quick-Union
Initialization
: addition arraysz[i]
to count number of objects in the tree rooted ati
Union(p, q)
: change the root of smaller tree to root of larger tree and updatesz[]
Find
: same as quick-union- Path compression: we can add extra line of code to make the tree more flat when computing the root of a node
|
|
- Find takes time proportional to depth of $p$ and $q$.
- Union takes constant time, given roots.
- Depth of any node is at most $\lg N$.
Applications
- Pixels in a digital photo.
- Computers in a network.
- Friends in a social network.
- Transistors in a computer chip.
- Elements in a mathematical set.
- Variable names in Fortran program.
- Metallic sites in a composite system.
HW1 Percolation
- $N$-by-$N$ grid of sites.
- Each site is open with probability $p$.
- System percolates iff top and bottom are connected by open sites.
When $N$ is large, theory guarantees a sharp threshold $p^*$. What is the value of $p^*$?
Use Monte Carlo simulation:
- Initialize $N$-by-$N$ whole grid to be blocked.
- Declare random sites open until top connected bottom.
- Vacancy percentage estimates $p^*$.
How to check whether an $N$-by-$N$ system percolates:
- Create an object for each site and name them $0$ to $N^2 - 1$.
- Sites are in same component if connected by open sites.
- Percolates iff any site on bottom row is connected to site on top row (brute-force algorithm: $N^2$ calls to
connected()
). - Clever trick: introduce $2$ virtual sites connected to top and bottom separately and check if virtual top site is connected to virtual bottom site. (only $1$ call to
connected()
)
Virtual top/bottom backwash problem:
- When the system percolates (virtual top is connected to virtual bottom), every node that is connected to the bottom is also considered to be connected to the top (even though it is not). This is the backwash problem.
- A simple way to solve this problem is to use two
WeightedQuickUnionUF
object, one with the virtual bottom and the other not. Use the first to decide whether the system percolates and the second to see whether a node is full (connected to the top). - A more memory-efficient way to solve this is to manually maintain two boolean arrays,
contop[]
andconbot[]
to keep track whether the root of a node is connected to the top/bottom. When a node is connected to the top and the bottom at the same time, the system percolates.
Solution Percolation.java
Bags, Queues, and Stacks
- Operations: insert, remove, iterate.
- Stack: LIFO (last in first out).
- Queue: FIFO (first in first out).
- Bag: Adding items to a collection and iterating (order doesn’t matter)
Stack
Linked-List Implementation
|
|
Resizing-Array Implementation
|
|
Resizing-Array vs. Linked-List
Linked-list implementation:
- Every operation takes constant time in the worst case.
- Use extra time and space to deal with the links.
Resizing-array implementation:
- Every operation takes constant amortized time.
- Less wasted space.
Queues
Linked-List Implementation
|
|
Resizing-Array Implementation
|
|
Generics
- Avoid casting in client.
- Discover type mismatch errors at compile-time instead of run-time.
- Client code can use generic stack for any type of data.
Take Stack as an example
|
|
Iterators
- Support iteration over stack items by client, without revealing the internal representation of the stack.
- Implement the
java.lang.Iterable
interface supported by Java.
Linked-List Implementation
|
|
Array Implementation
|
|
Java Support
Java supports elegant client code if Iterable
interface is implemented.
|
|
Applications
- Parsing in a compiler.
- Java virtual machine.
- Undo in a word processor.
- Back button in a Web browser.
- PostScript language for printers.
- Implementing function calls in a compiler
Two-Stack Algorithm
To evaluate infix expressions.
$$ (\ 1\ +\ (\ (\ 2\ +\ 3\ )\ *\ (\ 4\ *\ 5\ )\ )\ ) $$- Value: push onto the value stack.
- Operator: push onto the operator stack.
- Left parenthesis: ignore.
- Right parenthesis: pop operator and two values; push the result of applying that operator to those values onto the operand stack.
|
|
- Dijkstra’s two-stack algorithm computes the same value if the operator occurs after the two values.
- All of the parentheses are redundant.
HW2 Deques and Randomized Queues
- Dequeue: A double-ended queue or deque is a generalization of a stack and a queue that support adding and removing items from either the front or the back of the data structure. Deque.java
- Randomized queue: A randomized queue is similar to a stack or queue, except that the item removed is chosen uniformly at random from items in the data structure. RandomizedQueue.java
- Client: Write a client program that takes an integer $k$ as a command-line argument; reads in a sequence of strings from standard input using
StdIn.readString()
; and prints exactly $k$ of them, uniformly at random. Permutation.java
Use reservoir sampling to randomly choose $k$ samples from a list of $n$ elements in a single pass over the items:
- Keep the first $k$ items
- When the $i$-th item arrives
- with probability $k/i$, discard an old item at random and keep the new one
- with probability $1 - k/i$, discard the new item
Elementary Sorts
inplace? | stable? | worst | average | best | remarks | |
---|---|---|---|---|---|---|
selection | ✔ | $N^2/2$ | $N^2/2$ | $N^2/2$ | $N$ exchanges | |
insertion | ✔ | ✔ | $N^2/2$ | $N^2/4$ | $N$ | use for small $N$ or partially ordered |
shell | ✔ | $N$ | tight code, subquadratic | |||
merge | ✔ | $N\lg N$ | $N\lg N$ | $N\lg N$ | $N\lg N$ guarantee, stable | |
quick | ✔ | $N^2/2$ | $2N\ln N$ | $N\lg N$ | $N\lg N$ probabilistic guarantee fastest in practice | |
3-way quick | ✔ | $N^2/2$ | $2N\ln N$ | $N$ | improves quicksort in presence of duplicate keys | |
heap | ✔ | $2N\lg N$ | $2N\lg N$ | $N\lg N$ | $N\lg N$ guarantee, in-place |
Comparable
Interface:
- Built-in comparable types: Integer, Double, String, Date, File, …
- User-defined comparable types. Implement the
Comparable
interface supported by Java.
|
|
Two useful sorting helper functions:
- Less: is item $v$ less than $w$?
- Exchange: swap item in array at index $i$ with the one at index $j$.
|
|
Selection Sort
- Scan from left to right $[i, N]$
- Find the minimum entry and exchange with $i$’s entry
- Continue next scan from $[i+1, N]$
|
|
Insertion Sort
- Scan $i$ from left to right $[0, N]$
- Scan $j$ from right to left $[i, 0]$, exchange
a[j]
with each larger entry to its left - Start next scan $[1, N]$
|
|
Shellsort
- Insertion sort with stride length $h$.
- A $g$-sorted array remains $g$-sorted after $h$-sorting it.
- Use a sequence of increment steps to shellsort the array.
Which increment sequence to use?
- Power of two: $1, 2, 4, 8, 16, 32, \dots$ No
- Power of two minus one: $1, 3, 7, 15, 31, 63, \dots$ Maybe
- $3x + 1$: $1, 4, 13, 40, 121, 364, \dots$ OK. Easy to compute
- Sedgewick: $1, 5, 19, 41, 109, 209, 505, 929, 2161, 3905, \dots$ Good. Tough to beat in empirical studies
|
|
Shuffle
Rearrange array so that result is a uniform random permutation.
Shuffle sort
- Generate a random real number for each array entry.
- Sort the array.
Knuth shuffle
- In iteration $i$, pick integer $r$ between $0$ and $i$ uniformly at random.
- Swap $a[i]$ and $a[r]$.
|
|
Convex Hull
The convex hull of a set of $N$ points is the smallest perimeter fence enclosing the points.
Convex hull applications
- Robot motion planning: Find shortest path in the plane from $s$ to $t$ to avoids a polygonal obstacle. Shortest path is either a straight line from $s$ to $t$ or it is one of two polygonal chains of convex hull.
- Farthest pair problem: Given $N$ points in the plane, find a pair of points with the largest Euclidean distance between them. Farthest pair of points are extreme points on convex hull.
Graham scan:
- Choose point $p$ with smallest $y$-coordinate.
- Sort points by polar angle with $p$.
- Consider points in order; discard unless it create a counterclockwise turn.
Mergesort
- Divide array into two halves.
- Recursively sort each half.
- Merge two halves.
|
|
Java assert statement can be enable or disable at runtime
|
|
Practical Improvements
- Use insertion sort for small subarrays since mergesort has too much overhead for tiny subarrays.
- Stop if the biggest item in first half $\le$ smallest item in second half.
- Eliminate the copy to the auxiliary array by switching the role of the input and auxiliary array in each recursive call.
|
|
Bottom-Up Mergesort
- Pass through array, merging subarrays of size $1$.
- Repeat for subarrays of size $2, 4, 8, 16, \dots$
|
|
Comparator
Interface:
- Sorting using natural order by implementing the
Comparable
interface. - Sorting using an alternate order by implementing the
Comparator
interface. - Implement the
compare()
method.
|
|
To use with Java system sort:
- Create Comparator object.
- Pass as second argument to
Arrays.sort()
|
|
To support comparators in our sort implementations:
- Use
Object
instead ofComparable
. - Pass
Comparator
tosort()
andless()
and use it inless()
.
|
|
Stability
A stable sort preserves the relative order of items with equal keys. The following is a counter example.
- Stable: Insertion sort, Mergesort
- Unstable: Selection sort, Shellsort
HW3 Collinear Points
Given a set of $n$ distinct points in the plane, find every (maximal) line segment that connects a subset of $4$ or more of the points.
Point data type: Create an immutable data type Point
that represents a point in the plane. Point.java
Brute force: Examines $4$ points at a time and checks whether they all lie on the same line segment (check the three slope between them are equal), returning all such line segments. BruteCollinearPoints.java
Fast, sorting-based solution: For each point $p$, calculate the slope it makes with every other point and sort. The 3 other collinear points must have equal slopes w.r.t. $p$. FastCollinearPoints.java
Quicksort
- Shuffle the array
- Partition so that, for some $j$
- entry $a[j]$ is in place
- no larger entry to the left of $j$
- no smaller entry to the right of $j$
- Sort each piece recursively.
|
|
Quicksort is an in-place sorting algorithm but is not stable.
Practical Improvements
- Use insertion sort for small subarrays.
- Choose the median item as the pivot
|
|
Quick Select
Use partition
to find a $k$-th smallest item in a given array.
|
|
Quick select takes linear time on average with the random shuffle provides a probabilistic guarantee.
3-Way Partitioning
When all keys equal, the quicksort needs $\sim 1/2 N^2$ compares. We need to modify quicksort to deal with duplicate elements.
Partition array into $3$ parts so that:
- Entries between $lt$ and $gt$ equal to partition item $v$.
- No larger entries to left of $lt$.
- No smaller entries to right of $gt$.
|
|
Priority Queues
Remove the largest (or smallest) item in deletion.
implementation | insert | del max | max |
---|---|---|---|
unordered array | $1$ | $N$ | $N$ |
ordered array | $N$ | $1$ | $1$ |
binary heap | $\lg N$ | $\lg N$ | $1$ |
Array Implementation
Unordered Array Implementation
|
|
Ordered Array Implementation
|
|
Binary Heap
- Parent’s key no smaller than children’s keys.
- If child’s key becomes larger than its parent’s key, then repeatedly exchange child’s key with parent’s key.
- If parent’s key becomes smaller than one (or both) of its children’s, then repeatedly exchange parent’s key with larger child’s key.
- When insert a key, add node at end, then swim it up.
- When delete the maximum, exchange root with node at end, then sink it down.
- Index starts from $1$.
|
|
Heapsort
- Create max-heap with all $N$ keys and repeatedly remove the maximum key
|
|
HW4 8 Puzzle
A $3$-by-$3$ grid with $8$ square blocks labeled $1$ through $8$ and a blank square. The goal is to rearrange the blocks so that they are in order, using as few moves as possible. You are permitted to slide blocks horizontally or vertically into the blank square. Here is an example:
$$ \begin{matrix} & 1 & 3 \\ 4 & 2 & 5 \\ 7 & 8 & 6 \end{matrix} \quad\to\quad \begin{matrix} 1 & & 3 \\ 4 & 2 & 5 \\ 7 & 8 & 6 \end{matrix} \quad\to\quad \begin{matrix} 1 & 2 & 3 \\ 4 & & 5 \\ 7 & 8 & 6 \end{matrix} \quad\to\quad \begin{matrix} 1 & 2 & 3 \\ 4 & 5 & \\ 7 & 8 & 6 \end{matrix} \quad\to\quad \begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & \end{matrix} $$We use the A search algorithm* to solve this problem. We require:
- A search node of the game to be a board.
- The number of moves made to reach the board.
- The predecessor search node.
The search process:
- Insert the initial search node (the initial board, $0$ moves, and a null predecessor search node) into a priority queue.
- Delete min priority search node from the priority queue, and insert all neighboring search nodes back.
- Repeat until the search node dequeued corresponds to a goal board.
Choices of priority function:
- Hamming priority function: the number of blocks in the wrong position, plus the number of moves made so far to get to the search node.
- Manhattan priority function: the sum of the Manhattan distances (sum of the vertical and horizontal distance) from the blocks to their goal position, plus the number of moves so far to get to the search node.
Optimization:
- To avoid search node corresponding to the same board enqueued on the priority queue many times, don’t enqueue a neighbor if its board is the same as the board of the predecessor search node.
- Pre-compute the Manhattan priority when constructing the search node and save it in an instance variable.
Board: Creat an immutable data type Board
that models an $n$-by-$n$ board with sliding tiles. Board.java
Solver: Solver.java
Symbol Tables
Key-value pair abstraction:
- Insert a value with specified key.
- Search for the corresponding value with a given key.
API
|
|
Implementing Equals for User-Defined Types
|
|
Binary Search Trees
- A BST is a binary tree in symmetric order.
- A binary tree is either empty or two disjoint binary trees (left and right).
- Each node has a key and every node’s key is
- larger than all keys in its left subtree.
- smaller than all keys in its right subtree.
Basic Operations
- Search: If less, go left; if greater, go right; if equal, search hit.
- Insert: If less, go left; if greater, go right; if null, insert.
Node
A Node is comprised of four fields:
- A key and value
- A reference to the left and right subtree
|
|
Get
Return value corresponding to given key, or null
if no such key.
|
|
Put
Search for key, then
- If key in tree, reset value.
- If not in tree, add new node.
|
|
Ordered Operations
Max, Min, Floor, Ceiling
- Minimum: smallest key in table. Keep search
x.left()
until it is null, then returnx
. - Maximum: Largest key in table. Keep search
x.right()
until it is null, then returnx
. - Floor: Largest key $\le$ a given key.
- Ceiling: Smallest key $\ge$ a given key.
|
|
Subtree Counts
In each node, we store the number of nodes in the subtree rooted at that node.
|
|
Rank
The number of keys $<$ k.
|
|
Inorder Traversal
- Traverse left subtree.
- Enqueue key.
- Traverse right subtree.
|
|
Deletion
Deleting The Minimum
- Go left until finding a node with a null left link.
- Replace that node by its right link.
- Update subtree count.
|
|
Hibbard Deletion
Search for node $t$ containing key $k$.
- If no children: Delete $t$ by setting parent link to null
- If one child: Delete $t$ by replacing parent link
- If two children:
- Find the minimum in $t$’s right subtree $x$.
- Delete $x$ in $t$’s right subtree.
- Put $x$ in $t$’s spot.
Update subtree counts.
|
|
Balanced Search Trees
2-3 Tree
Allow 1 or 2 keys per node
- 2-node: one key, two children.
- 3-node: two key, three children.
Symmetric order. Inorder traversal yields keys in ascending order.
Perfect balance. Every path from root to null link has same length.
Search
Search is similar to binary search trees. Recursively compare search key against keys in node. The only difference is for 3-node, when a search key needs to be compared with these two keys.
Insertion
Insertion into a 2-node at bottom will change it to a 3-node.
Insertion into a 3-node at bottom:
- Add new key to 3-node to create temporary 4-node.
- Move middle key in 4-node into parent and split the other two keys.
- Repeat up the tree, as necessary.
- If you reach the root and it’s a 4-node, split it into three 2-nodes.
This insertion maintains symmetric order and perfect balance.
Implementation
Direct implementation is complicated, because:
- Maintaining multiple node types is cumbersome.
- Need multiple compares to move down tree.
- Need to move back up the tree to split 4-nodes.
- Large number of cases for splitting.
Instead, we choose a better way to do it, the red-black BST.
Red-Black BST
Represent 2-3 tree as a BST by splitting 3-node to two 2-node with a red link connecting them.
- No node has two red links connected to it.
- Every path from root to null link has the same number of black links.
- Red links lean left.
Representation
Each node is pointed to by precisely one link from its parent and can encode color of the link.
|
|
Elementary Operations
Left rotation. Orient a right-leaning red link to lean left.
|
|
Right rotation. Orient a left-leaning red link to (temporarily) lean right (To solve two red links in a row).
|
|
Color flip. Recolor to split a (temporary) 4-node.
|
|
Search
Search is the same as for elementary BST (ignore color) but runs faster because of better balance.
Insertion
- Right child red, left child black: rotate left.
- Left child, left-left grandchild red: rotate right.
- Both children red: flip colors.
|
|
B-Tree
B-tree and its variants are widely used for file systems and databases to reduce the time for probing.
Generalize 2-3 trees by allowing up to $M-1$ key-link pairs per node. Choose $M$ as large as possible so that $M$ links fit in a page.
- At least 2 key-link pairs at root.
- At least $M/2$ key-link pairs in other nodes.
- External nodes contain client keys.
- Internal nodes contain copies of keys to guide search (each key is a copy of min key in subtree).
A search or an insertion in a B-tree of order $M$ with $N$ keys requires between $\log_{M-1} N$ and $\log_{M/2} N$ probes. In practice, the number of probes is at most 4. e.g. $M=1024; N=62 \text{ billion}$, $\log_{M/2} N \le 4$.
Geometric Applications of BSTs
1d Range Search
Extension of ordered symbol table. Besides insertion, search and deletion, we require Range count and Range search.
Range count: number of keys between $k_1$ and $k_2$.
|
|
Range search: find all keys between $k_1$ and $k_2$.
- Recursively find all keys in left subtree (if any could fall in range).
- Check key in current node.
- Recursively find all keys in right subtree (if any could fall in range).
Line Segment Intersection
Given $N$ horizontal and vertical line segments, find all intersections.
Sweep-line algorithm: reduces 2d line intersection to 1d range search.
- x-coordinates define events.
- h-segment (left endpoint): insert y-coordinate into BST.
- h-segment (right endpoint): remove y-coordinate from BST.
- v-segment: range search for interval of y-endpoints.
Kd Trees
Start from 2d trees. It is the extension of ordered symbol-table to 2d keys.
Space-Partitioning Trees
- Grid: Divide space uniformly into squares.
- 2d tree: Recursively divide space into two halfplanes.
- Quadtree: Recursively divide space into four quadrants.
- BSP tree: Recursively divide space into two regions.
Grid Implementation
- Divide space into $M\times M$ grid of squares.
- Create list of points contained in each square.
- Use 2d array to directly index relevant square.
- Insert: add $(x,y)$ to list for corresponding square.
- Range search: examine only squares that intersect 2d range query.
Grid size is the key to performance. If too small, it will waste space; if too large, there will be too many points per square. A reasonable choice is $\sqrt{N}$-by-$\sqrt{N}$. Grid implementation is fast and simple solution for evenly-distributed points but not for clustering.
2d Tree
BST but alternate using x and y coordinates as key.
Range search: Find all points in a query axis-aligned rectangle.
- Check if point in node lies in given rectangle.
- Recursively search trees (if any could fall in rectangle).
- If query rectangle does not intersect the rectangle corresponding to a subtree, prune this subtree.
Nearest neighbor search: Find closest point to query point.
- Recursively search trees.
- If closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a subtree, prune this subtree.
Applications
- Ray tracing
- 2d range search
- Flight simulators
- N-body simulation
- Collision detection
- Astronomical databases
- Nearest neighbor search
- Adaptive mesh generation
- Accelerate rendering in Doom
- Hidden surface removal and shadow casting
Interval Search Trees
1d interval search:
- Insert an interval $(lo, hi)$.
- Search for an interval.
- Delete an interval.
- Given an interval, find all intervals in data structure that intersects it.
Non-degeneracy assumption: No two intervals have the same left endpoint.
Create BST, where each node stores an interval. Use left endpoint as BST key and store max endpoint in subtree rooted at node.
To insert an interval $(lo, hi)$, first insert into BST using $lo$ as the key, then update max in each node on search path.
To find intersections for query interval $(lo, hi)$:
- If interval in node intersects query interval, return it.
- Else if left subtree is null, go right.
- Else if max endpoint in left subtree is less than $lo$, go right.
- Else go left
Rectangle Intersection
Find all intersections among a set of $N$ orthogonal rectangles.
Non-degeneracy assumption: All x and y coordinates are distinct.
Sweep-line algorithm: reduces 2d orthogonal rectangle intersection search to 1d interval search.
- x-coordinates of left and right end points define events.
- Maintain set of rectangles that intersect the sweep line in an interval search tree (using y-intervals of rectangle).
- Left endpoint: interval search for y-interval of rectangle; insert y-interval.
- Right endpoint: remove y-interval
HW5 Kd-Trees
Write a data type to represent a set of points in the unit square using a 2d-tree to support efficient range search and nearest-neighbor search.
Two immutable data types are provided, the Point2D
represents points in the plane and RectHV
represents axis-aligned rectangles.
Brute-force implementation: use red-black BST. PointSET.java
2d-tree implementation: KdTree.java
Hash Table
Save items in a key-indexed table (index is a function of the key)
- Computing the hash function
- Equality test: Method for checking whether two keys are equal
- Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index
Hash Function
Idealist goal:
- Efficiently computable
- Each table index equally likely for each key
Java library implementations for primitive types: Integer
, Boolean
, Double
, String
|
|
- Hash code: an
int
between $-2^{31}$ and 2^{31} - 1$ - Hash function output: an
int
between $0$ and $M - 1$
|
|
Separate Chaining
Use an array of $M < N$ linked lists to deal with collisions:
- Hash: map key to integer $i$ between $0$ and $M-1$
- Insert: put at front of $i$-th chain (if not already there)
- Search: need to search only $i$-th chain
|
|
Typical choice: $M \sim N / 5$
Linear Probing
When a new key collides, find next empty slog, and put it there ($M > N$)
- Hash: map key to integer $i$ between $0$ and $M-1$
- Insert: put at table index $i$ if free, if not try $i+1$, $i+2$, etc
- Search: search table index $i$, if occupied but no match, try $i+1$, $i+2$, etc
|
|
Typical choice: $M \sim 2N$, mean displacement will be $\sim 3/2$