Introduction
Welcome. You've found my study guide. With the help of some friends I've created a guide encapsulating most of the knowledge I've found necessary for interviews in my personal experience, as well as things I've learned may be necessary from a variety of books related to software engineering interviews.
Beyond interview prep, this can also be a good reference for anyone in school trying to study for their data structures, algorithms, or design patterns courses.
General Knowledge
Bitwise Operations
- & = AND
- | = OR
- ~ = NOT
- ^ = XOR
- << = left shift (new bits are 0)
- >> = right shift (new bits are 0)
Bit Manipulation Tricks
- Check if an integer is even
return x & 1 == 0;
- Check if an integer is a power of 2
return x && !(x & (x - 1));
- Swap two numbers
a = 1; b = 2;
a = a ^ b;
b = a ^ b;
a = a ^ b;
// a = 2, b = 1
- Flip the bits of an integer (language-dependent)
int allOnes = ConvertToInt("111..."); // use as many 1s as there are bits in x
return allOnes - x;
HTTP Request Methods
- GET = Retrieve data.
- HEAD = GET, but without the response body (just the metadata and header).
- POST = Tell a URI to handle some data.
- PUT = Set the resource at the given URI.
- DELETE = Delete the specified resource.
There are others, but these are the most commonly used.
Git Commands
- init - creates a new local repository
- clone - downloads an existing project/repo
- diff - shows unstaged file differences
- commit - records the current state of files to version history
- reset - undoes all commits after the provided one
- branch - creates a new branch
- checkout - switches to a specified branch
- merge / mergetool - combine target branches
- stash - temporarily store current file changes
- fetch - downloads repo history
- pull - downloads branch history
- push - uploads local branch commits
There are others, but these are the most commonly used.
Memory
Memory can be allocated on the stack or the heap.
Stack allocation happens in a contiguous block of memory, and the compiler pre-determines the required space.
Heap allocation happens during the execution of the program as per the instructions of the programmer.
The key difference is that programmers are responsible for allocating and deallocating heap memory.
Asymptotic Analysis
Big-O Notation
Asymptotic analysis is usually expressed in Big-O notation. This notation is usually used to describe the upper bound on a growth rate.
Formally, it is said that \( f(x) = O(g(x)) \) if there exists a positive real number \( m \) such that \( \left\lvert f(x) \right\rvert \leq m \cdot g(x) \) as \( x \to \infty \). Alternatively, \( f(x) = O(g(x)) \) if such an \( m \) exists that makes \( \lim_{x \to \infty} \frac{\left\lvert f(x) \right\rvert}{m \cdot g(x)} \lt \infty \) true.
For example: \( x^2 \) is \( O\left(x^3\right) \) and \( 4x^2 + 3x + 2 \) is \( O\left(x^2\right) \).
Other Notations
Big-\( \Omega \) notation can be interpreted as the "opposite" of Big-O notation in computer science; \( f(x) = \Omega(g(x)) \) if and only if \( g(x) = O(f(x)) \). Alternatively, \( f(x) = \Omega(g(x)) \) if there exists a positive real number \( m \) such that \( \lim_{x \to \infty} \frac{f(x)}{m \cdot g(x)} \gt 0 \).
For example: \( x^3 \) is \( \Omega\left(8x^2\right) \). It is also possible for the second example above to hold: \( 4x^2 + 3x + 2 \) is \( \Omega\left(x^2\right) \).
Big-\( \Theta \) notation is defined simply as: \( f(x) = \Theta(g(x)) \) if and only if \( f(x) = O(g(x)) = \Omega(g(x)) \).
Using the example above: \( 4x^2 + 3x + 2 \) is \( \Theta\left(x^2\right) \).
Algorithm Analysis
In computer science, Big-O notation is commonly used to describe the performance of algorithms, either in terms of run time or memory usage. Usually when run time or memory usage are analyzed, we want to know the worst-case, so we make the assumption that conditional statements always pass. Take the following function for example:
def myFunc(someArray)
{
count = 0;
foreach (element in someArray)
{
if (element != null)
count = count + 1;
}
}
Line-by-line:
count = 0;is an assignment operation, and has a constant run time, so this is an \( O(1) \) operation.foreach (element in someArray)loops through each element of the array, so this tells us that whatever happens inside this loop will happen n times, where n is the length of the array.if (element != null)is a conditional statement (constant time) using an equality comparer (also constant time), thus an \( O(1) \) operation.count = count + 1;is an assignment and increment (both constant time). Since we're analyzing the worst case scenario, we assume this will always be hit, even though it's only conditionally hit in practice. Either way, it is an \( O(1) \) operation.
In total, we have a run time of \( O(1) + (O(n) \cdot (O(1) + O(1))) \). This can be condensed to \( O(1 + (n \cdot (1 + 1))) = O(1 + n) = O(n) \).
If we were to add another for loop inside the existing one then we would get a runtime of \( O\left(n^2\right) \), and so on.
Amortized Analysis
An alternative to worst-case performance analysis is amortized analysis, which is roughly "average-case" performance analysis. There are 3 primary methods to analyze an algorithm like this:
- Run the algorithm \( n \) times with different inputs, calculating their operational costs each time. Take the upper bound of the sum of all operational costs and divide by \( n \).
- The accounting method
- The potential method
Amortized analysis is usually how certain performance gains are found, especially when the possible input set of an algorithm is limited. For example, quick sort is the most commonly implemented sorting algorithm, but it's actually faster to sort a mostly sorted collection using insertion sort. Similarly, quick sort in and of itself has a worst-case of \( O(n) \), even though it's usually more performant than other sorting algorithms with better worst-case analyses.
The Master Theorem
Sometimes it is difficult to analyze algorithms, especially when they implement recursion, such as in Divide & Conquer algorithms. This is where the master theorem comes into play. Divide & Conquer algorithms can be expressed as \[ T(n) = a \cdot T\left(\frac{n}{b}\right) + f(n) \] where \( T(n) \) is an upper bound on the run time of the algorithm, \( n \) is the input size, \( a \) is the number of recursive calls that each individual call makes, \( \frac{n}{b} \) is the input size of the recursive calls, and \( f(n) \) is the the run time of the final call (the base case). Additionally, let \( c_{crit} = \frac{\log(number of subproblems)}{\log(relative subproblem size)} = \log_b a \).
Given the above, the master theorem says that
- If there exists a \( c \lt c_{crit} \) such that \( f(n) = O\left(n^c\right) \), then \( T(n) = \Theta\left(n^{\log_b a}\right) \).
- If there exists a \( k \geq 0 \) such that \( f(n) = \Theta\left(n^{c_{crit}} log^k n\right) \), then \( T(n) = \Theta\left(n^{c_{crit}} log^{k+1} n \right) \).
- If there exists a \( c \gt c_{crit} \) such that \( f(n) = \Omega\left(n^c\right) \), and there exists a \( k \lt 1 \) and sufficiently large \( n \) such that \( a \cdot f\left(\frac{n}{b}\right) \leq k \cdot f(n) \), then \( T(n) = \Theta(f(n)) \).
Security and Cryptography
Most people in the industry aren't designing cyptography algorithms, but there's some value in knowing the uses for the various popular cryptography algorithms and cryptographic schemes in the same way that there's value in knowing some basic malicious attacks against software software systems.
SHA
An acronym for Secure Hash Algorithms, SHA is a family of cryptographic hash functions, which means that they turn data of arbitrary size into a fixed-size bit array. They are impractical to reverse, and the same data will always result in the same hash, with two values never (or infeasibly) resulting in the same hash. Also, hashes should be unrelated to each other (a small variation in one value should result in a drastically different hash). They are used to verify the integrity of messages and files, as well as for generating and verifying signatures, and verifying passwords. They are also seen used as unique identifiers, and commonly used in software distribution so that users can validate they have the correct file after download by applying the same hash function.
RSA
RSA is a public-key cryptosystem--a system in which participants have a public encryption key and a private decryption key. It is most commonly used for symmetric key cryptography (keys used for encrypting plaintext and decrypting ciphertext are the same or one is derivative of the other--both keys are known to all the parties participating in the exchange of information), and for small amounts of data in asymmetric key cryptography, because it is a slow algorithm. The premise of such systems is that a party that wants to send a sensitive message to another party can use the latter's public encryption key so that only the latter can decrypt it with their private key. Such systems can also be used in authentication by using the private keys to create digital signatures on messages, and so the receiver can use the sender's public key to re-encrypt the message with the signature and verify that the result is indeed valid.
Diffie-Hellman Key Exchange
This is a method of securely exchanging cryptographic keys over a public channel. It is itself non-authenticated, but provides the basis for many authenticated protocols.
An underlying assumption is that reversing the operation \( g^a \mod p \), where a is some integer, p is a prime number, and g is a primitive root modulo p (math terms; more info here), is not practical.
The basic algorithm is as follows:
- Alice and Bob publicly agree to use some prime p and some primitive root modulo p, g.
- Alice secretly chooses an integer, a, and sends Bob the result of \( A = g^a \mod p \).
- Bob secretly chooses an integer, b, and sends Alice the result of \( B = g^b \mod p \).
- Alice computes the value of \( B^a \mod p \).
- Bob computes the value of \( A^b \mod p \).
- The values Alice and Bob computed should be the same, and that is their shared secret.
HMAC
An HMAC is a Hash-based Message Authentication Code, a specific type of code invoving a cryptographic hash function and a secret cryptographic key. It is used to simultaneously verify the data integrity and authenticity of a message. Such codes can be generated with any cryptographic hash functions (like SHA).
The exact definition/calculation for an HMAC is complex. More info here.
AES
AES is a specification for the encryption of electronic data. It describes a symmetric-key algorithm. In practice it is generally superior to RSA for use in a symmetric system, but can be combined with RSA's asymmetric system capabilities such that data is exchanged encrypted via AES, but getting the secret key required to decrypt that data involves authorized recipients publishing a public key and keeping a private key, the public key of which is then used by the sender with RSA to encrypt and transmit secret AES keys to recipients that they can use to decrypt the data.
Hamming Codification
Hamming codes are a family of linear error-correcting codes that can detect up to two-bit errors or correct one-bit errors without detection of uncorrected errors. Parity bits are a similar concept that cannot correct errors and can only detect an odd number of bits in error. More info here.
Cyclic Redundancy Check
Another error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. It is named as such because it expands data without adding information and the algorithm is based on cyclic codes therein. More info here.
Attack & Defense
There aren't many common attacks that developers will have to actively defend against on a regular basis. From an application design standpoint you could encourage features such as 2-factor authentication, but at the end of the day it's not up to developers whether or not a feature is added. One attack developers will have to actively defend against is the injection attack. Defending against an injection attack is as simple as not letting user input go unchecked to a database query. This means sanitizing and parameterizing user input. Man-in-the-middle attacks are also something that needs to be addressed by developers. The defense against man-in-the-middle attacks is not letting sensitive information travel between services in a manner that makes the information easy to decrypt. In other words: sensitive information should always be encrypted before crossing services, and encrypted information should not travel with or near any information that could aid in decrypting it. While you might not personally have to deal with either of these things, it's not all that common to see a basic injection attack question in an interview (e.g. "What's wrong with this code?").
Data Structures
These are fundamental structures in computer science which enable complex operations. This chapter only covers data structures that are significant/well-known in computer science, or that are broadly applicable to most computer science fields. You will not find data structures in this chapter that have very specific, niche use cases, or are only applicable to specific fields of computer science.
Cheat Sheet
For the lazy among you...
Abstract Data Types
- List: ordered, not necessarily unique, not finite
- Set: not necessarily ordered, unique, not finite
- Tuple: ordered, not necessarily unique, finite, a composite type in some (probably most) languages
- Stack: first-in-last-out,
pop(),push(),peek() - Queue: first-in-first-out,
enqueue(),dequeue(), can be prioritized (priority queue) - Bag: random or full reads only,
grab()/pick() - Multimap/Associative Array: key-value mappings (e.g. hash table),
getValue(key),insert(key, value)
Performance Tables
If multiple values, the order is Worst-Case | Amortized .
Arrays & Lists
| Structure | Indexing | Insert/Delete [0] | Insert/Delete [n-1] | Insert/Delete [i] | Wasted Space |
|---|---|---|---|---|---|
| Array | Θ(1) | N/A | N/A | N/A | 0 |
| Dynamic Array | Θ(1) | Θ(n) | Θ(n) | Θ(1) | Θ(n) | Θ(n) |
| Hashed Array Tree* | Θ(1) | Θ(n) | Θ(n) | Θ(1) | Θ(n) | Θ(√n) |
| Linked List | Θ(n) | Θ(1) | Θ(1), Θ(n)** | *** | Θ(n) |
| Balanced Tree**** | Θ(log n) | Θ(log n) | Θ(log n) | Θ(log n) | Θ(n) |
** Θ(1) when the last element is known; Θ(n) when the last element is unknown.
*** Insertions/Deletions from the middle of a list are dependent how that element is found (it is Θ(n) to simply go to the ith element, but searching without knowing the index has a different complexity). There is a Θ(1) tax on that algorithm.
**** Not technically a list structure, but included to compare indexing.
| Structure | Search | Insert | Delete | Space |
|---|---|---|---|---|
| Skip List | O(n) | O(log n) | O(n) | O(log n) | O(n) | O(log n) | O(n log n) | O(n) |
| Hash Table | O(n) | O(1) | O(n) | O(1) | O(n) | O(1) | O(n) |
Trees
| Structure | Search | Insert | Delete |
|---|---|---|---|
| Binary Tree | O(n) | O(n) | O(n) |
| Binary Search Tree | O(n) | O(log n) | O(n) | O(log n) | O(n) | O(log n) |
| Cartesian Tree | O(n) | O(log n) | O(n) | O(log n) | O(n) | O(log n) |
| K-D Tree | O(n) | O(log n) | O(n) | O(log n) | O(n) | O(log n) |
| AVL Tree | O(log n) | O(log n) | O(log n) |
| Red-Black Tree | O(log n) | O(log n) | O(log n) |
| Splay Tree | Θ(n) | O(log n)* | Θ(n) | O(log n) | Θ(n) | O(log n) |
| B-Tree | O(log n) | O(log n) | O(log n) |
| Trie** | O(m) | O(m) | O(m) |
| Van Emde Boas Tree*** | O(log m) | O(log m) | O(log m) |
| Radix Tree** | O(m) | O(m) | O(m) |
** Tries are a type of associative array, and radix trees are a type of trie. Here, m represents the length of the key.
*** Van Emde Boas trees are also a type of associative array. Here, m is the bit length of the keys and the maximum number of elements that can be stored in the tree is 2m.
Heaps
| Structure | Find-Min* | Delete-Min* | Insert | Decrease-Key | Meld |
|---|---|---|---|---|---|
| Binary Heap | Θ(1) | Θ(log n) | O(log n) | O(log n) | Θ(n) |
| Fibonacci Heap | Θ(1) | O(log n) | Θ(1) | Θ(1) | Θ(1) |
Graphs
There aren't that many types of graphs, and the most common non-standard type is a flow network.
Graphs can be searched in \( O(\left\lvert V \right\rvert + \left\lvert E \right\rvert) \) time using \( O(\left\lvert V \right\rvert) \) space, where \( \left\lvert V \right\rvert \) and \( \left\lvert E \right\rvert \) are the number of vertices and the number of edges, respectively.
The performance of finding the shortest path between two vertices depends on the type of graph (directed, acyclic, does/does not contain negative edge weights, etc.) and the algorithm used. See more in the algorithms chapter.
Data Types
These are the building blocks of complex data structures.
Primitive Types
These are the possible representations of data at any given address.
Somewhat depends on the language, but this is the list Wikipedia uses.
Boolean
True or false.
Character
Represents a symbol, usually in ASCII.
Interchangeable with integer in some languages.
Floating-Point Numbers
E.G. float, double.
Fixed-Point Numbers
E.G. decimal.
Integer
Integral or fixed-precision values.
Reference
A.K.A. pointer or handle.
Enumerated Type
A small set of uniquely named values.
Composite Types
A.K.A. non-primitive types.
Somewhat depends on the language, but this is the list Wikipedia uses.
Array
Fixed set of values in adjacent space.
E.G. string (an array of characters).
Indexable in constant (\( \Theta(1) \)) time with no wasted space. An index \( i \) can be modified in constant time as well. Since the length is fixed, insertion and deletion operations don't apply.
Record
A.K.A. tuple or structure (struct).
Union
Can have any of several representations or formats within the same position in memory.
Abstract Data Types
These are models defined by their behavior. Their implementations are not prescribed, which is what separates them from data structures and composite data types.
List
A.K.A. sequence.
Countable, ordered values with no requirement for uniqueness.
Common operations include getElementAt(i), add(element), add(element, index), count(), first() and last().
Multimap
A series of mappings from unique keys to values.
Associative Array
A.K.A. map or dictionary.
A subset of multimaps wherein only one value can be associated with a key.
Set
Unique, unordered values.
Multiset
A.K.A. bag.
Similar to a set, but can allow repeated (equal) values, which can either be treated as identical (counted, but not stored), or equivalent (counted and stored). You could also think of it as similar to a list, but order doesn't matter.
Stack
Like a list, defined by the following behavior:
push(element)an element by adding it to the front of the list.pop()an element by removing it from the front of the list.peek()by observing the front-most element.
The last element to enter the stack will be the first one to leave it.
Queue
Like a list, defined by the following behavior:
enqueue(element)an element by adding it to the back of the list.dequeue()an element by removing it from the front of the list.
The first element to enter the queue will be the first one to leave it.
Priority Queue
A structure defined with the same behavior as a queue, except either the enqueue() operation or the dequeue() operation may re-order elements such that certain elements will leave the queue with a higher priority than other elements (first-in-first-out is not as strictly followed).
Linear Data Structures
Dynamic Array
This is array that can shrink or expand in size. A somewhat basic implementation of this is creating a new array with length of, for example: 16. When the array becomes full, a new array is created with more spaces (different implementations may create a different amount of spaces), for example: 32. Then, all the values in the old array are copied into the new array. The same operation may happen in reverse if enough values are deleted from the array.
The average cost to insert or delete at a random index is therefore \( \Theta(1) \), but to insert or delete at the beginning or end of the array is \( O(n) \). Dynamic arrays usually waste \( \Omega(n) \) space when expanding.
Hashed Array Tree
Despite the name, this is not really a tree; It's a dynamic array. The difference is that elements are stored as lists.
For example, your base array may start with a capacity of 16. Actually, it would be an array of length 4, where each element is a pointer to another array of length 4.
This approach puts some restrictions on when it can be expanded and shrunk, but it means that you only have to move 4 elements (the pointers) instead of 16 when you do expand it. The insertion and deletion performance of the hashed array tree is therefore the same as in a standard dynamic array, but it only wastes \( O\left(\sqrt n\right) \) space when expanding.
Singularly Linked List
A list that consists of a sequence of nodes, which each contain a value and a reference to the next node in the list (or null if it is the last node).
- Searchable in \( O(n) \) time (unsorted). Variable when sorted (depends on algorithm).
- Reading, inserting or deleting at the end of the list may take \( O(1) \) or \( O(n) \) time, depending on whether the last element is known or not (it is known in most implementations).
- Reading, inserting or deleting at the beginning of the list should take \( O(1) \) time.
- Insert, delete, or read in the middle of the list in search time + \( \Theta(1) \) (always \( O(i) \) time if acting on index \( i \ne n-1 \)).
Doubly Linked List
The main difference between a doubly linked list and a singularly linked list is that nodes also carry a reference to the previous node. This may lend itself to sorting algorithms, and if the last element is known in the implementation, then it will always take at most \( O\left(\frac n 2 \right) \) time to read/insert/delete at a specific index (time is still that of a singularly linked list if looking for an element by value instead of index).
Circularly Linked List
A circularly linked list is simply a linked list where the "next" pointer of the last node points to the first node. There is no substantial performance difference between this and a singularly linked list. A circularly linked list can optionally be implemented with doubly linked nodes, thus gaining the benefits of a doubly linked list.
Skip List
A skip list is quite a unique list. It is a tiered structure, where each tier is itself a sorted list (commonly a linked list). At the top you have a list with few elements, and as you progress down, the lists gain more and more elements (including the elements from previous tiers). Usually there is some sort of algorithm to determine which elements go in the top tier (usually the most frequently accessed elements end up there). This kind of data structure is known as a probabilistic data structure.
A search through a skip list would start at the top and work its way down when it cannot find the desired element. It goes through elements until it passes from a smaller element to a larger element than the one desired, then it will go down to the next tier and search between those two elements in the that tier, and so on. To go to a specific position it can simply jump to the bottom tier or, if it knows the highest tier that the index is on, it can jump straight to the highest tier with that index.
This kind of list is good for storing data with values of varying importance, such as data following a normal curve.
To achieve the performance benefits of a skip list, you must accept wasted space. A skip list will use \( O(n \log n) \) space in the worst case, but \( O(n) \) on average in exchange for searching, inserting, and deleting in \( O (\log n) \) time on average and \( O(n) \) time in the worst case.
Hash Table
An implementation of an associative array that passes provided keys through a hash function to determine where in an array they go.
Key collisions can be handled either by implementing the array as an array of lists (multiple keys can map to the same place, but their order is tracked so as to know which value in the list belongs to which key) or by moving the value to the next available space.
Hash tables...
- do not allow for null keys or values.
- are thread-safe, but not performant.
Hash tables will use the standard \( O(n) \) space with constant time (\( O(1) \)) searches, insertions, and deletions on average (\( O(n) \) in the worst case).
Hash Map
This is a volatile variation of the hash table, which...
- allows nulls for keys and values.
- is not thread-safe, but is performant.
Hash Set
A set in which values are passed through a hash function to determine their placement within an array. All other rules of sets are followed (unique, unordered values).
Graphs
Graphs are collections of vertices (a.k.a. nodes) and edges, where each vertex has a collection of edges it touches, and each edge is composed of a to vertex and a from vertex. They are primarily used for spatial operations or relationship analysis.
Directed Graph
Edges can only be traversed one way. Vertices have their collection of edges split in two: one collection for incoming edges and one collection for outgoing edges.
Directed, Acyclic Graph
Directed graphs are acyclic if there is no way to traverse from any one vertex back to itself.
Flow Networks
Edges in a flow network are assigned a flow, and each vertex must have the same incoming flow as it has outgoing flow, except for source vertices, which may not have any incoming flow, and sink vertices, which may not have any outgoing flow. Edges are assigned a capacity, and the flow through edges cannot exceed their capacity.
Adjacency List
This is a way of representing a graph as a list of node adjacencies.
Trees
Trees are basically rooted graphs (interactions with trees begins at the same vertex each time, which is the root).
Some terminology:
- An element is referred to as a node.
- The starting node, with no parent, is called the root.
- External nodes, or leaves are the nodes with no children. Internal nodes are the opposite.
- The height of a tree is the length of the longest path to a leaf from the root.
- A tree is balanced if the left and right subtrees of every node differ in height by no more than 1.
All trees listed below use \( O(n) \) space.
Trees can have very complex implementations, so the implementations will not all be described here. The important part is knowing which trees are important for which tasks; You can always google the implementation once you know what you need.
Binary Tree
In a binary tree, no node can have more than two child nodes. This leads to some handy properties:
- The maximum number of nodes at level \( l \) will be \( 2^{l-1} \). Note: The root node is considered to be on level 1.
- The maximum number of nodes in a binary tree of height \( h \) is \( 2^h - 1 \). Note: The height of a tree with one node (the root node) is considered to be 1.
- In a binary tree with \( n \) nodes, the minimum height is \( \log_2(n+1) \).
Many correlaries can be drawn from the aforementioned properties as well.
There are various ways of describing binary trees based on their structure:
- A full binary tree is one in which every node has either 0 or 2 children. Recursively: A full binary tree is either a single node, or a node with 2 children that are each full binary trees.
- A complete binary tree is one in which every level except the last must be completely filled, and all nodes on the last level must be as far left as possible.
- A perfect binary tree is one in which all leaves are on the same level.
Binary Search Tree
The basic process of inserting into a binary search tree is as follows:
insert(value)
{
if root.value == null
root.value = value
else if value <= root.value
root.left.insert(value)
else
root.right.insert(value)
}
The expected height of a binary search tree is \( \sqrt n \).
Searching, inserting, and deleting all take, on average, \( O(\log n) \) time and, at worst, \( O(n) \) time.
AVL Tree
In an AVL tree, the nodes are ordered, from left to right, minimum to maximum, and the tree stays as balanced as possible.
The basic process of inserting is similar to that of a binary search tree, except that after an insertion, the ancestors of the new node are checked for balance and, if they are not balanced, they get re-balanced.
AVL trees are similar to red-black trees in terms of operational performance, but AVL trees are faster for lookup-intensive applications because they are more strictly balanced.
Searching, inserting, and deleting all take \( O(\log n) \) time.
Splay Tree
Splay trees can really use whatever ordering they want, but the important part is that there is an order, because splay trees have the unique behavior of "splaying" once a node is accessed. Splaying enacts a series of rotations/swaps with the accessed node and its parent or siblings, repeatedly, until the accessed node becomes the root node and the tree still adheres to its ordering principle. This means that splay trees are very performant when it is known that some nodes will be need to be accessed much more frequently than others.
Searching, inserting, and deleting all take, on average, \( O(\log n) \) time and, at worst, \( O(n) \) time.
Trie
What makes a tree a trie is if all nodes have a key that is some character to accompany its value. Thus, a traversal through a trie results in a path that creates a unique word/string out of the keys. This also makes it an associative array implementation.
Searching, inserting, and deleting all take \( O(m) \) time, where \( m \) is the length of the word/string being accessed or updated.
Cartesian Tree
Cartesian trees are heap-ordered (largest values on top, smallest at bottom, or vice-versa) trees generated from a sequence of numbers (each node represents one of the numbers in the sequence) with the unique property that a symmetric (in-order/left-to-right) traversal of the tree will yield the sequence it was generated from. It takes linear time to construct such a tree.
These trees are commonly used for range-searching operations and range minimum queries.
Searching, inserting, and deleting all take, on average, \( O(\log n) \) time and, at worst, \( O(n) \) time.
K-D Tree
Every leaf node in a k-d tree represents a k-dimensional point. Each level down the tree splits that point on another dimension.
For example, in a 2-d tree, the root node may represent a 1-dimensional line with the formula \( y = x \). In this tree, every leaf node to the left of the root will represent a point \( (x,y) \) where \( y \lt x \), and every leaf node to the right of it will represent a point where \( y \gt x \). You could keep creating divisions at each level until eventually you make leaf nodes with the actual coordinates of the points.
This type of tree lends itself to range-searching operations and nearest neighbor problems.
Searching, inserting, and deleting all take, on average, \( O(\log n) \) time and, at worst, \( O(n) \) time.
Red-Black Tree
In a red-black tree, each node is assigned a color (red or black), and the tree is given a pattern. If that pattern gets broken by an insertion or deletion, then the tree is re-arranged to reform the pattern of colors. Usually the rule is that colors must alternate.
Searching, inserting, and deleting all take \( O(\log n) \) time.
B-Tree
A B-tree is a non-binary tree that stores nodes in groups and remains balanced. Instead of a node having children, groups of nodes have shared children. This is used for databases and file systems since it is well-suited for storage systems that read and write relatively large blocks of data.
Searching, inserting, and deleting all take \( O(\log n) \) time.
Van Emde Boas Tree
A unique tree that implements an associative array, whose search time improves as the stored values decrease in bit size.
Searching, inserting, and deleting all take \( O(\log m) \) time, \( m \) is the bit length of the keys, and the maximum number of elements that can be stored in the tree is \( 2^m \).
Radix Tree
A radix tree is a type of trie which is space-optimized. Each node that is an only child is merged with its parent, so instead of a path like g-l-a-d-i-a-t-o-r, you may get a path like g-l-adi-a-to-r.
These share the performance of tries, with some optimizations on searching and some taxes on inserting/deleting.
Searching, inserting, and deleting all take \( O(m) \) time, where \( m \) is the length of the word/string being accessed or updated.
Heap
Heaps are types of trees that are commonly used to implement priority queues. Heaps are almost complete trees, and usually they either sort by low-to-high or high-to-low from top-to-bottom.
Some common operations include:
- Find-Min (find the highest priority node)
- Delete-Min (delete the highest priority node)
- Insert
- Increase-Key/Decrease-Key (update the priority of a key/node in the tree)
- Meld/Merge (join 2 heaps together)
- Heapify (turn a list of elements into a heap)
Binary Heap
These are simply heaps wherein nodes can only have 2 children.
Finding takes \( \Theta(1) \) time, deleting \( \Theta(\log n) \), inserting and modifying the key \( O(\log n) \), and melding/merging \( \Theta(n) \).
Fibonacci Heap
A fibonacci heap is actually a collection of heap-ordered trees, which maintains its status as a heap by keeping track of the minimum value between all the trees. The sizes of the trees are limited by the fibonacci numbers, hence the name. A fibonacci heap is guaranteed to be more performant than a binary heap in cases where insertions and modifications are more likely to happen than deletions.
Finding, inserting, modifying, and merging should all take \( \Theta(1) \) time, with deletions taking \( O(\log n) \).
Algorithms
Algorithms are essentially a way of getting from point A to point B. Therefore, many of the algorithms in this section will be accompanied by some sort of problem or question that the algorithm solves or answers.
In practice, it is unlikely that you will face the exact problems that these algorithms solve, but it's worth knowing them because many problems can be transformed into problems that these algorithms solve. This is the concept behind NP-Hard; A problem is said to be NP-Hard if any problem in NP can be transformed into it in polynomial time. In order to solve an NP-Hard problem, you simply need to know an NP problem, the solution to it, and how to transform it into the NP-Hard problem. The same can philosophy can be extended indefinitely to other, real-world problems, and is not confined to NP-Hard (just because a problem is not NP-Hard does not mean you cannot transform it into another problem you know the answer to).
As with data structures, this chapter only covers algorithms that are significant/well-known in computer science, or that are broadly applicable to most computer science fields. This means, for example, you wont find an algorithm for analyzing pictures of people to determine if they are smiling or frowning, nor will you find algorithms specific to artificial intelligence, meachine learning, etc. This chapter includes, primarily, algorithms that are applicable to most fields of computer science or have some value outside of niche use cases.
Ultimately, just knowing algorithms wont help that much. Here are some places where you can practice applying your knowledge of algorithms:
- https://projecteuler.net/
- https://leetcode.com/
- https://www.codewars.com/
- https://www.kaggle.com/
NOTE: Some algorithms simply have a link to other articles. Such is done for non-essential algorithms which are more valuable to simply know about than to know the implementation for.
Top-Level
Top-level algorithms encompass the general approaches to solving problems in computer science. These don't necessarily have an accompanying problem, and they're used as a foundational piece of most common algorithms.
Brute Force
The most straight-forward approach to problem-solving: Try everything. Many people think of password cracking when they think of brute force algorithms; If you are trying to figure out the passcode to a 4-digit combination lock with no guess limit, you can simply try every combination of 4 digits. Such is the essence of brute force.
Greedy Method
The greedy method is when you "prefer" something in an algorithmic approach to problem solving.
For instance, if you want to maximize the number of items you can take on your trip in your suitcase without concern for their value, you might pack your items in order from smallest to largest. In that case, you are preferring small items, so you're using a greedy method. Such a problem also demonstrates the drawback of the greedy method: Often times, it is not perfect. In that scenario, at some point it may no longer becomes optimal to pack the smallest item because its shape might prevent you from adding anything more, even though 2 slightly larger items would have fit.
Divide & Conquer
Like the name suggests, the premise of divide and conquer algorithms is to divide the problem into sub-problems, solve the sub-problems, and then merge the solutions together to form the solution to the initial problem.
Dynamic Programming
Dynamic programming can be difficult to conceptualize, but don't worry, there are examples in other sections of this guide.
Similar to the divide and conquer approach, dynamic programming seeks to define sub-problems and solve them before merging them to create a solution to the base problem. The difference between the two approaches is that dynamic programming requires that the sub-problems are overlapping.
So what, exactly, is an overlapping sub-problem? Consider the Fibonacci series: \( F_i = F_{i-1} + F_{i-2} \), with the base case \( F_1 = F_2 = 1 \). In order to solve for \( F_{10} \) we generate two sub-problems, \( F_{10} = F_9 + F_8 \). We thus need to solve another sub-problem, \( F_9 = F_8 + F_7 \), and so we see that the solution for \( F_{10} \) and \( F_9 \) have an overlapping sub-problem of \( F_8 \) (and \( F_7 \), etc.). If we were to take a divide and conquer approach then we'd be solving \( F _8 \) twice, which dynamic programming seeks to optimize against.
The dynamic programming approach is usually to create a k-dimensional table for sub-problems and their solutions, where k is the number of sub-problems each problem has. There are 2 common ways to go about this:
- Top-down: Upon observing a sub-problem, check the table to see if it has already been solved. If it has already been solved, then use it, otherwise we solve it and add its solution to the table.
- Bottom-up: Create a table of sub-problems, then start at the most refined sub-problem(s) (ones which do not have any further sub-problems, like \( F_1 \) and \( F_2 \) in the Fibonacci series), adding solutions to the table until you arrive at the original problem.
The top-down approach may save more space, but you have to check the table for a sub-problem in each step. The bottom-up approach avoids the lookup tax, but might solve more sub-problems than it needs to. The top-down approach is usually done recursively, whereas the bottom-up approach is usually done iteratively. In programming this amounts to whether a function calls itself or whether it uses a for loop.
Monte Carlo
This type of algorithm solves a problem by randomly sampling the solutions to related problems. For example: if you know that a certain problem with variables has a solution of 0 5% of the time and a solution of 1 99% of the time, then the monte carlo approach to figuring out where the next solution will lie could be to just randomly sample the set of known solutions and pick the result of the random sampling. This does not guarantee accuracy, but it is very fast (fixed time) and the probability you are correct is high. When repeated in massive quantities you can simulate very complex things like fluid dynamics.
Las Vegas
A Las Vegas algorithm is another randomized algorithm, but in contract to the Monte Carlo approach, it will always produce the correct result (or inform you of failure) at the cost of an uncertain runtime. It is generally applicable to problems where "you'll know the solution when you see it." In essence, you randomly pick a possible solution, and if that solution is correct you return it, otherwise you repeat the process, failing after some threshold of attempts.
Hill Climbing
Hill climbing algorithms apply to algorithms that have many solutions, and it is a strategy taken to find the optimal solution. For example, if you want to find the fastest path between two cities, you could find many paths, and optimize their speed over team via hill climbing.
The general premise is to arbitrarily pick a solution to start at, then make an incremental change to it. If the incremental change produces a better solution, then repeat the process. You can tune the algorithm in ways such as the amount of incremental changes that are attempted before giving up, or how incremental the changes actually are. In general, this will lead the optimal solution for problems with a convex (hill-shaped) solution set. However, for problems with multiple "hills", you might only find the local optima.
Combinatorics
Find a cycle in function value iterations (Brent's Algorithm)
Solve the stable matching problem (Gale-Shapley Algorithm)
The stable matching problem is as follows: Given n students and n internships, and each participant has an ordering or ranking for which students/internships they would prefer, generate a stable matching for each participant. All matches are stable if there are no cases such that a student would prefer an internship they weren't matched with, and that intership would also prefer that student to the one they were matched with. Note that you can replace "students" and "internships" with any entities that need to be matched and have preferences.
The Gale-Shapley algorithm starts by trying to match each student with the internship they prefer most. Each intership with a match offer will then choose among the students that matched to them the one they most prefer, and reject the others. Then, each unmatched student will try to match with their 2nd-most preferred intership, and the interships will then repeat the process of choosing only the student they most prefer (if they prefer a student among their new offers to the one they currently are matched with--if they have a match--then they will take the new student). This process repeats until there are no unmatched students.
The runtime complexity of the algorithm is \( O(n^2) \).
Generate a pseudorandom number (Mersenne Twister, Linear Congruential Generator, Blum Blum Shub)
Graph & Tree
Find the lower common ancestor in a tree or directed acyclic graph (Tarjan's Off-Line Lowest Common Ancestors Algorithm)
Recall that the lowest common ancestor of two nodes in a tree is the lowest node in the tree with those two nodes somewhere in its subtrees.
The basic premise is to follow a post-order traversal of the tree, keeping track of The simplified process of finding the lowest common ancestor of nodes u and v in a tree rooted at root is as follows:
function getLcs(root, u, v)
root.ancestor -> root
for each child in root.children // left-to-right order
getLcs(child, u, v)
child.ancestor -> u.ancestor
if root is u and v is visited
return v.ancestor
else if root is v and u is visited
return u.ancestor
root.isVisited = true
By pre-processing the tree into arrays, one where arr[i] is the visitation status of node i and one where arr[i] is the ancestor of node[i], you can reduce the time complexity of some operations.
A refined variation of this algorithm can perform in linear (\( O(n) \)) time.
Measure the importance of a website (PageRank)
The PageRank algorithm was the original Google. The underyling assumption is that more important websites are likely to receive links from other websites, thus the PageRank algorithm works by performing a web crawl and counting the number and quality of links to pages to determine how important they are.
More info here.
Compute the maximum flow in a flow network or directed graph (Ford-Fulkerson Algorithm, Edmonds-Karp Algorithm)
Recall that finding the maximum flow in a flow network entails finding which path through the flow network will allow you to end the path with the highest amount of flow possible.
For this problem there are two algorithms: Ford-Fulkerson and Edmonds-Karp. The difference being that the Ford-Fulkerson algorithm is more of a prescribed idea, whereas the Edmonds-Karp algorithm is actually a fully implemented and optimized version of the Ford-Fulkerson algorithm.
The inputs to the problem consist of a flow network (a graph), a flow capacity, a source node, and a sink node.
To solve the problem you start by assigning all edges a value of 0. Using a breadth-first search approach to traversing the graph, you can generate a list of possible paths to get to the sink node from the source node. In the process of "searching", you are stopping when you find a path that reaches the sink node. At that point in time you will figure out the minimum capacity among the edges from the source to the sink in that path, and add that minumum value to the edges in that path. Your search then continues, but the key here is that, as you continue determining the minimum capacity of edges in a path, you use the remaining capacity of the edges, not the actual capacity. So if you have already assigned an edge a value of 3 and its capacity is 5, then its new capacity is 2 for all future calculations.
The time complexity of the Edmonds-Karp algorithm is \( O(VE^2) \), where V is the number of nodes, and E is the number of edges.
Find the minimum cut of a graph (Karger's Algorithm)
Recall that the minimum cut of a graph is a slice through the graph which intersects the least amount of edges in the graph.
Karger's algorithm picks a neighboring pair of vertices at random and contracts the edge connecting them (the two nodes are merged, and their edges are moved to the new node), then repeats the process on the newly created graph until there are only two vertices left. The determined cut is then the one which goes through the edges connecting the final two vertices. The algorithm follows the monte carlo formula at its core, and thus does not guarantee the correct answer, but has a high probably of finding the correct answer, especially if repeated a sufficient number of times.
As a monte carlo algorithm, the probability of success is more important than the runtime, as you will want to run it enough times to increase the probability of success to within whatever your personal threshold is. More info here
Find the minimum spanning tree of an undirected, edge-weighted graph (Kruskal's Algorithm, Prim's Algorithm)
The main difference between Kruskal's algorithm and Prim's algorithm is that Kruskal's will work on a disconnected graph (it can find the minimum spanning forest). Also, against a sufficiently dense graph, Prim's algorithm can be made to run in linear time.
Recall that a minimum spanning tree is a subset of the edges of a weighed, undirected graph that connects all the vertices together without containing any cycles and with the minimum weight possible.
Kruskal's algorithm starts by creating a list of forests consisting of one node from the graph (each node is its own forest). Then, the edges in the graph are sorted by weight. One-by-one, the lightest edge is remove from the graph. If that edge connects two forests, then it is added to the minimum spanning tree, and the two forests are joined into one forest. This process is repeated until all a spanning tree is established. This is essentially a modified greedy algorithm.
Kruskal's algorithm runs in \( O(E \log E) \) time.
Prim's algorithm is a more of a pure greedy algorithm. Starting from an arbitrary vertex in the graph, choose the minimum weighted edge connected to that vertex that does not connect to a vertex already covered by the spanning tree in progress.
Prim's algorithm [typically] runs in \( O(E + V \log V) \) time (though it depends on the data structure used to implement it).
Find the shortest path between two points in a weighted graph with only non-negative weights (Dijkstra's Algorithm)
Dijsktra's algorithm is a classic algorithm for finding the shortest path between two points in a weight graph. The applications of the algorithm are plentiful. This algorithm is the fastest knwon for finding the shortest path in an arbitrary directed graph with unbounded and non-negative edge weights. However, for more specialized cases (bounded edge weights, integer-only edge weights, directed acyclic graphs, etc.), there also exist more specialized variants that can do them a little faster.
Dijsktra's algorithm is as follows: All nodes are marked as having a distance from the source node of infinity and are unvisited. The source node is then given a distance value of 0 (it is 0 distance from itself). Starting at the source node, all neighboring nodes are analyzed to determine what their distance from the source would be if the path goes through them: the neighboring nodes are given a tentative distance value equal to the current node's distance value plus the distance it is from the current node; If the tenative value is less than its current value then its current value is replaced with the tentative one. Once all neighbors are analyzed, the current node is set as visited. The process is then repeated with the node that currently has the lowest distance value that has not been visited. Once the target node is reached, the algorithm ends, and you simply walk backwards, choosing the nodes with the shortest assigned distance, until you are back at the source in order to determine the shortest path.
Dijkstra's algorithm runs in \( O(E + V \log V) \) time when implemented with a fibonacci heap, but the amortized runtime can be improved to \( O(E + V \log \frac E V \log V) \) using binary heaps.
Find the shortest path between two points in a weighted graph (Bellman-Ford Algorithm)
The Bellman-Ford algorithm solves the same problem as Dijkstra's algorithm, except that it can accommodate negative edge weights. The caveat is that if there is a cycle in the graph that totals to a negative value from edge weights, then obviously the solution would involve indefinitely following that loop.
While Dijkstra's algorithm greedily selects the closest vertex that has not been visited after assigning distances, the Bellman-Ford algorithm simply repeats the process for every vertex, thus doing away with the need to denote vertices as "visited". At the end of the process, the same walk backwards from the target is performed. The algorithm thus takes \( O(E \cdot V) \) time.
Solve the traveling salesman problem (Nearest Neighbor Algorithm, Brute Force Algorithm)
The goal of the traveling salesman problem is to determine the fastest route a salesman can take that goes through all the different cities they want to visit and then returns to the starting city.
There are two approaches to this problem: The brute force approach and the nearest neighbor approach.
The nearest neighbor algorithm uses the greedy method to select the nearest city to the one you are currently at that has not already been visited. This is only an approximate solution.
The brute force algorithm simply calculates the distance of every possible path (stopping if the path goes longer than the current minimum).
This problem is used as a benchmark for many optimization methods, as it is computationally difficult and applicable to many real-world problems.
Solve the knight's tour problem (Warnsdorff's Rule)
The knight's tour problem has the goal of determining what sequence of moves a knight can make on a chess board that brings it to every square exactly once, ending on the square it started on. It is an instance of the more general Hamiltonian path problem in graph theory (try to hit every vertex in a graph exactly once).
Warnsdorff's rule is heuristic that dictates one should choose to move the knight to the square from which it will have the fewest possible moves (in essence being a greedy method solution). In practice, not all Hamiltonian path problems can be solved in this way, but there are many special cases where the heuristic does apply, and thus it's not a bad idea to try applying this heuristic if you encounter such a problem.
Traverse a graph in search of a specific vertex or value (Breadth-First Search, Depth-First Search, Best-First Search)
Breadth-First Search
Check the starting node, then check each adjacent node to the start, then check each adjacent node to the adjacent nodes, and so on. In a tree, this amounts to searching an entire level before moving on to the next.
\( O(V + E) \) time and O(V) space.
Depth-First Search
Check the starting node, then, for each adjacent node, repeat this process. In a tree this amounts to searching a lineage/branch before moving on to the next.
\( O(V + E) \) time and O(V) space.
Best-First Search
Choose which node to explore next by some pre-determined rule. The runtime is thus not constant.
Traverse a tree in search of a specific node or value (Breadth-First Search, Depth-First Search, Best-First Search, Pre-Order Traversal, In-Order Traversal, Post-Order Traversal)
Trees have the same standard traversal methods as graphs (because trees are, in effect, a specialized graph), but the starting node is always the root. Additionally, there are some common orderings when performing a depth-first search:
Pre-Order Traversal
function POT(root)
nodes.add(root)
POT(root.left)
POT (root.right)
In-Order Traversal
function IOT(root)
IOT(root.left)
nodes.add(root)
IOT(root.right)
Post-Order Traversal
function POT(root)
POT(root.left)
POT(root.right)
nodes.add(root)
Find the largest clique in a graph (MaxClique/MaxCliqueDyn Algorithm)
Recall that a clique is a subset of vertices in a graph which are all adjacent to each other (also called a complete subgraph). To find a max clique is to find a clique in a graph which contains the most nodes of any clique in the graph.
The MaxCliqueDyn algorithm is variation of MaxClique which can handle dynamically varying bounds on the clique size. More info here
Sequence
Find the \( k^{th} \) smallest element in an unordered sequence (Quickselect)
Quickselect operates on bounds and a pivot, with the first bounds being the first and last indices of the sequence. A pivot is chosen at random from within the bounds, then the sequence is iterated through and moves elements smaller than the pivot to the front of the list. Once the sequence is iterated through, it is read from last-to-first until a value smaller than the pivot is found, at which point the pivot is placed there in the sequence (so now the pivot is in the correct position as if the sequence was sorted). Now, if the pivot is in position k, then it is returned. Otherwise, if the pivot is in a position greater than k, the process is repeated with the current pivot as the new right bound and the start of the sequence as the left bound; if the pivot is in a position less than k, the process is repeated with the current pivot as the new left bound and the end of the sequence as the right bound. In either non-k situation, a new pivot is chosen.
Quickselect runs in \( O(n) \) time on average, but \( O(n^2) \) time in the worst case. It is able to be implemented in-place (i.e. without needing to use an auxilliary data structure), and is used frequently in real-world applications.
Find an item in an unordered sequence (Linear/Sequential/Brute Force Search)
A most basic of algorithms: if your item is not the first item, then check the next item and repeat. \( O(n) \) time. Included to illustrate brute forcing.
Randomly shuffle a finite set (Fisher-Yates/Knuth Shuffle)
The Fisher-Yates shuffle (A.K.A. the Knuth shuffle) is an algorithm for generation a random permutation of a finite sequence. The basic idea is as follows:
function shuffle(arr[n])
for i = 0...n-2
pick a random number j between i and n-1 (inclusive)
swap arr[i] and arr[j]
Simple enough and \( O(n) \) time.
Find the sequence alignment between two strings that results in the shortest Levenshtein distance, using whitespace as a buffer (Hirschberg's Algorithm, Needleman-Wunsch Algorithm)
Recall that the Levenshtein distance between two strings is the amount of single-character edits (insertion, deletion, or substitution) required to transform one string into the other.
Hirschberg's algorithm is a divide & conquer twist on the classic dynamic programming model of the Needleman-Wunsch algorithm (which has a worse space requirement of \( O(nm) \)).
The Needleman-Wunsch algorithm is as follows:
function getLevenshteinDistance(str1, str2)
initialize dist[n][m]
for i -> 0...n
for j -> 0...m
if i is 0
dist[i][j] = 0
else if j is 0
dist[i][j] = 0
else
dist[i][j] = min(dist[i-1][j-1], dist[i-1][j], dist[i][j-1])
if (str1[i] != str2[j])
dist[i][j] = dist[i][j] + 1
return dist[n-1][m-1]
To get the steps required to transform one string into another, you can follow the path of minimum values (going vertically, horizontally, or diagonally) from dist[n-1][m-1] down to dist[0][0]. When the values of the strings in those positions are the same, no change is needed. When i and j are different then, if i is less than j, make an insertion between str1[i] and str1[i + 1], and if i is greater than j, make an insertion between str2[j] and str2[j + 1]. When the values are the same then a substitution is required.
The Hirschberg algorithm seeks to apply divide and conquer on the Needleman-Wunsch algorithm (finding the distance between the first half of the first string and first half of the second, and combining the with the results of the other halves, recursively) in order to save space. It takes \( O(nm) \) time and uses \( O(min(n,m)) \) space, where n is the length of the first string, and m is the length of the second.
Determine whether a string contains another given string (Boyer-Moore Algorithm)
The traditional method of finding a substring within a string is as follows:
function hasSubstring(str, substr)
if (substr.length > str.length)
return false
for (i = 0...m)
if (str[i] != substr[i])
return hasSubstring(str[1->n], substr);
return true;
Which runs in \( O(nm) \) time.
The Boyer-Moore algorithm improves upon that by changing the recursive calls. The realization is that if you compare the substring to the string from back-to-front instead of from front to back then you can optimize the shift you make for the recursive call if the substring is not a match. The worst case runtime remains the same, but with the application of something known as the Galil rule, it can be reduced to linear time. More information about the Boyer-Moore algorithm is here.
BONUS: A similar question is: does this string contain the characters of this other string in order?
The algorithm is as follows:
function hasCharsInOrder(str, chars)
if chars.length > str.length
return false
else if str[0] == chars[0]
return hasCharsInOrder(str[1->n], chars[1->m])
else
return hasCharsInOrder(str[1->n], chars)
It runs in \( O(n + m) \) time.
Sorting
All the algorithms in this chapter accomplish the same goal by different means: sort the elements of a list.
Bubble Sort
Bubble sort is a pretty rudimentary algorithm primarily used for educational purposes (which is what this guide is for, so it appears here). It's a very straightforward process. Go through the list; if the current element is greater than the next element then swap them, else continue. Repeat that process until the list is sorted. Even in the average case it still takes \( O(n^2) \) time.
Quicksort
Quicksort is very common sorting algorithm in the real world. It is a divide and conquer algorithm that starts by selecting a pivot element from the list and then going through the list, categorizing all other elements as "less than", "equal to", or "greater than" the pivot element. The sub-arrays are then sorted recursively, and combined at the end:
function quicksort(arr)
pivot = random i in 0 -> n-1
left = new array
center = new array
right = new array
for each element in arr
if element < pivot:
left.add(element)
else if element > pivot:
right.add(element)
else:
center.add(element)
return quicksort(left) + center + quicksort(right)
Quicksort has an \( O(n^2) \) worst case, \( O(n) \) best case, and \( O(n \log n) \) average case. It also has to use \( O(\log n) \) auxilliary space.
Insertion Sort
Insertion sort is compared to the way people usually sort a hand of playing cards.
The premise is to go through the list one-by-one and examine the element at that index. If it is less than the element to its left, then swap the two elements and repeat, otherwise (or if it is at position 0) stop and move on to repeat this process with the next element in the list.
function insertionsort(arr)
for i = 1...n-1
val = arr[i]
for j = i-1...0
if arr[j] > val
arr[j+1] = arr[j]
else
arr[j+1] = val
break
It has an \( O(n^2) \) runtime, although it is perhaps the best sorting algorithm for a list that is alread mostly sorted. It can also be performed in place.
Merge Sort
Merge sort is one of the common real-world sorting algorithms. It is a divide and conquer algorithm, which starts by dividing the list in n sublists of one element each. A single-element list is, of course, sorted by default. From there, pair up each sublist with another and merge them together with the logic as follows:
function merge(arr1, arr2)
i = 0
j = 0
result = new arr
while (i < n || j < n)
if j == n || arr1[i] <= arr2[j]
result.add(arr1[i])
i += 1
else
result.add(arr2[j])
j += 1
return result
The process of pairing up and merging sublists is repeated until there is only one list (the final, sorted list).
This takes \( O(n \log n) \) time in the worst case, and uses \( O(n) \) auxilliary space.
Timsort
Timsort was originally developed for use in the python programming language. It is a stable algorithm, meaning if two equivalent (equal by comparison, but different, perhaps by reference), values are in the list, their relative ordering will be maintained in the final product.
It is heavily based on merge sort and insertion sort. More information here.
In the worst case it takes \( O(n \log n) \) time. It has an advantage over quicksort when sorting object references or pointers.
Bucket Sort
Bucket sort works by creating k buckets representing a value or range of values. The list is then iterated through, placing the values in their relevant buckets, and then at the end the buckets are combined in order to create the sorted (or semi-sorted in the case that buckets represent ranges) list.
In most cases this is an inefficient sorting algorithm, taking at worst \( O(n^2) \) time, and on average \( O(n + \frac {n^2} k + k) \) time. In the case that k = n the runtime can be \( O(n) \), but the higher k is, the more space is used. The worst-case space complexity would be \( O(n \cdot k) \).
This algorithm is most useful when the length of the list exceeds the possible values. For instance, sorting a large amount of people by age.
Radix Sort
Radix sort is a form of bucket sort that avoids comparisons by bucketing based on the radix of the list elements, starting either at the most significant digit or the least significant.
It runs in \( O(w \cdot n) \) time, and has equivalent space complexity, where w is the number of bits required to store the keys for the elements based on their radix. More info here.
Selection Sort
Selection sort is an in-place sorting algorithm.
It is inefficient on large lists, and generally performs worse than both insertion sort and heapsort, but has advantages in its simplicity.
The general process is starting at index 0, find the smallest element and swap it with that index, then move to index 1 and repeat with the remaining unsorted elements.
It runs in \( O(n^2) \) time.
Heapsort
Heapsort is an improvement upon selection sort. In the step where selection sort finds the smallest element, heapsort keeps the remaining (unsorted) elements in a heap so as to make the process of finding the minimum unsorted value much faster. The worst case scenario is improved to \( O(n \log n) \). Because it uses an array implementation of a heap on the remaining unsorted sequence, it only needs to use a constant of auxilliary space.
Other
Find the product of two square matrices (Strassen Algorithm)
The Strassen Algorithm for matrix multiplication is not the fastest known algorithm, but most faster algorithms are very complex and derived from the Strassen one, so it's worth knowing nonetheless.
The standard way of multiplying two 4x4 matrices, A and B (\( A_{1,2} \) corresponds to matrix A, row 1, column 2), is as follows:
- \( C_{1,1} = A_{1,1}B_{1,1} + A_{1,2}B_{2,1} \)
- \( C_{1,2} = A_{1,1}B_{1,2} + A_{1,2}B_{2,2} \)
- \( C_{2,1} = A_{2,1}B_{1,1} + A_{2,2}B_{2,1} \)
- \( C_{2,2} = A_{2,1}B_{1,2} + A_{2,2}B_{2,2} \)
This method requires 8 multiplications. The Strassen algorithm takes advantage of the fact that there are shared multiplications among the 8, and it can be narrowed down to 7:
- \( M_1 = (A_{1,1} + A_{2,2})(B_{1,1} + B_{2,2}) \)
- \( M_1 = (A_{2,1} + A_{2,2})B_{1,1} \)
- \( M_1 = A_{1,1}(B_{1,2} - B_{2,2}) \)
- \( M_1 = A_{2,2}(B_{2,1} - B_{1,1}) \)
- \( M_1 = (A_{1,1} + A_{1,2})B_{2,2} \)
- \( M_1 = (A_{2,1} - A_{1,1})(B_{1,1} + B_{1,2}) \)
- \( M_1 = (A_{1,2} - A_{2,2})(B_{2,1} + B_{2,2}) \)
Which are used as follows:
- \( C_{1,1} = M_1 + M_4 - M_5 + M_7 \)
- \( C_{1,2} = M_3 + M_5 \)
- \( C_{2,1} = M_2 + M_4 \)
- \( C_{2,2} = M_1 - M_2 + M_3 + M_6 \)
To calculate the product of 2 matrices of size NxN where N 2 1, the Strassen algorithm divides each matrix into a 2x2 matrix of submatrices. In the case of odd N, there may be some 1x1 submatrices. The submatrices are multiplied recursively and so on until the result of the base matrix is produced.
It excels in multiplying matrices where \( N = 2^i \) for some integer i. It runs in \( O(N^{2.8074...}) \) time, but that's probably not worth memorizing.
Apply a Fourier Transform (Cooley-Turkey Algorithm, Fast Fourier Transform)
A Fourier transform has many uses in the real world, but it is a complex thing. The Fast Fourier Transform is an optimal way to derive a Fourier transform, and the Cooley-Turkey algorithm is an algorithm that uses that method. More info here.
Given the feedback history of a function, a current value, and a desired value, determine what new value applied to the function will likely yield the desired value (PID Controller)
As the section title suggests, PID controllers estimate what input to a function will yield the desired output based on the history of the function's inputs and outputs, and which state the context is in.
Another complex operation involving math principles. More info here.
Factor an integer (General Number Field Sieve)
The General Number Field Sieve is a classical way of factoring integers. Another complex operation involving math principles. More info here.
Data Stores
CAP Theorem
It is impossible for a distributed data store to guarantee more than 2 of the following:
- Consistency: Every read gets the most recent write (or an error).
- Availability: Every request receives a non-error response.
- Partition Tolerance: The system will function regardless of time delay or dropped messages.
It is implied that a data store with network partitioning will therefore have to choose between consistency and availability.
ACID Transactions
An ACID transaction is one which satisfies the following 4 characteristics:
- Atomicity: Each transaction, regardless of how complex, will either completely succeed or completely fail, so there are no partial updates to the data store.
- Consistency: Transactions can only bring the database from one valid state to another.
- Isolation: Transactions processed concurrently are guaranteed to have the same end result as if they were processed sequentially.
- Durability: Once a transaction has been committed it will remain committed, regardless of system availability (i.e. data is stored in non-volatile memory).
Note the difference between the "consistency" of an acid transaction and "consistency" in the CAP theorem.
BASE Models
A data store that follows the BASE model is one which satisfies the following 3 characteristics:
- Basic Availability: The data store stays available and able to process transactions, even if some nodes/clusters/partitions have failed.
- Soft State: Consistency of data is delegated to developers instead of being handled by the database.
- Eventual Consistency: At some point, data will converge to a consistent state.
The "eventual consistency" clause need only be proven theoretically.
Joins
Traditionally, a join is an operation that combines data from two tables in a relational database management system. In practice though, many NoSQL environments can accommodate joins in some form, even if it is not built-in functionality.
Inner Join
Common Syntax: SELECT Employee.Name, Department.Name FROM Employees AS Employee INNER JOIN Departments AS Department ON Employee.DepartmentId = Department.DepartmentId
Effect: Combines rows in the Employees table with rows in the Departments table that have the same DepartmentId, ignoring nulls and rows with no match.
Left Outer Join
Common Syntax: SELECT Employee.Name, Department.Name FROM Employees AS Employee LEFT OUTER JOIN Departments AS Department ON Employee.DepartmentId = Department.DepartmentId
Effect: Combines rows in the Employees table with rows in the Departments table that have the same DepartmentId. It will keep all rows in the Employees table, and if there is no matching row in the Departments table it will use null for the Department values (Department.Name in this case). It will still ignore nulls and rows with no match from the Departments table.
Right Outer Join
Common Syntax: SELECT Employee.Name, Department.Name FROM Employees AS Employee RIGHT OUTER JOIN Departments AS Department ON Employee.DepartmentId = Department.DepartmentId
Effect: Combines rows in the Departments table with rows in the Employees table that have the same DepartmentId. It will keep all rows in the Departments table, and if there is no matching row in the Employees table it will use null for the Employee values (Employee.Name in this case). It will still ignore nulls and rows with no match from the Employees table. This is not a common join since it is equivalent to a Left Outer Join where the FROM and JOIN tables are swapped.
Full Outer Join
Common Syntax: SELECT Employee.Name, Department.Name FROM Employees AS Employee FULL OUTER JOIN Departments AS Department ON Employee.DepartmentId = Department.DepartmentId
Effect: Produces the union of a Left Outer Join and a Right Outer Join (all rows from both tables will be accounted for, and nulls will be inserted where matches weren't found).
Cross Join (A.K.A. Cartesian Join)
Common Syntax: SELECT Employee.Name, Department.Name FROM Employees AS Employee CROSS JOIN Departments AS Department
Effect: Produces the cartesian product of the two tables. A new row will be created for every combination of rows between the two tables.
Deciding What To Use
So the lingering question is, "When do I use NoSQL, and when do I use SQL?" Well, in general, NoSQL is simply more flexible, and can be used in pretty much any situation. Most of the benefits of SQL are things that can be emulated for NoSQL within the calling code, but that does add complexity. Additionally, in the modern development environment, you can use different databases for different tasks, so keep that in mind.
Here are some things to look for that might indicate SQL being better for your use case:
- You need to accommodate a wide array of complex queries that combine results across different subdomains in your application (e.g. for generating reports).
- You need to ensure ACID compliance (e.g. for financial applications).
- You don't anticipate a lot of changes or growth.
You might instead look into using NoSQL in the following cases:
- You have budget restrictions (SQL is more expensive to scale).
- The data structures being managed are volatile/variable.
- You plan to analyze large amounts of data, but don't need to manipulate/write data in such quantities.
Other, more specific use cases for NoSQL include:
- Event capture and processing.
- Storing data for intelligence engines to use.
Relational Databases
Relational data stores are the ones most people are comfortable with. They can be categorized by the following:
- Data is stored as relationships, in tabular form, usually representing an entity type.
- Robust support for joins.
- Requirement of ACID transactions.
- Out-of-the-box support for relational algebra (including aforementioned joins).
A core aspect of designing relational databases is the normalization of data. That is: The process of structuring a relational database to reduce redundancies in the data and improve data integrity. Normalization is an incremental process, and the varying degrees of normalization are as follows:
- 1NF (First Normal Form): Each column of a table must be atomic (contain non-separable values, such as strings, integers, and dates, but not lists). To achieve this, you can usually split the non-atomic columns into their own tables and use a key to relate them to the original table.
- 2NF (Second Normal Form): Each non-key attribute (column) must be dependent on the whole table key (not just part of it). To achieve this, you can usually move a part of the key and its dependencies to a new table.
- 3NF (Third Normal Form): Non-key attributes in the table are exclusively dependent on the key. In other words: No non-key attribute should depend on another non-key attribute. This can usually be achieved by moving the non-key attribute in question to a new table where its dependency is a key.
- BCNF (Boyce-Codd Normal Form): Each attribute represents a fact about "the key, the whole key, and nothing but the key". This is often fixed by moving the attribute in question to a new table with only the parts of the original table's key that it cares about. This is an intermediary step between 3NF and 4NF but came about after 4NF, which is why it itself is not 4NF.
- 4NF (Fourth Normal Form): Every multi-value dependency (2+ values) is either trivial (every {x, y, ...} combination is unique in the table for that relationship) or the independent key in the dependency is a candidate key (only exists once in the table). This is best demonstrated with an example: Suppose you have a table with a key {Restaurant Name, Pizza Type, Delivery Area}. In this example pizza type and delivery area both depend on the restaurant, but do not depend on each other, so it can be split into two tables, one with a {Restaurant Name, Delivery Area} key, and one with a {Restaurant Name, Pizza Type} key.
- 5NF (Fifth Normal Form): Every non-trivial join relationship is implied by the keys in the table. You can verify 5NF by trying to split a table into new tables and then joining the split tables together. If there is no way to split the original table in a way that the new tables can be joined together to look identical to the original table, then that table satisfies 5NF.
There are actually other normal forms, but they are more nuanced and not as frequently followed.
NoSQL
NoSQL databases are exactly what their title claims they are: non-SQL databases. The hallmarks of a NoSQL database are as follows:
- Data storage is flexible by virtue of not being tied to relationships.
- Adherance to the BASE model.
- Efficient implementations of partitioning, clustering, etc. for distribution of data.
Unlike relational systems, which are commonly designed around the entities and normalization of data (relationships are handled later via joins), NoSQL databases are commonly designed with runtime queries in mind. Stores might be created specifically for one query, and there may be redundancies across stores for that reason. Often, redundancies can be avoided, but sometimes they can't be in the name of performance.
There are many NoSQL data stores to choose from, but the most common ones fall under one of these categories:
- Key-Value: These act like dictionaries; They tie a key to a value. These are good for straight-forward lookups.
- Document: Document stores are a dubclass of Key-Value stores wherein all the data is contained in a document, which could have any structure desired (similar to objects in programming). Documents in a database usually do not need to have matching schema. The data in key-value stores are inherently opaque to the database, whereas a document store relies on the internal structure of the document to extract metadata that the database engine can use to optimize transactions. This difference is usually moot.
- Columnar: Columnar data stores store data in columns instead of rows. Traditional, row-based storage will serialize complete records together (e.g. for a "Person" with a "Name" and "Age", you might see
John Smith, 30andDaisy Hopkins, 29as the rows), whereas a columnar storage will serialize complete columns together (for the previous example:Record 1: John Smith, Record 2: Daisy HopkinsandRecord 1: 30, Record 2: 29). The traditional, row-based storage lends itself to drawing relationships between tables, but for simple queries, the columnar store will be much more performant. - Graph: Graphs store data...in a graph! This adds a layer of relationships to objects in the store, making it useful for storing information about networks and groups, as well as lending itself to pathfinding queries.
Design Patterns
Creational
Structural
Behavioral
Functional
Concurrency
Architectural
Cloud Distributed
Testing
Other
Anti-Patterns
Interview Prep
In addition to the questions covered in this chapter, which can generally be studied via flash cards and practice, here are some general interviewing tips:
- Update your personal website (don't have one? make one)
- Update any professional social media profiles (e.g. LinkedIn)
- Update your resume (create a copy specifically for the position/company you're applying to) and CV
- Research the company
- Write a cover letter and send it either with your resume or after you submit your resume (you can look up what to write, there are many examples online)
- If the interview is planned more than a week in advance, it may be a good idea to write a confirmation letter to the interviewer a day or two before the interview. For example: >I'm writing to confirm our meeting for Thursday at 9:30 AM. I look forward to discussing my application for the software engineering position at [Company Name]. Please contact me if you have any last-minute questions or comments.
- Bring a clean notepad and pen, copies of your resume, a list of references (if not included on resume), ID and Social Security card (in case you're asked to complete an application on the spot)
- Write down the name of your interviewer(s) and the position and company you are interviewing for beforehand if you think you might forget any of those
- Turn off your phone for the interview
- Write a followup letter after the interview (next day) (you can look up what to write, there are many examples online)
Personal/Anecdotal Questions
Your goal when answering any of these questions is basically to talk about how amazing you are without sounding bragadocious.
In addition to the below, common questions, anything on your resume is fair game; if you talked about specific projects on your resume, you should be prepared to answer questions about them.
Why are you interested in this position?
Talk about you, and mention how passionate/enthusiastic you are about this position.
What makes you a good fit for this position?
If possible, connect your skills directly to a problem the company has, and describe how you can solve it.
What are your greatest strengths?
Brag about yourself, but make it relevant to the position.
What are your greatest weaknesses?
Your "weaknesses" should actually be strengths, but phrased like something you want to improve upon.
Sometimes I like to include a bit about how I've improved myself based on peer/supervisor feedback to demonstrate a drive to improve. Doing so requires a bit more honesty, since you probably wont get feedback about "weaknesses that aren't really weaknesses" from your coworkers, so I don't do this with managerial interviewers, but I might throw it in when I'm interviewing with people that would be peers if I got the job. Also, you might specifically get a question about what kind of criticism you've had from coworkers, in which case it's probably best to be honest but choose the least bad piece of criticism you've received.
How would you handle a dispute with a co-worker?
Generic answer: compromise.
Think of a project you worked on...
- What about the project was challenging? / What challenges did you face on the project?
- What was an obstacle that you faced while working on the project, and how did you overcome it?
- What did you learn from the project?
Pick a story where you're the hero if possible. If you aren't specifically asked to talk about how the project helped you grow, you should still talk about it anyways.
What made you want to leave your previous/current job?
Whatever the real reason is, just say you're looking for "greener pastures" (i.e. a place to grow and expand your horizons). If you were to trash talk your old office it would make you look like a gossiper, and while there are other inoffensive and legitimate reasons to want to leave a job, none will likely sound as good as saying you want to grow as a professional and that you couldn't continue to do so at your old/current job.
What would you change/improve about your previous/current job?
This is a hard question to prepare for. You don't want to trash talk your old workplace for the reasons mentioned above, or lie about a weakness in your old company that doesn't exist, and you also don't want to propose a change that highlights a potential weakness in the company you're applying to. A good example would be upward movement: many small companies don't have programs to elevate your position, whereas larger companies do, so if you're leaving a small company and applying to a big one you could say that. Long story short: bring up an honest disadvantage of your previous job, and mention a way to improve it that is already implemented at the company you're applying to.
Do you think GPA an accurate representation of work ethic?
Don't take a side, even if your GPA is awesome or garbage. Say that the two might be/probably are correlated, but a good GPA doesn't necessarily indicate a good work ethic.
What's your opinion of [political/religious topic]
Regardless of your opinion, lie if necessary and take a middle road stance. Questions about politics and religion have no place in an interview, and is even illegal in many places. You may be within your rights to sue the company if you don't get the job since and it seems like it could have been a result of this question. Even if you get the job, the kind of place that asks this question probably isn't the kind you should be working for.
What are your hobbies outside of work?
Be honest, but pick your most attractive hobbies from a professional's perspective. Playing video games isn't really an attractive hobby. Any hobby in which you make something (whether it's coding or not) would be good to include on your list.
Occupational Questions
These are question that will test your knowledge of the trade and your problem solving skills.
Short Answer
Note: In addition to what is provided here, you may be asked questions about anything you have put on your resume (e.g. How does X work in the Y programming language?).
Q: What is \( 2^x \)? Note: \( x \) will rarely be more than 12.
Answer
2, 4, 8, 16, 32, 64, etc.Q: What is the difference between a struct and a class, and what might you use a struct for? Note: This is somewhat language-dependent (not all languages even have both).
Answer
Structs usually give their properties and methods public modifiers by default, and in some modern languages they represent value types as opposed to reference types. Usually structs are used for simple objects or types, which don't need complex methods. In general, a struct might be appropriate when the data type is going to be small, and either it is expected to be short-lived or it is expected to be frequently embedded in other objects.Q: What is a static class/method/variable?
Answer
- Class: A static class is one which cannot be instantiated.
- Method: A static method is one which is not tied to an instance of an object.
- Variable: A static variable is one which is shared across instances of an object.
Q: What do the 'public', 'protected', 'private', and 'internal' keywords denote?
Answer
These keywords refer to the accessibility of their subjects:- Public: Anyone can access this from anywhere.
- Protected: When present in languages, it usually means only child classes can access this, and they can do so from anywhere.
- Private: Only this object can access this.
- Internal: When present in languages, it usually means anyone can access this, but only from within this package/library.
Q: What does being immutable entail, and what keywords can modify mutability?
Answer
If something is immutable then it cannot be changed.- The 'const' keyword can make its subject immutable at compile time (in addition to being static).
- The 'readonly', when present in a language, can make its subject immutable at run time. Its value can still only be set once, and usually only in the constructor of a class.
Q: What does the 'final' keyword denote?
Answer
When present in a language, it prevents a class or method from being inherited or overriden, respectively.Q: What is idempotence?
Answer
A function is said to be idempotent if repeated function calls with the same input will only change the state of the state of the application once.- For example: getting a value and setting a value are idempotent, the end result is always the same after the first call is made, regardless of repetition. However, adding to a list is not idempotent, because repeated calls to add will keep modifying the list further.
- In mathematical terms, a function \\( f \\) is idempotent if \\( f o f = f \\).
Q: What is lazy bind/loading?
Answer
Waiting until a resource is needed before you bind/load it.Q: What is reflection?
Answer
Code's ability to inspect itself or other code in the same program.Q: What are the 4 pillars of object-oriented design?
Answer
- Abstraction: The process of showing only the necessary features of an object to other code.
- Inheritance: The ability to make a hierarchy of classes wherein child classes acquire certain properties and behaviors of parent classes.
- Polymorphism: The ability of a child class to both share and extend the properties and behavior of a parent class.
- Encapsulation: The process of wrapping properties and behavior in an object so that access to each property and function can be controlled, and so that properties and behavior can be related to each other.
Q: What's the difference between inheriting and interfacing?
Answer
Inheritance defines what an object is, whereas interfacing defines what an object can do.Q: What is a pure function?
Answer
A function with no side-effects that always returns the same output for a given input.Q: What is NP-Completeness?
Answer
A problem is said to be NP if it has a non-deterministic polynomial-time solution, and a problem is said to be NP-Hard if every NP problem can be transformed into it in polynomial time. For a problem to be NP-Complete means that it is both NP and NP-Hard.Q: What is a lock/mutex?
Answer
A mechanism for controlling access to a resource. Unlike semaphores, only one task can gain access to a lock/mutex at a time in order to access the resource.Q: What is a semaphore?
Answer
A variable used to signal the availability of a resource. Unlike locks/mutexes, a semaphore can allow multiple tasks to access a resource; a semaphore value of 0 indicates that no other tasks are allowed access to the resource, and a value of more than 0 allows that many more tasks to access the resource. Consequently, a binary semaphore (a semaphore only allowed to be 0 or 1) can be used to implement a lock/mutex.Q: Explain Agile software development.
Answer
Agile software development is a practice that promotes self-organization and cross-functionality within teams. It puts an emphasis on continual improvement and constant interactions with end users to be able to adjust the product as needed.Whiteboard
Competitive tech companies are notorious for giving computer science applicants a barrage of whiteboard problems. The thing that distinguishes these problems from other common interview problems is their long form and focus on critical thinking, as opposed to rote memorization. These come in the form of "here's what you know, here's what you want to know, write an algorithm to find out what you want to know." You probably wont be given a computer, you probably wont be allowed to use your phone, and you probably wont be given a piece of paper; It's just you and a whiteboard.
Even if you're well-acquainted with the algorithms chapter you may still find these problems difficult, but there are some steps you can take alleviate the challenge and impress the interviewer:
- Know the inputs, outputs, requirements, and constraints.
- Ask the interviewer about these if you are unsure.
- Usually you aren't given the method signature of the solution. Instead, the question might be simply, "Given a list of people, sort them by age". Here, you can ask questions like, "What should a person object look like?" (they might just have you create it to see what you come up with). Such questions can apply to outputs as well. Even better, you could ask, "Can I assume that the maximum age of a person is 150?" That question could be the difference between implementing quick sort and implementing bucket sort.
- Common requirements and constraints might revolve around runtime, complexity, memory usage, etc. It's likely that you wont receive such requirements/constraints if they are not mentioned by the interviewer. However, if you solve the problem quickly, they may try to expand upon the question by asking you to solve it under certain constraints or with certain requirements.
- Think out loud. Interviewers want to hear your thought process, and sometimes they might correct your course if they can tell you're going down the wrong path.
- Don't try to write syntactically correct programs (unless asked). Write pseudo-code. It's faster, and you reduce the risk of syntactical errors ruining your solution.
- If you can't think of a solution immediately, then start by trying to recall any algorithms that might solve the problem with a little transformation. A well-known example of this is the bipartite matching problem, which can be transformed into a maximum flow problem, which has a known algorithmic solution.
- If you can't recall an algorithm that matches this problem, then pick a top-level algorithm and design a new algorithm yourself (refresher: top-level algorithms include brute force, divide & conquer, greedy method, and dynamic programming).
- Don't forget about recursion. Interviewers seem to love recursion. If you can solve a problem in a simple way with recursion, do it!
- Address all possible inputs. Usually edge cases can be solved with simple
ifstatements, so it can be beneficial to address them last if time is a concern.- Remember: negatives, zeros, positives, minimums, maximums. Aside from those, it's situational (e.g. if using division with a variable denominator, you need to check that the denominator can never be 0).
- If you solve a problem quickly, and you notice that there are places in the solution that an exception could be thrown if it were converted to real code on a machine, then you might sound smarter if you ask if you should handle possible exceptions. This is rarely encountered, and rarely a requirement, just something to keep in mind if you breezed through the problem.
NOTE: Problems covered in the algorithms chapter and in the general knowledge chapter are not included (e.g. Traveling Salesman, Levenshtein Distance, Value Swap).
Activity Selection
Given n activities and their start/end times, determine the maximum number of activities (and which activities those are) that a single person can participate in, assuming they cannot do multiple activities at once.
A variation of this would be to not maximize the number of activities, but rather minimize the time in which you are not participating in an activity.
Huffman Encoding Tree
A Huffman code is a type of prefix code used for lossless compression.
Given a set of symbols and their weights, find a set of codewords with minimum expected length (a tree with minimum weighted path length from the root).
Basically, you want to create a tree where each leaf node is one of the symbols given. This is most easily made as a binary tree on paper, and generating the code for a symbol would thus involve treating left branches as adding a 0 and right branches as adding a 1. In general, you would like symbols weightest heaviest to have the smallest code, since they will be encountered the most often when compressing a file.
It is also essential that codes cannot be confused. For example, if one symbol had the code 00 and another had the code 001 and another 1, then it would be impossible to tell in an encoding whether it means two symbols (00 and 1) or one (001). For this reason, the symbols will all need to be leaves in the tree.
For example: given a = 0.1, b = 0.15, c = 0.30, d = 0.16, and 3 = 0.29, then the huffman codes would be 010, 011, 11, 00, and 10, in that order.
Closest Pair of Points
Given a set of points in a two dimension plane, determine which pair of points is the closest.
You can, of course, challenge yourself to do this in 3-dimensional space, or beyond.
Longest Increasing Subsequence
Given an array of numbers, determine the start and stop index of the longest sequence of increasing numbers in the array (index i must be greater than index i - 1 for each i in the sequence).
Longest Common Subsequence
Given two strings, determine the longest substring the two share.
Hamming Distance
Perhaps a bit trivial: Given to equal-length strings, determine how many substitutions it would take to convert one to the other.
Bipartite Matching
Given n applicants and m jobs, as well as the knowledge of which jobs the applicants have applied to, determine the maxmimum number of applicants that can end up with jobs, and to which job each applicant would go in that case.
Graph Coloring
Given a graph and n colors, find a way to color each vertex of the graph such that no adjacent vertices share a color.
Topological Sorting
List the vertices of a directed acyclic graph in such a way that, for every edge uv, vertex u comes before vertex v. There can be more than one correct solution for some graphs.
Point Containment
Given a series of points on a 2-dimensional plane, determine the corners of the smallest box that encapsulates all points.
Can you extend this to 3-dimensional space?
Point Containment 2
Given a polgyon in a 2-dimensional plane, and a point, determine if the point is within the polgyon.
Can you extend this to 3-dimensional space?
Closest Point
Given a starting point in 2-dimensional space and a list of other points in that space, determine which of the "other" points is closest to the starting point.
Can you extend this to 3-dimensional space?
Knapsack
Given a weight limit and a list of objects that have a value and a weight, determine the maximum value of objects you can take without exceeding the weight limit.
FizzBuzz
Print the numbers from 1 to 100, except print "fizz" instead of multiples of 3, print "buzz" instead of multiples of 5, and print "fizzbuzz" instead of multiples of both 3 and 5.
Parentheses Validation
Given a string containing only parentheses, determine if each parenthese is matched correctly.
For example: (()())((())) is correct, but )(())()( is not. Essentially, each open parenthese must have a closed parenthese after it, and each close parenthese must have an open parenthese before it (not necessarily immediately before or after).
Generate Parentheses
Given an integer, n, generate a list of all possible strings with n correctly matched parentheses.
For example: When \( n = 3 \), the possible strings will be ((())), (()()), (())(), ()(()), and ()()().
Reverse Integer
Given an integer, n, reverse its digits.
For example: When n is 321, its reverse would be 123.
Factor Combinations
Given a number, n, find all combinations of factors of n, besides 1 and n, that can multiply to produce it.
For example: When n is 12, the factor combinations are (6, 2), (3, 4), and (2, 2, 3), because each of those lists contain only factors of 12, and when they are multiplied together they produce 12. NOTE: (6, 2) and (2, 6) are considered equivalent for this matter, so only one is included.
Fraction to Recurring Decimal
Given a fraction with a recurring decimal, \( \frac a b \), output the decimal value, using an underscore (_) to represent the start of the repeating decimal.
For example: When a is 1 and b is 3, output 0.3_
Anagrams
Given two strings, determine if they are anagrams (two words that contain exactly the same letters, and the same amount of each letter to boot).
As an additional challenge: Given an array of strings, determine if they are all anagrams.
Longest Non-Redundant Substring
Given a string, determine the starting and ending index of the longest contiguous sequence of distinct characters in the string.
For example: In the string abacdbca, we find the longest sequence of distinct characters at acdb (start index 2, end index 5), as well as dbca (start index 4, end index 7).
As an additional challenge: Find the longest substring with at most k distinct characters.
Minimum Path Sum
Given an m x n grid of non-negative numbers, find a path from top left to bottom right which minimizes the sum of all the numbers touched along the path.
For example, the minimum path through
[
[1,3,1],
[1,5,1],
[4,2,1]
]
is (0, 0) -> (0, 1) -> (0, 2) -> (1, 2) -> (2, 2).
Tower of Hanoi
Given m pegs and n discs (smallest to largest from top to bottom) which start on the first peg, determine what steps must be taken to move the discs to peg k and have them end up stacked in the same order in which they began.
There are rules for what steps you can perform, though:
- Only one disc can be moved at a time
- You may only move the top-most disc on a peg
- You may only place a disc on top of a larger disc
The tower of hanoi problem specifically dictates m is 3, although you can challenge yourself to consider more (the same algorithm can be reused, but it could also be made more efficient).
Dining Philosophers
Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.
Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can only take the fork on their right or the one on their left as they become available and they cannot start eating before getting both forks.
Eating is not limited by the remaining amounts of spaghetti or stomach space; an infinite supply and an infinite demand are assumed.
The problem is how to design a discipline of behavior (a concurrent algorithm) such that no philosopher will starve; i.e., each can forever continue to alternate between eating and thinking, assuming that no philosopher can know when others may want to eat or think.
This is more of a conceptual problem, as most programming languages have the functionality to achieve this built-in (you just have to know how to use it correctly). Concurrency was not discussed much in this guide, aside from small explaination in the design patterns chapter. For help you can research semaphores, mutexes, and locks.
Eight Queens
Place n queens on an n x n chessboard such that no two queens threaten each other (no two queens share the same column, row, or diagonal).
If you figure out the solution by hand then it will be straightforward to convert to code, so don't try to solve the problem with guess and check as that would defeat the purpose; try to come up with an algorithmic way of arriving at the solution. Of course, reverse engineering is a valid skill so you could just ignore this advice.
Two Generals
The two generals' problem is another real-world problem, except this time it is merely food for thought, as the problem has no solution. It is therefore not really a whiteboard problem, but you might still be asked about this at an interview and tricked into trying to solve it just to see your thought process.
Suppose there are two generals and their armies on either side of a castle, and they would like to coordinate to plan their attack, as they cannot win if only one army attacks at a time. However, it is a risky venture to contact one another as they will have to send someone through the castle grounds to the other side, and there's no guarantee that the person makes it. One general would like to start the joint attack at 8:00 am, so they send a messenger to the other general and decide "if my messenger does not come back then I will send another, otherwise if my messenger comes back then I know the other general got the message". The other general receives the message and tells the messenger to return once it has responded to the first general, "if the messenger does not come back then I don't know if the other general got my response--I will have to send another messenger". And so you see the conundrum. Neither general can have complete certainty that the other general is certain about the plan.
Scheduling
Given n students and m classes with a capacity and a start/end time, and each student has a known ranking of classes they would like to take, attempt to maximize total happiness amongst by assigning students to classes. The only rules are that the classes cannot exceed their capacity, and students cannot take two classes at the same time.
A variation of this problem says that students need k minutes between classes to travel from classroom to classroom. Another says that some classes are dependent on others...this is a real-world problem so it can be extended in a lot of ways.
Jump Game
Given an array of positive integers, determine whether the "jump game" will end given a starting point, i.
The jump game is where you move a number of "spaces" forward in the array equal to the current position you are at. The game ends if you can reach the last element of the array. The game cannot end if you are unable to move to a valid space, or if you are on a 0 which is not the last position of the array.
There are several variations of this problem that you can tackle:
- The array can contain negative integers, which will cause you to move backwards.
- You can move a number of spaces up to the amount shown on your current space (absolute value to the left if using negative numbers).
- You are allowed to move past the end of the array, and wrap around to the other side (in which case the game will only fail to end if you encounter a 0 or a loop).
- You get to choose whether you want to move that many spaces forward or backward.
- You are not only given a starting index, i, but also an ending index, j (the same rules apply, except the game only ends if you can land on j).
- The game only ends when you land on a 0.
The most complex ruleset for a challenge would be one in which you are given the choice of whether to move forward or backward, as well as the ability to loop around the array, as well as a target ending index.
Bit Reversal
Given an integer, n, reverse its bits.
String Permuations Around Capitals
Given a string, output all permutations of the string wherein capitalized letters remain in the same position.
For example: The string abCd can be permuted as itself and as adCb, baCd, bdCa, daCb, and dbCa.
A variation of this problem may have you only move numbers around within the confines of their surrounding capital letters. Other common variations simply change the pivot criteria (instead of keeping capital letters in the same place, you have to keep hyphens in the same place, and so on).
Find Duplicates
Given an array of integers, find all duplicates (the 2nd or greater instance) of numbers in the array.
Challenge yourself to do this in \( O(n) \) time and \( O(1) \) wasted/extra space.
Tic-Tac-Toe
Implement Tic-Tac-Toe with an N x N board. What this means is to produce the following outputs and take the following inputs:
Output: What is N?
Input [integer]
Output: Would you like to play as X (go first) or O (go second)?
Input: [X or O]
// if user selects O, the program goes first and marks an empty space--in the challenge variation of this problem, the program will always make an optimal move
Output: // after every move the program should print the current state of the NxN board
// after every move the program should analyze whether there are N of one value (X or O) in a row, and if there are, output the winner
Output: Where would you like to place your mark?
Input: [(i, j), where i is the row, and j is the column]
A variation of this problem is: Given an N x N Tic-Tac-Toe board state, determine whether there is a guaranteed way to win for the player whose turn it is (no matter how the opponent plays, the player can win from this position). Traditionally this variation is only given for 3x3 Tic-Tac-Toe, so you can start with the assumption that N is always 3, and then try to program for a generic N.
Indexed Binary String
Given an index, k, find the \( k^{th} \) digit in a binary string. The binary string will be defined as follows:
binString = "0";
binString = binString + ~binString; // recall ~ is the NOT operator in some languages
binString = binString + ~binString;
// etc.
Thus the binary string ends up being 0 -> 01 -> 0110 -> 01101001 -> 0110100110010110...
To provide a more formal definition of the binary string is the problem.
Relationships
Consider the following code (this intentionally contains oddities that don't exist in most langauges):
Class Person {
Public Person mom;
Public Person dad;
Public Person children[];
Public static bool areRelated(Person p1, Person p2, int gens) {
/* gens = number generations removed the two people must be within in order to be considered "related" */
}
}
Your goal is to implement areRelated(...). It should return whether or not the two people are related, based on the generational threshold.
The simple solution to this may yield a lot of wasted space or extraneous recursive calls for high values of gens. Try to think of a way to minimze that cost (of course, you may have already done it in your first solution, so don't spend too much time on this if you can't figure it out!).
PHP Exploit
Consider the following code:
$pin = $_GET['PIN'];
$cmd = 'sudo testcode.py $pin;';
exec($cmd);
How can someone remove all files from the host machine if a web app contains this code?
Followup question: How do you know the system will allow you to run this code without prompting you for an admin password?
Prime Sum
Given an even number greater than 2, validate that it is the sum of 2 primes (this is a true fact) by finding those 2 primes and printing them. NOTE: 1 is not considered a prime number.
Odd One Out
Given a sorted list within which there are exactly two copies of each element, except for one, find which element is the one without a copy.
There is a simple solution to this problem that can be performed in \( O(n) \) time. Your goal is to solve this problem in \( O(\log n) \) time.
Greatest Common Denominator
Find the greatest common denominator of 2 numbers.
Questions To Ask
When it's your turn to ask questions, I've come up with some good (in my opinion) ones. While it's true that having questions might make you look better, it probably doesn't matter much what the questions are, so unlike the other pages, this one isn't meant to be a guide or wiki.
Note: Questions about salary, benefits, etc. should be saved for after you've been offered the job. This is not a resource for negotiation tactics, although such resources do exist.
Note: If the answer to any of these questions is available on the company's website or was included in the job description, then don't ask it; It might just make you look worse for not knowing what you're applying for.
Q: What software development workflows and frameworks do you use?
Workflows = tools, procedures, etc.
Frameworks = waterfall, agile, scrum, etc.
Q: What sorts of projects could I expect to be a part of if I were to work here?
Q: What are parts of the job that you enjoy? (Alternatively: What's your favorite part of working here?)
Not applicable to interviewers who are not in the position you're applying for...
Q: What does a typical work day look like for you? (How much of your day is spent coding? How many meetings do you have every week?)
Again: probably not relevant to ask people who aren't in the position you're applying for...
Q: What technology stacks do you use?
Q: What are some challenges that might be faced by someone just starting to work here?
Q: I know that you guys deal in X (e.g. scalability). Are there opportunities within the company to learn more about X? Should I prepare myself with knowledge of X if I were to work here?
Q: Could you tell me more about X that you use?
Other Resources
Books
Laakmann, Gayle. Cracking the Coding Interview
Martin, Robert C. Clean Code
Gamma, Helm, Johnson, and Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software
Cormen, Leiserson, Rivest, and Stein. Introduction to Algorithms
Abelson, Sussman, and Sussman. Structure and Interpretation of Computer Programs
Knuth, Donald. The Art of Computer Programming
Stroustrup, Bjarne. The C++ Programming Language
Shotts Jr., William E. The Linux Command Line
Hunt and Thomas. The Pragmatic Programmer: From Journeyman to Master
Brooks, Frederick. The Mythical Man-Month
"The Agile Manifesto", https://agilemanifesto.org/