Grokking Algorithms

An Illustrated Guide for Programmers and Other Curious People


  • On Amazon
  • ISBN: 978-1617292231
  • My Rating: 5/10

What is it about?

Grokking Algorithms provides an introduction to some commonly used algorithms and explains them with a lot of hand-drawn illustrations.

My impression

My impression of Grokking Algorithms is mixed. On the one hand, the author does a good job in explaining the concepts. But on the other hand, I found the book too simplistic and I wished there was more meat on the bone. Some parts also feel like filler material, especially the last chapter about ten algorithms not covered in the book.

My notes

Introduction to Algorithms

An algorithm is a set of instructions for accomplishing a task.

Big O notation tells you how fast an algorithm is. For example, suppose you have a list of size n. Simple search needs to check each element, so it will take n operations. The run time in Big O notation is O(n).

Big O notation lets you compare the number of operations. It tells you how fast the algorithm grows.

[...] Big O notation is about the worst-case scenario.

Selection Sort



Recursion is used when it makes the solution clearer. There's no performance benefit to using recursion; in fact, loops are sometimes better for performance.

When you write a recursive function, you have to tell it when to stop recursing. That's why every recursive function has two parts: the base case, and the recursive case. The recursive case is when the function calls itself. The base case is when the function doesn't call itself again ... so it doesn't go into an infinite loop.


D&C algorithms are recursive algorithms. To solve a problem using D&C, there are two steps:

  1. Figure out the base case. This should be the simplest possible case.
  2. Divide or decrease your problem until it becomes the base case.

When you're writing a recursive function involving an array, the base case is often an empty array or an array with one element.

Hash Tables

A hash function is a function where you put in a string [String here means any kind of data - a sequence of bytes] and you get back a number.

[...] there are some requirements for a hash function:

  • It needs to be consistent. For example, suppose you put in "apple" and get back "4". Every time you put in "apple", you should get "4" back.
  • It should map different words to different numbers. For example, a hash function is no goo if it always returns "1" for any word you put in. In the best case, every different word should map to a different number.

The hash function knows how big your array is and only returns valid indexes. So if your array is 5 items, the hash function doesn't return 100 ... that wouldn't be a valid index in the array.

Put a hash function and an array together, and you get a data structure called a hash table.

[...] hashes are good for

  • Modeling relationships from one thing to another thing
  • Filtering out duplicates
  • Caching/memorizing data instead of making your server do work

This is called a collision: two keys have been assigned the same slot. This is a problem. [...] Collisions are bad, and you need to work around them. There are many different ways to deal with collisions. The simplest one is this: if multiple keys map to the same slot, start a linked list at that slot.

If those linked lists get long, it slows down your hash table a lot. But they won't get long if you use a good hash function!

In the average case, hash tables take O(1) for everything. O(1) is called constant time.

In the worst case, a hash table takes O(n) – linear time – for everything, which is really slow. [...] So it's important that you don't hit worst-case performance with hash tables. And to do that, you need to avoid collisions. To avoid collisions, you need

  • A low load factor
  • A good hash function

The load factor of a hash table is easy to calculate. Hash tables use an array for storage, so you count the number of occupied slots in an array.

Once the load factor starts to grow, you need to add more slots to your hash table. This is called resizing.

A good rule of thumb is, resize when your load factor is greater than 0.7.

A good hash function distributes values in the array evenly. A bad hash function groups values together and produces a lot of collisions.

Breadth-First Search

The algorithm to solve a shortest-path problem is called breadth-first search.

A graph models a set of connections. [...] Each graph is made up of nodes and edges. [...] A node can be directly connected to many other nodes. Those nodes are called its neighbors.

[Breadth-first search] can help answer two types of questions:

  • Question type 1: Is there a path from node A to node B?
  • Question type 2: What is the shortest path from node A to node B?

A tree is a special type of graph, where no edges ever point back.

Dijkstra's Algorithm

[...] Dijkstra's algorithm has four steps:

  1. Find the cheapest node. This is the node you can get to in the last amount of time.
  2. Check whether there's a cheaper path to the neighbors of this node. If so, update their costs.
  3. Repeat until you've done this for every node in the graph.
  4. Calculate the final path.

When you work with Dijkstra's algorithm, each edge in the graph has a number associated with it. These are called weights. A graph with weights is called a weighted graph. A graph without weights is called an unweighted graph.

To calculate the shortest path in an unweighted graph, use breadth-first search. To calculate the shortest path in a weighted graph, use Dijkstra's algorithm.

An undirected graph means that both nodes point to each other. That's a cycle! With an undirected graph, each edge adds another cycle. Dijkstra's algorithm only works with directed acyclic graphs, called DAGs for short.

You can't use negative-weight edges with Dijkstra's algorithm. If you want to find the shortest path in a graph that has negative-weight edges, there's an algorithm for that! It's called the Bellman-Ford algorithm.

Greedy Algorithms

A greedy algorithm is simple: at each step, pick the optimal move. [...] In technical terms: at each step you pick the locally optimal solution, and in the end you're left with the globally optimal solution.

When calculating the exact solution will take too much time, an approximation algorithm will work. Approximation algorithms are judged by

  • How fast they are
  • How close they are to the optimal solution
Greedy algorithms are a good choice because not only are they simple to come up with, but that simplicity means they usually run fast, too.

NP-complete problems have no known fast solution. If you have an NP-complete problem, your best bet is to use an approximation algorithm.

Dynamic Programming

Dynamic programming starts by solving subproblems and builds up to solving the big problem. For the knapsack problem, you'll start by solving the problem for smaller knapsacks (or "sub-knapsacks") and then work up to solving the original problem.

Dynamic programming only works when each subproblem is discrete – when it doesn't depend on other subproblems.

Dynamic programming is useful when you're trying to optimize something given a constraint. In the knapsack problem, you had to maximize the value of the goods you stole, constrained by the size of the knapsack.

Every dynamic-programming solution involves a grid.

The values in the cells are usually what you're trying to optimize.

There's no single formula for calculating a dynamic-programming solution.

k-nearest neighbors

These are the two basic things you'll do with KNN – classification and regression:

  • Classification = categorization into a group
  • Regression = predicting a response (like a number)

Feature extraction means converting an item (like a fruit or a user) into a list of numbers that can be compared.

Picking good features is an important part of a successful KNN algorithm.

Where to Go Next