# The Final Exam is comprehensive

This is a guide, not a guarantee.
You are responsible for all material we have done in this course, including your readings, your programming projects, PollEverywhere questions, and all classroom information.

## Material after the Midterm

```
Sets, Maps, implementing them (lists, trees, hashing, etc.)

Balanced BST
Tree balancing:
AVL: imbalance condition, rotations to re-balance
rotations alter tree structure but maintain the order property
worst case time complexity for insert, remove, contains
show how AVL tree is altered by insert, remove

Splay: another approach to balancing
splaying done any time a node is accessed
splaying is applying preserving rotations until accesses node
is moved to the root
amortized worst case time complexity
show how splaying works in a tree for insert, remove, find, etc.

Hashing, Hash Tables, Hash Maps
Pigeon hole principle
Hash Table, hash function
how to pick table size
how to make an effective hash function
what to do about collisions
hash into linked lists (or other collection data structures)
table size should be prime, and same the size as # elements expected
probing to find an open cell in the table
linear probing function
"expo" probing function
table size should be prime, and twice the # elements expected
Rehasing to extend the table size

Hash Maps... storing associated information/object along with a hashed key

Blockchain
SHA-256 Cryptographic hash function
basic block chain construction and operation
nonce
"mining" what it is, how it's done
complexity of the hashing used in mining blocks (for Bitcoin application)

Basic graph theory
vertex (node), edge (arc), edge weight
undirected graph
directed graph
path in a graph
cycle in a graph
bi-partite graph
a tree is a bi-partite graph
complete graph
planar graph

depth-first search (traversal) in a graph (use a stack)
breadth-first search (traversal) in a graph (use a queue)

representation of a graph
useful when graph is dense ( |E| approx.= |V|^2 )
useful when graph is sparse ( |E| << |V|^2 )
although an undirected edge between nodes A and B is not the
same theoretically as two directed edges A --> B and B --> A,
we often represent them that way in adjacency lists for
an undirected graph

Topological sort

Single Source Shortest Path algorithm for shortest paths in unweighted digraphs
Dijkstra's algorithm for shortest paths in weighted digraphs

Spanning tree
Minimum spanning tree
Kruskal's algorithm for MST
Prim's algorithm for MST

Uses for graphs: course prerequisite structure, maps,
modeling digital networks, airline and auto routes,
flow of traffic, work flow in an organization,
finite state machines, document structure, hypermedia,
linguistic representations, all tree uses (trees are graphs)
tons and tons of stuff

Euler path
path through a graph that crosses each edge exactly once
Euler circuit
EP that begins and ends on the same node
We have an efficient algorithm for finding Euler path/circuit

Hamiltonian path
path through a graph that visits each node exactly once
Hamiltonian circuit
HP that begins and ends on the same node
Traveling salesman is example of Hamiltonian path/circuit
No efficient (polynomial time, like O(N^2) ) algorithm is known for solving it

Sorting
we can prove that no sort that uses only pairwise comparisons
can do better than N log N best case.  This means that all
comparison sorts are Big-Omega(N log N), bounded below.
They might be worse, however, and some are.
bubble sort
O(N^2) worst case
insertion sort
O(N^2) worst case
doesnt swap, it inserts
sometimes used as the "finisher" in quicksort
selection sort
O(N^2) worst case
similar in concept to insertion sort but operationally it
manipulates the array by swaps
heap sort
O(N log N) best case
O(N log N) worst case
O(N log N) average case
merge sort
O(N log N) best case
O(N log N) wosrt case
O(N log N) average case
cost: often uses lots of extra memory, needs a second array as large
as the one you are sorting
quicksort
O(N log N) best case
O(N^2) worst case
O(N log N) average case
does not need much extra space
get extra efficiency by cutting off recursion when the lists
become smaller than, say, length 15 and then doing some other
sort like insertion sort on the small lists to finish up
bogosort
random shuffling until you get the one that is in order
O(infinity) worst case
an extreme, just an illustration, not a serious shuffle

sorting into buckets
a way to sort in O(N) time
can be used when the full range of possible input values is known
allocate an array of int A and initialize all slots to 0
then when you see an input number n, do A[n]++
this seems to violate the Big-Omega(N log N) proof, but this
sort does not do only pairwise comparisons.  It effectively
does lots of comparisons at once by putting a number right
into an array slot.  Since we believe TANSTAAFL, the tradeoff
is that is uses lots of memory (not all array slots are filled)
and you have to know in advance the range of inputs.

Other O(N) sorts
we can also sort N items in N time if we use parallel hardware
the Big-Oh complexity results for our sort algorithms have
all assumed we were running them on a single processor.

sorts are either stable or unstable
stable means it preserves the original order of items with
the same value when there are duplicate values in the input.
often a sort can be implemented to be stable or unstable, and
in many cases making it stable sacrifices some efficiency
bubble sort is stable
insertion sort is stable
selection sort as we did it in class is not stable
it can be made stable with no change in Big-Oh complexity
but at a cost in practical efficiency
merge sort is not necessarily stable but is easy to make so
(just make sure the merge takes from the left-most array if
it can for duplicates)
quick sort is not stable
heap sort is not stable (heap is the issue)

Skip Lists
like a linked list except items are kept in order
alternative to balanced binary search trees
each higher level skips over more of the items at the lower levels
probabilistic data structure...
meaning we roll dice to decide aspects of the structure as we build
also means if we build a skip list twice with the same input data sequence
we will almost certainly get two different linked structures
average time complexity: find O(log N), insert O(log N)
worst case time complexity: find O(N), insert O(N)
however the worst case is probabilistically very very unlikely
average case is probabilistically very likely

Trie
different form of tree used to do prefix-based storage and searching for text strings
used in doing "look ahead" for typing in apps like google searched, phone lists
what is longest path in trie?
what is worst case time complexity of finding a word in a trie?
emphasizes that the time complexity of an algorithm will depend on
-- cost of building the data structure
-- plus cost of using the data structure

```