Topics covered in class

The Final Exam is comprehensive

This is a guide, not a guarantee.
You are responsible for all material we have done in this course, including your readings, your programming projects, PollEverywhere questions, and all classroom information.

Material after the Midterm

Sets, Maps, implementing them (lists, trees, hashing, etc.)

Balanced BST Tree balancing: AVL: imbalance condition, rotations to re-balance rotations alter tree structure but maintain the order property worst case time complexity for insert, remove, contains show how AVL tree is altered by insert, remove Splay: another approach to balancing splaying done any time a node is accessed splaying is applying preserving rotations until accesses node is moved to the root amortized worst case time complexity show how splaying works in a tree for insert, remove, find, etc.
Hashing, Hash Tables, Hash Maps Pigeon hole principle Hash Table, hash function how to pick table size how to make an effective hash function what to do about collisions hash into linked lists (or other collection data structures) table size should be prime, and same the size as # elements expected probing to find an open cell in the table linear probing function quadratic probing function "expo" probing function table size should be prime, and twice the # elements expected Rehasing to extend the table size Hash Maps... storing associated information/object along with a hashed key Blockchain SHA-256 Cryptographic hash function basic block chain construction and operation nonce "mining" what it is, how it's done complexity of the hashing used in mining blocks (for Bitcoin application)
Basic graph theory vertex (node), edge (arc), edge weight undirected graph directed graph path in a graph cycle in a graph bi-partite graph a tree is a bi-partite graph complete graph planar graph depth-first search (traversal) in a graph (use a stack) breadth-first search (traversal) in a graph (use a queue) representation of a graph adjacency matrix useful when graph is dense ( |E| approx.= |V|^2 ) adjacency lists useful when graph is sparse ( |E| << |V|^2 ) although an undirected edge between nodes A and B is not the same theoretically as two directed edges A --> B and B --> A, we often represent them that way in adjacency lists for an undirected graph Topological sort Single Source Shortest Path algorithm for shortest paths in unweighted digraphs Dijkstra's algorithm for shortest paths in weighted digraphs Spanning tree Minimum spanning tree Kruskal's algorithm for MST Prim's algorithm for MST Uses for graphs: course prerequisite structure, maps, modeling digital networks, airline and auto routes, flow of traffic, work flow in an organization, finite state machines, document structure, hypermedia, linguistic representations, all tree uses (trees are graphs) tons and tons of stuff Euler path path through a graph that crosses each edge exactly once Euler circuit EP that begins and ends on the same node We have an efficient algorithm for finding Euler path/circuit Hamiltonian path path through a graph that visits each node exactly once Hamiltonian circuit HP that begins and ends on the same node Traveling salesman is example of Hamiltonian path/circuit No efficient (polynomial time, like O(N^2) ) algorithm is known for solving it
Sorting we can prove that no sort that uses only pairwise comparisons can do better than N log N best case. This means that all comparison sorts are Big-Omega(N log N), bounded below. They might be worse, however, and some are. bubble sort O(N^2) worst case insertion sort O(N^2) worst case doesnt swap, it inserts sometimes used as the "finisher" in quicksort selection sort O(N^2) worst case similar in concept to insertion sort but operationally it manipulates the array by swaps heap sort O(N log N) best case O(N log N) worst case O(N log N) average case merge sort O(N log N) best case O(N log N) wosrt case O(N log N) average case cost: often uses lots of extra memory, needs a second array as large as the one you are sorting quicksort O(N log N) best case O(N^2) worst case O(N log N) average case does not need much extra space get extra efficiency by cutting off recursion when the lists become smaller than, say, length 15 and then doing some other sort like insertion sort on the small lists to finish up bogosort random shuffling until you get the one that is in order O(infinity) worst case an extreme, just an illustration, not a serious shuffle sorting into buckets a way to sort in O(N) time can be used when the full range of possible input values is known allocate an array of int A and initialize all slots to 0 then when you see an input number n, do A[n]++ this seems to violate the Big-Omega(N log N) proof, but this sort does not do only pairwise comparisons. It effectively does lots of comparisons at once by putting a number right into an array slot. Since we believe TANSTAAFL, the tradeoff is that is uses lots of memory (not all array slots are filled) and you have to know in advance the range of inputs. Other O(N) sorts we can also sort N items in N time if we use parallel hardware the Big-Oh complexity results for our sort algorithms have all assumed we were running them on a single processor. sorts are either stable or unstable stable means it preserves the original order of items with the same value when there are duplicate values in the input. often a sort can be implemented to be stable or unstable, and in many cases making it stable sacrifices some efficiency bubble sort is stable insertion sort is stable selection sort as we did it in class is not stable it can be made stable with no change in Big-Oh complexity but at a cost in practical efficiency merge sort is not necessarily stable but is easy to make so (just make sure the merge takes from the left-most array if it can for duplicates) quick sort is not stable heap sort is not stable (heap is the issue)
Skip Lists like a linked list except items are kept in order alternative to balanced binary search trees multi-level linked cells each higher level skips over more of the items at the lower levels probabilistic data structure... meaning we roll dice to decide aspects of the structure as we build also means if we build a skip list twice with the same input data sequence we will almost certainly get two different linked structures average time complexity: find O(log N), insert O(log N) worst case time complexity: find O(N), insert O(N) however the worst case is probabilistically very very unlikely average case is probabilistically very likely
Trie different form of tree used to do prefix-based storage and searching for text strings used in doing "look ahead" for typing in apps like google searched, phone lists what is longest path in trie? what is worst case time complexity of finding a word in a trie? emphasizes that the time complexity of an algorithm will depend on -- cost of building the data structure -- plus cost of using the data structure