Basic graph theory vertex (node), edge (arc), edge weight undirected graph directed graph path in a graph cycle in a graph bi-partite graph a tree is a bi-partite graph complete graph planar graph depth-first search (traversal) in a graph (use a stack) breadth-first search (traversal) in a graph (use a queue) representation of a graph adjacency matrix useful when graph is dense ( |E| approx.= |V|^2 ) adjacency lists useful when graph is sparse ( |E| << |V|^2 ) although an undirected edge between nodes A and B is not the same theoretically as two directed edges A --> B and B --> A, we often represent them that way in adjacency lists for an undirected graph Topological sort Single Source Shortest Path algorithm for shortest paths in unweighted digraphs Dijkstra's algorithm for shortest paths in weighted digraphs Spanning tree Minimum spanning tree Kruskal's algorithm for MST Prim's algorithm for MST Uses for graphs: course prerequisite structure, maps, modeling digital networks, airline and auto routes, flow of traffic, work flow in an organization, finite state machines, document structure, hypermedia, linguistic representations, all tree uses (trees are graphs) tons and tons of stuff Euler path path through a graph that crosses each edge exactly once Euler circuit EP that begins and ends on the same node We have an efficient algorithm for finding Euler path/circuit Hamiltonian path path through a graph that visits each node exactly once Hamiltonian circuit HP that begins and ends on the same node Traveling salesman is example of Hamiltonian path/circuit No efficient (polynomial time, like O(N^2) ) algorithm is known for solving it
Sorting we can prove that no sort that uses only pairwise comparisons can do better (faster) than N log N best case. This means that all comparison sorts are Big-Omega(N log N), bounded below. They might be worse, however, and some are. bubble sort O(N^2) worst case insertion sort O(N^2) worst case works with pair-wise swaps (item with nearest neighbor) sometimes used as the "finisher" in quicksort selection sort O(N^2) worst case similar in concept to insertion sort but operationally it manipulates the array by possibly long-range swaps heap sort O(N log N) best case O(N log N) worst case O(N log N) average case merge sort O(N log N) best case O(N log N) wosrt case O(N log N) average case cost: often uses lots of extra memory, needs a second array as large as the one you are sorting quicksort O(N log N) best case O(N^2) worst case O(N log N) average case does not need much extra space get extra efficiency by cutting off recursion when the lists become smaller than, say, length 15 and then doing some other sort like insertion sort on the small lists to finish up bogosort random shuffling until you get the one that is in order O(infinity) worst case an extreme, just an illustration, not a serious sort sorting into buckets a way to sort in slightly worse than O(N) time can be used when the full range of possible input values is known allocate an array of int A and initialize all slots to 0 then when you see an input number n, do A[n]++ this seems to violate the Big-Omega(N log N) proof, but this sort does not do only pairwise comparisons. It effectively does lots of comparisons at once by putting a number right into an array slot. Since we believe TANSTAAFL, the tradeoff is that is uses lots of memory (not all array slots are filled) and you have to know in advance the range of inputs. sorts are either stable or unstable stable means it preserves the original order of items with the same value when there are duplicate values in the input. often a sort can be implemented to be stable or unstable, and in many cases making it stable sacrifices some efficiency bubble sort is stable insertion sort is stable selection sort as we did it in class is not stable it can be made stable with no change in Big-Oh complexity but at a cost in practical efficiency merge sort is not necessarily stable but is easy to make so (just make sure the merge takes from the left-most array if it can for duplicates) quick sort is not stable heap sort is not stable (heap is the issue) //=========================================================================== Sets and Maps ADT ops, behavior, and ways to implement them (lists, trees, hashing, etc.) //=========================================================================== Hashing, Hash Tables, Hash Maps Pigeon hole principle (chicken hole principle) Hash Table, hash function how to pick table size how to make an effective hash function what to do about collisions hash into linked lists (or other collection data structures) table size should be prime, and same the size as # elements expected probing to find an open cell in the table linear probing function quadratic probing function "expo" probing function table size should be prime, and twice the # elements expected Rehasing to extend the table size Hash Maps... storing associated information/object along with a hashed key //=========================================================================== binary heap a binary heap is a completely different thing from the runtime heap... runtime heap is not heap-structured, not related at all just unfortunate and maybe a bit confusing to share the name "heap" binary tree with two properties: heap order property structure property always a complete binary tree, with last level being filled from L to R but perhaps not full representation of binary heap as array heaps can be Min or Max operations: getMin (Max) O(1) worst, avg since it is at the root of the heap tree removeMin (Max) O(log N) worst case O(log N) average case add O(log N) worst case O(1) average case (75% of the heap elements in last 2 tree levels) incElt, decElt cost of searching for the Elt, then plus cost of rearranging the heap array when the priority is incremented or decremented searching the array linearly is O(N) using a HashMap (like Assn 4) is O(1) cost of rearranging the heap array in O(log N) build simple way is O(N log N) by doing N adds cleaver way is O(N) if you have all the elements on hand, use special "build" //=========================================================================== priority queue implementing priority queue with list, BST and asymptotic analysis implement priority queue using binary heap and asymptotic analysis elements stored have value and priority priority is used to put elements into binary heap operations: enq: add item to PrQUE deq: take item out of the PrQUE, and that item is "highest" (or lowest) priority front: produce element from queue that has "highest" (or lowest) priority size, empty, etc... //=========================================================================== Tree representations (linked cells, implicit as array) We have represented a tree (binary tree) as -- linked cells (basic unbalanced BST, also used for balanced methods like AVL, Splay) -- array (binary heap) in array representation, the links are implicit and are obtained via array subscript arithmetic -- know why implicit, array representation works ok for binary heap, not for general case of binary tree //=========================================================================== Balanced BST Tree balancing: AVL: imbalance condition, rotations to re-balance rotations alter tree structure but maintain the order property worst case time complexity for insert, remove, contains show how AVL tree is altered by insert Splay: no balance/imbalance condition kept rather we "splay" a node to the root every operation that touched a node -- insert, delete, contains, findMin, findMax splay is move a node up to the root via rotations (similar to AVL) that maintain the BST node relation properties asymptotic analysis any ONE INDIVIDUAL operation has a possible worst case time complexity of O(N) for a splay tree of N nodes same to insert, delete, contains, findMin, findMax... this means a splay tree can stretch out... get mis-shapen like an unbalanced BST BUT... is we do enough ops in a row that do splaying... the tree will colapse of "fold up" from ugly shaped into something more balanced. so we say that an op like insert, delete, contains, findMin, findMax in a long enough sequence, will amortize out to O(log N) for each of them... the long expensive ops will be balanced in the sequence by several short near constant cost ops. So we say that insert, delete, contains, findMin, findMax all have worst case time complexity of "amortized O(log N)" this is due to locality of reference splay trees give good performace for apps that exhibit locality of reference. locality of reference... know it.. Binary Tree terms full, perfect, complete