proof by induction proof by contradiction proof by contrapositive dis-proof by counter example
axiomatic definition of ADT behavior List ADT Stack ADT Queue ADT Bounded Stack ADT Bounded Queue ADT Map ADT Set ADT MSet ADT (multiset)
List data structure general ops: get kth item insert at slot k delete at slot k implementing lists: as array as linked cells Sorted list are especially nice order information can assist in list searching binary search can find an item in O(log N) time worst case special lists: Stack general ops: push, pop, top, empty, size keep pointer to most recent slot (t) push is insert at slot t pop is delete from slot t push, pop are O(1) since we access the top slot ddirectly via pointer no searching through the list, no rearrangement of locations this applies to implementations by links or array Queue general ops: add, remove, front, empty, size keep pointers to front slot, back slot add is insert at slot front remove is delete from slot back add, remove are O(1) since we access the slots directly by pointer no searching through the list, no rearrangement of locations this applies to implementations by links or array
Bubble sort O(N^2) time required to sort list of N items pro: easy to program con: very slow, useless for sorting lists of any serious size
Tree data structure in general (n-ary tree) Knuth transform, converts n-ary tree to binary tree Binary Tree data structure Binary Search Tree Tree balancing: AVL, Splay traversals: in-order, pre-order, post-order in-order gives sorted list from a BST pre-order, post-order give lists dependent on tree structure BST Sort Not a real sorting algorithm name, but a way to remember that one fairly efficient way to sort a list is to first build a binary search tree, then traverse it in the right way 1) build a BST from the list: O(N log N) one insert in a BST: O(log N) for tree with N elments doing N inserts: O(N log N) 2) in-order traversal to get the sorted order of elements O(N) since we have to visit each of the N nodes in the BST So total worst case time complexity is O(N log N) + O(N) which is O(N * (1+ log N)) which is O(N log N)
Pigeon hole principle Hash Table, hash function how to pick table size how to make an effective hash function what to do about collisions hash into linked lists (or other collection data structures) table size should be prime, and same the size as # elements expected probing to find an open cell in the table linear probing function quadratic probing function "expo" probing function table size should be prime, and twice the # elements expected rehasing to extend the table size
"Big Oh" notation for expressing growth rates of functions List of length N insert is O(N) remove is O(1) once you find the element find is O(N) Queue containing N items add is O(1) remove is O(1) front is O(1) can't get middle elements like general List Stack containing N items push is O(1) pop is O(1) top is O(1) can't get middle elements like general List Binary Search Tree with N nodes find is O(log N) insert is O(log N) remove is O(log N) because we have to find it to remove it traversals are O(N) (list every element in some order) we get ordering (sorting) information from the tree structure degenerate case of BST is a List Hash table of N elements insert is O(1) find is O(1) delete is O(1) get no ordering information from the table Set basic ops: add, delete, isIn (find), empty, delete sets can be implemented as BST to make operations take O(log N) worst case add is O(log N) like insert in BST delete is O(log N) like delete in BST find is O(log N) like find in BST isIn is a version of find, returns boolean rather than value record empty is O(1) Map basic ops: (assume implemented as BST) add (key, value) pair is O(log n) delete (key, value) pair is O(log n) get (key) is O(log N), returns the value stored with this key can also get order-dependent information like findMin, findMax in O(log N) sort items in O(N) basic ops: (assume implemented as Hashtable) add (key, value) pair is O(1) delete (key, value) pair is O(1) get (key) is O(1), returns the value stored with this key no order-dependent ops are possible
binary heap binary tree with two properties: heap order property structure property always complete binary tree, last level filled from L to R representation of binary heap as array heaps can be Min or Max priority queue implement priority queue as binary heap operations: getMin (Max) O(1) worst, avg since it is at the root of the heap tree removeMin (Max) O(log N) worst case O(log N) average case add O(log N) worst case O(1) average case build nieve is O(N log N) by doing N adds cleaver is O(N) axioms for alternate version of priority queue items are (value, priority) pairs heap sort build a binary heap do repeated removeMin() to create the sort basic graph theory vertex (node), edge (arc), edge weight undirected graph directed graph path in a graph cycle in a graph bi-partite graph complete graph planar graph depth-first search (traversal) in a graph (use a stack) breaadth-first search (traversal) in a graph (use a queue) representation of a graph adjacency matrix useful when graph is dense ( |E| approx.= |V|^2 ) adjacency lists useful when graph is sparse ( |E| << |V|^2 ) although an undirected edge between nodes A and B is the same theoretically as two directed edges A --> B and B --> A, we often represent them that way in adjacency lists for an undirected graph topological sort Dijkstra's algorithm for shortest paths spanning tree minimum spanning tree Kruskal's algorithm for MST Prim's algorithm for MST uses for graphs: course prerequisite structure, maps, modeling digital networks, airline and auto routes, flow of traffic, work flow in an organization, finite state machines, document structure, hypermedia, linguistic representations, all tree uses (trees are graphs) tons and tons of stuff Euler path path through a graph that crosses each edge exactly once Euler circuit EP that begins and ends on the same node Hamiltonian path path through a graph that visits each node exactly once Hamiltonian circuit HP that begins and ends on the same node Traveling salesman and P/NP/NP-complete Halting problem and uncomputability sorting we can prove that no sort that uses only pairwise comparisons can do better than N log N best case. This means that all comparison sorts are Big-Omega(N log N), bounded below. They might be worse, however, and some are. insertion sort O(N^2) worst case sometimes used as the "finisher" in quicksort selection sort O(N^2) worst case similar in concept to insertion sort but operationally it manipulates the array a bit differently heap sort O(N log N) best case O(N log N) worst case O(N log N) average case merge sort O(N log N) best case O(N log N) wosrt case O(N log N) average case cost: uses lots of extra memory, needs a second array as large as the one you are sorting quicksort O(N log N) best case O(N^2) worst case O(N log N) average case does not need much extra space get extra efficiency by cutting off recursion when the lists become smaller than, say, length 15 and then doing some other sort like insertion sort on the small lists to finish up bogosort random shuffling until you get the one that is in order O(infinity) worst case an extreme, just an illustration, not a serious shuffle sorting into buckets a way to sort in O(N) time can be used when the full range of possible input values is known allocate an array of int A and initialize all slots to 0 then when you see an input number n, do A[n]++ this seems to violate the Big-Omega(N log N) proof, but this sort does not do only pairwise comparisons. It effectively does lots of comparisons at once by putting a number right into an array slot. Since we believe TANSTAAFL, the tradeoff is that is uses lots of memory (not all array slots are filled) and you have to know in advance the range of inputs. Other O(N) sorts we can also sort N items in N time if we use parallel hardware the Big-Oh complexity results for our sort algorithms have all assumed we were running them on a single processor. sorts are either stable or unstable stable means it preserves the original order of items with the same value when there are duplicate values in the input. often a sort can be implemented to be stable or unstable, and in many cases making it stable sacrifices some efficiency bubble sort is stable insertion sort is stable selection sort as we did it in class is not stable it can be made stable with no change in Big-Oh complexity but at a cost in practical efficiency merge sort is not necessarily stable but is easy to make so (just make sure the merge takes from the left-most array if it can for duplicates) quick sort is not stable Analysis of merge sort solve the recurrance equation analyzing the behavior of mergesort T(N) = 2*T(N/2) + N we want to solve this so that we get something like T(N) = .... where the function T does not appear on the right hand side we find T(N) is O(N log N)