UNC-CH COMP 410

Topics covered in class


Midterm exam will cover this material

This is a guide, not a guarantee. The exam covers all material we have discussed in class up to and including skip lists, and all concepts covered in the code we have written. The following is a list of topics included in this material:

List data structure
  general ops: get kth item
               insert at slot k
               delete at slot k

implementing lists:
  as array
  as linked cells

Ordered lists are especially nice
order information can assist in list searching
binary search can find an item in O(log N) time worst case
understand binary searching in an ordered sequence
         
special lists:
  Stack
    general ops: push, pop, top, empty, size
    keep pointer to most recent slot (t)
    push is insert at slot t
    pop is delete from slot t
    push, pop are O(1) since we access the top slot ddirectly via pointer
      no searching through the list, no rearrangement of locations
      this applies to implementations by links or array
    
  Queue
    general ops: add, remove, front, empty, size
    keep pointers to front slot, back slot
    add is insert at slot front
    remove is delete from slot back
    add, remove are O(1) since we access the slots directly by pointer
      no searching through the list, no rearrangement of locations
      this applies to implementations by links or array



Bubble sort O(N^2) time required to sort list of N items pro: easy to program con: very slow, useless for sorting lists of any serious size
Tree data structure in general (n-ary tree) Binary Tree data structure Any n-ary tree can be represented as a binary tree Binary Search Tree order property: all nodes in left child subtree have smaller value than root, all nodes in right child subtree have values greater traversals: in-order, pre-order, post-order in-order gives sorted list from a BST pre-order, post-order give lists dependent on tree structure breadth-first BST Sort Not a real sorting algorithm name, but a way to remember that sometimes fairly efficient way to sort a list is to first build a binary search tree, then traverse it in the proper way Average case analysis: 1) build a BST from the list: O(N log N) one insert in a BST: O(log N) for tree with N elments doing N inserts: O(N log N) 2) in-order traversal to get the sorted order of elements O(N) since we have to visit each of the N nodes in the BST So total avg case time complexity is O(N log N) + O(N) which is O(N * (1 + log N)) which is O(N log N) Worst case analysis: 1) build a BST from the list: O(N * N) is O(N^2) one insert in a BST: O(N) for tree with N elments doing N inserts: O(N * N) is O(N^2) 2) in-order traversal to get the sorted order of elements O(N) since we have to visit each of the N nodes in the BST So total worst case time complexity is O(N^2) + O(N) which is dominated by O(N^2)
"Big Oh" notation for expressing growth rates of functions List of length N insert is O(N) remove is O(1) once you find the element find is O(N) Queue containing N items add is O(1) remove is O(1) front is O(1) can't get middle elements like general List Stack containing N items push is O(1) pop is O(1) top is O(1) can't get middle elements like general List Binary Search Tree with N nodes find is O(log N) insert is O(log N) remove is O(log N) because we have to find it to remove it traversals are O(N) (list every element in some order) we get ordering (sorting) information from the tree structure degenerate case of BST is a List "Big Oh" is an upper bound on growth speed of an operation/algorithm but it is not necessarily a good (tight) upper bound if some operation F is O(N^2) in worst case time use then F is also O(N^3)... and then F is also O(2^N) etc. but O(N^2) is the lowest (tightest) bound of those given. we would say O(N^2) is the "best" worst case time complexity of those choices or the "least bad" worst case
Time-space trade-off -- sometimes we can gain some speed (save execution time) for an operation by using more space (memory) -- we can also sometimes design ways to use less space for an operation, but then lose speed when doing so examples?
Recursion: it's turtles all the way down (until the armadillo) -- a programming pattern in which the code in the body of a function contains one or more calls to itself -- well-formed recursive function must have -- a base case, which returns a simple result for a simple "small enough" instance of the problem defined by the parameter values, and this result is returned without a recursive call -- a recursive case, in which a recursive call is made (a call to the very function being defined); the recursive call is made on a "smaller" instance of the problem being solved; the problem (its size) is defined by the parameters passed to the recursive call. -- a recursive function may have more that one base case, and may have more than one recursive case -- the base case(s) should proceed the recursive case(s)... testing for base case is testing for stopping the recursion, and it the first thing to do in the code -- proper termination (preventing infinite recursing) involves 1) making sure each recursive call is passed a smaller problem 2) the problem is decreasing towards a base case 3) the base case checks are written so that the problem size eventually trips a base case (and does not bypass it) -- poorly formed recursions can blow out the call stack -- using recursion when iteration is better can blow out the call stack -- proven theory: -- any program that uses iteration (loops) and no recursion, can be re-written to use recursion and no loops, while still computing the same function (results) -- any program that uses recursion no loops, can be re-written to use loops and no recursion, while still computing the same function (results) -- this means recursion give us expressive convenience for some problems, but no extra computing power... there are no function that we can compute with recursion, but cannot compute without it.
Runtime stack and runtime heap the runtime system is how a programming language does the computations that programs call for runtime stack allocates stack frames (call frames) when procedure/methods are called -- manages this dynamic memory is LIFO order, so the first call to end is the most recent one started -- a stack frame (call frame) has memory space for the parameter values sent in when a function is called, the return address of the code that called it, the local variables in the function , and some bookkeeping data runtime heap is dynamic memory from which objects are allocated on calls to "new" heap is not managed with any particular discipline like LIFO etc. heap can get fraqgmented, stack cannot heap fragmentation is dealt with via gargage collection in Java how can the runtime stack blow out (run out of space)? -- too many concurrently active function calls -- this can happen with recursion... especially a recursion that is poorly written and does not hit a terminating base case
binary heap a binary heap is a completely different thing from the runtime heap... runtime heap is not heap-structured, not related at all just unfortunate and maybe a bit confusing to share the name "heap" binary tree with two properties: heap order property structure property always a complete binary tree, with last level being filled from L to R but perhaps not full representation of binary heap as array heaps can be Min or Max operations: getMin (Max) O(1) worst, avg since it is at the root of the heap tree removeMin (Max) O(log N) worst case O(log N) average case add O(log N) worst case O(1) average case (75% of the heap elements in last 2 tree levels) build simple way is O(N log N) by doing N adds cleaver way is O(N) if you have all the elements on hand, use special "build" priority queue limplementing priority queue with list, BST implement priority queue using binary heap elements stored have value and priority priority is used to put elements into binary heap operations: enq: add item to PrQUE deq: take item out of the PrQUE, and that item is "highest" (or lowest) priority front: produce element from queue that has "highest" (or lowest) priority size, empty, etc... heap sort special "build" a binary heap do repeated removeMin() to create the ordered sequence worst case time complexity if we do repeated insert to construct heap worst case time complexity if we do special build to construct heap
Sorting we have discussed and analyzed these sorting methods: -- Bubble Sort: rearranging the items in an array/list to put them in order "gold standard of badness" since it's O(N^2) worst case as well as avg case -- keeping a basic List sorted (using inSort operation) -- BST sort: build a binary search tree and do in-order traversal (worst case analysis, and average case analysis) -- Sorting with a binary heap -- sorting using skip list
Tree representations we have represented a tree (binary tree) as -- linked cells (basic unbalanced BST, also used for balanced methods like AVL, Splay) -- array (binary heap) in array representation, the links are implicit and are obtained via array subscript arithmetic -- know why array representation works ok for binary heap, not for general case of binary tree
Balanced BST Tree balancing: AVL: imbalance condition, rotations to re-balance rotations alter tree structure but maintain the order property worst case time complexity for insert, remove, contains show how AVL tree is altered by insert, remove Splay: another approach to balancing splaying done any time a node is accessed splaying is applying preserving rotations until accesses node is moved to the root amortized worst case time complexity show how splaying works in a tree for insert, remove, find, etc.
Skip Lists like a linked list except items are kept in order alternative to balanced binary search trees multi-level linked cells each higher level skips over more of the items at the lower levels probabilistic data structure... meaning we roll dice to decide aspects of the structure as we build also means if we build a skip list twice with the same input data sequence we will almost certainly get two different linked structures average time complexity: find O(log N), insert O(log N) worst case time complexity: find O(N), insert O(N) however the worst case is probabilistically very very unlikely average case is probabilistically very likely