List data structure
general ops: get kth item
insert at slot k
delete at slot k
implementing lists:
as array
as linked cells
Ordered lists are especially nice
order information can assist in list searching
binary search can find an item in O(log N) time worst case
understand binary searching in an ordered sequence
special lists:
Stack
general ops: push, pop, top, empty, size
keep pointer to most recent slot (t)
push is insert at slot t
pop is delete from slot t
push, pop are O(1) since we access the top slot ddirectly via pointer
no searching through the list, no rearrangement of locations
this applies to implementations by links or array
Queue
general ops: add, remove, front, empty, size
keep pointers to front slot, back slot
add is insert at slot front
remove is delete from slot back
add, remove are O(1) since we access the slots directly by pointer
no searching through the list, no rearrangement of locations
this applies to implementations by links or array
Bubble sort
O(N^2) time required to sort list of N items
pro: easy to program
con: very slow, useless for sorting lists of any serious size
Tree data structure in general (n-ary tree)
Binary Tree data structure
Any n-ary tree can be represented as a binary tree
Binary Search Tree
order property: all nodes in left child subtree have smaller
value than root, all nodes in right child subtree
have values greater
traversals:
in-order, pre-order, post-order
in-order gives sorted list from a BST
pre-order, post-order give lists dependent on tree structure
breadth-first
BST Sort
Not a real sorting algorithm name, but a way to remember that sometimes
fairly efficient way to sort a list is to first build a binary search tree,
then traverse it in the proper way
Average case analysis:
1) build a BST from the list: O(N log N)
one insert in a BST: O(log N) for tree with N elments
doing N inserts: O(N log N)
2) in-order traversal to get the sorted order of elements
O(N) since we have to visit each of the N nodes in the BST
So total avg case time complexity is
O(N log N) + O(N) which is O(N * (1 + log N))
which is O(N log N)
Worst case analysis:
1) build a BST from the list: O(N * N) is O(N^2)
one insert in a BST: O(N) for tree with N elments
doing N inserts: O(N * N) is O(N^2)
2) in-order traversal to get the sorted order of elements
O(N) since we have to visit each of the N nodes in the BST
So total worst case time complexity is
O(N^2) + O(N) which is dominated by O(N^2)
"Big Oh" notation for expressing growth rates of functions
List of length N
insert is O(N)
remove is O(1) once you find the element
find is O(N)
Queue containing N items
add is O(1)
remove is O(1)
front is O(1)
can't get middle elements like general List
Stack containing N items
push is O(1)
pop is O(1)
top is O(1)
can't get middle elements like general List
Binary Search Tree with N nodes
find is O(log N)
insert is O(log N)
remove is O(log N) because we have to find it to remove it
traversals are O(N) (list every element in some order)
we get ordering (sorting) information from the tree structure
degenerate case of BST is a List
"Big Oh" is an upper bound on growth speed of an operation/algorithm
but it is not necessarily a good (tight) upper bound
if some operation F is O(N^2) in worst case time use
then F is also O(N^3)... and
then F is also O(2^N)
etc.
but O(N^2) is the lowest (tightest) bound of those given.
we would say O(N^2) is the "best" worst case time complexity of those choices
or the "least bad" worst case
Time-space trade-off
-- sometimes we can gain some speed (save execution time) for an operation
by using more space (memory)
-- we can also sometimes design ways to use less space for an operation, but
then lose speed when doing so
examples?
Recursion: it's turtles all the way down (until the armadillo)
-- a programming pattern in which the code in the body of a function contains one or
more calls to itself
-- well-formed recursive function must have
-- a base case, which returns a simple result for a simple "small enough"
instance of the problem defined by the parameter values, and this result
is returned without a recursive call
-- a recursive case, in which a recursive call is made (a call to the very function
being defined); the recursive call is made on a "smaller" instance of the
problem being solved; the problem (its size) is defined by the parameters passed to
the recursive call.
-- a recursive function may have more that one base case, and may have more than one
recursive case
-- the base case(s) should proceed the recursive case(s)... testing for base case is testing
for stopping the recursion, and it the first thing to do in the code
-- proper termination (preventing infinite recursing) involves
1) making sure each recursive call is passed a smaller problem
2) the problem is decreasing towards a base case
3) the base case checks are written so that the problem size eventually trips
a base case (and does not bypass it)
-- poorly formed recursions can blow out the call stack
-- using recursion when iteration is better can blow out the call stack
-- proven theory:
-- any program that uses iteration (loops) and no recursion, can be re-written
to use recursion and no loops, while still computing the same function (results)
-- any program that uses recursion no loops, can be re-written to use loops and no
recursion, while still computing the same function (results)
-- this means recursion give us expressive convenience for some problems,
but no extra computing power... there are no function that we can compute with
recursion, but cannot compute without it.
Runtime stack and runtime heap
the runtime system is how a programming language does the computations that
programs call for
runtime stack allocates stack frames (call frames) when procedure/methods are called
-- manages this dynamic memory is LIFO order, so the first call to end is the
most recent one started
-- a stack frame (call frame) has memory space for the parameter values sent in
when a function is called, the return address of the code that called it,
the local variables in the function , and some bookkeeping data
runtime heap is dynamic memory from which objects are allocated on calls to "new"
heap is not managed with any particular discipline like LIFO etc.
heap can get fraqgmented, stack cannot
heap fragmentation is dealt with via gargage collection in Java
how can the runtime stack blow out (run out of space)?
-- too many concurrently active function calls
-- this can happen with recursion... especially a recursion that is poorly written
and does not hit a terminating base case
binary heap
a binary heap is a completely different thing from the runtime heap...
runtime heap is not heap-structured, not related at all
just unfortunate and maybe a bit confusing to share the name "heap"
binary tree with two properties:
heap order property
structure property
always a complete binary tree, with
last level being filled from L to R but perhaps not full
representation of binary heap as array
heaps can be Min or Max
operations:
getMin (Max)
O(1) worst, avg since it is at the root of the heap tree
removeMin (Max)
O(log N) worst case
O(log N) average case
add
O(log N) worst case
O(1) average case (75% of the heap elements in last 2 tree levels)
build
simple way is O(N log N) by doing N adds
cleaver way is O(N) if you have all the elements on hand, use special "build"
priority queue
implement priority queue using binary heap
elements stored have value and priority
priority is used to put elements into binary heap
operations:
enq: add item to PrQUE
deq: take item out of the PrQUE, and that item is "highest" (or lowest) priority
front: produce element from queue that has "highest" (or lowest) priority
size, empty, etc...
heap sort
special "build" a binary heap
do repeated removeMin() to create the ordered sequence
worst case time complexity if we do repeated insert to construct heap
worst case time complexity if we do special build to construct heap
Sorting
we have discussed and analyzed these sorting methods:
-- Bubble Sort: rearranging the items in an array/list to put them in order
"gold standard of badness" since it's O(N^2) worst case as well as avg case
-- keeping a basic List sorted (using inSort operation)
-- BST sort: build a binary search tree and do in-order traversal
(worst case analysis, and average case analysis)
-- Sorting with a binary heap
Tree representations
we have represented a tree (binary tree) as
-- linked cells (basic unbalanced BST, also used for balanced methods like AVL, Splay)
-- array (binary heap)
in array representation, the links are implicit
and are obtained via array subscript arithmetic
-- know why array representation works ok for binary heap, not for general case of binary tree