Transcript for Monday 28 November (sums)

| categories: Transcripts

Today we looked at summing groups of items. We had a table with many rows and 2 columns. The first columns was the code and the second the value. The codes could reoccur and we wanted the sum of the values for each code.

For example, the table might look like this:

array([[ 1.  ,  0.47],
       [ 1.  ,  1.07],
       [ 1.  ,  2.22],
       [ 2.  ,  0.79],
       [ 2.  ,  1.61],
       [ 4.  ,  2.18],
       [ 4.  ,  0.98],
       [ 4.  ,  2.14],
       [ 4.  ,  2.21],
       [ 5.  ,  0.56]])

And we want to produce output like this:

1.0 3.76
2.0 2.4
4.0 7.51
5.0 0.56

We looked at two solutions. The first is easiest and uses a dictionary just as we did when counting values. This time we're summing them up rather than counting.

def printSums(table):
    sum = {}  # initialize an empty dictionary
    for row in table:  # visit each row
        code = row[0]  # unpack row
        amt = row[1]   # and amount
        if code not in sum:  # test if we've seen this code
            sum[code] = 0.0  # add it if not
        sum[code] = sum[code] + amt  # accumulate the amounts

    # we need to sort the keys to get them in order
    for code in sorted(sum.keys()):
        print code, sum[code]

Then we looked at a sequential approach that would use less memory but is trickier to get correct.

def printSums2(table):
    sum = 0  # initialize a single accumulator
    current_code = None  # keep track of the code we're summing
    for row in table:  # visit each row
        code = row[0]  # unpack
        amt = row[1]

        # is the code the same as previous?
        if code == current_code:
            sum = sum + amt  # yes, add this one in
        else:
            # no, then report if not the first one
            if current_code is not None:
                print current_code, sum
            sum = amt  # reiniitalize sum
            current_code = code  # reset the code

    # we must print the last one
    print current_code, sum