Transcript for Monday 28 November (sums)
| categories: Transcripts
Today we looked at summing groups of items. We had a table with many rows and 2 columns. The first columns was the code and the second the value. The codes could reoccur and we wanted the sum of the values for each code.
For example, the table might look like this:
array([[ 1. , 0.47],
[ 1. , 1.07],
[ 1. , 2.22],
[ 2. , 0.79],
[ 2. , 1.61],
[ 4. , 2.18],
[ 4. , 0.98],
[ 4. , 2.14],
[ 4. , 2.21],
[ 5. , 0.56]])
And we want to produce output like this:
1.0 3.76 2.0 2.4 4.0 7.51 5.0 0.56
We looked at two solutions. The first is easiest and uses a dictionary just as we did when counting values. This time we're summing them up rather than counting.
def printSums(table):
sum = {} # initialize an empty dictionary
for row in table: # visit each row
code = row[0] # unpack row
amt = row[1] # and amount
if code not in sum: # test if we've seen this code
sum[code] = 0.0 # add it if not
sum[code] = sum[code] + amt # accumulate the amounts
# we need to sort the keys to get them in order
for code in sorted(sum.keys()):
print code, sum[code]
Then we looked at a sequential approach that would use less memory but is trickier to get correct.
def printSums2(table):
sum = 0 # initialize a single accumulator
current_code = None # keep track of the code we're summing
for row in table: # visit each row
code = row[0] # unpack
amt = row[1]
# is the code the same as previous?
if code == current_code:
sum = sum + amt # yes, add this one in
else:
# no, then report if not the first one
if current_code is not None:
print current_code, sum
sum = amt # reiniitalize sum
current_code = code # reset the code
# we must print the last one
print current_code, sum