Transcript for Wednesday 14 November (fit data)
| categories: Transcripts
Today we talked about how to fit equations to data.
# IPython log file cd Dropbox/116 # first I faked up some noisy data x = np.random.random_integers(0, 100, 100) / 10.0 y = 50 + 10 * x + x**2 + 5 * np.random.randn(100) # look at it plotted at points plot(x, y, '.') #[Out]# [<matplotlib.lines.Line2D at 0xb0e57ec>] # just listing the values isn't very illuminating. x[:5] #[Out]# array([ 7.2, 1.6, 3.5, 6.7, 4.9]) y[:5] #[Out]# array([ 173.19489273, 73.59377412, 99.23120207, 159.96533158, #[Out]# 118.72251928]) # Let's fit a line to the data. # Our line equation will be of the form y = m * x + b # where m is the slope and b is the intercept # We can set this up as a system of N equations # y[0] = b + m * x[0] # y[1] = b + m * x[1] # ... # y[N-1] = b + m * x[N-1] # we have tools to solve a matrix equation of the form M * V = Y # where M is a N by 2 matrix # and V is a 2 x 1 matrix # and Y is a N by 1 matrix # To get our system of equations into this form we need a matrix to evaluate b + m * x # we can get b simply added by multiplying in by 1 in our matrix # That matrix is easy to construct with numpy # I'll transpose it because I want N rows by 2 columns # not 2 rows of N columns M = np.array([np.ones_like(x), x]).T M.shape #[Out]# (100, 2) M[:5] #[Out]# array([[ 1. , 7.2], #[Out]# [ 1. , 1.6], #[Out]# [ 1. , 3.5], #[Out]# [ 1. , 6.7], #[Out]# [ 1. , 4.9]]) # numpy includes a function to produce the least-squares solution to a system of # linear equations. It is careful to maintain the numerical accuracy of the result R = np.linalg.lstsq(M, y) # the result includes several values, the first part is what we want R #[Out]# (array([ 33.36284698, 19.86799503]), #[Out]# array([ 6930.03699749]), #[Out]# 2, #[Out]# array([ 60.43050388, 4.5764834 ])) # extract the first part of the solution and call it fit fit = R[0] fit #[Out]# array([ 33.36284698, 19.86799503]) # it is saying the intercept is 33.3 and the slope is about 19.9 # now we can get the estimated y value for each x using this equation ye = np.dot(M, fit) # we plotted it and it looks pretty good though it curves a bit at the ends plot(x, ye, 'r.') #[Out]# [<matplotlib.lines.Line2D at 0xb0f78ac>] # let's look at the error residual = y - ye # in a new figure figure(2) #[Out]# <matplotlib.figure.Figure at 0xb0f7d2c> plot(x, residual, '.') # we could see that it was pretty curved. Perhaps this function is quadratic #[Out]# [<matplotlib.lines.Line2D at 0xab5860c>] # fitting a quadratic is just as easy, we just add a column of x squared M2 = np.array([np.ones_like(x), x, x**2]).T M2.shape #[Out]# (100, 3) # get the quadratic fit R2 = np.linalg.lstsq(M2, y) R2 #[Out]# (array([ 50.66106284, 10.2079896 , 0.94856269]), #[Out]# array([ 2199.3046329]), #[Out]# 3, #[Out]# array([ 464.38897231, 15.38634848, 2.80652423])) figure(1) #[Out]# <matplotlib.figure.Figure at 0xb06dd2c> fit2 = R2[0] # now we see it is estimating an intercept of 50.7, # slope of 10.2 # and quadratic coefficient of 0.95. fit2 #[Out]# array([ 50.66106284, 10.2079896 , 0.94856269]) # we can get the new estimate ye2 = np.dot(M2, fit2) # and plot it to see it fits better plot(x, ye2, 'g.') #[Out]# [<matplotlib.lines.Line2D at 0xb0f13cc>] # we can also look at the residual residual2 = y - ye2 figure(2) #[Out]# <matplotlib.figure.Figure at 0xb0f7d2c> # which now looks like noise instead of having obvious structure plot(x, residual2, 'g.') #[Out]# [<matplotlib.lines.Line2D at 0xab58eec>] # note that our linear fit isn't simply a subset of the quadratic fit #[Out]# array([ 33.36284698, 19.86799503]) fit2 #[Out]# array([ 50.66106284, 10.2079896 , 0.94856269]) # we could fit other types of equations by transforming the data using logarithms.