Mathematics – DogDogFish

import matplotlib.pyplot as plt fig = plt.figure() axis = fig.add_subplot(1,1,1) circle = plt.Circle((0,0), 1) axis.add_patch(circle) axis.set_xlim([-1,1]) axis.set_ylim([-1,1]) axis.set_title('A Circle in a Square') plt.show()

#!/usr/bin/python import numpy as np import math import matplotlib.pyplot as plt """ Calculate pi using Monte-Carlo Simulation """ """ First - the maths: A circle has area Pi*r^2 A square wholly enclosing above circle has area 4r^2 If we randomly generate points in that square we'd expect the ratio of points in the square/points in the circle to equal the area of the square divided by the circle. By that logic n_in_sq/n_in_cir = 4/Pi and so Pi = (4 * n_in_cir)/n_in_sq """ class pi_calculator(object): def __init__(self, radius, iterations): self.radius = radius self.iterations = iterations self.square = square(radius) self.circle = circle(radius) def scatter_points(self): for _ in range(self.iterations): point_x, point_y = ((2*self.radius) * np.random.random_sample(2)) - self.radius self.square.increment_point_count(point_x, point_y) self.circle.increment_point_count(point_x, point_y) def return_pi(self): return (4.0*self.circle.return_point_count())/self.square.return_point_count() def calculate_accuracy(self, calc_pi): absolute_error = math.pi - calc_pi percent_error = 100*(math.pi - calc_pi)/math.pi return (absolute_error, percent_error) def return_iterations(self): return self.iterations class square(object): def __init__(self, radius): self.radius = radius self.lower_x = -radius self.lower_y = -radius self.upper_x = radius self.upper_y = radius self.point_count = 0 def in_square(self, point_x, point_y): return (self.upper_x > point_x > self.lower_x) and (self.upper_y > point_y > self.lower_y) def increment_point_count(self, point_x, point_y, increment = 1): if self.in_square(point_x, point_y): self.point_count += increment def return_point_count(self): return self.point_count class circle(object): def __init__(self, radius): self.radius = radius self.point_count = 0 def in_circle(self, point_x, point_y): return point_x**2 + point_y**2 < self.radius**2 def increment_point_count(self, point_x, point_y, increment=1): if self.in_circle(point_x, point_y): self.point_count += increment def return_point_count(self): return self.point_count if __name__ == '__main__': axis_values = [] pi_values = [] absolute_error_values = [] percent_error_values = [] for _ in range(1,3000,30): pi_calc = pi_calculator(1, _) pi_calc.scatter_points() print "Number of iterations: %d Accuracy: %.5f" % (pi_calc.return_iterations(), math.fabs(pi_calc.calculate_accuracy(pi_calc.return_pi())[0])) axis_values.append(_) pi_values.append(pi_calc.return_pi()) absolute_error_values.append(math.fabs(pi_calc.calculate_accuracy(pi_calc.return_pi())[0])) percent_error_values.append(math.fabs(pi_calc.calculate_accuracy(pi_calc.return_pi())[1])) improvement_per_iteration = [absolute_error_values[index] - absolute_error_values[index-1] for index, value in enumerate(absolute_error_values) if index > 0] fig = plt.figure() fig.suptitle('Calculating Pi - Monte Carlo Method') ax1 = fig.add_subplot(2,2,1) ax2 = fig.add_subplot(2,2,2) ax3 = fig.add_subplot(2,2,3) ax4 = fig.add_subplot(2,2,4) plt.subplots_adjust(wspace=0.3, hspace=0.3) ax1.set_xticklabels([str(entry) for entry in axis_values[::len(axis_values)/5]], rotation=30, fontsize='small') ax1.set_xlabel('Iterations') ax1.set_ylabel('Calculated value of Pi') ax1.plot(pi_values, 'k') ax1.plot([math.pi for entry in axis_values], 'r') ax2.set_ylabel('Absolute error') ax2.set_xticklabels([str(entry) for entry in axis_values[::len(axis_values)/5]], rotation=30, fontsize='small') ax2.set_xlabel('Iterations') ax2.plot(absolute_error_values, 'k', label="Total Error") ax3.set_ylabel('Absolute percentage error (%)') ax3.set_xticklabels([str(entry) for entry in axis_values[::len(axis_values)/5]], rotation=30, fontsize='small') ax3.set_xlabel('Iterations') ax3.plot(percent_error_values, 'k', label="Percent Error") ax4.set_ylabel('Absolute improvement per iteration') ax4.set_xticklabels([str(entry) for entry in axis_values[::len(axis_values)/5]], rotation=30, fontsize='small') ax4.set_xlabel('Iterations') ax4.plot(improvement_per_iteration, 'k', label="Absolute change") plt.savefig('pi_calculation.png') plt.show()

Hi all,

Bit of a different blog coming up – in a previous post I used Markov Clustering and said I’d write a follow-up post on what it was and why you might want to use it. Well, here I am. And here you are. So let’s begin:

In the simplest explanation, imagine an island. The island is connected to a whole bunch of other islands by bridges. The bridges are made out of bricks. Nothing nasty so far – apart from the leader of all the islands. They’re a ‘man versus superman’, ‘survival of the fittest’ sort and so one day the issue a proclamation. “Every day a brick will be taken from every bridge connected to your island and the bricks will be reapportioned on your island back to the bridges, in proportion to the remaining number of bricks in the bridge.”

At first, nobody is especially worried – each day, a brick disappears and then reappears on a different bridge on the island. Some of the islands notice some bridges getting three or four bricks back each day. Some hardly ever seem to see a brick added back to their bridge. Can you see where this will lead in 1000 years? In time, some of the bridges (the smallest ones to start off with) fall apart and end up with no bricks at all. If this is the only way between two islands, these islands become cut off entirely from each other.

This is basically Markov clustering.

For a more mathematical explanation:

Let’s start with a (transition) matrix:

import numpy as np
transition_matrix = np.matrix([[0,0.97,0.5],[0.2,0,0.5],[0.8,0.03,0]])

$latex Transition Matrix = begin{matrix}
0 & 0.97 & 0.5 \
0.2 & 0 & 0.5 \
0.8 & 0.03 & 0
end{matrix} $

In the above ‘islands’ picture those numbers represent the number of bricks in the bridges between islands A, B and C. In the random-walk interpretation, those are the probabilities that you’ll end up at each node as the number of random walks tends to infinity. In my previous post on house prices, I used a correlation matrix.

First things first – I’m going to stick a one in each of the identity areas. If you’re interested in why that is, have a read around self-loops and, even better, try this out both with and without self-loops. It sort of fits in nicely with the above islands picture but that’s more of a fluke than anything else – there’s always the strongest bridge possible between an island and itself. The land. Anyway…

np.fill_diagonal(transition_matrix, 1)

$latex Transition Matrix = begin{matrix}
1 & 0.97 & 0.5 \
0.2 & 1 & 0.5 \
0.8 & 0.03 & 1
end{matrix} $

Now let’s normalize – make sure each column sums to 1:

transition_matrix = transition_matrix/np.sum(transition_matrix, axis=0)

$latex Transition Matrix = begin{matrix}
0.5 & 0.485 & 0.25 \
0.1 & 0.5 & 0.25 \
0.4 & 0.015 & 0.5
end{matrix} $

Now we perform an expansion step – that is, we raise the matrix to a power (I’ll use two – you can change this parameter – in the ‘random-walk’ picture this can be thought of as varying how far a person can walk from their original island).

transition_matrix = np.linalg.matrix_power(transition_matrix, 2)

$latex Expanded Matrix = begin{matrix}
0.3985 & 0.48875 & 0.37125 \
0.2 & 0.30225 & 0.275 \
0.4015 & 0.209 & 0.35375
end{matrix} $

Then we perform the inflation step – This involves multiplying each element in the matrix by itself (to a power) and then normalizing on column again. Again, I’ll be using two as a power – increasing this leads to a greater number of smaller clusters:

for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
    entry[...] = entry ** 2

$latex Inflated Matrix = begin{matrix}
0.15880225 & 0.23887556 & 0.13782656 \
0.04 & 0.09135506 & 0.075625 \
0.16120225 & 0.043681 & 0.12513906
end{matrix} $

Finally (for this iteration) – we’ll normalize by row.

transition_matrix = transition_matrix/np.sum(transition_matrix, axis=0)

$latex Normalized Matrix = begin{matrix}
0.44111185 & 0.63885664 & 0.40705959 \
0.11110972 & 0.24432195 & 0.22335232 \
0.44777843 & 0.11682141 & 0.36958809
end{matrix} $

And it’s basically that simple. Now all we need to do is rinse and repeat the expansion, inflation and normalization until we hit a stable(ish) solution i.e.

$latex
Normalized Matrix_{n+1} – Normalized Matrix_n < epsilon $
for some small epsilon.

Once we’ve done this (with this particular matrix) we should see something like this:

$latex Final Matrix = begin{matrix}
1 & 1 & 1 \
0 & 0 & 0 \
0 & 0 & 0
end{matrix} $

Doesn’t look like a brilliant result be we only started with a tiny matrix. In this case we have all three nodes belonging to one cluster. The first node (the first row) is the ‘attractor’ – as it has values in its row it is attracting itself and the second and third row (the columns). If we were to end up with the following result (from a given initial matrix):

$latex Final Matrix = begin{matrix}
1 & 0 & 1 & 0 & 0\
0 & 1 & 0 & 1 & 0\
0 & 0 & 0 & 0 & 0\
0 & 0 & 0 & 0 & 0\
0 & 0 & 0 & 0 & 1\
end{matrix} $

This basically says that we have three clusters {1,3} (with 1 as the attractor), {2,4} (with 2 as the attractor) and {5} on its lonesome.

Instead of letting you piece all that together here’s the code for Markov Clustering in Python:

import numpy as np
import math
## How far you'd like your random-walkers to go (bigger number -> more walking)
EXPANSION_POWER = 2
## How tightly clustered you'd like your final picture to be (bigger number -> more clusters)
INFLATION_POWER = 2
## If you can manage 100 iterations then do so - otherwise, check you've hit a stable end-point.
ITERATION_COUNT = 100
def normalize(matrix):
    return matrix/np.sum(matrix, axis=0)

def expand(matrix, power)
    return np.linalg.matrix_power(matrix, power)

def inflate(matrix, power):
    for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
        entry[...] = math.pow(entry, power)
    return matrix

def run(matrix):
    np.fill_diagonal(matrix, 1)
    matrix = normalize(matrix)
    for _ in range(ITERATION_COUNT):
        matrix = normalize(inflate(expand(matrix, EXPANSION_POWER), INFLATION_POWER))
    return matrix

If you were in the mood to improve it you could write something that’d check for convergence for you and terminate once you’d achieved a stable solution. You could also write a function to perform the cluster interpretation for you.

As you were.

Category: Mathematics

Estimating Pi using the Monte Carlo Method in Python

Markov Clustering – What is it and why use it?

Bloggers

Recent Posts

Categories