Saturday, July 17, 2010

Bayes Election Prediction R Code, Gelman, Tokdar

Attached is the R code that calculates a prediction for 1992 presidential election based on methods found in Andrew Gelman's Bayesian Data Analysis book. The R code itself is written by Dr. Surya Tokdar; however the links at CMU that housed this and his other lecture related materials were all gone. I salvaged what I could from Google cache, removed all plotting, graphics related portions of the code, leaving only MCMC Gibbs calculations intact. Data was also missing at the CMU link, so I recreated both data files based on Andrew Gelman's original presidential.asc file. Still, the R chol() call (for Cholesky decomposition) gave errors, then I replaced all NA values with 0's. Code finally worked with this recent fix, and its results seem correct -- a Democrat win is predicted for most states at year 1992 which actually was the case. The code and data files can be found below. Let me know of your questions and comments.

Code

Monday, July 5, 2010

Change Point Analysis using MCMC Gibbs Sampling on Coal Mining Data (in Python)

The code is here. This analysis is performed on British coal mining accident data, which is included in the zip file as well. The function coal() performs change point analysis using MCMC Gibbs sampling which models the data using two Poisson distributions. Change point analysis finds the point k when the second Poisson distribution, instead of the first comes into effect which means a change in regulations, etc. at that point in time. I wrote the Python script looking at Peter Neil's R code from Manchester University.

Also included is the same analysis performed using Bugs (JAGS). This code is adapted from Bayesian Computation with R book.

Sampling Indices From Weight Vector

Here is the code
def w_choice(lst):
n = random.uniform(0, 1)
for item, weight in enumerate(lst):
if n < weight:
break
n = n - weight
return item

# testing
prob = [0.1, 0.2, 0.5, 0.2]
print w_choice(prob)
This function will return the indeces from a sequence depending on the weights present in that sequence. We modified this code here to get the function above.

The code above assumes normalized weight vector, that is, all probability values in the vector should add to one. If the parameter passed into w_choice is not normalized, then this normalization can be performed with a single line of Python code at the beginning of w_choice:
lst = lst / sum(lst)