If shuffle-playing playlist ×100 resulted in [10 13 10 3 2 2] different songs being repeated [1 2 3 4 5 6] times, what is the estimate for the total number of songs? (assuming shuffle play was completely random)
Update: (R code)
k <- 50 # k number of songs on the disk indexed 1:k n <- 100 # n number of random song selections m <- 20 # m number of repeat experiments colnum <- 10 mat <- matrix(data=NA,nrow=m,ncol=colnum) df <- as.data.frame(mat) for(i in 1:m){ played <- 1+floor(k*runif(n)) # actual song indices (1:k) selected freq <- sapply(1:k,function(x){sum(played==x)}) # = number of times song with index x is being played histo <- sapply(1:colnum,function(x){sum(freq==x)}); for(j in 1:colnum){ df[i,j] <- histo[j] } } df
Resulting in: e.g. 20 distributions (V1=number of single plays, V2=number of double plays, etc):
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 15 13 11 1 2 2 0 0 0 0 2 15 12 10 1 4 0 1 0 0 0 3 12 14 7 6 3 0 0 0 0 0 4 17 16 6 4 2 0 1 0 0 0 5 17 10 12 5 0 0 1 0 0 0 6 13 15 11 6 0 0 0 0 0 0 7 10 14 9 3 2 1 1 0 0 0 8 12 17 5 6 3 0 0 0 0 0 9 9 19 8 3 1 2 0 0 0 0 10 13 9 11 6 1 0 1 0 0 0 11 16 9 12 5 2 0 0 0 0 0 12 15 9 11 6 2 0 0 0 0 0 13 19 9 7 4 4 1 0 0 0 0 14 17 11 4 7 3 1 0 0 0 0 15 11 20 8 1 3 1 0 0 0 0 16 14 12 10 5 0 2 0 0 0 0 17 9 12 8 7 3 0 0 0 0 0 18 10 15 9 4 2 0 1 0 0 0 19 14 11 12 7 0 0 0 0 0 0 20 16 14 11 3 1 1 0 0 0 0
Now I need to get from here to the Poisson modelling--my R is a bit rusty (?lmer
)...--Any help would be appreciated...
Attempted Poisson modelling: disappointing fit?!
plot(1:colnum,df[1,1:colnum],ylim=c(0,30), type="l",xlab="repeats",ylab="count") for(i in 1:m){ clr <- rainbow(m)[i] lines(1:colnum,df[i,1:colnum],type="l",col=clr) points(1:colnum,df[i,1:colnum],col=clr) } df.lambda=data.frame(lambda=seq(1,5,0.1),ssq=c(NA));df.lambda for(ii in 1:dim(df.lambda)[1]){ l <- df.lambda$lambda[ii] ssq <- 0 for(i in 1:20){ for(j in 1:10){ ssq <- ssq + (df[i,j] - n*dpois(j,l))^2 } } print(ssq) df.lambda$lambda[ii] <- l df.lambda$ssq[ii] <- ssq } df.lambda lambda.est <- df.lambda$lambda[which.min(df.lambda$ssq)] lambda.est # 2.4 points(x <- 1:10, n*dpois(1:10,lambda.est),type="l",lwd=2) 100*dpois(1:10,3) n/lambda.est
the estimated lambda stays around 2.3, with an n estimate of around 43; the fitted curve seems very discrepant, and seems to worsen with rising n !? Doesn't this have to do with the fact, that our repeats are different from the 'classical' Poisson distributions: it's not just ONE event that repeats itself x number of times, but the sum of repeats of different items (songs)?!