-3
$\begingroup$

The problem while learning R I faced is the following:

There are two places you can place your ads. The first one(A) claims that it attracts visitors more quickly than the second one(B). The given data represents the hours needed to attract 1000 visitors:

A: 3,6,6,4,4,5...

B: 6,5,4,4,5,6...

This means that A needs 3 hours to attract 1000 visitors, after that it needs 6 hours to attract 1000 more etc. Can we accept as true the claim of A?

I do not really need the concrete answer but an idea how to solve it in a more general case and write the code in R.

  • 0
    Are you willing to assume data are normal? You want a 2-sample Welch t-test?2017-01-20
  • 0
    The whole data is not normally distributed.2017-01-21
  • 0
    If not _too_ far from normal, with large enough sample sizes Welch t may still be OK. I showed nonparametric Wilcoxon SR. Are data exponential waiting times? Question a bit vague as it stands. Got a plot of the data? Can't tell much from brief bits posted. Please give me a clue what you're looking for. Passably good at R; mind reading not so much.2017-01-21
  • 1
    You have two questions. The first is for an algorithm to test the hypothesis and the second is for an R implementation of that algorithm. For the first question, you haven't provided enough data to design an algorithm to confidently test the hypothesis, IMO. If you do develop a good algorithm, expressing it in just about any language should be easy.2017-01-21

1 Answers 1

1

If you want a 2-sample Welch t-test, here it is.

a = rnorm(50, 6, .9)
b = rnorm(55, 5, 1)
t.test(a,b)

        Welch Two Sample t-test

data:  a and b 
t = 5.046, df = 102.573, p-value = 1.959e-06
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 0.6096795 1.3993312 
sample estimates:
mean of x mean of y 
 5.867697  4.863191 

Two-sided and Welch are defaults. Use parameters to pool or for one-sided alternatives.

If data are in a 'stacked' format, then it's like this:

all = c(a,b)
gp = as.factor(rep(1:2, times = c(50,55)))
t.test(all ~ gp)

        Welch Two Sample t-test

data:  all by gp 
t = 5.046, df = 102.573, p-value = 1.959e-06
...

If you question normality (and 50ish is still a 'small' sample to you), then you can do Mann-Whitney-Wilcoxon rank sum test for difference in medians.

 wilcox.test(a, b)

        Wilcoxon rank sum test with continuity correction

data:  a and b 
W = 2091, p-value = 4.417e-06
alternative hypothesis: true location shift is not equal to 0 

wilcox.test(all ~ gp)

       Wilcoxon rank sum test with continuity correction

data:  all by gp 
W = 2091, p-value = 4.417e-06
alternative hypothesis: true location shift is not equal to 0 

Data display:

 stripchart(all ~ gp, pch="|", ylim=c(0,3), col=c("blue","maroon"))

enter image description here

plot(density(a), col="blue", lwd=2, xlim=c(0,10), ylim=c(0,.5), 
   xlab="A (blue), B (maroon)",main="Density Estimators for A an B")
lines(density(b), col="maroon", lwd=2)

enter image description here


If you had something like a permutation test in mind, then please give particulars; I'll look again in several hours.