1
$\begingroup$

The shortest version of the question:

How find which distribution does the inter-arrivals follow by having the arrival times?


Question in more depth

I did my best to explain the problem briefly but provide enough detail to make it easy to understand. So, please let me know if something needs more explanation.

I am working on modeling the arrival times. I have some devices which collect some information about the passing objects and records the arrival times in the form of aggregation per minute. For example

2 objects arrived at t1, t1=60 sec

3 objects arrived at t2, t2=120 sec

I am interested to see what distribution the inter-arrivals (not arrival-times) of these objects have, and I have some difficulty to find the best approach.


Proposed solution

I am thinking of these three approaches and appreciate any help:

  1. Fit the collected arrival times and see what distribution these arrival times have. Well, this gives me the best distribution for arrival time not inter-arrival. So, I could not consider this.

  2. Use uniform distribution to randomly distribute the arrival times within the two consecutive arrival times, sort them, and repeat this process for all the arrival times, and then subtract them to obtain the inter-arrival times.Then do the left censoring for any inter-arrival less than 60 to neutral the effect of manipulation, which is distributing the arrival uniformly. Then fit these differences, censored inter-arrival times, by using fitdistrplus in r. For example,

randomnumbers= a null list

for t1,

randomnumbers=c(sort(uniform(2,min=0, max=t1)),randomnumbers)

for t2,

 randomnumbers=c( sort(uniform(3,min=t1, max=t2)),randomnumbers)

and then to obtain inter-arrival times:

diffvect= a null list for obtaining the inter-arrival times

for(irv in 2:length(randomnumbers))
{
    diffvect=c(diffvect,(abs(randomnumbers[irv]-randomnumbers[irv-1])))
}

left censoring anything less than 60 (sec), and then fit and get the minimum of AIC (Akaike Information Criterion).

I am not the fan of this approach as if I have a large numbers of arrival times per minutes (say 30, this means objects arrive every 2 sec), and if I do the left censoring, I will throw out most of my inter-arrivals ,which have the difference less than 60 (stored in diffvect). why? because the difference between two consecutive arrival for most objects is about 2 seconds in this case (less than 60). So this approach might only work for the cases when the difference between two consecutive arrivals are greater or equal than left censoring level which is the aggregation level (in this case 60 sec).

  1. This is another version of the above step, If I am interested to see whether the best distribution is gamma or lognormal, instead of using the uniform distribution in step 2, I can generate the random numbers by gamma, sort them, add them to form the inter-arrival times, and then subtract them and see the output of fitting (AIC), similar to step 2. Do this process one more time and this time instead of gamma, use lognormal. Note:I can calculate the parameters of each distribution by setting the variance=typical variance in my data, and mean=Number of recorded arrival time per min/60

For example

randomnumbers=randomnumbers_1=Arrival_randomnumbers= a null list

for t1,

randomnumbers_1=(sort(gamma(2,shape=calc_shape,scale=calc_scale)))
for(wiat in 1:length(randomnumbers))#form arrival-times
    {
        if(wiat==1)
        {
            Arrival_randomnumbers[wiat]=randomnumbers_1[wiat]
        }
        else
        {
            Arrival_randomnumbers[wiat]=Arrival_randomnumbers[wiat-1]+randomnumbers_1[wiat]
        }
    }

    randomnumbers=c(Arrival_randomnumbers,randomnumbers)

for t2,

repeat the above step with subsituting the approriate numbers and parameters.

then

diffvect= a null list for obtaining the inter-arrival times
for(irv in 2:length(randomnumbers))
{
    diffvect=c(diffvect,(abs(randomnumbers[irv]-randomnumbers[irv-1])))
}

and then see

if min_aic belongs to lognormal and we used lognormal, then lognormal is the correct distribution

if min_aic belongs to gamma and we used gamma, then gamma is the correct distribution.

  • 0
    Shall I move this question to the 'cross validated' forum? or this is on the right forum? Thanks2017-02-17

0 Answers 0