1
$\begingroup$

I have chemistry data of rocks, with assigned age intervals. What I want to do is plot these/analyse these together. My problem is how to deal with this singular data point specified over different time intervals.

e.g.:

Data1: Silica = $76$, age = $2200$ - $2400$ Mya

Data2: Silica = $64$, age = $2100$ - $2300$ Mya

If I want to analyse data between an age range of e.g. $2100$ - $2250$ Mya, how would I go about doing this? Do I assign the value across the entire range, or do I do some kind of normal distribution of the data across the range. I believe the age ranges are confidence limits on whatever age analysis was done (assume $95\%$). If i assign it across the entire interval, it means the Data1 will be 'counted' twice in any analysis e.g. if i choose to look at $2200$ - $2300$ Mya averages/analysis, and then $2300$ - $2400$ Mya. If i treat it as a distribution though, the value i plot will be lower than the measured value - but is this the correct way to deal with it?

A diagram to maybe help explain what I mean: Overlapping time ranges, with a single value for each range. Do I apply a weight/distribution to the value over the range like below, or do i apply the measured value across the entire range.

  Silica↑     _
            _   _
          _       _
        |-----------|
      2400         2200
                    _
                  _   _
                _       _
              |------------|
             2300         2100
  Time→
  • 0
    I'm having a bit of trouble understanding the problem. You are trying to make a plot of Silica vs time and if you just had data points for time instead of intervals you would presumably use some kind of binning/smoothing to make the plot? But instead you have intervals? It would be good to have an explanation of what you mean by 'analyze between 2100-2250'.2017-01-05
  • 0
    Well as a simplest example to what I'm worried about - say I want to know the average Silica between 2400-2250 Mya. In my diagram, the second range is only barely crossing into this region, so i would imagine i should 'weight' the value for this point less than the first (as the age ranges are 95% confidence intervals for its actual age).2017-01-06
  • 0
    Yeah, that seems right. Sorry, but I'm unsure of the proper procedure. My wild guess would be to use a gaussian kernel of the appropriate width (so 95% is in the 95% CI) for each point and then use the total area of the gaussian in the range as the weight in the weighted average. Might be better to ask over at Cross Validated.2017-01-06

0 Answers 0