1
$\begingroup$

I'm tracking data on a backup job that runs nightly on our server and using the historical data to predict data and job time growth. I have the following three data points for most of the records: Data backed up (in Bytes), Total Job Time (hh:mm:ss), and Transfer Speed (in Bytes/Minute).

Total Job Time does not equal (Data Backed Up)/(Transfer Speed) because there is necessary overhead for the job starting, transitioning, and completing. I have created a fourth data point, Work Time, recording the time spent actually transfering data created using the above formula, but this does not appear to relate directly or consistently with the Total Job Time. Server use, network latency, and other resource-bound factors all affect the relationship.

For some of the older records I am missing the Transfer Speed data and I would like to know what formula I need to apply to the other two data points (Data and Job Time) to make a reasonable guess as to what the Transfer Speed might have been.

Below is a representative sample of the data, I've converted all the to bytes and minutes for ease of calculation:

Data(bytes)   Time(min) Speed(bytes/min) 383542111073    381.22  1273000000 383676323632    382.72  1267000000 383875888842    378.55  1283000000 384088122257    382.15  1268000000 384247013724    378.40  1282000000 384457413287    378.68  1285000000 384652849842    381.42  1272000000 384973213219    380.15  1278000000 385188544442    380.13  1280000000 385504302010    377.80  1291000000 385628091021    377.97  1289000000 386061561686    384.77  1264000000 386853481337    383.98  1270000000 387117610212    381.90  1278000000 387679368117    385.80  1262000000 388015187994    386.50  1261000000 388240874769    385.20  1265000000 391312996783    383.15  1282000000 392497055973    384.73  1280000000 392877252269    387.13  1269000000 392988498970    386.52  1274000000 393236837467    385.33  1279000000 392386489223    366.32  1363000000 392626640464    370.68  1341000000 392772670262    366.68  1363000000 391049505322    366.60  1360000000 391308127859    365.62  1362000000 391683916463    365.53  1367000000 391868818660    367.87  1355000000 392029291293    366.82  1356000000 392028073259    370.40  1341000000 392143518314    366.07  1365000000 

For any given combination of Data and Time, I'd like to be able to guess Speed.

UPDATE for comment regarding graphing:

I have graphed the data, but probably because I forgot most of my math in order to focus on technology as a career, I'm not exactly sure how the graph type will point me towards a particular function. The graph of the entire data set is below. Note how the MB/Min and Work Time data is missing from mid-July and before.

Part of the problem with my (meager, unpracticed) thoughts on what formula is best is that a month into the data collection I changed the time at which the backup occured which, by moving it to a time period when fewer things were running on the server lowered the resulting time by what appears to be 12 minutes. You can see that in the data set above where the last 7 time values are clustered around the high 360's and the points above are closer to 380.

The entire data set

  • 0
    Thanks @ShreevatsaR for the SO link. I'm looking through it right now. I had needlessly complicated the issue in my mind, assuming there may have been more than one "x" I'd have been solving for.2011-09-20

1 Answers 1

1

Having copied data from your post, 3D plot suggest that data-points are on a hyper-plane, which suggests a fitting model. I am using Mathematica:

enter image description here

Added: Per OP's additional question, the model does change a little with 7-th record removed:

In[37]:= x3 == Exp[c] x1^a x2^b /.   FindFit[Log[data], a logx1 + b logx2 + c, {a, b, c}, {logx1, logx2}]  Out[37]= x3 == (1444.8 x1^0.800888)/x2^1.29128  In[38]:= x3 == Exp[c] x1^a x2^b /.   FindFit[Delete[Log[data], 7],    a logx1 + b logx2 + c, {a, b, c}, {logx1, logx2}]  Out[38]= x3 == (1581.69 x1^0.797488)/x2^1.29123 
  • 0
    Ok, I was worried because when I translated that into formulas in my spreadsheet the resulting values were either 6 or 7 for the entire data set. But I must've fat fingered something because the second time I tried it worked and the resulting values appear to be within acceptable ranges for the rest of the data. Thank you very much Sasha and @ShreevatsaR. ShreevatsaR, if you would post your comments as an answer, I'd be happy to give you an upvote for your assistance.2011-09-20