I'm tracking data on a backup job that runs nightly on our server and using the historical data to predict data and job time growth. I have the following three data points for most of the records: Data backed up (in Bytes), Total Job Time (hh:mm:ss), and Transfer Speed (in Bytes/Minute).
Total Job Time does not equal (Data Backed Up)/(Transfer Speed) because there is necessary overhead for the job starting, transitioning, and completing. I have created a fourth data point, Work Time, recording the time spent actually transfering data created using the above formula, but this does not appear to relate directly or consistently with the Total Job Time. Server use, network latency, and other resource-bound factors all affect the relationship.
For some of the older records I am missing the Transfer Speed data and I would like to know what formula I need to apply to the other two data points (Data and Job Time) to make a reasonable guess as to what the Transfer Speed might have been.
Below is a representative sample of the data, I've converted all the to bytes and minutes for ease of calculation:
Data(bytes) Time(min) Speed(bytes/min) 383542111073 381.22 1273000000 383676323632 382.72 1267000000 383875888842 378.55 1283000000 384088122257 382.15 1268000000 384247013724 378.40 1282000000 384457413287 378.68 1285000000 384652849842 381.42 1272000000 384973213219 380.15 1278000000 385188544442 380.13 1280000000 385504302010 377.80 1291000000 385628091021 377.97 1289000000 386061561686 384.77 1264000000 386853481337 383.98 1270000000 387117610212 381.90 1278000000 387679368117 385.80 1262000000 388015187994 386.50 1261000000 388240874769 385.20 1265000000 391312996783 383.15 1282000000 392497055973 384.73 1280000000 392877252269 387.13 1269000000 392988498970 386.52 1274000000 393236837467 385.33 1279000000 392386489223 366.32 1363000000 392626640464 370.68 1341000000 392772670262 366.68 1363000000 391049505322 366.60 1360000000 391308127859 365.62 1362000000 391683916463 365.53 1367000000 391868818660 367.87 1355000000 392029291293 366.82 1356000000 392028073259 370.40 1341000000 392143518314 366.07 1365000000
For any given combination of Data and Time, I'd like to be able to guess Speed.
UPDATE for comment regarding graphing:
I have graphed the data, but probably because I forgot most of my math in order to focus on technology as a career, I'm not exactly sure how the graph type will point me towards a particular function. The graph of the entire data set is below. Note how the MB/Min and Work Time data is missing from mid-July and before.
Part of the problem with my (meager, unpracticed) thoughts on what formula is best is that a month into the data collection I changed the time at which the backup occured which, by moving it to a time period when fewer things were running on the server lowered the resulting time by what appears to be 12 minutes. You can see that in the data set above where the last 7 time values are clustered around the high 360's and the points above are closer to 380.