At time $t_{0}$ I have $N$ server instances running. Each instance is at $100\%$ CPU utilization. Collectively, the instances are attempting to respond to $R_{0}$ web requests. Some unknown percent of $R_{0}$ requests are being ignored because all the instances are under too much load.
At time $t_{1}$, the load lessens, and all $N$ instances are now at only $50\%$ CPU utilization. Collectively, the instances are now successfully responding to $100\%$ of $R_{1}$ web requests. Note $R_{1}$ < $R_{0}$.
How many additional server instances ($N_{\text{additional}}$ so that $N_{\text{new}} = N + N_{\text{additional}}$) should you create so that should $R_{0}$ traffic occur again, CPU utilization on each instance will be no more than $75\%$?
My approach is to use the performance under $50\%$ load to calculate the predicted maximum number of requests per instance, and then divide the $R_{0}$ by this amount to get the estimated number of instances needed, multiplying by some amount to over-estimate to get the $75\%$ max CPU utilization.
e.g.
Maximum $R$ PerInstance $= 100 \times \frac{\frac{R_1}{N}}{50}$
Predicted Number of Instances $= \frac{\text{Predicted R}}{\text{Max } R \text{ Per Instance}} \times 1.25$
Does that seem like the right approach?