Can someone explain the math involving with the performance of a parallelized implementation of mergesort?
There's another question that talks about the performance of mergesort without parallism here: Merge Sort time complexity analysis
I'm thinking that with each level of splitting, you can increase performance by a factor of two (double the cores you throw at this problem). Then, with each level of merging, you can use only half as many cores, so the increase in performance is halved for each level of merging.
So, if N is the number of elements in the unsorted array, for levels 1 to log(base 2)N, we will be increasing performance by a factor of two. Then from log(base 2)N + 1 to 2log(base 2)N, we will be decreasing performance by a factor of two. I don't know where to go from here... thoughts?