Since your input points are given in single precision, and you can almost certainly do your computation with double-precision intermediates, your main source of error will be uncertainty in the exact position of the input points.
Ideally, I think you should want to compute the normal vector as the cross product of two diagonals: $(v_i-v_j)\times(v_k-v_l)$ with $i,j,k,l$ chosen to maximize the magnitude of the cross product. This at once selects against using very short diagonals (whose direction is more uncertain than long ones), and against using nearly parallel diagonals (where small errors in the directions of each diagonal could lead to large errors in the final normal). However, there are on the order of $n^4$ possible combinations of diagonals, so unless you have unlimited time or $n$ is always small, you don't want to try all combinations.
One possible heuristic would be to choose $i$ and $j$ as the endpoints of the longest diagonal (which can be found in $O(n^2)$ time) and then try cross products with all combinations of $k$ and $l$, also in $O(n^2)$ time.
Faster but more crudely, choose $i$ arbitrarily, then choose $j$ so the length of $v_i-v_j$ is maximal. Choose $k$ to maximize the distance from $v_k$ to the diagonal $v_iv_j$, and finally choose $l$ to maximize the cross product.
In any case, don't attempt to translate to polygon anywhere before your computations. That will just lead to intermediates of the form $(a+t)-(b+t)$ rather than $a-b$, and the former cannot possibly give better results than the latter. At best it is harmless.