The camera position is that of the eye (same for all displays). The azimuth and elevation correspond to the direction from eye perpendiculary to the display. If that line does not hit the display at its center, this offset must be taken into account and the image shifted. Depending on how the azimuth/elevation transformation is performed and how your displays are positioned, you may also need to tilt the image accordingly. The distance between eye and display determines the projection offset (is that what is called focal length in spite of the absence of any lenses?) and together with the display resolution it determines the view angle.
Ultimately, you will need and find for each display a $3\times 3$ matrix $B_i$ such that the procdure is as follows:
Depending on the main camera position, there is an affine isometric transformation $v\mapsto A v + b$ with the property that the camera position maps to $0$, the looking direction maps to the positive $z$ axis and the sagittal plane is transformed to the $xy$ plane. Then on display $i$, the 3d point $v$ corresponds to $(x/z, y/z)$ provided $z>0$, with $ \left(\begin{matrix}x\\y\\z\end{matrix}\right)=B_i(A v + b).$ Note that only $A$ and $v$ while navigating throuhg the 3d model, whereas the $B_i$ are fixed (determined by the display and user seat arrangement).