An answer you would probably receive in a game dev forum:
It sounds like your using a Y up system and pitching around the world X axis. Typically, for a game camera where you only want yaw & pitch, usually the yaw is desired about the world's vertical axis but pitch is desired about the camera's local horizontal axis(local X axis). When the camera orientation is in the identity orientation, the camera's local horizontal axis happens to be aligned with the world horizontal axis (usually the X axis) but when yaw is applied the local horizontal axis rotates away from the world X axis. If you then apply pitch around the world X axis, it can cause a rotation with a Z axis component to it. But if you apply the pitch rotation first (before the yaw) while the local and world X axes are aligned, there will be no Z component to the rotation.
This pitch rotation causes the local Up axis to rotate away from the world Up axis too but you really don't care because you always yaw around the world up axis anyway.
So try applying pitch first, then yaw and see if it helps.
or... calculate the local horizontal axis after yawing and use that axis to pitch around instead of the world X axis.