I'm not an expert in this, so please take the following with a grain of salt:
Let me give a brief survey of the state of things, to the best of my knowledge:
Currently it is unknown whether NF is consistent relative to ZFC, or even ZFC + large cardinals. Both Gabbay and Holmes announced proofs of its consistency, indeed from much less than ZFC; however, Gabbay withdrew his proof after finding a flaw, and my understanding is that Holmes' proof has not been vetted fully by the community (although I could be wrong).
Versions of NF, however, are known to be consistent:
The most natural variant is NFU - NF with urelements. This is known to be consistent relative to the Theory of Simple Types; see the wikipedia page or this paper by Boffa for details. My understanding is that this was first proved by Jensen.
Because it's neat, let me explain the basic idea of the consistency proof of NFU here. Consider a model $M$ of ZF, and a nontrivial automorphism $j$ of $M$ such that for some ordinal $\alpha$, $j(\alpha)<\alpha$ (note that this necessarily means that $M$ is illfounded). We may now consider the relation $\varepsilon$ on $M$ given by $$x\varepsilon y\iff j(x)\in y\wedge y\in V_{j(\alpha)+1}$$ where "$\in$" refers to the usual membership relation of $M$. We then restrict the relation $\varepsilon$ to an appropriate subclass of the model $M$: namely, we consider the structure $$N=(V_\alpha^M, \varepsilon).$$ It can be shown that $N$ is a model of $NFU$; the urelements are those $y\in V_\alpha^M$ which are not in $V_{j(\alpha)+1}$, since these are prevented from having $\varepsilon$-elements in $N$ trivially by the second clause of the definition of $\varepsilon$.
This establishes the consistency of $NFU$ relative to $ZF$; to bring the consistency strength down, one examines the proof in detail and notices that $ZF$ may be replaced with a much weaker theory.
What else can we do?
Various axioms can be added to strengthen $NFU$. For example, unlike $NF$, which is known to disprove Choice, NFU+Choice is known to be consistent, again relative to TST. And we can go further: unlike NF, NFU does not prove the Axiom of Infinity (and versions of $NFU$ where every set is finite have been studied, I believe by Visser and others), but Infinity can be added without significantly increasing the consistency strength. (Interestingly, my understanding is that the proof of Infinity in NF goes through Specker's disproof of Choice! So it's highly nontrivial.)
We can also look at other restrictions of $NF$. For instance, we can restrict the number of variables which are allowed to occur in a formula to which stratified comprehension applies.
What about other ill-founded set theories, or set theories with urelements?
ZF, or ZFC, with urelements has been extensively studied. The proof that Choice is independent of ZFA (ZF with urelements, or atoms) is vastly simpler than that for ZF; it uses permutation models developed by Fraenkel and Mostowski. Permutation models and their connections with classical ZF were studied in depth by Hall. Along the same lines, the subtheory KP of ZF is often equipped with urelements, yielding the theory KPU. All these theories are consistent relative to their non-urelement-versions, and conversely; that is, in the ZF-style setting, unlike the NF-style setting, urelements don't change the complexity too much.
We can also equip ZF(C) with antifoundation axioms, the most common being Aczel's and Boffa's. Aczel's axiom implies that there is exactly one Quine atom, while Boffa's implies that there are many. Interestingly, Boffa's axiom has been used to provide an interesting context for nonstandard analysis.
As to sources:
Certainly Forster's book is a good resource. For NF(etc.) in particular, Randall Holmes' website is an invaluable resource.
My understanding is that Barwise and Etchemendy's book The Liar is a good introduction to nonwellfounded set theories, although I haven't actually read it.