In a sense $\epsilon-$regularity is one way of capturing what it means for a graph to be "random-like".  For comparison, let's suppose that $G$ is a random bipartite graph on two large subsets $A$ and $B$ having edge probability $p$ (meaning that every edge appears with probability $p$ independently; here $p$ is fixed).  Here's several ways in which you might try and say $G$ really does "look random".
-The edges of $G$ are spread out evenly -- there aren't any huge clusters or huge empty spaces in the graph. 
-Each small bipartite graph $H$ appears in $G$ about the number of times you expect it to (roughly $p^{E(H)}$ times the number of appearances in the complete bipartite graph).
-Almost all vertices in $A$ have degree roughly $p |B|$, and almost all pairs of vertices in $A$ have roughly $p^2 |B|$ common neighbors (being a neighbor of one vertex doesn't make you significantly more or less likely to be a neighbor of another vertex).  The same holds if you swap $A$ and $B$.  
As it turns out, the above three characterizations are actually equivalent, if quantified in the right way (this is a phenomenon known as "quasirandomness").  The idea of $\epsilon-$regularity is to quantify the first of these properties.  
For each $S \subseteq A$ and $T \subseteq B$, our random graph is expected to have $p|S| |T|$ edges connecting $S$ with $T$. Rewording this slightly, the density $d(S,T):=\frac{|E(S,T)|}{|S||T|}$ is on average equal to $p$.  So we might hope to make our "random-like" definition something like 
"For every $S$ and $T$, we have 
$|d(S,T)-p|<\epsilon$"
But there's a problem here...if $S$ and $T$ are too small, there's no hope for this to hold (imagine the extreme case where $|S|=|T|=1$).  So we'll add one additional condition: We only want the above bound to hold for subsets of size at least $\epsilon n$.  The idea is that if a small subset is off, it won't have too much impact on things like subset counts anyway, so we might as well worry only about the large subsets.  This modified definition is what is used for $\epsilon$-regularity in Szemerédi's Lemma.
What the Lemma itself says is that any dense graph can be approximated by a bounded number of these $\epsilon-$regular graphs (depending only on $\epsilon$, not on the size of the original graph).  "Approximated by"  here means that you may need to delete a small number ($\epsilon n^2$) edges, but once you delete them you'll be left with a graph on $k$ subsets $A_1, A_2, \dots, A_k$ of vertices, so that for every $i$ and $j$ the graph between $A_i$ and $A_j$ is random-like in all of the ways above [The value of $p$ may be different for each pair $(i,j)$].