The transition matrix for each individual switch is
$ \def\pon{p_{\text{on}}}\def\poff{p_{\text{off}}} \pmatrix{1-\pon&\pon\\\poff&1-\poff}\;, $
where the first component corresponds to state $0$ and the second component to state $1$. The equilibrium distribution (with eigenvalue $1$) is $(p_0,p_1)$ with $p_0=\poff/(\poff+\pon)$ and $p_1=\pon/(\poff+\pon)$. The second eigenvector is $(1,-1)$, with eigenvalue $\lambda:=1-\poff-\pon$. Thus the marginal distributions for the individual switches approach equilibrium exponentially with $\lambda^t$. If $k$ and $N$ aren't very small, the time required to hit the desired configuration will be much longer than the time scale set by $\lambda$, so we can approximate the distribution by the equilibrium distribution and avoid a full non-equilbrium treatment, which would be rather complicated.
You didn't specify how you want to deal with events where the desired configuration persists for more than $j$ time steps. If you count every individual overlapping stretch of $j$ time steps as a separate success, then the long-term optimum would be to take $\lambda\to1$, since the probability of the configuration at any given time step being the desired one is independent of $\lambda$ and the probability for it to persist grows monotonically with $\lambda$. Thus, in order to have an interesting problem we should count each stretch of at least $j$ time steps with the desired configuration just once.
The probability for any given time step to start such a stretch is
$ p_0^{N-k}p_1^k\left((1-\sigma p_1)^{N-k}(1-\sigma p_0)^k\right)^{j-1}\left(1-(1-\sigma p_1)^{N-k}(1-\sigma p_0)^k\right)\;, $
where I've expressed everything in terms of $p_0$, $p_1=1-p_0$ and $\sigma:=\poff+\pon=1-\lambda$. The first two factors give the probability for the desired configuration to occur at the time step, the third factor gives the probability for this configuration to persist for $j$ time steps (i.e. through $j-1$ transitions), and the last factor gives the probability for the previous configuration to have been different. If $\pon$ and $\poff$ can be varied separately, we can treat $p_1$ and $\sigma$ as the two independent variables. To maximize the above probability, we can first maximize with respect to $\sigma$ for given $p_1$. That amounts to maximizing $x^{j-1}(1-x)$ with respect to $x=(1-\sigma p_1)^{N-k}(1-\sigma p_0)^k$, which yields $x=1-1/j$. Substituting this into the probability gives
$ p_0^{N-k}p_1^k\left(1-\frac1j\right)^{j-1}\frac1j\;. $
This is independent of $\sigma$ and is just a constant times the probability for the desired configuration to occur, and this is maximized for $p_1=k/N$. You can obtain $\sigma$ from $x$ numerically (the equation of $N$-th degree linking it to $x$ can't be solved analytically), but we can calculate the expected time between two separate persistent occurrences of the desired configuration without knowing $\sigma$, since this is just the reciprocal of the above probability,
$ \frac{N^N}{(N-k)^{N-k}k^k}j\left(1-\frac1j\right)^{-(j-1)}\sim\frac{N^N}{(N-k)^{N-k}k^k}j\,\mathrm e\;. $
Thus, asymptotically for large $j$, the expected time between separate persistent occurrences is just $j$ times $\mathrm e$ times the expected time between (not necessarily separate, not necessarily persistent) occurrences of the configuration.
[Edit:] I see from your comments and reading the question again carefully that you may seem to have intended the set of $k$ swtiches to remain fixed only for a particular stretch of $j$ time steps, not overall. In that case, the probability has another factor $\binom Nk$. This leads to another nice asymptotic simplification: for large $N$ and $k$, by Stirling's approximation we have
$ \binom Nk\sim \frac{N^N}{(N-k)^{N-k}k^k}\sqrt{\frac N{2\pi(N-k)k}}=\frac{N^N}{(N-k)^{N-k}k^k}\frac1{\sqrt{2\pi p_0 p_1 N}}\;, $
so the expected time between two separate persistent occurrences asymptotically becomes
$ j\,\mathrm e\sqrt{2\pi p_0 p_1 N}\;. $
Your additional question how to choose $\poff$ if $\pon$ is fixed or vice versa doesn't have such a nice solution in closed form, since in that case all four factors in the probability contain the one free variable; however, you can easily find the solution numerically for given values of the parameters.
That the optimal value of $p_1$ to maximize the equilibrium probability $p_k$ of exactly $k$ particular switches being on in any given time step is, unsurprisingly, $p_1=k/N$, can be seen by setting the derivative of $p_k$ with respect to $p_1$ to zero:
$ p_k=p_1^k(1-p_1)^{N-k}\;, $
$ \frac{\partial p_k}{\partial p_1}=\frac kp_1 -\frac{N-k}{1-p_1}\mathop{\stackrel!=}0\;, $
$ p_1=\frac kN\;. $