5
$\begingroup$

I'm programming a learning software. It works with question-/answercards. I´m searching for a algorithm that gives me a higher probability for cards that the user has answered wrong.

My actual idea (edit: Inverse transform sampling) is that each card has an integer which indicates how often the user has answerd the question wrong. Count all integer-values, creating a random integer between 0 and the counted integer-values and use this integer to go through my cards and count their integers until I reached the random integer. Then I reach the integer I choose this card :-)

But there must be a better solution ;-)

Edit: Rejection Sampling

N = number of cards
M = score of the highest card
c = random (1 - N)
x = random (1 - M)

if (x <= (score of card-nr: c)) accept card!
else create new c & x and goto if-querry

That means that cards with a higher score will choosen more often.

  • 1
    By the way, what you are doing is essentially [inverse transform sampling](http://en.wikipedia.org/wiki/Inverse_transform_sampling).2011-03-23
  • 0
    @Rahul Narain: good to know, thx2011-03-24

4 Answers 4

4

You solution seems ok if you don't mind sorting the cards each time. Here is a different method: choose a card at random and a number at random from 0 to the max card score. Accept the chosen card if the number is at most the card score. Otherwise, repeat. This method is rejection sampling on the graph of card scores.

  • 0
    The algorithm in the question requires no sorting.2011-03-23
  • 0
    It does require a linear traverse through the list of cards; this method is also prone to this behavior if there are a few cards with much higher score than the others.2011-03-24
  • 0
    I have looked at the artical but don´t understand it :-( Can you try to explain the algorithm to me?2011-03-24
  • 0
    @myName: I did it in the answer. I'll try again: If there are $N$ cards, choose a random number $c$ between $1$ and $N$. Now choose a random number $x$ between $1$ and $M$, where $M$ is the max score of all cards. If $x \le \mbox{score}(c)$ then accept card $c$. Otherwise, start again. If you think about the score graph, you're choosing a point $(c,x)$ that lies below the graph.2011-03-24
  • 0
    I think i have anderstood you :-) Can you please have a look at the example above and confirm or correct it.2011-03-24
  • 0
    @myName: it's almost what I said, except that $M$ is the max score of all cards (that is, the score of the card with highest score), not the total score.2011-03-24
  • 0
    @lhf: thx, i have correct the example and i will give it a try :-)2011-03-24
2

Here's a more efficient algorithm, requiring some space. You keep a lookup table containing the card to pick for each value of your random integer. The table is init by letting cell $i$ point at card $i$. When you increase the prominence of card $j$, just add a new cell pointing to $j$.

If you are memory-savvy, then you can use the following algorithm. Put all your cards in a balanced binary tree. Each card maintains both its own prominence and the sum of prominence of it and all its descendants. To select a card, use binary search. When you increase the prominence of a card, you need to update only its ancestors. So both operations take logarithmic time.

0

No, that's pretty much the best way of producing that distribution. You might want to add something that slowly decreases the weight of correctly-answered flashcards or your total counts just keep inflating, which becomes a problem after a while.

  • 0
    i thought i multiply the score with 0,X if the answer was right and with 1,X if the answer was wrong ;-)2011-03-23
  • 0
    You're right that for that application, the slow running time doesn't matter. In some applications it might, and then you can be smarter and use a more efficient algorithm.2011-03-24
0

You probably don't want to make the initial numbers 0, or you'll never see a new card once you've had a wrong answer to something else.