17 minute read

Last week we looked at three principles that combine to create an extraordinarily effective, scientific study method called spaced repetition. You certainly can start studying using spaced repetition without understanding what’s going on, but if you like understanding the tools you’re using, this article is for you! And once you get familiar with the basics and want to improve your efficiency, you’ll have a much easier time if you understand how spaced repetition systems (SRS) work. Happily, most popular algorithms involve nothing more than simple arithmetic, so let’s take a look. We’ll begin with simple algorithms and build up to more complicated ones.

As we proceed, remember that last week we identified three principles we want to incorporate into our SRS:

Doubling review time

Here’s a straightforward spacing algorithm that works moderately well: once you have a general handle on whatever you’re trying to memorize, wait a couple of hours or a day and then review it again. If you judge that you remembered it satisfactorily, wait twice as long as you did before studying it this time. If you forget it, wait a day and start over from the beginning.

We can incorporate active recall as well: recite the poem, try to explain the concept, or translate each of a series of words in turn before looking at the answer.

Finally, it’s easy enough to adjust difficulty on an ad-hoc basis. If you find something particularly difficult or easy, you can choose to review it a couple days earlier or later than the doubling algorithm would suggest.

The Leitner system

You can probably see the glaring issue with the method laid out in the previous section already. Namely, once you’re trying to remember more than a few things, how the heck are you supposed to keep track of the amount of time between studying each item? And while small units of information are necessary for efficient scheduling (because if you forget one part of a complicated unit, you have to start the whole unit over at the beginning), the smaller you make your units, the worse this problem gets.

The Leitner system is a popular way of tracking next review times. Many variations on the Leitner system exist, but all of them share the basic feature of a series of boxes that you place your items into to help track how far along each item is. In a typical implementation, these are literal boxes (or partitions in a box) filled with three-by-five cards. Computer programs and smartphone apps use virtual boxes, but the idea is the same: cards start in a box whose contents are reviewed every day and progressively move into boxes whose cards are reviewed less and less frequently. Here’s a diagram:

an illustration of the process of moving items between boxes based on study results described above

A special staging box or the surface of your desk can be used to accumulate cards you haven’t started learning yet. When you’re ready to start, you review each card in turn and place it in the first box when you answer correctly. (If you get it wrong, you put it at the back of the stack.)

The next time you have a chance to study (preferably every day), you review the cards in the first box. If you answer correctly, you move the card to the second box, as shown in the diagram above; otherwise, it gets set aside and studied again at the end of your session, whereupon it goes back into the first box.

The third study session is where things get interesting: this time you get to skip the items in the second box and only study the first box. In the fourth study session you study items in the second box and promote them to the third box if appropriate, then study items in the first box and promote them to the second box. (If you forget an item in any box, it gets set aside until the end of the session, then goes back to the first box.)

Systems vary on how often you study the third box; some systems have you study the third box every three sessions, while others stick closer to the doubling algorithm and suggest every four sessions. Systems also vary on how many boxes are used; when items pass the last box, they’re removed from the regular rotation. Ideally, you would never remove items since you never stop forgetting things, but keeping dozens of boxes is rather impractical!

Tip: One problem with the Leitner system as laid out here is that certain study sessions are much longer than others. For instance, if you review the first box every session, the second box every two sessions, and the third box every three sessions, on the sixth session you will have to review all three boxes, while on the seventh session you will only have to review one. A system involving a series of “decks” of cards that aggregate cards at multiple difficulty levels can alleviate this issue, but this method also makes the system considerably more confusing, so I won’t attempt to explain it here. However, if you need a system that works with physical index cards, rather than a computerized system, this version of the Leitner system may be worth looking into. You can see one such system described on Wikipedia.

Accommodating difficulty with SM2

The Leitner system is easily understood and reasonably efficient, so many people stop right there. However, it has a major flaw: it’s bad at dealing with questions of varying difficulty. Some things you want to remember are inherently harder to remember than others, and when you’re limited to a series of boxes reviewed at fixed times, it’s hard to differentiate between easy and hard items. Therefore, the Leitner system scores poorly on our third spaced-repetition principle, separating easy things from hard ones. To resolve this problem, we’ll look at a different kind of algorithm, based on a modified version of the SuperMemo algorithm SM2. It shares basic principles with the Leitner system, but it gets rid of the rigid boxes and makes room for more complex scheduling rules.

Intervals

Instead of placing each item in a box that we review after a fixed number of study sessions, we’ll now allow an item to wait an arbitrary number of days before we look at it again: 1 day, 5 days, 6 days, 273 days. We’ll call the number of days between reviews the card’s interval.

Practically, this means that we’re going to have to keep our flashcards on a computer and enlist some software to handle the math unless we want to spend more time punching numbers into a calculator and paging through index cards than studying, but for the moment let’s set aside the exact implementation. Pretend that we’re still using a box of index cards, but we have two superpowers: we can instantly calculate and record any figure we want, and we can make cards that match some criteria pop out of the box via telekinesis.

Let’s use our powers and the concept of intervals to make the interval-doubling system practical for use on an unlimited number of items. When we get a card correct for the first time, we’ll record the date we studied it and give it an interval of 1 day. When we get a card correct thereafter, we’ll update the study date to today and double the interval: 1 day becomes 2, 2 days becomes 4, and so on. When we get a card wrong, we set it aside to review again at the end, then when we do, we update the study date to today and reset its interval to 1 day. Every time we study, we use our telekinetic sorting powers to grab all cards where the last study date plus the interval is today’s date – in other words, the cards are due for review today. If we missed some days of study, some cards will also be overdue – their last study date plus the interval is earlier than today, meaning we theoretically should have studied them already – in which case we study those as well.

Ratings

So we got ourselves some additional flexibility, but we’re still missing the piece we moved to this system promising to solve. Aside from showing off our superpowers, we’ve so far added a lot of complexity for only minor benefits.

Here’s how arbitrary intervals really shine: we can describe how well we remembered a card and use that to adjust the interval and determine when we see it next. Let’s define four levels of difficulty:

  1. Again: I did not remember the answer to this card. I need to try again soon.
  2. Hard: I remembered the answer to this card, but only with great difficulty. The interval was too long and should increase less this time.
  3. Good: I remembered the answer to this card, and I had to think a little but not too hard. The interval was about right.
  4. Easy: I remembered the answer to this card easily. The interval was too short and should increase more this time.

We’ll call these levels of difficulty ratings.

Since the doubling algorithm works reasonably well and we said that Good means we got the timing about right, let’s say a Good rating will multiply the previous interval by 2 to determine the new interval. An Easy rating can multiply the interval by 2.6, and a Hard rating can multiply the interval by 1.2. An Again rating, as usual, will still set the card aside for later review and reset its interval to 1 day.

Note: I didn’t just pluck the numbers 1.2 and 2.6 out of thin air; we’ll see where they come from soon.

Ease

Going from 2 choices (wrong/right) to 4 choices (wrong/too hard/exactly right/too easy) is revolutionary in terms of our third spaced-repetition principle, separating easy things from hard things. Now we can prevent our intervals from growing out of control on difficult cards so that we don’t forget them before we see them again, and we can also indicate that a card was too easy and we’d like to wait longer next time.

However, we haven’t really achieved full separation of easy things from hard things yet. We’re now able to rate a card to indicate that we remembered it particularly well or poorly, but that rating affects only this one study session. We don’t want to have to choose Hard every single time on every unusually difficult card; if we have to do that, we don’t have any flexibility left. Maybe the card is much harder than most of our cards, but this particular time it came a bit too early – but we still have to choose Hard lest the scheduler think it’s normal difficulty and push it out too far into the future. What we really want is for the scheduling system to remember our past performance on each card and use that information to adjust the definitions of “good”, “hard”, and “easy”, so that all three choices are always reasonable.

To accomplish that, let’s add a new statistic to each of our cards, describing how difficult we find this card. We’ll call this number the card’s ease, and we’ll express it as a percentage (so typical eases would be 130%, 200%, 240%, and so on). Here’s how we use ease: when we select the Good rating for a particular card, rather than multiplying the card’s interval by a fixed value of 2, we multiply by the card’s ease. We write the ease as a percentage since it represents how large the new interval will be compared to the previous interval. If the ease is 200%, then when we press the Good button, the next interval will be 200% of the current interval, or twice as long. If the ease is 100%, the next interval will be the same as the current interval.

Note: In practice, the ease value must always be larger than 120% or the Good rating will give the card a smaller interval than the Hard rating would, which is obviously undesirable! (We’ll see how Hard and Easy work with ease in the next paragraph.) Most if not all popular spaced-repetition systems won’t allow ease to drop below 130%, as experience has shown that lower values tend to greatly increase frustration and decrease efficiency while providing only minimal improvement in memory.

What about if we press Hard or Easy? For Hard, we’ll continue to just multiply the interval by 1.2 and ignore the ease – if the card was too hard, that suggests we almost forgot it, so we want to be conservative about further interval increases. For Easy, we’ll multiply by the ease and then multiply by a small additional bonus; a typical value would be 130%. So if our ease was 200%, pressing Easy would multiply the interval by 200%, then by 130%, for a total increase of 260% or 2.6 times.

Note: Here are the numbers 1.2 and 2.6 I used in the simplified version of the algorithm reappearing. In this context, the numbers 1.2 and 130% are hard-coded parts of the spaced-repetition algorithm and were determined by experimentation. They’re reasonably effective under most circumstances.

So far the ease isn’t helping us much, because it never changes! How do we want to change it? Well, we said that if we pressed Good, we thought the card came at about the right time, so we probably think the card’s difficulty is about right as well. Thus, we won’t change the ease for Good. For Hard or Easy, though, we want to adjust the ease down or up respectively, to indicate that the card was harder or easier than we had previously thought. We’ll subtract or add 15 percentage points from the ease as appropriate. And if we forgot the card entirely, we’ll want to make it even a little less easy – we’ll say 20 percentage points less.

Math warning: Adjusting by 15 percentage points is very different than adjusting by 15 percent. Increasing or decreasing a percentage by some number of percentage points means adding or subtracting that number: reducing 200% by 15 percentage points results in 185%. In contrast, if we reduced 200% by 15 percent, we would multiply 2.00 by .15 for 0.3, then subtract 0.3 from 2.00 for 170%.

We’ve now determined how we use ease and how we adjust it. One question remains: what ease should the card start out at? Our doubling algorithm would suggest 200%, but when we’re able to adjust the ease of each card individually, a higher value such as 250% turns out to be more efficient. Certainly we will forget more cards in the early stages of learning, but since the ease drops 15 or 20 percentage points every time the card is scheduled too aggressively, this will quickly rectify itself, while we’ll waste far less time pressing Easy over and over again on the easier cards.

Overdue cards

We briefly mentioned overdue cards earlier – that is, ones that we’re studying later than the time we theoretically should have studied them. The current version of our algorithm treats overdue cards exactly like due cards for scheduling purposes. However, this doesn’t actually make any sense. If you see a card three weeks later than the algorithm thought you should have but you still remember it fine, that suggests it was easier than its ease and interval would suggest and the card should get a bonus.

The straightforward way to adjust for overdueness is to calculate the next interval for overdue cards based on the actual time since the last review rather than the scheduled interval. For instance, suppose we have a card with an interval of 10 days, we’re studying it 5 days late, its ease is 200%, and we choose Easy. We calculate the next interval by multiplying 15 days by the ease and easy bonus rather than 10 days, since the card was shown 5 days after the 10-day interval had elapsed. We will see a new interval of 39 days (15 days × 200% ease × 130% easy bonus).

This might be a bit aggressive, though; perhaps we were able to recall the item only with great difficulty, and we’d rather split the difference and schedule based on a time period in between the scheduled review date and the actual review date. What we did above is great for cards we rated Easy, but for Good we’ll use only half the difference and for Hard a quarter of the difference. In our example above, if we had chosen Hard instead, the interval used for calculation would be 11 days (10 days plus 5 days ÷ 4, rounded down), and our new interval would be 28 days (11 days × 200% × 130%, rounded down).

Note: To avoid having to deal with fractions of days, most spaced-repetition programs use integer arithmetic for intervals, which means that fractional results are always rounded down, even if they’re 0.9999.

The rest of the Anki algorithm

We’ve built our way almost up to the standard algorithm used by the popular spaced-repetition software Anki. We have only two adjustments left to deal with that Anki applies after completing the steps described above.

First, Anki requires that any new interval be at least one day longer than the current interval (except when choosing Again, of course). Otherwise, rounding of intervals could cause a card to get stuck forever; for instance, if your current interval was 2 days and you chose a Hard rating, 2 × 1.2 rounds to 2, so if you keep choosing Hard the card will never advance at all! If the new interval would be the same as or less than the current interval, Anki sets the new interval to one day longer than the current interval.

Second, Anki applies a small amount of randomness to the new interval. If the calculated interval was 10 days, for example, the card might be scheduled for somewhere between 9 and 11 days. (The amount of allowable spread is defined as a small percentage of the calculated interval, so a card with an interval of years may have its new interval adjusted by a month, while a card with a calculated interval of 1 day will never be moved at all.) The purpose of this randomization is to ensure that cards introduced on the same day don’t get “stuck together” and repeatedly scheduled on the same day if they’re repeatedly rated the same. If cards frequently appear together, one may act as a memory cue for another, reducing the effectiveness of your study.

Besides the review algorithm described above, Anki also has a learning algorithm, which helps make sure you know the cards before putting them into the normal scheduling system. The model here is simple: cards advance through a number of steps (say, 1 minute and 10 minutes), going back to the beginning if you forget, then are scheduled for review as above upon passing the final step. In other words, learning mode uses the Leitner system with tiny intervals.


Getting tired of theory? In the next post, we’ll move into practice and take a look at applying algorithms like those presented here, looking at some popular spaced-repetition software and exploring how you can get started using spaced repetition yourself.

You can find a convenient summary of the Anki algorithm, along with more technical details, in Anki’s manual.