recall at k (recall@k)

Recall at k (recall@k) is a metric used to evaluate how many relevant items are present in the top k results returned by a model. It's widely used in information retrieval, recommendation systems, and machine learning to assess the coverage of relevant results in ranked outputs.

Recall at k (recall@k) is a key evaluation metric used in information retrieval, recommendation systems, and machine learning, especially when dealing with ranked outputs or predictions. It helps measure how successful a model is at retrieving all relevant items within its top k predictions. In simple terms, recall@k answers the question: “Of all the relevant items for a given query, how many did the model include in its top k results?”

To break it down, imagine you’re using a movie recommendation system. You ask for suggested movies, and the system returns a ranked list of its top 10 picks. Out of all the movies you would have actually liked (the relevant items), recall@10 tells you what fraction of those truly relevant movies appear in that top 10 list.

The mathematical formula for recall at k is:

recall@k = (Number of relevant items retrieved in top k) / (Total number of relevant items)

If there are 20 movies you would have liked, and 5 of them are recommended in the top 10, the recall@10 is 5/20, or 25%.

Why is recall@k important? In many applications, users only look at the first few results—think search engines or e-commerce product suggestions. High recall@k means users are less likely to miss relevant items because more of what matters appears early in the ranked list. This is especially critical when missing a relevant result could have a big impact, such as in medical diagnosis tools or fraud detection systems.

Recall@k is often used alongside precision at k (precision@k). While recall@k measures coverage of relevant items, precision@k focuses on the quality of the top k results—how many of the retrieved items are actually relevant. There’s often a trade-off between the two: increasing recall can sometimes decrease precision and vice versa. For a holistic view of model performance, it’s common to consider both metrics together.

Another scenario where recall@k is valuable is with imbalanced datasets, where there are many more irrelevant than relevant items. In such cases, accuracy alone can be misleading, and recall@k provides a more meaningful way to measure how well the model captures the important cases.

In multi-label classification tasks, recall@k can also be used to evaluate how well a model predicts all the correct labels for each sample, within its top k guesses. In recommendation systems, high recall@k means users are more likely to discover items they truly care about, improving overall satisfaction and engagement.

Overall, recall at k is a practical, widely used metric for evaluating the effectiveness of models that return ranked or top-n results, helping researchers and practitioners tune their systems for the best possible user experience.

💡 Found this helpful? Click below to share it with your network and spread the value:
Anda Usman
Anda Usman

Anda Usman is an AI engineer and product strategist, currently serving as Chief Editor & Product Lead at The Algorithm Daily, where he translates complex tech into clear insight.