sklearn,numpy里有没有计算ndcg的函数?

  统计/机器学习 推荐系统 模型验证 Python    浏览次数:5311        分享
2

sklearn,numpy里有没有计算ndcg的函数?

 

古力夬   2019-10-24 12:25



   1个回答 
6

sklearn里并没有,需要自己实现的,kaggle上有人写过,链接在这里

代码在这里

import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import make_scorer


def dcg_score(y_true, y_score, k=5):
    """Discounted cumulative gain (DCG) at rank K.

    Parameters
    ----------
    y_true : array, shape = [n_samples]
        Ground truth (true relevance labels).
    y_score : array, shape = [n_samples, n_classes]
        Predicted scores.
    k : int
        Rank.

    Returns
    -------
    score : float
    """
    order = np.argsort(y_score)[::-1]
    y_true = np.take(y_true, order[:k])

    gain = 2 ** y_true - 1

    discounts = np.log2(np.arange(len(y_true)) + 2)
    return np.sum(gain / discounts)


def ndcg_score(ground_truth, predictions, k=5):
    """Normalized discounted cumulative gain (NDCG) at rank K.

    Normalized Discounted Cumulative Gain (NDCG) measures the performance of a
    recommendation system based on the graded relevance of the recommended
    entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal
    ranking of the entities.

    Parameters
    ----------
    ground_truth : array, shape = [n_samples]
        Ground truth (true labels represended as integers).
    predictions : array, shape = [n_samples, n_classes]
        Predicted probabilities.
    k : int
        Rank.

    Returns
    -------
    score : float

    Example
    -------
    >>> ground_truth = [1, 0, 2]
    >>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
    >>> score = ndcg_score(ground_truth, predictions, k=2)
    1.0
    >>> predictions = [[0.9, 0.5, 0.8], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
    >>> score = ndcg_score(ground_truth, predictions, k=2)
    0.6666666666
    """
    lb = LabelBinarizer()
    lb.fit(range(len(predictions) + 1))
    T = lb.transform(ground_truth)

    scores = []

    # Iterate over each y_true and compute the DCG score
    for y_true, y_score in zip(T, predictions):
        actual = dcg_score(y_true, y_score, k)
        best = dcg_score(y_true, y_true, k)
        score = float(actual) / float(best)
        scores.append(score)

    return np.mean(scores)


# NDCG Scorer function
ndcg_scorer = make_scorer(ndcg_score, needs_proba=True, k=5)
SofaSofa数据科学社区DS面试题库 DS面经

u_u   2019-10-26 16:37

这个靠谱! - chang   2022-03-27 16:29


  相关讨论

推荐系统中的召回(recall)是什么意思?

推荐系统有哪些常用的评价标准

怎么理解推荐系统中的NDCG?

sklearn的cosine_similarity余弦相似怎么用?

Cumulative Gain Chart 和Lift Chart的解释是什么

sklearn算法里输入的数据集是要求pandas.DataFrame还是numpy.array呢

两个向量的余弦距离大于1?

推荐系统算法里的cold start是什么意思?

推荐系统中常用的表示相似或者距离的方法有哪些?

余弦相似和内积的意义?

  随便看看

对于xgboost,还有必要做很多特征工程吗?

如何理解VC dimension?

训练集中有的特征含有缺失值,一般怎么处理

怎么理解推荐系统中的NDCG?

matplotlib.pyplot做折线图的时候,显示为虚线,或者点划线?