sklearn,numpy里有没有计算ndcg的函数?

  统计/机器学习 推荐系统 模型验证 Python    浏览次数:592        分享
0

sklearn,numpy里有没有计算ndcg的函数?

 

古力夬   2019-10-24 12:25



   1个回答 
2

sklearn里并没有,需要自己实现的,kaggle上有人写过,链接在这里

代码在这里

import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import make_scorer


def dcg_score(y_true, y_score, k=5):
    """Discounted cumulative gain (DCG) at rank K.

    Parameters
    ----------
    y_true : array, shape = [n_samples]
        Ground truth (true relevance labels).
    y_score : array, shape = [n_samples, n_classes]
        Predicted scores.
    k : int
        Rank.

    Returns
    -------
    score : float
    """
    order = np.argsort(y_score)[::-1]
    y_true = np.take(y_true, order[:k])

    gain = 2 ** y_true - 1

    discounts = np.log2(np.arange(len(y_true)) + 2)
    return np.sum(gain / discounts)


def ndcg_score(ground_truth, predictions, k=5):
    """Normalized discounted cumulative gain (NDCG) at rank K.

    Normalized Discounted Cumulative Gain (NDCG) measures the performance of a
    recommendation system based on the graded relevance of the recommended
    entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal
    ranking of the entities.

    Parameters
    ----------
    ground_truth : array, shape = [n_samples]
        Ground truth (true labels represended as integers).
    predictions : array, shape = [n_samples, n_classes]
        Predicted probabilities.
    k : int
        Rank.

    Returns
    -------
    score : float

    Example
    -------
    >>> ground_truth = [1, 0, 2]
    >>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
    >>> score = ndcg_score(ground_truth, predictions, k=2)
    1.0
    >>> predictions = [[0.9, 0.5, 0.8], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
    >>> score = ndcg_score(ground_truth, predictions, k=2)
    0.6666666666
    """
    lb = LabelBinarizer()
    lb.fit(range(len(predictions) + 1))
    T = lb.transform(ground_truth)

    scores = []

    # Iterate over each y_true and compute the DCG score
    for y_true, y_score in zip(T, predictions):
        actual = dcg_score(y_true, y_score, k)
        best = dcg_score(y_true, y_true, k)
        score = float(actual) / float(best)
        scores.append(score)

    return np.mean(scores)


# NDCG Scorer function
ndcg_scorer = make_scorer(ndcg_score, needs_proba=True, k=5)
SofaSofa数据科学社区DS面试题库 DS面经

u_u   2019-10-26 16:37



  相关主题

sklearn算法里输入的数据集是要求pandas.DataFrame还是numpy.array呢   2回答

怎么对2维的numpy array取整?   2回答

调用np.average()时报错TypeError: No loop matching   1回答

求一个nxn的numpy array的对角线的和?   1回答

删除numpy array中指定的一列   1回答

numpy array中元素的个数?   2回答

用np.round的时候报错AttributeError: no attribute 'rint'   1回答

对numpy array求每行的均值   1回答

numpy里的np.newaxis有什么用?   1回答

numpy中获得tensor的维数?   2回答

怎么把scipy.sparse.csr.csr_matrix转成numpy array   1回答

numpy里面resize和reshape的区别是什么?   1回答



回答问题时需要注意什么?

我们谢绝在回答前讲“生动”的故事。

我们谢绝“这么简单,你自己想”、“书上有的,你认真看”这类的回答;如果你认为对方的提问方式或者内容不妥,你可以直接忽略该问题,不用进行任何作答,甚至可以对该问题投反对票。

我们谢绝答非所问。

我们谢绝自己不会、硬要回答。

我们感激每一个用户在编写答案时的努力与付出!