
Now the multiclass ensemble applies softmax and produces equal 1/17 probabilities for each topic! More extreme, suppose the individual topic model scores are all 0.9. But in the multiclass ensemble, you see things as much more uncertain maybe the model applies softmax, so that the probability of topic 1 is only ~0.5. In your topic-1 model, you make a fairly confident judgement that this document is of topic 1 (score of 0.9). Suppose one of your documents is reasonably likely to be in either of two topics: the probability scores given by the forests are 0.9, 0.85, then <0.1 in all the rest. (I'm still thinking about it, but I thought I should post the above for now.) More specifically, I think that the softmax may be responsible for the phenomenon you're displaying. In general, the one-vs-rest models are very good at identifying the single class, whereas the multiclass model has to balance performance on all of them. But it isn't actually a one-vs-rest approach (as I thought in the first version of this answer), because these trees are built to minimize a single loss function, the cross-entropy of the softmax probabilities. At each iteration, an extra tree is added to each forest. Multiclass models in XGBoost consist of n_classes separate forests, one for each one-vs-rest binary problem. Is there a clear theoretical reason for this? It is visible that the multiclass approach systematically outperforms the one vs rest one.
#One versus rest synonym series
a one versus rest series of models, one per topic, where each document is labeled as belonging or not to a topic.a multiple class model, where a topic is associated with each document.The following step is to create a model that given the term x document score matrix and the best topic per document, predicts the best topic. Then for each document, I pick up the topic with the highest score. For each document, I get a topic score computed using the terms in the document (with a TfIdf score). I have a corpus of documents for which I did topic discovery, using a term x term matrix clustering approach. I have an NLP task I'm tackling with xgboost (R implementation).īefore describing my doubt I'll give you some background:
