XGBoost

XGBoost is a collection method based on Boosting, combined with many optimization techniques to greatly improve the accuracy of prediction. The optimization includes the regularization term added to the loss function, the second derivative operation of the loss function, the addition of shrinkage and subsampling to prevent overfitting, cross-validation, support for early stopping, built-in cross-learning and parallel processing, and so on. In the KDDCup 2015 competition, all the top 10 teams used XGBoost for analysis, which shows their excellent learning results. However, XGBoost is more suitable for analysis with a large amount of data.

XGBoost 是採集成方法,並以Boosting為基礎,再搭配許多優化技巧使其預測準確度大幅提升。其優化包含在loss function中加入的正則化項、對loss function進行二階導數運算、加入shrinkage和subsampling來防止過度擬合、交叉驗證、支援早停法、內建交叉學習與並行化處理等等。在KDDCup 2015 競賽中,前10名的所有隊伍皆是使用XGBoost進行分析,由此可見其優良學習成果,然而XGBoost較適用於含有大量數據的分析。