{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# XGBoost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Team members:\n",
"* Technical writer - **Abylay Aitbanov**\n",
"* Author of executable content - **Alisher Aip**\n",
"* Project Manager - **Adilzhan Jumakanov**"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Book about XGBoost\n",
" Read here \n",
"
\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"\n",
"\n",
"[XGBoost](https://github.com/dmlc/xgboost) is one of the most popular and efficient implementations of the Gradient Boosted Trees algorithm, a supervised learning method that is based on function approximation by optimizing specific loss functions as well as applying several regularization techniques. It is an ensemble learning method that combines the predictions of multiple weak models to produce a stronger prediction. \n",
"\n",
"XGBoost stands for “Extreme Gradient Boosting” and it has become one of the most popular and widely used machine learning algorithms due to its ability to handle large datasets and its ability to achieve state-of-the-art performance in many machine learning tasks such as classification and regression."
]
},
{
"cell_type": "markdown",
"source": [
"
\n | Daily Time Spent on Site | \nAge | \nArea Income | \nDaily Internet Usage | \nAd Topic Line | \nCity | \nGender | \nCountry | \nTimestamp | \nClicked on Ad | \n
---|---|---|---|---|---|---|---|---|---|---|
0 | \n62.26 | \n32.0 | \n69481.85 | \n172.83 | \nDecentralized real-time circuit | \nLisafort | \nMale | \nSvalbard & Jan Mayen Islands | \n2016-06-09 21:43:05 | \n0 | \n
1 | \n41.73 | \n31.0 | \n61840.26 | \n207.17 | \nOptional full-range projection | \nWest Angelabury | \nMale | \nSingapore | \n2016-01-16 17:56:05 | \n0 | \n
2 | \n44.40 | \n30.0 | \n57877.15 | \n172.83 | \nTotal 5thgeneration standardization | \nReyesfurt | \nFemale | \nGuadeloupe | \n2016-06-29 10:50:45 | \n0 | \n
3 | \n59.88 | \n28.0 | \n56180.93 | \n207.17 | \nBalanced empowering success | \nNew Michael | \nFemale | \nZambia | \n2016-06-21 14:32:32 | \n0 | \n
4 | \n49.21 | \n30.0 | \n54324.73 | \n201.58 | \nTotal 5thgeneration standardization | \nWest Richard | \nFemale | \nQatar | \n2016-07-21 10:54:35 | \n1 | \n
\n | Daily Time Spent on Site | \nAge | \nArea Income | \nDaily Internet Usage | \nAd Topic Line | \nCity | \nGender | \nCountry | \nTimestamp | \nClicked on Ad | \n
---|---|---|---|---|---|---|---|---|---|---|
0 | \n62.26 | \n32.0 | \n69481.85 | \n172.83 | \nDecentralized real-time circuit | \nLisafort | \n0 | \n174 | \n2016-06-09 21:43:05 | \n0 | \n
1 | \n41.73 | \n31.0 | \n61840.26 | \n207.17 | \nOptional full-range projection | \nWest Angelabury | \n0 | \n166 | \n2016-01-16 17:56:05 | \n0 | \n
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=None, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=None, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=None, max_leaves=None,\n min_child_weight=None, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=None, n_jobs=None,\n num_parallel_tree=None, random_state=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=None, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=None, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=None, max_leaves=None,\n min_child_weight=None, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=None, n_jobs=None,\n num_parallel_tree=None, random_state=None, ...)
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=0.5, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=0.1, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=12, max_leaves=None,\n min_child_weight=5, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=100, n_jobs=None,\n num_parallel_tree=None, random_state=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=0.5, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=0.1, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=12, max_leaves=None,\n min_child_weight=5, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=100, n_jobs=None,\n num_parallel_tree=None, random_state=None, ...)
RandomizedSearchCV(estimator=XGBClassifier(base_score=None, booster=None,\n callbacks=None,\n colsample_bylevel=None,\n colsample_bynode=None,\n colsample_bytree=None, device=None,\n early_stopping_rounds=None,\n enable_categorical=False,\n eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None,\n importance_type=None,\n interaction_constraints=None,\n learning_rate=None...\n n_iter=25, n_jobs=4,\n param_distributions={'colsample_bylevel': array([0.5, 0.6, 0.7, 0.8, 0.9]),\n 'colsample_bytree': array([0.5, 0.6, 0.7, 0.8, 0.9]),\n 'learning_rate': [0.01, 0.1, 0.2, 0.3,\n 0.4],\n 'max_depth': [3, 6, 10, 15],\n 'n_estimators': [100, 250, 500, 750],\n 'reg_alpha': [0.1, 0.001, 1e-05],\n 'reg_lambda': [0.1, 0.001, 1e-05],\n 'subsample': array([0.5, 0.6, 0.7, 0.8, 0.9])},\n scoring='accuracy', verbose=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomizedSearchCV(estimator=XGBClassifier(base_score=None, booster=None,\n callbacks=None,\n colsample_bylevel=None,\n colsample_bynode=None,\n colsample_bytree=None, device=None,\n early_stopping_rounds=None,\n enable_categorical=False,\n eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None,\n importance_type=None,\n interaction_constraints=None,\n learning_rate=None...\n n_iter=25, n_jobs=4,\n param_distributions={'colsample_bylevel': array([0.5, 0.6, 0.7, 0.8, 0.9]),\n 'colsample_bytree': array([0.5, 0.6, 0.7, 0.8, 0.9]),\n 'learning_rate': [0.01, 0.1, 0.2, 0.3,\n 0.4],\n 'max_depth': [3, 6, 10, 15],\n 'n_estimators': [100, 250, 500, 750],\n 'reg_alpha': [0.1, 0.001, 1e-05],\n 'reg_lambda': [0.1, 0.001, 1e-05],\n 'subsample': array([0.5, 0.6, 0.7, 0.8, 0.9])},\n scoring='accuracy', verbose=1)
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=None, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=None, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=None, max_leaves=None,\n min_child_weight=None, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=100, n_jobs=-1,\n num_parallel_tree=None, random_state=None, ...)
XGBClassifier(base_score=None, booster=None, callbacks=None,\n colsample_bylevel=None, colsample_bynode=None,\n colsample_bytree=None, device=None, early_stopping_rounds=None,\n enable_categorical=False, eval_metric=None, feature_types=None,\n gamma=None, grow_policy=None, importance_type=None,\n interaction_constraints=None, learning_rate=None, max_bin=None,\n max_cat_threshold=None, max_cat_to_onehot=None,\n max_delta_step=None, max_depth=None, max_leaves=None,\n min_child_weight=None, missing=nan, monotone_constraints=None,\n multi_strategy=None, n_estimators=100, n_jobs=-1,\n num_parallel_tree=None, random_state=None, ...)