# 使用feature Importance进行特征选择

## DecisionTree

The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.


## XGBoost

get_score(fmap='', importance_type='weight')
Get feature importance of each feature. Importance type can be defined as:
‘weight’: the number of times a feature is used to split the data across all trees.
‘gain’: the average gain across all splits the feature is used in.
‘cover’: the average coverage across all splits the feature is used in.
‘total_gain’: the total gain across all splits the feature is used in.
‘total_cover’: the total coverage across all splits the feature is used in.


• weight：该特征被选为分裂特征的次数。
• gain：该特征的带来平均增益(有多棵树)。在tree中用到时的gain之和/在tree中用到的次数计数。gain = total_gain / weight
• cover：该特征对每棵树的覆盖率。
• total_gain：在所有树中，某特征在每次分裂节点时带来的总增益
• total_cover：在所有树中，某特征在每次分裂节点时处理(覆盖)的所有样例的数量。

## LightGBM

feature_importance(importance_type='split', iteration=None)
Get feature importances.
importance_type (string, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
iteration (int or None, optional (default=None)) – Limit number of iterations in the feature importance calculation. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).


• split就是特征在所有决策树中被用来分割的总次数。
• gain就是特征在所有决策树种被用来分割后带来的增益(gain)总和