-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathlightgbm_parameters
More file actions
93 lines (64 loc) · 3.7 KB
/
lightgbm_parameters
File metadata and controls
93 lines (64 loc) · 3.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
LightGBM Parameters
=====================
Control Parameters
=====================
max_depth:
It describes the maximum depth of tree. This parameter is used to handle model overfitting. Any time you feel that your model is overfitted, my first advice will be to lower max_depth.
min_data_in_leaf:
It is the minimum number of the records a leaf may have. The default value is 20, optimum value. It is also used to deal over fitting
feature_fraction:
Used when your boosting(discussed later) is random forest. 0.8 feature fraction means LightGBM will select 80% of parameters randomly in each iteration for building trees.
bagging_fraction:
specifies the fraction of data to be used for each iteration and is generally used to speed up the training and avoid overfitting.
early_stopping_round:
This parameter can help you speed up your analysis. Model will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds. This will reduce excessive iterations.
lambda:
lambda specifies regularization. Typical value ranges from 0 to 1.
min_gain_to_split:
This parameter will describe the minimum gain to make a split. It can used to control number of useful splits in tree.
max_cat_group:
When the number of category is large, finding the split point on it is easily over-fitting. So LightGBM merges them into ‘max_cat_group’ groups, and finds the split points on the group boundaries, default:64
=====================
Core Parameters
=====================
Task:
It specifies the task you want to perform on data. It may be either train or predict.
application:
This is the most important parameter and specifies the application of your model, whether it is a regression problem or classification problem. LightGBM will by default consider model as a regression model.
regression: for regression
binary: for binary classification
multiclass: for multiclass classification problem
boosting:
defines the type of algorithm you want to run, default=gdbt
gbdt: traditional Gradient Boosting Decision Tree
rf: random forest
dart: Dropouts meet Multiple Additive Regression Trees
goss: Gradient-based One-Side Sampling
num_boost_round:
Number of boosting iterations, typically 100+
learning_rate:
This determines the impact of each tree on the final outcome. GBM works by starting with an initial estimate which is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates. Typical values: 0.1, 0.001, 0.003…
num_leaves:
number of leaves in full tree, default: 31
device:
default: cpu, can also pass gpu
=====================
Metric parameter
=====================
metric: again one of the important parameter as it specifies loss for model building. Below are few general losses for regression and classification.
mae: mean absolute error
mse: mean squared error
binary_logloss: loss for binary classification
multi_logloss: loss for multi classification
=====================
IO parameter
=====================
max_bin:
it denotes the maximum number of bin that feature value will bucket in.
categorical_feature:
It denotes the index of categorical features. If categorical_features=0,1,2 then column 0, column 1 and column 2 are categorical variables.
ignore_column:
same as categorical_features just instead of considering specific columns as categorical, it will completely ignore them.
save_binary:
If you are really dealing with the memory size of your data file then specify this parameter as ‘True’. Specifying parameter true will save the dataset to binary file, this binary file will speed your data reading time for the next time.
Knowing and using above parameters will definitely help you implement the model.