Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Protected Member Functions | Protected Attributes
LightGBM::GBDT Class Reference

GBDT algorithm implementation. including Training, prediction, bagging. More...

#include <gbdt.h>

Inheritance diagram for LightGBM::GBDT:
LightGBM::GBDTBase LightGBM::Boosting LightGBM::DART LightGBM::GBDT_Accessor LightGBM::GOSS LightGBM::RF

Public Member Functions

 GBDT ()
 Constructor.
 
 ~GBDT ()
 Destructor.
 
void Init (const Config *gbdt_config, const Dataset *train_data, const ObjectiveFunction *objective_function, const std::vector< const Metric * > &training_metrics) override
 Initialization logic.
 
void MergeFrom (const Boosting *other) override
 Merge model from other boosting object. Will insert to the front of current boosting object.
 
void ShuffleModels (int start_iter, int end_iter) override
 Shuffle Existing Models.
 
void ResetTrainingData (const Dataset *train_data, const ObjectiveFunction *objective_function, const std::vector< const Metric * > &training_metrics) override
 Reset the training data.
 
void ResetConfig (const Config *gbdt_config) override
 Reset Boosting Config.
 
void AddValidDataset (const Dataset *valid_data, const std::vector< const Metric * > &valid_metrics) override
 Adding a validation dataset.
 
void Train (int snapshot_freq, const std::string &model_output_path) override
 Perform a full training procedure.
 
void RefitTree (const std::vector< std::vector< int > > &tree_leaf_prediction) override
 Update the tree output by new training data.
 
virtual bool TrainOneIter (const score_t *gradients, const score_t *hessians) override
 Training logic.
 
void RollbackOneIter () override
 Rollback one iteration.
 
int GetCurrentIteration () const override
 Get current iteration.
 
bool NeedAccuratePrediction () const override
 Can use early stopping for prediction or not.
 
std::vector< double > GetEvalAt (int data_idx) const override
 Get evaluation result at data_idx data.
 
virtual const double * GetTrainingScore (int64_t *out_len) override
 Get current training score.
 
virtual int64_t GetNumPredictAt (int data_idx) const override
 Get size of prediction at data_idx data.
 
void GetPredictAt (int data_idx, double *out_result, int64_t *out_len) override
 Get prediction result at data_idx data.
 
int NumPredictOneRow (int num_iteration, bool is_pred_leaf, bool is_pred_contrib) const override
 Get number of prediction for one data.
 
void PredictRaw (const double *features, double *output, const PredictionEarlyStopInstance *earlyStop) const override
 Prediction for one record, not sigmoid transform.
 
void PredictRawByMap (const std::unordered_map< int, double > &features, double *output, const PredictionEarlyStopInstance *early_stop) const override
 
void Predict (const double *features, double *output, const PredictionEarlyStopInstance *earlyStop) const override
 Prediction for one record, sigmoid transformation will be used if needed.
 
void PredictByMap (const std::unordered_map< int, double > &features, double *output, const PredictionEarlyStopInstance *early_stop) const override
 
void PredictLeafIndex (const double *features, double *output) const override
 Prediction for one record with leaf index.
 
void PredictLeafIndexByMap (const std::unordered_map< int, double > &features, double *output) const override
 
void PredictContrib (const double *features, double *output, const PredictionEarlyStopInstance *earlyStop) const override
 Feature contributions for the model's prediction of one record.
 
std::string DumpModel (int start_iteration, int num_iteration) const override
 Dump model to json format string.
 
std::string ModelToIfElse (int num_iteration) const override
 Translate model to if-else statement.
 
bool SaveModelToIfElse (int num_iteration, const char *filename) const override
 Translate model to if-else statement.
 
virtual bool SaveModelToFile (int start_iteration, int num_iterations, const char *filename) const override
 Save model to file.
 
std::string SaveModelToString (int num_iterations)
 Save model to string.
 
virtual std::string SaveModelToString (int start_iteration, int num_iterations) const override
 
bool LoadModelFromString (std::string str)
 Restore from a serialized buffer.
 
bool LoadModelFromString (const char *buffer, size_t len) override
 
std::vector< double > FeatureImportance (int num_iteration, int importance_type) const override
 Calculate feature importances.
 
int MaxFeatureIdx () const override
 Get max feature index of this model.
 
std::vector< std::string > FeatureNames () const override
 Get feature names of this model.
 
int LabelIdx () const override
 Get index of label column.
 
int NumberOfTotalModel () const override
 Get number of weak sub-models.
 
int NumModelPerIteration () const override
 Get number of tree per iteration.
 
int NumberOfClasses () const override
 Get number of classes.
 
void InitPredict (int num_iteration, bool is_pred_contrib) override
 Initial work for the prediction.
 
double GetLeafValue (int tree_idx, int leaf_idx) const override
 
void SetLeafValue (int tree_idx, int leaf_idx, double val) override
 
virtual const char * SubModelName () const override
 Get Type name of this boosting object.
 
- Public Member Functions inherited from LightGBM::Boosting
virtual ~Boosting ()
 virtual destructor
 
std::string SaveModelToString (int num_iterations)
 Save model to string.
 
bool LoadModelFromString (std::string str)
 Restore from a serialized string.
 
Boostingoperator= (const Boosting &)=delete
 Disable copy.
 
 Boosting (const Boosting &)=delete
 Disable copy.
 

Protected Member Functions

virtual bool EvalAndCheckEarlyStopping ()
 Print eval result and check early stopping.
 
void ResetBaggingConfig (const Config *config, bool is_change_dataset)
 reset config for bagging
 
virtual void Bagging (int iter)
 Implement bagging logic.
 
data_size_t BaggingHelper (Random &cur_rand, data_size_t start, data_size_t cnt, data_size_t *buffer)
 Helper function for bagging, used for multi-threading optimization.
 
virtual void Boosting ()
 calculate the object function
 
virtual void UpdateScore (const Tree *tree, const int cur_tree_id)
 updating score after tree was trained
 
virtual std::vector< double > EvalOneMetric (const Metric *metric, const double *score) const
 eval results for one metric
 
std::string OutputMetric (int iter)
 Print metric result of current iteration.
 
double BoostFromAverage (int class_id, bool update_scorer)
 

Protected Attributes

int iter_
 current iteration
 
const Datasettrain_data_
 Pointer to training data.
 
std::unique_ptr< Configconfig_
 Config of gbdt.
 
std::unique_ptr< TreeLearnertree_learner_
 Tree learner, will use this class to learn trees.
 
const ObjectiveFunctionobjective_function_
 Objective function.
 
std::unique_ptr< ScoreUpdatertrain_score_updater_
 Store and update training data's score.
 
std::vector< const Metric * > training_metrics_
 Metrics for training data.
 
std::vector< std::unique_ptr< ScoreUpdater > > valid_score_updater_
 Store and update validation data's scores.
 
std::vector< std::vector< const Metric * > > valid_metrics_
 Metric for validation data.
 
int early_stopping_round_
 Number of rounds for early stopping.
 
std::vector< std::vector< int > > best_iter_
 Best iteration(s) for early stopping.
 
std::vector< std::vector< double > > best_score_
 Best score(s) for early stopping.
 
std::vector< std::vector< std::string > > best_msg_
 output message of best iteration
 
std::vector< std::unique_ptr< Tree > > models_
 Trained models(trees)
 
int max_feature_idx_
 Max feature index of training data.
 
std::vector< score_tgradients_
 First order derivative of training data.
 
std::vector< score_thessians_
 Secend order derivative of training data.
 
std::vector< data_size_tbag_data_indices_
 Store the indices of in-bag data.
 
data_size_t bag_data_cnt_
 Number of in-bag data.
 
std::vector< data_size_ttmp_indices_
 Store the indices of in-bag data.
 
data_size_t num_data_
 Number of training data.
 
int num_tree_per_iteration_
 Number of trees per iterations.
 
int num_class_
 Number of class.
 
data_size_t label_idx_
 Index of label column.
 
int num_iteration_for_pred_
 number of used model
 
double shrinkage_rate_
 Shrinkage rate for one iteration.
 
int num_init_iteration_
 Number of loaded initial models.
 
std::vector< std::string > feature_names_
 Feature names.
 
std::vector< std::string > feature_infos_
 
int num_threads_
 number of threads
 
std::vector< data_size_toffsets_buf_
 Buffer for multi-threading bagging.
 
std::vector< data_size_tleft_cnts_buf_
 Buffer for multi-threading bagging.
 
std::vector< data_size_tright_cnts_buf_
 Buffer for multi-threading bagging.
 
std::vector< data_size_tleft_write_pos_buf_
 Buffer for multi-threading bagging.
 
std::vector< data_size_tright_write_pos_buf_
 Buffer for multi-threading bagging.
 
std::unique_ptr< Datasettmp_subset_
 
bool is_use_subset_
 
std::vector< bool > class_need_train_
 
bool is_constant_hessian_
 
std::unique_ptr< ObjectiveFunctionloaded_objective_
 
bool average_output_
 
bool need_re_bagging_
 
std::string loaded_parameter_
 
Json forced_splits_json_
 

Additional Inherited Members

- Static Public Member Functions inherited from LightGBM::Boosting
static bool LoadFileToBoosting (Boosting *boosting, const char *filename)
 
static BoostingCreateBoosting (const std::string &type, const char *filename)
 Create boosting object.
 

Detailed Description

GBDT algorithm implementation. including Training, prediction, bagging.

Member Function Documentation

◆ AddValidDataset()

void LightGBM::GBDT::AddValidDataset ( const Dataset valid_data,
const std::vector< const Metric * > &  valid_metrics 
)
overridevirtual

Adding a validation dataset.

Parameters
valid_dataValidation dataset
valid_metricsMetrics for validation dataset

Implements LightGBM::Boosting.

Reimplemented in LightGBM::RF.

◆ Bagging()

void LightGBM::GBDT::Bagging ( int  iter)
protectedvirtual

Implement bagging logic.

Parameters
iterCurrent interation

Reimplemented in LightGBM::GOSS.

◆ BaggingHelper()

data_size_t LightGBM::GBDT::BaggingHelper ( Random cur_rand,
data_size_t  start,
data_size_t  cnt,
data_size_t buffer 
)
protected

Helper function for bagging, used for multi-threading optimization.

Parameters
startstart indice of bagging
cntcount
bufferoutput buffer
Returns
count of left size

◆ Boosting()

void LightGBM::GBDT::Boosting ( )
protectedvirtual

calculate the object function

Reimplemented in LightGBM::RF.

◆ DumpModel()

std::string LightGBM::GBDT::DumpModel ( int  start_iteration,
int  num_iteration 
) const
overridevirtual

Dump model to json format string.

Parameters
start_iterationThe model will be saved start from
num_iterationNumber of iterations that want to dump, -1 means dump all
Returns
Json format string of model

Implements LightGBM::Boosting.

◆ EvalAndCheckEarlyStopping()

bool LightGBM::GBDT::EvalAndCheckEarlyStopping ( )
protectedvirtual

Print eval result and check early stopping.

Reimplemented in LightGBM::DART.

◆ FeatureImportance()

std::vector< double > LightGBM::GBDT::FeatureImportance ( int  num_iteration,
int  importance_type 
) const
overridevirtual

Calculate feature importances.

Parameters
num_iterationNumber of model that want to use for feature importance, -1 means use all
importance_type0 for split, 1 for gain
Returns
vector of feature_importance

Implements LightGBM::Boosting.

◆ FeatureNames()

std::vector< std::string > LightGBM::GBDT::FeatureNames ( ) const
inlineoverridevirtual

Get feature names of this model.

Returns
Feature names of this model

Implements LightGBM::Boosting.

◆ GetCurrentIteration()

int LightGBM::GBDT::GetCurrentIteration ( ) const
inlineoverridevirtual

Get current iteration.

Implements LightGBM::Boosting.

◆ GetEvalAt()

std::vector< double > LightGBM::GBDT::GetEvalAt ( int  data_idx) const
overridevirtual

Get evaluation result at data_idx data.

Get eval result.

Parameters
data_idx0: training data, 1: 1st validation data
Returns
evaluation result

Implements LightGBM::Boosting.

◆ GetLeafValue()

double LightGBM::GBDT::GetLeafValue ( int  tree_idx,
int  leaf_idx 
) const
inlineoverridevirtual

Implements LightGBM::GBDTBase.

◆ GetNumPredictAt()

virtual int64_t LightGBM::GBDT::GetNumPredictAt ( int  data_idx) const
inlineoverridevirtual

Get size of prediction at data_idx data.

Parameters
data_idx0: training data, 1: 1st validation data
Returns
The size of prediction

Implements LightGBM::Boosting.

◆ GetPredictAt()

void LightGBM::GBDT::GetPredictAt ( int  data_idx,
double *  out_result,
int64_t *  out_len 
)
overridevirtual

Get prediction result at data_idx data.

Parameters
data_idx0: training data, 1: 1st validation data
resultused to store prediction result, should allocate memory before call this function
out_lenlength of returned score

Implements LightGBM::Boosting.

◆ GetTrainingScore()

const double * LightGBM::GBDT::GetTrainingScore ( int64_t *  out_len)
overridevirtual

Get current training score.

Get training scores result.

Parameters
out_lenlength of returned score
Returns
training score

Implements LightGBM::Boosting.

Reimplemented in LightGBM::DART.

◆ Init()

void LightGBM::GBDT::Init ( const Config gbdt_config,
const Dataset train_data,
const ObjectiveFunction objective_function,
const std::vector< const Metric * > &  training_metrics 
)
overridevirtual

Initialization logic.

Parameters
gbdt_configConfig for boosting
train_dataTraining data
objective_functionTraining objective function
training_metricsTraining metrics

Implements LightGBM::Boosting.

Reimplemented in LightGBM::GOSS, and LightGBM::RF.

◆ InitPredict()

void LightGBM::GBDT::InitPredict ( int  num_iteration,
bool  is_pred_contrib 
)
inlineoverridevirtual

Initial work for the prediction.

Parameters
num_iterationnumber of used iteration
is_pred_contrib

Implements LightGBM::Boosting.

◆ LabelIdx()

int LightGBM::GBDT::LabelIdx ( ) const
inlineoverridevirtual

Get index of label column.

Returns
index of label column

Implements LightGBM::Boosting.

◆ LoadModelFromString()

bool LightGBM::GBDT::LoadModelFromString ( const char *  buffer,
size_t  len 
)
overridevirtual

Implements LightGBM::Boosting.

◆ MaxFeatureIdx()

int LightGBM::GBDT::MaxFeatureIdx ( ) const
inlineoverridevirtual

Get max feature index of this model.

Returns
Max feature index of this model

Implements LightGBM::Boosting.

◆ MergeFrom()

void LightGBM::GBDT::MergeFrom ( const Boosting other)
inlineoverridevirtual

Merge model from other boosting object. Will insert to the front of current boosting object.

Parameters
other

Implements LightGBM::Boosting.

◆ ModelToIfElse()

std::string LightGBM::GBDT::ModelToIfElse ( int  num_iteration) const
overridevirtual

Translate model to if-else statement.

Parameters
num_iterationNumber of iterations that want to translate, -1 means translate all
Returns
if-else format codes of model

Implements LightGBM::Boosting.

◆ NeedAccuratePrediction()

bool LightGBM::GBDT::NeedAccuratePrediction ( ) const
inlineoverridevirtual

Can use early stopping for prediction or not.

Returns
True if cannot use early stopping for prediction

Implements LightGBM::Boosting.

Reimplemented in LightGBM::RF.

◆ NumberOfClasses()

int LightGBM::GBDT::NumberOfClasses ( ) const
inlineoverridevirtual

Get number of classes.

Returns
Number of classes

Implements LightGBM::Boosting.

◆ NumberOfTotalModel()

int LightGBM::GBDT::NumberOfTotalModel ( ) const
inlineoverridevirtual

Get number of weak sub-models.

Returns
Number of weak sub-models

Implements LightGBM::Boosting.

◆ NumModelPerIteration()

int LightGBM::GBDT::NumModelPerIteration ( ) const
inlineoverridevirtual

Get number of tree per iteration.

Returns
number of tree per iteration

Implements LightGBM::Boosting.

◆ NumPredictOneRow()

int LightGBM::GBDT::NumPredictOneRow ( int  num_iteration,
bool  is_pred_leaf,
bool  is_pred_contrib 
) const
inlineoverridevirtual

Get number of prediction for one data.

Parameters
num_iterationnumber of used iterations
is_pred_leafTrue if predicting leaf index
is_pred_contribTrue if predicting feature contribution
Returns
number of prediction

Implements LightGBM::Boosting.

◆ OutputMetric()

std::string LightGBM::GBDT::OutputMetric ( int  iter)
protected

Print metric result of current iteration.

Parameters
iterCurrent interation
Returns
best_msg if met early_stopping

◆ Predict()

void LightGBM::GBDT::Predict ( const double *  features,
double *  output,
const PredictionEarlyStopInstance early_stop 
) const
overridevirtual

Prediction for one record, sigmoid transformation will be used if needed.

Parameters
feature_valuesFeature value on this record
outputPrediction result for this record
early_stopEarly stopping instance. If nullptr, no early stopping is applied and all models are evaluated.

Implements LightGBM::Boosting.

◆ PredictByMap()

void LightGBM::GBDT::PredictByMap ( const std::unordered_map< int, double > &  features,
double *  output,
const PredictionEarlyStopInstance early_stop 
) const
overridevirtual

Implements LightGBM::Boosting.

◆ PredictContrib()

void LightGBM::GBDT::PredictContrib ( const double *  features,
double *  output,
const PredictionEarlyStopInstance early_stop 
) const
overridevirtual

Feature contributions for the model's prediction of one record.

Parameters
feature_valuesFeature value on this record
outputPrediction result for this record
early_stopEarly stopping instance. If nullptr, no early stopping is applied and all models are evaluated.

Implements LightGBM::Boosting.

◆ PredictLeafIndex()

void LightGBM::GBDT::PredictLeafIndex ( const double *  features,
double *  output 
) const
overridevirtual

Prediction for one record with leaf index.

Parameters
feature_valuesFeature value on this record
outputPrediction result for this record

Implements LightGBM::Boosting.

◆ PredictLeafIndexByMap()

void LightGBM::GBDT::PredictLeafIndexByMap ( const std::unordered_map< int, double > &  features,
double *  output 
) const
overridevirtual

Implements LightGBM::Boosting.

◆ PredictRaw()

void LightGBM::GBDT::PredictRaw ( const double *  features,
double *  output,
const PredictionEarlyStopInstance early_stop 
) const
overridevirtual

Prediction for one record, not sigmoid transform.

Parameters
feature_valuesFeature value on this record
outputPrediction result for this record
early_stopEarly stopping instance. If nullptr, no early stopping is applied and all models are evaluated.

Implements LightGBM::Boosting.

◆ PredictRawByMap()

void LightGBM::GBDT::PredictRawByMap ( const std::unordered_map< int, double > &  features,
double *  output,
const PredictionEarlyStopInstance early_stop 
) const
overridevirtual

Implements LightGBM::Boosting.

◆ RefitTree()

void LightGBM::GBDT::RefitTree ( const std::vector< std::vector< int > > &  tree_leaf_prediction)
overridevirtual

Update the tree output by new training data.

Implements LightGBM::Boosting.

◆ ResetConfig()

void LightGBM::GBDT::ResetConfig ( const Config gbdt_config)
overridevirtual

Reset Boosting Config.

Parameters
gbdt_configConfig for boosting

Implements LightGBM::Boosting.

Reimplemented in LightGBM::GOSS, and LightGBM::RF.

◆ ResetTrainingData()

void LightGBM::GBDT::ResetTrainingData ( const Dataset train_data,
const ObjectiveFunction objective_function,
const std::vector< const Metric * > &  training_metrics 
)
overridevirtual

Reset the training data.

Parameters
train_dataNew Training data
objective_functionTraining objective function
training_metricsTraining metrics

Implements LightGBM::Boosting.

Reimplemented in LightGBM::GOSS, and LightGBM::RF.

◆ RollbackOneIter()

void LightGBM::GBDT::RollbackOneIter ( )
overridevirtual

Rollback one iteration.

Implements LightGBM::Boosting.

Reimplemented in LightGBM::RF.

◆ SaveModelToFile()

bool LightGBM::GBDT::SaveModelToFile ( int  start_iteration,
int  num_iterations,
const char *  filename 
) const
overridevirtual

Save model to file.

Parameters
start_iterationThe model will be saved start from
num_iterationsNumber of model that want to save, -1 means save all
filenameFilename that want to save to
Returns
is_finish Is training finished or not

File to write models

Implements LightGBM::Boosting.

◆ SaveModelToIfElse()

bool LightGBM::GBDT::SaveModelToIfElse ( int  num_iteration,
const char *  filename 
) const
overridevirtual

Translate model to if-else statement.

Parameters
num_iterationNumber of iterations that want to translate, -1 means translate all
filenameFilename that want to save to
Returns
is_finish Is training finished or not

File to write models

Implements LightGBM::Boosting.

◆ SaveModelToString() [1/2]

std::string LightGBM::GBDT::SaveModelToString ( int  num_iterations)
inline

Save model to string.

Parameters
start_iterationThe model will be saved start from
num_iterationsNumber of model that want to save, -1 means save all
Returns
Non-empty string if succeeded

◆ SaveModelToString() [2/2]

std::string LightGBM::GBDT::SaveModelToString ( int  start_iteration,
int  num_iterations 
) const
overridevirtual

Implements LightGBM::Boosting.

◆ SetLeafValue()

void LightGBM::GBDT::SetLeafValue ( int  tree_idx,
int  leaf_idx,
double  val 
)
inlineoverridevirtual

Implements LightGBM::GBDTBase.

◆ ShuffleModels()

void LightGBM::GBDT::ShuffleModels ( int  start_iter,
int  end_iter 
)
inlineoverridevirtual

Shuffle Existing Models.

Implements LightGBM::Boosting.

◆ SubModelName()

virtual const char * LightGBM::GBDT::SubModelName ( ) const
inlineoverridevirtual

Get Type name of this boosting object.

Implements LightGBM::Boosting.

◆ Train()

void LightGBM::GBDT::Train ( int  snapshot_freq,
const std::string &  model_output_path 
)
overridevirtual

Perform a full training procedure.

Parameters
snapshot_freqfrequence of snapshot
model_output_pathpath of model file

Implements LightGBM::Boosting.

◆ TrainOneIter()

bool LightGBM::GBDT::TrainOneIter ( const score_t gradients,
const score_t hessians 
)
overridevirtual

Training logic.

Parameters
gradientsnullptr for using default objective, otherwise use self-defined boosting
hessiansnullptr for using default objective, otherwise use self-defined boosting
Returns
True if cannot train any more

Implements LightGBM::Boosting.

Reimplemented in LightGBM::DART, and LightGBM::RF.

◆ UpdateScore()

void LightGBM::GBDT::UpdateScore ( const Tree tree,
const int  cur_tree_id 
)
protectedvirtual

updating score after tree was trained

Parameters
treeTrained tree of this iteration
cur_tree_idCurrent tree for multiclass training

The documentation for this class was generated from the following files: