Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions
LightGBM::Bin Class Referenceabstract

Interface for bin data. This class will store bin data for one feature. unlike OrderedBin, this class will store data by original order. Note that it may cause cache misses when construct histogram, but it doesn't need to re-order operation, So it will be faster than OrderedBin for dense feature. More...

#include <bin.h>

Inheritance diagram for LightGBM::Bin:
LightGBM::Dense4bitsBin LightGBM::DenseBin< VAL_T > LightGBM::SparseBin< VAL_T >

Public Member Functions

virtual ~Bin ()
 virtual destructor
 
virtual void Push (int tid, data_size_t idx, uint32_t value)=0
 Push one record \pram tid Thread id.
 
virtual void CopySubset (const Bin *full_bin, const data_size_t *used_indices, data_size_t num_used_indices)=0
 
virtual BinIteratorGetIterator (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin) const =0
 Get bin iterator of this bin for specific feature.
 
virtual void SaveBinaryToFile (const VirtualFileWriter *writer) const =0
 Save binary data to file.
 
virtual void LoadFromMemory (const void *memory, const std::vector< data_size_t > &local_used_indices)=0
 Load from memory.
 
virtual size_t SizesInByte () const =0
 Get sizes in byte of this object.
 
virtual data_size_t num_data () const =0
 Number of all data.
 
virtual void ReSize (data_size_t num_data)=0
 
virtual void ConstructHistogram (const data_size_t *data_indices, data_size_t num_data, const score_t *ordered_gradients, const score_t *ordered_hessians, HistogramBinEntry *out) const =0
 Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
 
virtual void ConstructHistogram (data_size_t num_data, const score_t *ordered_gradients, const score_t *ordered_hessians, HistogramBinEntry *out) const =0
 
virtual void ConstructHistogram (const data_size_t *data_indices, data_size_t num_data, const score_t *ordered_gradients, HistogramBinEntry *out) const =0
 Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
 
virtual void ConstructHistogram (data_size_t num_data, const score_t *ordered_gradients, HistogramBinEntry *out) const =0
 
virtual data_size_t Split (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, MissingType missing_type, bool default_left, uint32_t threshold, data_size_t *data_indices, data_size_t num_data, data_size_t *lte_indices, data_size_t *gt_indices) const =0
 Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
 
virtual data_size_t SplitCategorical (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, const uint32_t *threshold, int num_threshold, data_size_t *data_indices, data_size_t num_data, data_size_t *lte_indices, data_size_t *gt_indices) const =0
 Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
 
virtual OrderedBinCreateOrderedBin () const =0
 Create the ordered bin for this bin.
 
virtual void FinishLoad ()=0
 After pushed all feature data, call this could have better refactor for bin data.
 

Static Public Member Functions

static BinCreateBin (data_size_t num_data, int num_bin, double sparse_rate, bool is_enable_sparse, double sparse_threshold, bool *is_sparse)
 Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse".
 
static BinCreateDenseBin (data_size_t num_data, int num_bin)
 Create object for bin data of one feature, used for dense feature.
 
static BinCreateSparseBin (data_size_t num_data, int num_bin)
 Create object for bin data of one feature, used for sparse feature.
 

Detailed Description

Interface for bin data. This class will store bin data for one feature. unlike OrderedBin, this class will store data by original order. Note that it may cause cache misses when construct histogram, but it doesn't need to re-order operation, So it will be faster than OrderedBin for dense feature.

Member Function Documentation

◆ ConstructHistogram() [1/2]

virtual void LightGBM::Bin::ConstructHistogram ( const data_size_t data_indices,
data_size_t  num_data,
const score_t ordered_gradients,
const score_t ordered_hessians,
HistogramBinEntry out 
) const
pure virtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters
data_indicesUsed data indices in current leaf
num_dataNumber of used data
ordered_gradientsPointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
ordered_hessiansPointer to hessians, the data_indices[i]-th data's hessian is ordered_hessians[i]
outOutput Result

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ ConstructHistogram() [2/2]

virtual void LightGBM::Bin::ConstructHistogram ( const data_size_t data_indices,
data_size_t  num_data,
const score_t ordered_gradients,
HistogramBinEntry out 
) const
pure virtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters
data_indicesUsed data indices in current leaf
num_dataNumber of used data
ordered_gradientsPointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
outOutput Result

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ CreateBin()

Bin * LightGBM::Bin::CreateBin ( data_size_t  num_data,
int  num_bin,
double  sparse_rate,
bool  is_enable_sparse,
double  sparse_threshold,
bool *  is_sparse 
)
static

Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse".

Parameters
num_dataTotal number of data
num_binNumber of bin
sparse_rateSparse rate of this bins( num_bin0/num_data )
is_enable_sparseTrue if enable sparse feature
sparse_thresholdThreshold for treating a feature as a sparse feature
is_sparseWill set to true if this bin is sparse
default_binDefault bin for zeros value
Returns
The bin data object

◆ CreateDenseBin()

Bin * LightGBM::Bin::CreateDenseBin ( data_size_t  num_data,
int  num_bin 
)
static

Create object for bin data of one feature, used for dense feature.

Parameters
num_dataTotal number of data
num_binNumber of bin
Returns
The bin data object

◆ CreateOrderedBin()

virtual OrderedBin * LightGBM::Bin::CreateOrderedBin ( ) const
pure virtual

Create the ordered bin for this bin.

Returns
Pointer to ordered bin

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ CreateSparseBin()

Bin * LightGBM::Bin::CreateSparseBin ( data_size_t  num_data,
int  num_bin 
)
static

Create object for bin data of one feature, used for sparse feature.

Parameters
num_dataTotal number of data
num_binNumber of bin
Returns
The bin data object

◆ FinishLoad()

virtual void LightGBM::Bin::FinishLoad ( )
pure virtual

After pushed all feature data, call this could have better refactor for bin data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ GetIterator()

virtual BinIterator * LightGBM::Bin::GetIterator ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin 
) const
pure virtual

Get bin iterator of this bin for specific feature.

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefault bin if bin not in [min_bin, max_bin]
Returns
Iterator of this bin

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ LoadFromMemory()

virtual void LightGBM::Bin::LoadFromMemory ( const void *  memory,
const std::vector< data_size_t > &  local_used_indices 
)
pure virtual

Load from memory.

Parameters
memory
local_used_indices

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ num_data()

virtual data_size_t LightGBM::Bin::num_data ( ) const
pure virtual

◆ Push()

virtual void LightGBM::Bin::Push ( int  tid,
data_size_t  idx,
uint32_t  value 
)
pure virtual

Push one record \pram tid Thread id.

Parameters
idxIndex of record
valuebin value of record

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ SaveBinaryToFile()

virtual void LightGBM::Bin::SaveBinaryToFile ( const VirtualFileWriter writer) const
pure virtual

Save binary data to file.

Parameters
fileFile want to write

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ SizesInByte()

virtual size_t LightGBM::Bin::SizesInByte ( ) const
pure virtual

Get sizes in byte of this object.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ Split()

virtual data_size_t LightGBM::Bin::Split ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin,
MissingType  missing_type,
bool  default_left,
uint32_t  threshold,
data_size_t data_indices,
data_size_t  num_data,
data_size_t lte_indices,
data_size_t gt_indices 
) const
pure virtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefualt bin if bin not in [min_bin, max_bin]
missing_typemissing type
default_leftmissing bin will go to left child
thresholdThe split threshold.
data_indicesUsed data indices. After called this function. The less than or equal data indices will store on this object.
num_dataNumber of used data
lte_indicesAfter called this function. The less or equal data indices will store on this object.
gt_indicesAfter called this function. The greater data indices will store on this object.
Returns
The number of less than or equal data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ SplitCategorical()

virtual data_size_t LightGBM::Bin::SplitCategorical ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin,
const uint32_t *  threshold,
int  num_threshold,
data_size_t data_indices,
data_size_t  num_data,
data_size_t lte_indices,
data_size_t gt_indices 
) const
pure virtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefualt bin if bin not in [min_bin, max_bin]
thresholdThe split threshold.
num_thresholdNumber of threshold
data_indicesUsed data indices. After called this function. The less than or equal data indices will store on this object.
num_dataNumber of used data
lte_indicesAfter called this function. The less or equal data indices will store on this object.
gt_indicesAfter called this function. The greater data indices will store on this object.
Returns
The number of less than or equal data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.


The documentation for this class was generated from the following files: