Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Protected Attributes | Friends
LightGBM::SparseBin< VAL_T > Class Template Reference
Inheritance diagram for LightGBM::SparseBin< VAL_T >:
LightGBM::Bin

Public Member Functions

 SparseBin (data_size_t num_data)
 
void ReSize (data_size_t num_data) override
 
void Push (int tid, data_size_t idx, uint32_t value) override
 Push one record \pram tid Thread id.
 
BinIteratorGetIterator (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin) const override
 Get bin iterator of this bin for specific feature.
 
void ConstructHistogram (const data_size_t *, data_size_t, const score_t *, const score_t *, HistogramBinEntry *) const override
 Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
 
void ConstructHistogram (data_size_t, const score_t *, const score_t *, HistogramBinEntry *) const override
 
void ConstructHistogram (const data_size_t *, data_size_t, const score_t *, HistogramBinEntry *) const override
 Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
 
void ConstructHistogram (data_size_t, const score_t *, HistogramBinEntry *) const override
 
bool NextNonzero (data_size_t *i_delta, data_size_t *cur_pos) const
 
virtual data_size_t Split (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, MissingType missing_type, bool default_left, uint32_t threshold, data_size_t *data_indices, data_size_t num_data, data_size_t *lte_indices, data_size_t *gt_indices) const override
 Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
 
virtual data_size_t SplitCategorical (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, const uint32_t *threshold, int num_threahold, data_size_t *data_indices, data_size_t num_data, data_size_t *lte_indices, data_size_t *gt_indices) const override
 Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
 
data_size_t num_data () const override
 Number of all data.
 
OrderedBinCreateOrderedBin () const override
 Create the ordered bin for this bin.
 
void FinishLoad () override
 After pushed all feature data, call this could have better refactor for bin data.
 
void LoadFromPair (const std::vector< std::pair< data_size_t, VAL_T > > &idx_val_pairs)
 
void GetFastIndex ()
 
void SaveBinaryToFile (const VirtualFileWriter *writer) const override
 Save binary data to file.
 
size_t SizesInByte () const override
 Get sizes in byte of this object.
 
void LoadFromMemory (const void *memory, const std::vector< data_size_t > &local_used_indices) override
 Load from memory.
 
void CopySubset (const Bin *full_bin, const data_size_t *used_indices, data_size_t num_used_indices) override
 
- Public Member Functions inherited from LightGBM::Bin
virtual ~Bin ()
 virtual destructor
 

Protected Attributes

data_size_t num_data_
 
std::vector< uint8_t > deltas_
 
std::vector< VAL_T > vals_
 
data_size_t num_vals_
 
std::vector< std::vector< std::pair< data_size_t, VAL_T > > > push_buffers_
 
std::vector< std::pair< data_size_t, data_size_t > > fast_index_
 
data_size_t fast_index_shift_
 

Friends

class SparseBinIterator< VAL_T >
 
class OrderedSparseBin< VAL_T >
 

Additional Inherited Members

- Static Public Member Functions inherited from LightGBM::Bin
static BinCreateBin (data_size_t num_data, int num_bin, double sparse_rate, bool is_enable_sparse, double sparse_threshold, bool *is_sparse)
 Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse".
 
static BinCreateDenseBin (data_size_t num_data, int num_bin)
 Create object for bin data of one feature, used for dense feature.
 
static BinCreateSparseBin (data_size_t num_data, int num_bin)
 Create object for bin data of one feature, used for sparse feature.
 

Member Function Documentation

◆ ConstructHistogram() [1/4]

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::ConstructHistogram ( const data_size_t data_indices,
data_size_t  num_data,
const score_t ordered_gradients,
const score_t ordered_hessians,
HistogramBinEntry out 
) const
inlineoverridevirtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters
data_indicesUsed data indices in current leaf
num_dataNumber of used data
ordered_gradientsPointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
ordered_hessiansPointer to hessians, the data_indices[i]-th data's hessian is ordered_hessians[i]
outOutput Result

Implements LightGBM::Bin.

◆ ConstructHistogram() [2/4]

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::ConstructHistogram ( const data_size_t data_indices,
data_size_t  num_data,
const score_t ordered_gradients,
HistogramBinEntry out 
) const
inlineoverridevirtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters
data_indicesUsed data indices in current leaf
num_dataNumber of used data
ordered_gradientsPointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
outOutput Result

Implements LightGBM::Bin.

◆ ConstructHistogram() [3/4]

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::ConstructHistogram ( data_size_t  ,
const score_t ,
const score_t ,
HistogramBinEntry  
) const
inlineoverridevirtual

Implements LightGBM::Bin.

◆ ConstructHistogram() [4/4]

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::ConstructHistogram ( data_size_t  ,
const score_t ,
HistogramBinEntry  
) const
inlineoverridevirtual

Implements LightGBM::Bin.

◆ CopySubset()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::CopySubset ( const Bin full_bin,
const data_size_t used_indices,
data_size_t  num_used_indices 
)
inlineoverridevirtual

Implements LightGBM::Bin.

◆ CreateOrderedBin()

template<typename VAL_T >
OrderedBin * LightGBM::SparseBin< VAL_T >::CreateOrderedBin ( ) const
overridevirtual

Create the ordered bin for this bin.

Returns
Pointer to ordered bin

Implements LightGBM::Bin.

◆ FinishLoad()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::FinishLoad ( )
inlineoverridevirtual

After pushed all feature data, call this could have better refactor for bin data.

Implements LightGBM::Bin.

◆ GetIterator()

template<typename VAL_T >
BinIterator * LightGBM::SparseBin< VAL_T >::GetIterator ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin 
) const
overridevirtual

Get bin iterator of this bin for specific feature.

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefault bin if bin not in [min_bin, max_bin]
Returns
Iterator of this bin

Implements LightGBM::Bin.

◆ LoadFromMemory()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::LoadFromMemory ( const void *  memory,
const std::vector< data_size_t > &  local_used_indices 
)
inlineoverridevirtual

Load from memory.

Parameters
memory
local_used_indices

Implements LightGBM::Bin.

◆ num_data()

template<typename VAL_T >
data_size_t LightGBM::SparseBin< VAL_T >::num_data ( ) const
inlineoverridevirtual

Number of all data.

Implements LightGBM::Bin.

◆ Push()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::Push ( int  tid,
data_size_t  idx,
uint32_t  value 
)
inlineoverridevirtual

Push one record \pram tid Thread id.

Parameters
idxIndex of record
valuebin value of record

Implements LightGBM::Bin.

◆ ReSize()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::ReSize ( data_size_t  num_data)
inlineoverridevirtual

Implements LightGBM::Bin.

◆ SaveBinaryToFile()

template<typename VAL_T >
void LightGBM::SparseBin< VAL_T >::SaveBinaryToFile ( const VirtualFileWriter writer) const
inlineoverridevirtual

Save binary data to file.

Parameters
fileFile want to write

Implements LightGBM::Bin.

◆ SizesInByte()

template<typename VAL_T >
size_t LightGBM::SparseBin< VAL_T >::SizesInByte ( ) const
inlineoverridevirtual

Get sizes in byte of this object.

Implements LightGBM::Bin.

◆ Split()

template<typename VAL_T >
virtual data_size_t LightGBM::SparseBin< VAL_T >::Split ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin,
MissingType  missing_type,
bool  default_left,
uint32_t  threshold,
data_size_t data_indices,
data_size_t  num_data,
data_size_t lte_indices,
data_size_t gt_indices 
) const
inlineoverridevirtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefualt bin if bin not in [min_bin, max_bin]
missing_typemissing type
default_leftmissing bin will go to left child
thresholdThe split threshold.
data_indicesUsed data indices. After called this function. The less than or equal data indices will store on this object.
num_dataNumber of used data
lte_indicesAfter called this function. The less or equal data indices will store on this object.
gt_indicesAfter called this function. The greater data indices will store on this object.
Returns
The number of less than or equal data.

Implements LightGBM::Bin.

◆ SplitCategorical()

template<typename VAL_T >
virtual data_size_t LightGBM::SparseBin< VAL_T >::SplitCategorical ( uint32_t  min_bin,
uint32_t  max_bin,
uint32_t  default_bin,
const uint32_t *  threshold,
int  num_threshold,
data_size_t data_indices,
data_size_t  num_data,
data_size_t lte_indices,
data_size_t gt_indices 
) const
inlineoverridevirtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters
min_binmin_bin of current used feature
max_binmax_bin of current used feature
default_bindefualt bin if bin not in [min_bin, max_bin]
thresholdThe split threshold.
num_thresholdNumber of threshold
data_indicesUsed data indices. After called this function. The less than or equal data indices will store on this object.
num_dataNumber of used data
lte_indicesAfter called this function. The less or equal data indices will store on this object.
gt_indicesAfter called this function. The greater data indices will store on this object.
Returns
The number of less than or equal data.

Implements LightGBM::Bin.


The documentation for this class was generated from the following files: