Interface for bin data. This class will store bin data for one feature. unlike OrderedBin, this class will store data by original order. Note that it may cause cache misses when construct histogram, but it doesn't need to re-order operation, So it will be faster than OrderedBin for dense feature. More...

#include <bin.h>

Inheritance diagram for LightGBM::Bin:

Public Member Functions
virtual	~Bin ()
	virtual destructor

virtual void	Push (int tid, data_size_t idx, uint32_t value)=0
	Push one record \pram tid Thread id.

virtual void	CopySubset (const Bin full_bin, const data_size_t used_indices, data_size_t num_used_indices)=0

virtual BinIterator *	GetIterator (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin) const =0
	Get bin iterator of this bin for specific feature.

virtual void	SaveBinaryToFile (const VirtualFileWriter *writer) const =0
	Save binary data to file.

virtual void	LoadFromMemory (const void *memory, const std::vector< data_size_t > &local_used_indices)=0
	Load from memory.

virtual size_t	SizesInByte () const =0
	Get sizes in byte of this object.

virtual data_size_t	num_data () const =0
	Number of all data.

virtual void	ReSize (data_size_t num_data)=0

virtual void	ConstructHistogram (const data_size_t data_indices, data_size_t num_data, const score_t ordered_gradients, const score_t ordered_hessians, HistogramBinEntry out) const =0
	Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

virtual void	ConstructHistogram (data_size_t num_data, const score_t ordered_gradients, const score_t ordered_hessians, HistogramBinEntry *out) const =0

virtual void	ConstructHistogram (const data_size_t data_indices, data_size_t num_data, const score_t ordered_gradients, HistogramBinEntry *out) const =0
	Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

virtual void	ConstructHistogram (data_size_t num_data, const score_t ordered_gradients, HistogramBinEntry out) const =0

virtual data_size_t	Split (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, MissingType missing_type, bool default_left, uint32_t threshold, data_size_t data_indices, data_size_t num_data, data_size_t lte_indices, data_size_t *gt_indices) const =0
	Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

virtual data_size_t	SplitCategorical (uint32_t min_bin, uint32_t max_bin, uint32_t default_bin, const uint32_t threshold, int num_threshold, data_size_t data_indices, data_size_t num_data, data_size_t lte_indices, data_size_t gt_indices) const =0
	Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

virtual OrderedBin *	CreateOrderedBin () const =0
	Create the ordered bin for this bin.

virtual void	FinishLoad ()=0
	After pushed all feature data, call this could have better refactor for bin data.

Static Public Member Functions
static Bin *	CreateBin (data_size_t num_data, int num_bin, double sparse_rate, bool is_enable_sparse, double sparse_threshold, bool *is_sparse)
	Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse".

static Bin *	CreateDenseBin (data_size_t num_data, int num_bin)
	Create object for bin data of one feature, used for dense feature.

static Bin *	CreateSparseBin (data_size_t num_data, int num_bin)
	Create object for bin data of one feature, used for sparse feature.

Detailed Description

Interface for bin data. This class will store bin data for one feature. unlike OrderedBin, this class will store data by original order. Note that it may cause cache misses when construct histogram, but it doesn't need to re-order operation, So it will be faster than OrderedBin for dense feature.

Member Function Documentation

◆ ConstructHistogram() [1/2]

virtual void LightGBM::Bin::ConstructHistogram	(	const data_size_t *	data_indices,
		data_size_t	num_data,
		const score_t *	ordered_gradients,
		const score_t *	ordered_hessians,
		HistogramBinEntry *	out
	)		const

pure virtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters

data_indices	Used data indices in current leaf
num_data	Number of used data
ordered_gradients	Pointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
ordered_hessians	Pointer to hessians, the data_indices[i]-th data's hessian is ordered_hessians[i]
out	Output Result

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ ConstructHistogram() [2/2]

virtual void LightGBM::Bin::ConstructHistogram	(	const data_size_t *	data_indices,
		data_size_t	num_data,
		const score_t *	ordered_gradients,
		HistogramBinEntry *	out
	)		const

pure virtual

Construct histogram of this feature, Note: We use ordered_gradients and ordered_hessians to improve cache hit chance The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients, which is not cache friendly, since the access of memory is not continuous. ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).

Parameters

data_indices	Used data indices in current leaf
num_data	Number of used data
ordered_gradients	Pointer to gradients, the data_indices[i]-th data's gradient is ordered_gradients[i]
out	Output Result

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ CreateBin()

Bin * LightGBM::Bin::CreateBin	(	data_size_t	num_data,
		int	num_bin,
		double	sparse_rate,
		bool	is_enable_sparse,
		double	sparse_threshold,
		bool *	is_sparse
	)

static

Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse".

Parameters

num_data	Total number of data
num_bin	Number of bin
sparse_rate	Sparse rate of this bins( num_bin0/num_data )
is_enable_sparse	True if enable sparse feature
sparse_threshold	Threshold for treating a feature as a sparse feature
is_sparse	Will set to true if this bin is sparse
default_bin	Default bin for zeros value

Returns: The bin data object

◆ CreateDenseBin()

Bin * LightGBM::Bin::CreateDenseBin	(	data_size_t	num_data,
		int	num_bin
	)

static

Create object for bin data of one feature, used for dense feature.

Parameters

num_data	Total number of data
num_bin	Number of bin

Returns: The bin data object

◆ CreateOrderedBin()

virtual OrderedBin * LightGBM::Bin::CreateOrderedBin ( ) const

pure virtual

Create the ordered bin for this bin.

Returns: Pointer to ordered bin

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ CreateSparseBin()

Bin * LightGBM::Bin::CreateSparseBin	(	data_size_t	num_data,
		int	num_bin
	)

static

Create object for bin data of one feature, used for sparse feature.

Parameters

num_data	Total number of data
num_bin	Number of bin

Returns: The bin data object

◆ FinishLoad()

virtual void LightGBM::Bin::FinishLoad ( )

pure virtual

After pushed all feature data, call this could have better refactor for bin data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ GetIterator()

virtual BinIterator * LightGBM::Bin::GetIterator	(	uint32_t	min_bin,
		uint32_t	max_bin,
		uint32_t	default_bin
	)		const

pure virtual

Get bin iterator of this bin for specific feature.

Parameters

min_bin	min_bin of current used feature
max_bin	max_bin of current used feature
default_bin	default bin if bin not in [min_bin, max_bin]

Returns: Iterator of this bin

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ LoadFromMemory()

virtual void LightGBM::Bin::LoadFromMemory	(	const void *	memory,
		const std::vector< data_size_t > &	local_used_indices
	)

pure virtual

Load from memory.

Parameters

memory
local_used_indices

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ num_data()

virtual data_size_t LightGBM::Bin::num_data ( ) const

pure virtual

Number of all data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ Push()

virtual void LightGBM::Bin::Push	(	int	tid,
		data_size_t	idx,
		uint32_t	value
	)

pure virtual

Push one record \pram tid Thread id.

Parameters

idx	Index of record
value	bin value of record

Implemented in LightGBM::SparseBin< VAL_T >, LightGBM::DenseBin< VAL_T >, and LightGBM::Dense4bitsBin.

◆ SaveBinaryToFile()

virtual void LightGBM::Bin::SaveBinaryToFile ( const VirtualFileWriter * writer ) const

pure virtual

Save binary data to file.

Parameters

file	File want to write

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ SizesInByte()

virtual size_t LightGBM::Bin::SizesInByte ( ) const

pure virtual

Get sizes in byte of this object.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ Split()

virtual data_size_t LightGBM::Bin::Split	(	uint32_t	min_bin,
		uint32_t	max_bin,
		uint32_t	default_bin,
		MissingType	missing_type,
		bool	default_left,
		uint32_t	threshold,
		data_size_t *	data_indices,
		data_size_t	num_data,
		data_size_t *	lte_indices,
		data_size_t *	gt_indices
	)		const

pure virtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters

min_bin	min_bin of current used feature
max_bin	max_bin of current used feature
default_bin	defualt bin if bin not in [min_bin, max_bin]
missing_type	missing type
default_left	missing bin will go to left child
threshold	The split threshold.
data_indices	Used data indices. After called this function. The less than or equal data indices will store on this object.
num_data	Number of used data
lte_indices	After called this function. The less or equal data indices will store on this object.
gt_indices	After called this function. The greater data indices will store on this object.

Returns: The number of less than or equal data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

◆ SplitCategorical()

virtual data_size_t LightGBM::Bin::SplitCategorical	(	uint32_t	min_bin,
		uint32_t	max_bin,
		uint32_t	default_bin,
		const uint32_t *	threshold,
		int	num_threshold,
		data_size_t *	data_indices,
		data_size_t	num_data,
		data_size_t *	lte_indices,
		data_size_t *	gt_indices
	)		const

pure virtual

Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)

Parameters

min_bin	min_bin of current used feature
max_bin	max_bin of current used feature
default_bin	defualt bin if bin not in [min_bin, max_bin]
threshold	The split threshold.
num_threshold	Number of threshold
data_indices	Used data indices. After called this function. The less than or equal data indices will store on this object.
num_data	Number of used data
lte_indices	After called this function. The less or equal data indices will store on this object.
gt_indices	After called this function. The greater data indices will store on this object.

Returns: The number of less than or equal data.

Implemented in LightGBM::DenseBin< VAL_T >, LightGBM::Dense4bitsBin, and LightGBM::SparseBin< VAL_T >.

The documentation for this class was generated from the following files:

External/LightGBM_2.2.3/LightGBM-2.2.3/include/LightGBM/bin.h
External/LightGBM_2.2.3/LightGBM-2.2.3/src/io/bin.cpp

Public Member Functions

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ ConstructHistogram() [1/2]

◆ ConstructHistogram() [2/2]

◆ CreateBin()

◆ CreateDenseBin()

◆ CreateOrderedBin()

◆ CreateSparseBin()

◆ FinishLoad()

◆ GetIterator()

◆ LoadFromMemory()

◆ num_data()

◆ Push()

◆ SaveBinaryToFile()

◆ SizesInByte()

◆ Split()

◆ SplitCategorical()