DMatrix type for QuantileDMatrix, the naming IterativeDMatix is due to its construction process. More...

#include <iterative_dmatrix.h>

Inheritance diagram for xgboost.data::IterativeDMatrix:

Public Member Functions
	IterativeDMatrix (DataIterHandle iter_handle, DMatrixHandle proxy, std::shared_ptr< DMatrix > ref, DataIterResetCallback reset, XGDMatrixCallbackNext next, float missing, int nthread, bst_bin_t max_bin)

bool	EllpackExists () const override

bool	GHistIndexExists () const override

bool	SparsePageExists () const override

DMatrix *	Slice (common::Span< int32_t const >) override

DMatrix *	SliceCol (int, int) override

BatchSet< SparsePage >	GetRowBatches () override

BatchSet< CSCPage >	GetColumnBatches (Context const *) override

BatchSet< SortedCSCPage >	GetSortedColumnBatches (Context const *) override

BatchSet< GHistIndexMatrix >	GetGradientIndex (Context const *ctx, BatchParam const &param) override

BatchSet< EllpackPage >	GetEllpackBatches (Context const *ctx, const BatchParam &param) override

BatchSet< ExtSparsePage >	GetExtBatches (Context const *ctx, BatchParam const &param) override

bool	SingleColBlock () const override

MetaInfo &	Info () override

MetaInfo const &	Info () const override

Context const *	Ctx () const override

Public Member Functions inherited from xgboost.core.DMatrix
None	__init__ (self, DataType data, Optional[ArrayLike] label=None, *Optional[ArrayLike] weight=None, Optional[ArrayLike] base_margin=None, Optional[float] missing=None, bool silent=False, Optional[FeatureNames] feature_names=None, Optional[FeatureTypes] feature_types=None, Optional[int] nthread=None, Optional[ArrayLike] group=None, Optional[ArrayLike] qid=None, Optional[ArrayLike] label_lower_bound=None, Optional[ArrayLike] label_upper_bound=None, Optional[ArrayLike] feature_weights=None, bool enable_categorical=False, DataSplitMode data_split_mode=DataSplitMode.ROW)

None	__del__ (self)

None	set_info (self, *Optional[ArrayLike] label=None, Optional[ArrayLike] weight=None, Optional[ArrayLike] base_margin=None, Optional[ArrayLike] group=None, Optional[ArrayLike] qid=None, Optional[ArrayLike] label_lower_bound=None, Optional[ArrayLike] label_upper_bound=None, Optional[FeatureNames] feature_names=None, Optional[FeatureTypes] feature_types=None, Optional[ArrayLike] feature_weights=None)

np.ndarray	get_float_info (self, str field)

np.ndarray	get_uint_info (self, str field)

None	set_float_info (self, str field, ArrayLike data)

None	set_float_info_npy2d (self, str field, ArrayLike data)

None	set_uint_info (self, str field, ArrayLike data)

None	save_binary (self, Union[str, os.PathLike] fname, bool silent=True)

None	set_label (self, ArrayLike label)

None	set_weight (self, ArrayLike weight)

None	set_base_margin (self, ArrayLike margin)

None	set_group (self, ArrayLike group)

np.ndarray	get_label (self)

np.ndarray	get_weight (self)

np.ndarray	get_base_margin (self)

np.ndarray	get_group (self)

scipy.sparse.csr_matrix	get_data (self)

Tuple[np.ndarray, np.ndarray]	get_quantile_cut (self)

int	num_row (self)

int	num_col (self)

int	num_nonmissing (self)

"DMatrix"	slice (self, Union[List[int], np.ndarray] rindex, bool allow_groups=False)

Optional[FeatureNames]	feature_names (self)

None	feature_names (self, Optional[FeatureNames] feature_names)

Optional[FeatureTypes]	feature_types (self)

None	feature_types (self, Optional[FeatureTypes] feature_types)

Additional Inherited Members
Data Fields inherited from xgboost.core.DMatrix
	missing

	nthread

	silent

	handle

	feature_names

	feature_types

Protected Member Functions inherited from xgboost.core.DMatrix
None	_init_from_iter (self, DataIter iterator, bool enable_categorical)

Detailed Description

DMatrix type for QuantileDMatrix, the naming IterativeDMatix is due to its construction process.

QuantileDMatrix is an intermediate storage for quantilization results including quantile cuts and histogram index. Quantilization is designed to be performed on stream of data (or batches of it). As a result, the QuantileDMatrix is also designed to work with batches of data. During initializaion, it walks through the data multiple times iteratively in order to perform quantilization. This design helps us reduce memory usage significantly by avoiding data concatenation along with removing the CSR matrix SparsePage. However, it has its limitation (can be fixed if needed):

It's only supported by hist tree method (both CPU and GPU) since approx requires a re-calculation of quantiles for each iteration. We can fix this by retaining a reference to the callback if there are feature requests.
The CPU format and the GPU format are different, the former uses a CSR + CSC for histogram index while the latter uses only Ellpack.

The documentation for this class was generated from the following files:

External/xgboost/src/data/iterative_dmatrix.h
External/xgboost/src/data/iterative_dmatrix.cc

Public Member Functions

Additional Inherited Members

Detailed Description