Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Data Fields | Protected Member Functions
xgboost.core.DMatrix Class Reference
Inheritance diagram for xgboost.core.DMatrix:
xgboost.core.QuantileDMatrix xgboost.core._ProxyDMatrix xgboost.data::DMatrixProxy xgboost.data::IterativeDMatrix xgboost.data::SimpleDMatrix xgboost.data::SparsePageDMatrix xgboost.core.DeviceQuantileDMatrix

Public Member Functions

None __init__ (self, DataType data, Optional[ArrayLike] label=None, *Optional[ArrayLike] weight=None, Optional[ArrayLike] base_margin=None, Optional[float] missing=None, bool silent=False, Optional[FeatureNames] feature_names=None, Optional[FeatureTypes] feature_types=None, Optional[int] nthread=None, Optional[ArrayLike] group=None, Optional[ArrayLike] qid=None, Optional[ArrayLike] label_lower_bound=None, Optional[ArrayLike] label_upper_bound=None, Optional[ArrayLike] feature_weights=None, bool enable_categorical=False, DataSplitMode data_split_mode=DataSplitMode.ROW)
 
None __del__ (self)
 
None set_info (self, *Optional[ArrayLike] label=None, Optional[ArrayLike] weight=None, Optional[ArrayLike] base_margin=None, Optional[ArrayLike] group=None, Optional[ArrayLike] qid=None, Optional[ArrayLike] label_lower_bound=None, Optional[ArrayLike] label_upper_bound=None, Optional[FeatureNames] feature_names=None, Optional[FeatureTypes] feature_types=None, Optional[ArrayLike] feature_weights=None)
 
np.ndarray get_float_info (self, str field)
 
np.ndarray get_uint_info (self, str field)
 
None set_float_info (self, str field, ArrayLike data)
 
None set_float_info_npy2d (self, str field, ArrayLike data)
 
None set_uint_info (self, str field, ArrayLike data)
 
None save_binary (self, Union[str, os.PathLike] fname, bool silent=True)
 
None set_label (self, ArrayLike label)
 
None set_weight (self, ArrayLike weight)
 
None set_base_margin (self, ArrayLike margin)
 
None set_group (self, ArrayLike group)
 
np.ndarray get_label (self)
 
np.ndarray get_weight (self)
 
np.ndarray get_base_margin (self)
 
np.ndarray get_group (self)
 
scipy.sparse.csr_matrix get_data (self)
 
Tuple[np.ndarray, np.ndarray] get_quantile_cut (self)
 
int num_row (self)
 
int num_col (self)
 
int num_nonmissing (self)
 
"DMatrix" slice (self, Union[List[int], np.ndarray] rindex, bool allow_groups=False)
 
Optional[FeatureNames] feature_names (self)
 
None feature_names (self, Optional[FeatureNames] feature_names)
 
Optional[FeatureTypes] feature_types (self)
 
None feature_types (self, Optional[FeatureTypes] feature_types)
 

Data Fields

 missing
 
 nthread
 
 silent
 
 handle
 
 feature_names
 
 feature_types
 

Protected Member Functions

None _init_from_iter (self, DataIter iterator, bool enable_categorical)
 

Detailed Description

Data Matrix used in XGBoost.

DMatrix is an internal data structure that is used by XGBoost, which is optimized
for both memory efficiency and training speed.  You can construct DMatrix from
multiple different sources of data.

Constructor & Destructor Documentation

◆ __init__()

None xgboost.core.DMatrix.__init__ (   self,
DataType  data,
Optional[ArrayLike]   label = None,
*Optional[ArrayLike]   weight = None,
Optional[ArrayLike]   base_margin = None,
Optional[float]   missing = None,
bool   silent = False,
Optional[FeatureNames]   feature_names = None,
Optional[FeatureTypes]   feature_types = None,
Optional[int]   nthread = None,
Optional[ArrayLike]   group = None,
Optional[ArrayLike]   qid = None,
Optional[ArrayLike]   label_lower_bound = None,
Optional[ArrayLike]   label_upper_bound = None,
Optional[ArrayLike]   feature_weights = None,
bool   enable_categorical = False,
DataSplitMode   data_split_mode = DataSplitMode.ROW 
)
Parameters
----------
data :
    Data source of DMatrix. See :ref:`py-data` for a list of supported input
    types.
label :
    Label of the training data.
weight :
    Weight for each instance.

     .. note::

         For ranking task, weights are per-group.  In ranking task, one weight
         is assigned to each group (not each data point). This is because we
         only care about the relative ordering of data points within each group,
         so it doesn't make sense to assign weights to individual data points.

base_margin :
    Base margin used for boosting from existing model.
missing :
    Value in the input data which needs to be present as a missing value. If
    None, defaults to np.nan.
silent :
    Whether print messages during construction
feature_names :
    Set names for features.
feature_types :

    Set types for features.  When `enable_categorical` is set to `True`, string
    "c" represents categorical data type while "q" represents numerical feature
    type. For categorical features, the input is assumed to be preprocessed and
    encoded by the users. The encoding can be done via
    :py:class:`sklearn.preprocessing.OrdinalEncoder` or pandas dataframe
    `.cat.codes` method. This is useful when users want to specify categorical
    features without having to construct a dataframe as input.

nthread :
    Number of threads to use for loading data when parallelization is
    applicable. If -1, uses maximum threads available on the system.
group :
    Group size for all ranking group.
qid :
    Query ID for data samples, used for ranking.
label_lower_bound :
    Lower bound for survival training.
label_upper_bound :
    Upper bound for survival training.
feature_weights :
    Set feature weights for column sampling.
enable_categorical :

    .. versionadded:: 1.3.0

    .. note:: This parameter is experimental

    Experimental support of specializing for categorical features.  Do not set
    to True unless you are interested in development. Also, JSON/UBJSON
    serialization format is required.

Reimplemented in xgboost.core._ProxyDMatrix, xgboost.core.DeviceQuantileDMatrix, and xgboost.core.QuantileDMatrix.

Member Function Documentation

◆ feature_names()

Optional[FeatureNames] xgboost.core.DMatrix.feature_names (   self)
Labels for features (column labels).

Setting it to ``None`` resets existing feature names.

◆ feature_types()

Optional[FeatureTypes] xgboost.core.DMatrix.feature_types (   self)
Type of features (column types).

This is for displaying the results and categorical data support. See
:py:class:`DMatrix` for details.

Setting it to ``None`` resets existing feature types.

◆ get_base_margin()

np.ndarray xgboost.core.DMatrix.get_base_margin (   self)
Get the base margin of the DMatrix.

Returns
-------
base_margin

◆ get_data()

scipy.sparse.csr_matrix xgboost.core.DMatrix.get_data (   self)
Get the predictors from DMatrix as a CSR matrix. This getter is mostly for
testing purposes. If this is a quantized DMatrix then quantized values are
returned instead of input values.

.. versionadded:: 1.7.0

◆ get_float_info()

np.ndarray xgboost.core.DMatrix.get_float_info (   self,
str  field 
)
Get float property from the DMatrix.

Parameters
----------
field: str
    The field name of the information

Returns
-------
info : array
    a numpy array of float information of the data

◆ get_group()

np.ndarray xgboost.core.DMatrix.get_group (   self)
Get the group of the DMatrix.

Returns
-------
group

◆ get_label()

np.ndarray xgboost.core.DMatrix.get_label (   self)
Get the label of the DMatrix.

Returns
-------
label : array

◆ get_quantile_cut()

Tuple[np.ndarray, np.ndarray] xgboost.core.DMatrix.get_quantile_cut (   self)
Get quantile cuts for quantization.

.. versionadded:: 2.0.0

◆ get_uint_info()

np.ndarray xgboost.core.DMatrix.get_uint_info (   self,
str  field 
)
Get unsigned integer property from the DMatrix.

Parameters
----------
field: str
    The field name of the information

Returns
-------
info : array
    a numpy array of unsigned integer information of the data

◆ get_weight()

np.ndarray xgboost.core.DMatrix.get_weight (   self)
Get the weight of the DMatrix.

Returns
-------
weight : array

◆ num_col()

int xgboost.core.DMatrix.num_col (   self)
Get the number of columns (features) in the DMatrix.

◆ num_nonmissing()

int xgboost.core.DMatrix.num_nonmissing (   self)
Get the number of non-missing values in the DMatrix.

.. versionadded:: 1.7.0

◆ num_row()

int xgboost.core.DMatrix.num_row (   self)
Get the number of rows in the DMatrix.

◆ save_binary()

None xgboost.core.DMatrix.save_binary (   self,
Union[str, os.PathLike]  fname,
bool   silent = True 
)
Save DMatrix to an XGBoost buffer.  Saved binary can be later loaded
by providing the path to :py:func:`xgboost.DMatrix` as input.

Parameters
----------
fname : string or os.PathLike
    Name of the output buffer file.
silent : bool (optional; default: True)
    If set, the output is suppressed.

◆ set_base_margin()

None xgboost.core.DMatrix.set_base_margin (   self,
ArrayLike  margin 
)
Set base margin of booster to start from.

This can be used to specify a prediction value of existing model to be
base_margin However, remember margin is needed, instead of transformed
prediction e.g. for logistic regression: need to put in value before
logistic transformation see also example/demo.py

Parameters
----------
margin: array like
    Prediction margin of each datapoint

◆ set_float_info()

None xgboost.core.DMatrix.set_float_info (   self,
str  field,
ArrayLike  data 
)
Set float type property into the DMatrix.

Parameters
----------
field: str
    The field name of the information

data: numpy array
    The array of data to be set

◆ set_float_info_npy2d()

None xgboost.core.DMatrix.set_float_info_npy2d (   self,
str  field,
ArrayLike  data 
)
Set float type property into the DMatrix
   for numpy 2d array input

Parameters
----------
field: str
    The field name of the information

data: numpy array
    The array of data to be set

◆ set_group()

None xgboost.core.DMatrix.set_group (   self,
ArrayLike  group 
)
Set group size of DMatrix (used for ranking).

Parameters
----------
group : array like
    Group size of each group

◆ set_info()

None xgboost.core.DMatrix.set_info (   self,
*Optional[ArrayLike]   label = None,
Optional[ArrayLike]   weight = None,
Optional[ArrayLike]   base_margin = None,
Optional[ArrayLike]   group = None,
Optional[ArrayLike]   qid = None,
Optional[ArrayLike]   label_lower_bound = None,
Optional[ArrayLike]   label_upper_bound = None,
Optional[FeatureNames]   feature_names = None,
Optional[FeatureTypes]   feature_types = None,
Optional[ArrayLike]   feature_weights = None 
)
Set meta info for DMatrix.  See doc string for :py:obj:`xgboost.DMatrix`.

◆ set_label()

None xgboost.core.DMatrix.set_label (   self,
ArrayLike  label 
)
Set label of dmatrix

Parameters
----------
label: array like
    The label information to be set into DMatrix

◆ set_uint_info()

None xgboost.core.DMatrix.set_uint_info (   self,
str  field,
ArrayLike  data 
)
Set uint type property into the DMatrix.

Parameters
----------
field: str
    The field name of the information

data: numpy array
    The array of data to be set

◆ set_weight()

None xgboost.core.DMatrix.set_weight (   self,
ArrayLike  weight 
)
Set weight of each instance.

Parameters
----------
weight : array like
    Weight for each data point

    .. note:: For ranking task, weights are per-group.

        In ranking task, one weight is assigned to each group (not each
        data point). This is because we only care about the relative
        ordering of data points within each group, so it doesn't make
        sense to assign weights to individual data points.

◆ slice()

"DMatrix" xgboost.core.DMatrix.slice (   self,
Union[List[int], np.ndarray]  rindex,
bool   allow_groups = False 
)
Slice the DMatrix and return a new DMatrix that only contains `rindex`.

Parameters
----------
rindex
    List of indices to be selected.
allow_groups
    Allow slicing of a matrix with a groups attribute

Returns
-------
res
    A new DMatrix containing only selected indices.

The documentation for this class was generated from the following file: