Medial Code Documentation
|
Data Structures | |
class | CommunicatorContext |
class | DaskDeviceQuantileDMatrix |
class | DaskDMatrix |
class | DaskPartitionIter |
class | DaskQuantileDMatrix |
class | DaskScikitLearnBase |
class | DaskXGBClassifier |
class | DaskXGBRanker |
class | DaskXGBRegressor |
class | DaskXGBRFClassifier |
class | DaskXGBRFRegressor |
Functions | |
Dict[str, Union[int, str]] | _try_start_tracker (int n_workers, List[Union[Optional[str], Optional[Tuple[str, int]]]] addrs) |
Dict[str, Union[int, str]] | _start_tracker (int n_workers, Optional[str] addr_from_dask, Optional[Tuple[str, int]] addr_from_user) |
None | _assert_dask_support () |
_T | dconcat (Sequence[_T] value) |
"distributed.Client" | _xgb_get_client (Optional["distributed.Client"] client) |
List[_MapRetT] | map_worker_partitions (Optional["distributed.Client"] client, Callable[..., _MapRetT] func, *Any refs, Sequence[str] workers) |
Dict[str, List[Any]] | _get_worker_parts (_DataParts list_of_parts) |
QuantileDMatrix | _create_quantile_dmatrix (Optional[FeatureNames] feature_names, Optional[Union[Any, List[Any]]] feature_types, Optional[Any] feature_weights, float missing, int nthread, Optional[_DataParts] parts, int max_bin, bool enable_categorical, Optional[DMatrix] ref=None) |
DMatrix | _create_dmatrix (Optional[FeatureNames] feature_names, Optional[Union[Any, List[Any]]] feature_types, Optional[Any] feature_weights, float missing, int nthread, bool enable_categorical, Optional[_DataParts] parts) |
DMatrix | _dmatrix_from_list_of_parts (bool is_quantile, **Any kwargs) |
Dict[str, Union[str, int]] | _get_rabit_args (int n_workers, Optional[Dict[str, Any]] dconfig, "distributed.Client" client) |
Optional[Dict[str, Any]] | _get_dask_config () |
List[str] | _get_workers_from_data (DaskDMatrix dtrain, Optional[Sequence[Tuple[DaskDMatrix, str]]] evals) |
Optional[TrainReturnT] | _filter_empty (Booster booster, TrainingCallback.EvalsLog local_history, bool is_valid) |
None | _check_workers_are_alive (List[str] workers, "distributed.Client" client) |
Optional[TrainReturnT] | _train_async ("distributed.Client" client, Dict[str, Any] global_config, Optional[Dict[str, Any]] dconfig, Dict[str, Any] params, DaskDMatrix dtrain, int num_boost_round, Optional[Sequence[Tuple[DaskDMatrix, str]]] evals, Optional[Objective] obj, Optional[Metric] feval, Optional[int] early_stopping_rounds, Union[int, bool] verbose_eval, Optional[Booster] xgb_model, Optional[Sequence[TrainingCallback]] callbacks, Optional[Metric] custom_metric) |
Any | train ("distributed.Client" client, Dict[str, Any] params, DaskDMatrix dtrain, int num_boost_round=10, *Optional[Sequence[Tuple[DaskDMatrix, str]]] evals=None, Optional[Objective] obj=None, Optional[Metric] feval=None, Optional[int] early_stopping_rounds=None, Optional[Booster] xgb_model=None, Union[int, bool] verbose_eval=True, Optional[Sequence[TrainingCallback]] callbacks=None, Optional[Metric] custom_metric=None) |
bool | _can_output_df (bool is_df, Tuple output_shape) |
Any | _maybe_dataframe (Any data, Any prediction, List[int] columns, bool is_df) |
_DaskCollection | _direct_predict_impl (Callable mapped_predict, "distributed.Future" booster, _DataT data, Optional[_DaskCollection] base_margin, Tuple[int,...] output_shape, Dict[int, str] meta) |
Tuple[Tuple[int,...], Dict[int, str]] | _infer_predict_output (Booster booster, int features, bool is_df, bool inplace, **Any kwargs) |
"distributed.Future" | _get_model_future ("distributed.Client" client, Union[Booster, Dict, "distributed.Future"] model) |
_DaskCollection | _predict_async ("distributed.Client" client, Dict[str, Any] global_config, Union[Booster, Dict, "distributed.Future"] model, _DataT data, bool output_margin, float missing, bool pred_leaf, bool pred_contribs, bool approx_contribs, bool pred_interactions, bool validate_features, Tuple[int, int] iteration_range, bool strict_shape) |
Any | predict (Optional["distributed.Client"] client, Union[TrainReturnT, Booster, "distributed.Future"] model, Union[DaskDMatrix, _DataT] data, bool output_margin=False, float missing=numpy.nan, bool pred_leaf=False, bool pred_contribs=False, bool approx_contribs=False, bool pred_interactions=False, bool validate_features=True, Tuple[int, int] iteration_range=(0, 0), bool strict_shape=False) |
_DaskCollection | _inplace_predict_async ("distributed.Client" client, Dict[str, Any] global_config, Union[Booster, Dict, "distributed.Future"] model, _DataT data, Tuple[int, int] iteration_range, str predict_type, float missing, bool validate_features, Optional[_DaskCollection] base_margin, bool strict_shape) |
Any | inplace_predict (Optional["distributed.Client"] client, Union[TrainReturnT, Booster, "distributed.Future"] model, _DataT data, Tuple[int, int] iteration_range=(0, 0), str predict_type="value", float missing=numpy.nan, bool validate_features=True, Optional[_DaskCollection] base_margin=None, bool strict_shape=False) |
Tuple[DaskDMatrix, Optional[List[Tuple[DaskDMatrix, str]]]] | _async_wrap_evaluation_matrices (Optional["distributed.Client"] client, Optional[str] tree_method, Optional[int] max_bin, **Any kwargs) |
Generator | _set_worker_client ("DaskScikitLearnBase" model, "distributed.Client" client) |
Variables | |
dd = LazyLoader("dd", globals(), "dask.dataframe") | |
da = LazyLoader("da", globals(), "dask.array") | |
dask = LazyLoader("dask", globals(), "dask") | |
distributed = LazyLoader("distributed", globals(), "dask.distributed") | |
_DaskCollection = Union["da.Array", "dd.DataFrame", "dd.Series"] | |
_DataT = Union["da.Array", "dd.DataFrame"] | |
TrainReturnT | |
LOGGER = logging.getLogger("[xgboost.dask]") | |
_MapRetT = TypeVar("_MapRetT") | |
_DataParts = List[Dict[str, Any]] | |
Dask extensions for distributed training ---------------------------------------- See :doc:`Distributed XGBoost with Dask </tutorials/dask>` for simple tutorial. Also :doc:`/python/dask-examples/index` for some examples. There are two sets of APIs in this module, one is the functional API including ``train`` and ``predict`` methods. Another is stateful Scikit-Learner wrapper inherited from single-node Scikit-Learn interface. The implementation is heavily influenced by dask_xgboost: https://github.com/dask/dask-xgboost Optional dask configuration =========================== - **xgboost.scheduler_address**: Specify the scheduler address, see :ref:`tracker-ip`. .. versionadded:: 1.6.0 .. code-block:: python dask.config.set({"xgboost.scheduler_address": "192.0.0.100"}) # We can also specify the port. dask.config.set({"xgboost.scheduler_address": "192.0.0.100:12345"})
|
protected |
A switch function for async environment.
|
protected |
Get data that local to worker from DaskDMatrix. Returns ------- A DMatrix object.
|
protected |
Get rabit context arguments from data distribution in DaskDMatrix.
|
protected |
Create a dummy test sample to infer output shape for prediction.
|
protected |
Return dataframe for prediction when applicable.
|
protected |
Temporarily set the client for sklearn model.
|
protected |
Start Rabit tracker, recurse to try different addresses.
|
protected |
Simple wrapper around testing None.
_T xgboost.dask.dconcat | ( | Sequence[_T] | value | ) |
Concatenate sequence of partitions.
Any xgboost.dask.inplace_predict | ( | Optional["distributed.Client"] | client, |
Union[TrainReturnT, Booster, "distributed.Future"] | model, | ||
_DataT | data, | ||
Tuple[int, int] | iteration_range = (0, 0) , |
||
str | predict_type = "value" , |
||
float | missing = numpy.nan , |
||
bool | validate_features = True , |
||
Optional[_DaskCollection] | base_margin = None , |
||
bool | strict_shape = False |
||
) |
Inplace prediction. See doc in :py:meth:`xgboost.Booster.inplace_predict` for details. .. versionadded:: 1.1.0 Parameters ---------- client: Specify the dask client used for training. Use default client returned from dask if it's set to None. model: See :py:func:`xgboost.dask.predict` for details. data : dask collection. iteration_range: See :py:meth:`xgboost.Booster.predict` for details. predict_type: See :py:meth:`xgboost.Booster.inplace_predict` for details. missing: Value in the input data which needs to be present as a missing value. If None, defaults to np.nan. base_margin: See :py:obj:`xgboost.DMatrix` for details. .. versionadded:: 1.4.0 strict_shape: See :py:meth:`xgboost.Booster.predict` for details. .. versionadded:: 1.4.0 Returns ------- prediction : When input data is ``dask.array.Array``, the return value is an array, when input data is ``dask.dataframe.DataFrame``, return value can be ``dask.dataframe.Series``, ``dask.dataframe.DataFrame``, depending on the output shape.
List[_MapRetT] xgboost.dask.map_worker_partitions | ( | Optional["distributed.Client"] | client, |
Callable[..., _MapRetT] | func, | ||
*Any | refs, | ||
Sequence[str] | workers | ||
) |
Map a function onto partitions of each worker.
Any xgboost.dask.predict | ( | Optional["distributed.Client"] | client, |
Union[TrainReturnT, Booster, "distributed.Future"] | model, | ||
Union[DaskDMatrix, _DataT] | data, | ||
bool | output_margin = False , |
||
float | missing = numpy.nan , |
||
bool | pred_leaf = False , |
||
bool | pred_contribs = False , |
||
bool | approx_contribs = False , |
||
bool | pred_interactions = False , |
||
bool | validate_features = True , |
||
Tuple[int, int] | iteration_range = (0, 0) , |
||
bool | strict_shape = False |
||
) |
Run prediction with a trained booster. .. note:: Using ``inplace_predict`` might be faster when some features are not needed. See :py:meth:`xgboost.Booster.predict` for details on various parameters. When output has more than 2 dimensions (shap value, leaf with strict_shape), input should be ``da.Array`` or ``DaskDMatrix``. .. versionadded:: 1.0.0 Parameters ---------- client: Specify the dask client used for training. Use default client returned from dask if it's set to None. model: The trained model. It can be a distributed.Future so user can pre-scatter it onto all workers. data: Input data used for prediction. When input is a dataframe object, prediction output is a series. missing: Used when input data is not DaskDMatrix. Specify the value considered as missing. Returns ------- prediction: dask.array.Array/dask.dataframe.Series When input data is ``dask.array.Array`` or ``DaskDMatrix``, the return value is an array, when input data is ``dask.dataframe.DataFrame``, return value can be ``dask.dataframe.Series``, ``dask.dataframe.DataFrame``, depending on the output shape.
Any xgboost.dask.train | ( | "distributed.Client" | client, |
Dict[str, Any] | params, | ||
DaskDMatrix | dtrain, | ||
int | num_boost_round = 10 , |
||
*Optional[Sequence[Tuple[DaskDMatrix, str]]] | evals = None , |
||
Optional[Objective] | obj = None , |
||
Optional[Metric] | feval = None , |
||
Optional[int] | early_stopping_rounds = None , |
||
Optional[Booster] | xgb_model = None , |
||
Union[int, bool] | verbose_eval = True , |
||
Optional[Sequence[TrainingCallback]] | callbacks = None , |
||
Optional[Metric] | custom_metric = None |
||
) |
Train XGBoost model. .. versionadded:: 1.0.0 .. note:: Other parameters are the same as :py:func:`xgboost.train` except for `evals_result`, which is returned as part of function return value instead of argument. Parameters ---------- client : Specify the dask client used for training. Use default client returned from dask if it's set to None. Returns ------- results: dict A dictionary containing trained booster and evaluation history. `history` field is the same as `eval_result` from `xgboost.train`. .. code-block:: python {'booster': xgboost.Booster, 'history': {'train': {'logloss': ['0.48253', '0.35953']}, 'eval': {'logloss': ['0.480385', '0.357756']}}}
xgboost.dask.TrainReturnT |