|
Medial Code Documentation
|
Data Structures | |
| class | PartIter |
Functions | |
| np.ndarray | stack_series (pd.Series series) |
| Optional[np.ndarray] | concat_or_none (Optional[Sequence[np.ndarray]] seq) |
| None | cache_partitions (Iterator[pd.DataFrame] iterator, Callable[[pd.DataFrame, str, bool], None] append) |
| csr_matrix | _read_csr_matrix_from_unwrapped_spark_vec (pd.DataFrame part) |
| DMatrix | make_qdm (Dict[str, List[np.ndarray]] data, Optional[int] dev_ordinal, Dict[str, Any] meta, Optional[DMatrix] ref, Dict[str, Any] params) |
| Tuple[DMatrix, Optional[DMatrix]] | create_dmatrix_from_partitions (Iterator[pd.DataFrame] iterator, Optional[Sequence[str]] feature_cols, Optional[int] dev_ordinal, bool use_qdm, Dict[str, Any] kwargs, bool enable_sparse_data_optim, bool has_validation_col) |
| np.ndarray | pred_contribs (XGBModel model, ArrayLike data, Optional[ArrayLike] base_margin=None, bool strict_shape=False) |
Variables | |
| Alias = namedtuple("Alias", ("data", "label", "weight", "margin", "valid", "qid")) | |
| alias = Alias("values", "label", "weight", "baseMargin", "validationIndicator", "qid") | |
Utilities for processing spark partitions.
| None xgboost.spark.data.cache_partitions | ( | Iterator[pd.DataFrame] | iterator, |
| Callable[[pd.DataFrame, str, bool], None] | append | ||
| ) |
Extract partitions from pyspark iterator. `append` is a user defined function for accepting new partition.
| Optional[np.ndarray] xgboost.spark.data.concat_or_none | ( | Optional[Sequence[np.ndarray]] | seq | ) |
Concatenate the data if it's not None.
| Tuple[DMatrix, Optional[DMatrix]] xgboost.spark.data.create_dmatrix_from_partitions | ( | Iterator[pd.DataFrame] | iterator, |
| Optional[Sequence[str]] | feature_cols, | ||
| Optional[int] | dev_ordinal, | ||
| bool | use_qdm, | ||
| Dict[str, Any] | kwargs, | ||
| bool | enable_sparse_data_optim, | ||
| bool | has_validation_col | ||
| ) |
Create DMatrix from spark data partitions.
Parameters
----------
iterator :
Pyspark partition iterator.
feature_cols:
A sequence of feature names, used only when rapids plugin is enabled.
dev_ordinal:
Device ordinal, used when GPU is enabled.
use_qdm :
Whether QuantileDMatrix should be used instead of DMatrix.
kwargs :
Metainfo for DMatrix.
enable_sparse_data_optim :
Whether sparse data should be unwrapped
has_validation:
Whether there's validation data.
Returns
-------
Training DMatrix and an optional validation DMatrix.
| DMatrix xgboost.spark.data.make_qdm | ( | Dict[str, List[np.ndarray]] | data, |
| Optional[int] | dev_ordinal, | ||
| Dict[str, Any] | meta, | ||
| Optional[DMatrix] | ref, | ||
| Dict[str, Any] | params | ||
| ) |
Handle empty partition for QuantileDMatrix.
| np.ndarray xgboost.spark.data.pred_contribs | ( | XGBModel | model, |
| ArrayLike | data, | ||
| Optional[ArrayLike] | base_margin = None, |
||
| bool | strict_shape = False |
||
| ) |
Predict contributions with data with the full model.
| np.ndarray xgboost.spark.data.stack_series | ( | pd.Series | series | ) |
Stack a series of arrays.