Medial Code Documentation
|
Data Structures | |
class | PartIter |
Functions | |
np.ndarray | stack_series (pd.Series series) |
Optional[np.ndarray] | concat_or_none (Optional[Sequence[np.ndarray]] seq) |
None | cache_partitions (Iterator[pd.DataFrame] iterator, Callable[[pd.DataFrame, str, bool], None] append) |
csr_matrix | _read_csr_matrix_from_unwrapped_spark_vec (pd.DataFrame part) |
DMatrix | make_qdm (Dict[str, List[np.ndarray]] data, Optional[int] dev_ordinal, Dict[str, Any] meta, Optional[DMatrix] ref, Dict[str, Any] params) |
Tuple[DMatrix, Optional[DMatrix]] | create_dmatrix_from_partitions (Iterator[pd.DataFrame] iterator, Optional[Sequence[str]] feature_cols, Optional[int] dev_ordinal, bool use_qdm, Dict[str, Any] kwargs, bool enable_sparse_data_optim, bool has_validation_col) |
np.ndarray | pred_contribs (XGBModel model, ArrayLike data, Optional[ArrayLike] base_margin=None, bool strict_shape=False) |
Variables | |
Alias = namedtuple("Alias", ("data", "label", "weight", "margin", "valid", "qid")) | |
alias = Alias("values", "label", "weight", "baseMargin", "validationIndicator", "qid") | |
Utilities for processing spark partitions.
None xgboost.spark.data.cache_partitions | ( | Iterator[pd.DataFrame] | iterator, |
Callable[[pd.DataFrame, str, bool], None] | append | ||
) |
Extract partitions from pyspark iterator. `append` is a user defined function for accepting new partition.
Optional[np.ndarray] xgboost.spark.data.concat_or_none | ( | Optional[Sequence[np.ndarray]] | seq | ) |
Concatenate the data if it's not None.
Tuple[DMatrix, Optional[DMatrix]] xgboost.spark.data.create_dmatrix_from_partitions | ( | Iterator[pd.DataFrame] | iterator, |
Optional[Sequence[str]] | feature_cols, | ||
Optional[int] | dev_ordinal, | ||
bool | use_qdm, | ||
Dict[str, Any] | kwargs, | ||
bool | enable_sparse_data_optim, | ||
bool | has_validation_col | ||
) |
Create DMatrix from spark data partitions. Parameters ---------- iterator : Pyspark partition iterator. feature_cols: A sequence of feature names, used only when rapids plugin is enabled. dev_ordinal: Device ordinal, used when GPU is enabled. use_qdm : Whether QuantileDMatrix should be used instead of DMatrix. kwargs : Metainfo for DMatrix. enable_sparse_data_optim : Whether sparse data should be unwrapped has_validation: Whether there's validation data. Returns ------- Training DMatrix and an optional validation DMatrix.
DMatrix xgboost.spark.data.make_qdm | ( | Dict[str, List[np.ndarray]] | data, |
Optional[int] | dev_ordinal, | ||
Dict[str, Any] | meta, | ||
Optional[DMatrix] | ref, | ||
Dict[str, Any] | params | ||
) |
Handle empty partition for QuantileDMatrix.
np.ndarray xgboost.spark.data.pred_contribs | ( | XGBModel | model, |
ArrayLike | data, | ||
Optional[ArrayLike] | base_margin = None , |
||
bool | strict_shape = False |
||
) |
Predict contributions with data with the full model.
np.ndarray xgboost.spark.data.stack_series | ( | pd.Series | series | ) |
Stack a series of arrays.