Medial Code Documentation
|
Data Structures | |
class | ClickFold |
class | PBM |
class | RelDataCV |
Functions | |
Generator[Tuple[np.ndarray, np.ndarray], None, None] | np_dtypes (int n_samples, int n_features) |
Generator | pd_dtypes () |
Generator | pd_arrow_dtypes () |
None | check_inf (RNG rng) |
Tuple[np.ndarray, np.ndarray] | get_california_housing () |
Tuple[np.ndarray, np.ndarray] | get_digits () |
Tuple[np.ndarray, np.ndarray] | get_cancer () |
Tuple[np.ndarray, np.ndarray] | get_sparse () |
Tuple[np.ndarray, np.ndarray] | get_ames_housing () |
Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray,] | get_mq2008 (str dpath) |
Tuple[npt.NDArray, npt.NDArray, npt.NDArray] | rlencode (npt.NDArray[np.int32] x) |
npt.NDArray[np.float32] | init_rank_score (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, float sample_rate=0.1) |
ClickFold | simulate_one_fold (Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]] fold, npt.NDArray[np.float32] scores_fold) |
Tuple[ClickFold, Optional[ClickFold]] | simulate_clicks (RelDataCV cv_data) |
Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32],] | sort_ltr_samples (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, npt.NDArray[np.int32] clicks, npt.NDArray[np.int64] pos) |
Variables | |
joblib = pytest.importorskip("joblib") | |
memory = joblib.Memory("./cachedir", verbose=0) | |
RelData = Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]] | |
Utilities for data generation.
None xgboost.testing.data.check_inf | ( | RNG | rng | ) |
Validate there's no inf in X.
Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_ames_housing | ( | ) |
Number of samples: 1460 Number of features: 20 Number of categorical features: 10 Number of numerical features: 10
Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_california_housing | ( | ) |
Fetch the California housing dataset from sklearn.
Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_cancer | ( | ) |
Fetch the breast cancer dataset from sklearn.
Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_digits | ( | ) |
Fetch the digits dataset from sklearn.
Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, ] xgboost.testing.data.get_mq2008 | ( | str | dpath | ) |
Fetch the mq2008 dataset.
Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_sparse | ( | ) |
Generate a sparse dataset.
npt.NDArray[np.float32] xgboost.testing.data.init_rank_score | ( | sparse.csr_matrix | X, |
npt.NDArray[np.int32] | y, | ||
npt.NDArray[np.int32] | qid, | ||
float | sample_rate = 0.1 |
||
) |
We use XGBoost to generate the initial score instead of SVMRank for simplicity. Sample rate is set to 0.1 by default so that we can test with small datasets.
Generator[Tuple[np.ndarray, np.ndarray], None, None] xgboost.testing.data.np_dtypes | ( | int | n_samples, |
int | n_features | ||
) |
Enumerate all supported dtypes from numpy.
Generator xgboost.testing.data.pd_arrow_dtypes | ( | ) |
Pandas DataFrame with pyarrow backed type.
Generator xgboost.testing.data.pd_dtypes | ( | ) |
Enumerate all supported pandas extension types.
Tuple[npt.NDArray, npt.NDArray, npt.NDArray] xgboost.testing.data.rlencode | ( | npt.NDArray[np.int32] | x | ) |
Run length encoding using numpy, modified from: https://gist.github.com/nvictus/66627b580c13068589957d6ab0919e66
Simulate click data using position biased model (PBM).
ClickFold xgboost.testing.data.simulate_one_fold | ( | Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]] | fold, |
npt.NDArray[np.float32] | scores_fold | ||
) |
Simulate clicks for one fold.
Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32], ] xgboost.testing.data.sort_ltr_samples | ( | sparse.csr_matrix | X, |
npt.NDArray[np.int32] | y, | ||
npt.NDArray[np.int32] | qid, | ||
npt.NDArray[np.int32] | clicks, | ||
npt.NDArray[np.int64] | pos | ||
) |
Sort data based on query index and position.