Data Structures
class	ClickFold

class	PBM

class	RelDataCV

Functions
Generator[Tuple[np.ndarray, np.ndarray], None, None]	np_dtypes (int n_samples, int n_features)

Generator	pd_dtypes ()

Generator	pd_arrow_dtypes ()

None	check_inf (RNG rng)

Tuple[np.ndarray, np.ndarray]	get_california_housing ()

Tuple[np.ndarray, np.ndarray]	get_digits ()

Tuple[np.ndarray, np.ndarray]	get_cancer ()

Tuple[np.ndarray, np.ndarray]	get_sparse ()

Tuple[np.ndarray, np.ndarray]	get_ames_housing ()

Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray,]	get_mq2008 (str dpath)

Tuple[npt.NDArray, npt.NDArray, npt.NDArray]	rlencode (npt.NDArray[np.int32] x)

npt.NDArray[np.float32]	init_rank_score (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, float sample_rate=0.1)

ClickFold	simulate_one_fold (Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]] fold, npt.NDArray[np.float32] scores_fold)

Tuple[ClickFold, Optional[ClickFold]]	simulate_clicks (RelDataCV cv_data)

Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32],]	sort_ltr_samples (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, npt.NDArray[np.int32] clicks, npt.NDArray[np.int64] pos)

Variables
	joblib = pytest.importorskip("joblib")

	memory = joblib.Memory("./cachedir", verbose=0)

	RelData = Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]]

Detailed Description

Utilities for data generation.

Function Documentation

◆ check_inf()

None xgboost.testing.data.check_inf ( RNG rng )

Validate there's no inf in X.

◆ get_ames_housing()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_ames_housing ( )

Number of samples: 1460
Number of features: 20
Number of categorical features: 10
Number of numerical features: 10

◆ get_california_housing()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_california_housing ( )

Fetch the California housing dataset from sklearn.

◆ get_cancer()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_cancer ( )

Fetch the breast cancer dataset from sklearn.

◆ get_digits()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_digits ( )

Fetch the digits dataset from sklearn.

◆ get_mq2008()

Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, ] xgboost.testing.data.get_mq2008 ( str dpath )

Fetch the mq2008 dataset.

◆ get_sparse()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_sparse ( )

Generate a sparse dataset.

◆ init_rank_score()

npt.NDArray[np.float32] xgboost.testing.data.init_rank_score	(	sparse.csr_matrix	X,
		npt.NDArray[np.int32]	y,
		npt.NDArray[np.int32]	qid,
		float	sample_rate = `0.1`
	)

We use XGBoost to generate the initial score instead of SVMRank for
simplicity. Sample rate is set to 0.1 by default so that we can test with small
datasets.

◆ np_dtypes()

Generator[Tuple[np.ndarray, np.ndarray], None, None] xgboost.testing.data.np_dtypes	(	int	n_samples,
		int	n_features
	)

Enumerate all supported dtypes from numpy.

◆ pd_arrow_dtypes()

Generator xgboost.testing.data.pd_arrow_dtypes ( )

Pandas DataFrame with pyarrow backed type.

◆ pd_dtypes()

Generator xgboost.testing.data.pd_dtypes ( )

Enumerate all supported pandas extension types.

◆ rlencode()

Tuple[npt.NDArray, npt.NDArray, npt.NDArray] xgboost.testing.data.rlencode ( npt.NDArray[np.int32] x )

Run length encoding using numpy, modified from:
https://gist.github.com/nvictus/66627b580c13068589957d6ab0919e66

◆ simulate_clicks()

Tuple[ClickFold, Optional[ClickFold]] xgboost.testing.data.simulate_clicks ( RelDataCV cv_data )

Simulate click data using position biased model (PBM).

◆ simulate_one_fold()

ClickFold xgboost.testing.data.simulate_one_fold	(	Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]]	fold,
		npt.NDArray[np.float32]	scores_fold
	)

Simulate clicks for one fold.

◆ sort_ltr_samples()

Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32], ] xgboost.testing.data.sort_ltr_samples	(	sparse.csr_matrix	X,
		npt.NDArray[np.int32]	y,
		npt.NDArray[np.int32]	qid,
		npt.NDArray[np.int32]	clicks,
		npt.NDArray[np.int64]	pos
	)

Sort data based on query index and position.

Data Structures

Functions

Variables