Medial Code Documentation
Loading...
Searching...
No Matches
Data Structures | Functions | Variables
xgboost.testing.data Namespace Reference

Data Structures

class  ClickFold
 
class  PBM
 
class  RelDataCV
 

Functions

Generator[Tuple[np.ndarray, np.ndarray], None, None] np_dtypes (int n_samples, int n_features)
 
Generator pd_dtypes ()
 
Generator pd_arrow_dtypes ()
 
None check_inf (RNG rng)
 
Tuple[np.ndarray, np.ndarray] get_california_housing ()
 
Tuple[np.ndarray, np.ndarray] get_digits ()
 
Tuple[np.ndarray, np.ndarray] get_cancer ()
 
Tuple[np.ndarray, np.ndarray] get_sparse ()
 
Tuple[np.ndarray, np.ndarray] get_ames_housing ()
 
Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray,] get_mq2008 (str dpath)
 
Tuple[npt.NDArray, npt.NDArray, npt.NDArray] rlencode (npt.NDArray[np.int32] x)
 
npt.NDArray[np.float32] init_rank_score (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, float sample_rate=0.1)
 
ClickFold simulate_one_fold (Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]] fold, npt.NDArray[np.float32] scores_fold)
 
Tuple[ClickFold, Optional[ClickFold]] simulate_clicks (RelDataCV cv_data)
 
Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32],] sort_ltr_samples (sparse.csr_matrix X, npt.NDArray[np.int32] y, npt.NDArray[np.int32] qid, npt.NDArray[np.int32] clicks, npt.NDArray[np.int64] pos)
 

Variables

 joblib = pytest.importorskip("joblib")
 
 memory = joblib.Memory("./cachedir", verbose=0)
 
 RelData = Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]]
 

Detailed Description

Utilities for data generation.

Function Documentation

◆ check_inf()

None xgboost.testing.data.check_inf ( RNG  rng)
Validate there's no inf in X.

◆ get_ames_housing()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_ames_housing ( )
Number of samples: 1460
Number of features: 20
Number of categorical features: 10
Number of numerical features: 10

◆ get_california_housing()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_california_housing ( )
Fetch the California housing dataset from sklearn.

◆ get_cancer()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_cancer ( )
Fetch the breast cancer dataset from sklearn.

◆ get_digits()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_digits ( )
Fetch the digits dataset from sklearn.

◆ get_mq2008()

Tuple[ sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, sparse.csr_matrix, np.ndarray, np.ndarray, ] xgboost.testing.data.get_mq2008 ( str  dpath)
Fetch the mq2008 dataset.

◆ get_sparse()

Tuple[np.ndarray, np.ndarray] xgboost.testing.data.get_sparse ( )
Generate a sparse dataset.

◆ init_rank_score()

npt.NDArray[np.float32] xgboost.testing.data.init_rank_score ( sparse.csr_matrix  X,
npt.NDArray[np.int32]  y,
npt.NDArray[np.int32]  qid,
float   sample_rate = 0.1 
)
We use XGBoost to generate the initial score instead of SVMRank for
simplicity. Sample rate is set to 0.1 by default so that we can test with small
datasets.

◆ np_dtypes()

Generator[Tuple[np.ndarray, np.ndarray], None, None] xgboost.testing.data.np_dtypes ( int  n_samples,
int   n_features 
)
Enumerate all supported dtypes from numpy.

◆ pd_arrow_dtypes()

Generator xgboost.testing.data.pd_arrow_dtypes ( )
Pandas DataFrame with pyarrow backed type.

◆ pd_dtypes()

Generator xgboost.testing.data.pd_dtypes ( )
Enumerate all supported pandas extension types.

◆ rlencode()

Tuple[npt.NDArray, npt.NDArray, npt.NDArray] xgboost.testing.data.rlencode ( npt.NDArray[np.int32]  x)
Run length encoding using numpy, modified from:
https://gist.github.com/nvictus/66627b580c13068589957d6ab0919e66

◆ simulate_clicks()

Tuple[ClickFold, Optional[ClickFold]] xgboost.testing.data.simulate_clicks ( RelDataCV  cv_data)
Simulate click data using position biased model (PBM).

◆ simulate_one_fold()

ClickFold xgboost.testing.data.simulate_one_fold ( Tuple[sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32]]  fold,
npt.NDArray[np.float32]  scores_fold 
)
Simulate clicks for one fold.

◆ sort_ltr_samples()

Tuple[ sparse.csr_matrix, npt.NDArray[np.int32], npt.NDArray[np.int32], npt.NDArray[np.int32], ] xgboost.testing.data.sort_ltr_samples ( sparse.csr_matrix  X,
npt.NDArray[np.int32]  y,
npt.NDArray[np.int32]  qid,
npt.NDArray[np.int32]  clicks,
npt.NDArray[np.int64]  pos 
)
Sort data based on query index and position.