Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Data Fields | Static Public Attributes
xgboost::MetaInfo Class Reference

Meta information about dataset, always sit in memory. More...

#include <data.h>

Public Member Functions

 MetaInfo ()=default
 default constructor
 
 MetaInfo (MetaInfo &&that)=default
 
MetaInfooperator= (MetaInfo &&that)=default
 
MetaInfooperator= (MetaInfo const &that)=delete
 
void Validate (int32_t device) const
 Validate all metainfo.
 
MetaInfo Slice (common::Span< int32_t const > ridxs) const
 
MetaInfo Copy () const
 
bst_float GetWeight (size_t i) const
 Get weight of each instances.
 
const std::vector< size_t > & LabelAbsSort (Context const *ctx) const
 get sorted indexes (argsort) of labels by absolute value (used by cox loss)
 
void Clear ()
 clear all the information
 
void LoadBinary (dmlc::Stream *fi)
 Load the Meta info from binary stream.
 
void SaveBinary (dmlc::Stream *fo) const
 Save the Meta info to binary stream.
 
void SetInfo (Context const &ctx, const char *key, const void *dptr, DataType dtype, size_t num)
 Set information in the meta info.
 
void SetInfo (Context const &ctx, StringView key, StringView interface_str)
 Set information in the meta info with array interface.
 
void GetInfo (char const *key, bst_ulong *out_len, DataType dtype, const void **out_dptr) const
 
void SetFeatureInfo (const char *key, const char **info, const bst_ulong size)
 
void GetFeatureInfo (const char *field, std::vector< std::string > *out_str_vecs) const
 
void Extend (MetaInfo const &that, bool accumulate_rows, bool check_column)
 
void SynchronizeNumberOfColumns ()
 Synchronize the number of columns across all workers.
 
bool IsRowSplit () const
 Whether the data is split row-wise.
 
bool IsColumnSplit () const
 Whether the data is split column-wise.
 
bool IsRanking () const
 Whether this is a learning to rank data.
 
bool IsVerticalFederated () const
 A convenient method to check if we are doing vertical federated learning, which requires some special processing.
 
bool ShouldHaveLabels () const
 A convenient method to check if the MetaInfo should contain labels.
 

Data Fields

uint64_t num_row_ {0}
 number of rows in the data
 
uint64_t num_col_ {0}
 number of columns in the data
 
uint64_t num_nonzero_ {0}
 number of nonzero entries in the data
 
linalg::Tensor< float, 2 > labels
 label of each instance
 
DataSplitMode data_split_mode {DataSplitMode::kRow}
 data split mode
 
std::vector< bst_group_tgroup_ptr_
 the index of begin and end of a group needed when the learning task is ranking.
 
HostDeviceVector< bst_floatweights_
 weights of each instance, optional
 
linalg::Tensor< float, 2 > base_margin_
 initialized margins, if specified, xgboost will start from this init margin can be used to specify initial prediction to boost from.
 
HostDeviceVector< bst_floatlabels_lower_bound_
 lower bound of the label, to be used for survival analysis (censored regression)
 
HostDeviceVector< bst_floatlabels_upper_bound_
 upper bound of the label, to be used for survival analysis (censored regression)
 
std::vector< std::string > feature_type_names
 Name of type for each feature provided by users. Eg. "int"/"float"/"i"/"q".
 
std::vector< std::string > feature_names
 Name for each feature.
 
HostDeviceVector< FeatureType > feature_types
 
HostDeviceVector< float > feature_weights
 

Static Public Attributes

static constexpr uint64_t kNumField = 12
 number of data fields in MetaInfo
 

Detailed Description

Meta information about dataset, always sit in memory.

Member Function Documentation

◆ GetWeight()

bst_float xgboost::MetaInfo::GetWeight ( size_t  i) const
inline

Get weight of each instances.

Parameters
iInstance index.
Returns
The weight.

◆ LoadBinary()

void xgboost::MetaInfo::LoadBinary ( dmlc::Stream fi)

Load the Meta info from binary stream.

Parameters
fiThe input stream

◆ SaveBinary()

void xgboost::MetaInfo::SaveBinary ( dmlc::Stream fo) const

Save the Meta info to binary stream.

Parameters
foThe output stream.

◆ SetInfo() [1/2]

void xgboost::MetaInfo::SetInfo ( Context const &  ctx,
const char *  key,
const void *  dptr,
DataType  dtype,
size_t  num 
)

Set information in the meta info.

Parameters
keyThe key of the information.
dptrThe data pointer of the source array.
dtypeThe type of the source data.
numNumber of elements in the source array.

◆ SetInfo() [2/2]

void xgboost::MetaInfo::SetInfo ( Context const &  ctx,
StringView  key,
StringView  interface_str 
)

Set information in the meta info with array interface.

Parameters
keyThe key of the information.
interface_strString representation of json format array interface.

◆ ShouldHaveLabels()

bool xgboost::MetaInfo::ShouldHaveLabels ( ) const

A convenient method to check if the MetaInfo should contain labels.

Normally we assume labels are available everywhere. The only exception is in vertical federated learning where labels are only available on worker 0.

◆ SynchronizeNumberOfColumns()

void xgboost::MetaInfo::SynchronizeNumberOfColumns ( )

Synchronize the number of columns across all workers.

Normally we just need to find the maximum number of columns across all workers, but in vertical federated learning, since each worker loads its own list of columns, we need to sum them.


The documentation for this class was generated from the following files: