Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions
LightGBM::Metadata Class Reference

This class is used to store some meta(non-feature) data for training data, e.g. labels, weights, initial scores, qurey level informations. More...

#include <dataset.h>

Public Member Functions

 Metadata ()
 Null costructor.
 
void Init (const char *data_filename, const char *initscore_file)
 Initialization will load qurey level informations, since it is need for sampling data.
 
void Init (const Metadata &metadata, const data_size_t *used_indices, data_size_t num_used_indices)
 init as subset
 
void LoadFromMemory (const void *memory)
 Initial with binary memory.
 
 ~Metadata ()
 Destructor.
 
void Init (data_size_t num_data, int weight_idx, int query_idx)
 Initial work, will allocate space for label, weight(if exists) and query(if exists)
 
void PartitionLabel (const std::vector< data_size_t > &used_indices)
 Partition label by used indices.
 
void CheckOrPartition (data_size_t num_all_data, const std::vector< data_size_t > &used_data_indices)
 Partition meta data according to local used indices if need.
 
void SetLabel (const label_t *label, data_size_t len)
 
void SetWeights (const label_t *weights, data_size_t len)
 
void SetQuery (const data_size_t *query, data_size_t len)
 
void SetInitScore (const double *init_score, data_size_t len)
 Set initial scores.
 
void SaveBinaryToFile (const VirtualFileWriter *writer) const
 Save binary data to file.
 
size_t SizesInByte () const
 Get sizes in byte of this object.
 
const label_tlabel () const
 Get pointer of label.
 
void SetLabelAt (data_size_t idx, label_t value)
 Set label for one record.
 
void SetWeightAt (data_size_t idx, label_t value)
 Set Weight for one record.
 
void SetQueryAt (data_size_t idx, data_size_t value)
 Set Query Id for one record.
 
const label_tweights () const
 Get weights, if not exists, will return nullptr.
 
const data_size_tquery_boundaries () const
 Get data boundaries on queries, if not exists, will return nullptr we assume data will order by query, the interval of [query_boundaris[i], query_boundaris[i+1]) is the data indices for query i.
 
data_size_t num_queries () const
 Get Number of queries.
 
const label_tquery_weights () const
 Get weights for queries, if not exists, will return nullptr.
 
const double * init_score () const
 Get initial scores, if not exists, will return nullptr.
 
int64_t num_init_score () const
 Get size of initial scores.
 
Metadataoperator= (const Metadata &)=delete
 Disable copy.
 
 Metadata (const Metadata &)=delete
 Disable copy.
 

Detailed Description

This class is used to store some meta(non-feature) data for training data, e.g. labels, weights, initial scores, qurey level informations.

Some details:

  1. Label, used for traning.
  2. Weights, weighs of records, optional
  3. Query Boundaries, necessary for lambdarank. The documents of i-th query is in [ query_boundarise[i], query_boundarise[i+1] )
  4. Query Weights, auto calculate by weights and query_boundarise(if both of them are existed) the weight for i-th query is sum(query_boundarise[i] , .., query_boundarise[i+1]) / (query_boundarise[i + 1] - query_boundarise[i+1])
  5. Initial score. optional. if exsitng, the model will boost from this score, otherwise will start from 0.

Member Function Documentation

◆ CheckOrPartition()

void LightGBM::Metadata::CheckOrPartition ( data_size_t  num_all_data,
const std::vector< data_size_t > &  used_data_indices 
)

Partition meta data according to local used indices if need.

Parameters
num_all_dataNumber of total training data, including other machines' data on parallel learning
used_data_indicesIndices of local used training data

◆ Init() [1/3]

void LightGBM::Metadata::Init ( const char *  data_filename,
const char *  initscore_file 
)

Initialization will load qurey level informations, since it is need for sampling data.

Parameters
data_filenameFilename of data
init_score_filenameFilename of initial score

◆ Init() [2/3]

void LightGBM::Metadata::Init ( const Metadata metadata,
const data_size_t used_indices,
data_size_t  num_used_indices 
)

init as subset

Parameters
metadataFilename of data
used_indices
num_used_indices

◆ Init() [3/3]

void LightGBM::Metadata::Init ( data_size_t  num_data,
int  weight_idx,
int  query_idx 
)

Initial work, will allocate space for label, weight(if exists) and query(if exists)

Parameters
num_dataNumber of training data
weight_idxIndex of weight column, < 0 means doesn't exists
query_idxIndex of query id column, < 0 means doesn't exists

◆ init_score()

const double * LightGBM::Metadata::init_score ( ) const
inline

Get initial scores, if not exists, will return nullptr.

Returns
Pointer of initial scores

◆ label()

const label_t * LightGBM::Metadata::label ( ) const
inline

Get pointer of label.

Returns
Pointer of label

◆ LoadFromMemory()

void LightGBM::Metadata::LoadFromMemory ( const void *  memory)

Initial with binary memory.

Parameters
memoryPointer to memory

◆ num_queries()

data_size_t LightGBM::Metadata::num_queries ( ) const
inline

Get Number of queries.

Returns
Number of queries

◆ PartitionLabel()

void LightGBM::Metadata::PartitionLabel ( const std::vector< data_size_t > &  used_indices)

Partition label by used indices.

Parameters
used_indicesIndice of local used

◆ query_boundaries()

const data_size_t * LightGBM::Metadata::query_boundaries ( ) const
inline

Get data boundaries on queries, if not exists, will return nullptr we assume data will order by query, the interval of [query_boundaris[i], query_boundaris[i+1]) is the data indices for query i.

Returns
Pointer of data boundaries on queries

◆ query_weights()

const label_t * LightGBM::Metadata::query_weights ( ) const
inline

Get weights for queries, if not exists, will return nullptr.

Returns
Pointer of weights for queries

◆ SaveBinaryToFile()

void LightGBM::Metadata::SaveBinaryToFile ( const VirtualFileWriter writer) const

Save binary data to file.

Parameters
fileFile want to write

◆ SetInitScore()

void LightGBM::Metadata::SetInitScore ( const double *  init_score,
data_size_t  len 
)

Set initial scores.

Parameters
init_scoreInitial scores, this class will manage memory for init_score.

◆ SetLabelAt()

void LightGBM::Metadata::SetLabelAt ( data_size_t  idx,
label_t  value 
)
inline

Set label for one record.

Parameters
idxIndex of this record
valueLabel value of this record

◆ SetQueryAt()

void LightGBM::Metadata::SetQueryAt ( data_size_t  idx,
data_size_t  value 
)
inline

Set Query Id for one record.

Parameters
idxIndex of this record
valueQuery Id value of this record

◆ SetWeightAt()

void LightGBM::Metadata::SetWeightAt ( data_size_t  idx,
label_t  value 
)
inline

Set Weight for one record.

Parameters
idxIndex of this record
valueWeight value of this record

◆ weights()

const label_t * LightGBM::Metadata::weights ( ) const
inline

Get weights, if not exists, will return nullptr.

Returns
Pointer of weights

The documentation for this class was generated from the following files: