This class is used to store some meta(non-feature) data for training data, e.g. labels, weights, initial scores, qurey level informations.
More...
|
|
| Metadata () |
| | Null costructor.
|
| |
| void | Init (const char *data_filename, const char *initscore_file) |
| | Initialization will load qurey level informations, since it is need for sampling data.
|
| |
| void | Init (const Metadata &metadata, const data_size_t *used_indices, data_size_t num_used_indices) |
| | init as subset
|
| |
| void | LoadFromMemory (const void *memory) |
| | Initial with binary memory.
|
| |
|
| ~Metadata () |
| | Destructor.
|
| |
| void | Init (data_size_t num_data, int weight_idx, int query_idx) |
| | Initial work, will allocate space for label, weight(if exists) and query(if exists)
|
| |
| void | PartitionLabel (const std::vector< data_size_t > &used_indices) |
| | Partition label by used indices.
|
| |
| void | CheckOrPartition (data_size_t num_all_data, const std::vector< data_size_t > &used_data_indices) |
| | Partition meta data according to local used indices if need.
|
| |
|
void | SetLabel (const label_t *label, data_size_t len) |
| |
|
void | SetWeights (const label_t *weights, data_size_t len) |
| |
|
void | SetQuery (const data_size_t *query, data_size_t len) |
| |
| void | SetInitScore (const double *init_score, data_size_t len) |
| | Set initial scores.
|
| |
| void | SaveBinaryToFile (const VirtualFileWriter *writer) const |
| | Save binary data to file.
|
| |
|
size_t | SizesInByte () const |
| | Get sizes in byte of this object.
|
| |
| const label_t * | label () const |
| | Get pointer of label.
|
| |
| void | SetLabelAt (data_size_t idx, label_t value) |
| | Set label for one record.
|
| |
| void | SetWeightAt (data_size_t idx, label_t value) |
| | Set Weight for one record.
|
| |
| void | SetQueryAt (data_size_t idx, data_size_t value) |
| | Set Query Id for one record.
|
| |
| const label_t * | weights () const |
| | Get weights, if not exists, will return nullptr.
|
| |
| const data_size_t * | query_boundaries () const |
| | Get data boundaries on queries, if not exists, will return nullptr we assume data will order by query, the interval of [query_boundaris[i], query_boundaris[i+1]) is the data indices for query i.
|
| |
| data_size_t | num_queries () const |
| | Get Number of queries.
|
| |
| const label_t * | query_weights () const |
| | Get weights for queries, if not exists, will return nullptr.
|
| |
| const double * | init_score () const |
| | Get initial scores, if not exists, will return nullptr.
|
| |
|
int64_t | num_init_score () const |
| | Get size of initial scores.
|
| |
|
Metadata & | operator= (const Metadata &)=delete |
| | Disable copy.
|
| |
|
| Metadata (const Metadata &)=delete |
| | Disable copy.
|
| |
This class is used to store some meta(non-feature) data for training data, e.g. labels, weights, initial scores, qurey level informations.
Some details:
- Label, used for traning.
- Weights, weighs of records, optional
- Query Boundaries, necessary for lambdarank. The documents of i-th query is in [ query_boundarise[i], query_boundarise[i+1] )
- Query Weights, auto calculate by weights and query_boundarise(if both of them are existed) the weight for i-th query is sum(query_boundarise[i] , .., query_boundarise[i+1]) / (query_boundarise[i + 1] - query_boundarise[i+1])
- Initial score. optional. if exsitng, the model will boost from this score, otherwise will start from 0.