Medial Code Documentation
Loading...
Searching...
No Matches
Public Member Functions | Data Fields | Protected Member Functions | Protected Attributes
lightgbm.basic.Dataset Class Reference
Inheritance diagram for lightgbm.basic.Dataset:

Public Member Functions

 __init__ (self, data, label=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)
 
 __del__ (self)
 
 construct (self)
 
 create_valid (self, data, label=None, weight=None, group=None, init_score=None, silent=False, params=None)
 
 subset (self, used_indices, params=None)
 
 save_binary (self, filename)
 
 set_field (self, field_name, data)
 
 get_field (self, field_name)
 
 set_categorical_feature (self, categorical_feature)
 
 set_reference (self, reference)
 
 set_feature_name (self, feature_name)
 
 set_label (self, label)
 
 set_weight (self, weight)
 
 set_init_score (self, init_score)
 
 set_group (self, group)
 
 get_label (self)
 
 get_weight (self)
 
 get_init_score (self)
 
 get_data (self)
 
 get_group (self)
 
 num_data (self)
 
 num_feature (self)
 
 get_ref_chain (self, ref_limit=100)
 

Data Fields

 handle
 
 data
 
 label
 
 reference
 
 weight
 
 group
 
 init_score
 
 silent
 
 feature_name
 
 categorical_feature
 
 params
 
 free_raw_data
 
 used_indices
 
 need_slice
 
 pandas_categorical
 
 params_back_up
 
 data_has_header
 
 predictor
 

Protected Member Functions

 _free_handle (self)
 
 _lazy_init (self, data, label=None, reference=None, weight=None, group=None, init_score=None, predictor=None, silent=False, feature_name='auto', categorical_feature='auto', params=None)
 
 _update_params (self, params)
 
 _reverse_update_params (self)
 
 _set_predictor (self, predictor)
 

Protected Attributes

 _predictor
 

Detailed Description

Dataset in LightGBM.

Constructor & Destructor Documentation

◆ __init__()

lightgbm.basic.Dataset.__init__ (   self,
  data,
  label = None,
  reference = None,
  weight = None,
  group = None,
  init_score = None,
  silent = False,
  feature_name = 'auto',
  categorical_feature = 'auto',
  params = None,
  free_raw_data = True 
)
Initialize Dataset.

Parameters
----------
data : string, numpy array, pandas DataFrame, H2O DataTable, scipy.sparse or list of numpy arrays
    Data source of Dataset.
    If string, it represents the path to txt file.
label : list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)
    Label of the data.
reference : Dataset or None, optional (default=None)
    If this is Dataset for validation, training data should be used as reference.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Weight for each instance.
group : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Group/query size for Dataset.
init_score : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Init score for Dataset.
silent : bool, optional (default=False)
    Whether to print messages during construction.
feature_name : list of strings or 'auto', optional (default="auto")
    Feature names.
    If 'auto' and data is pandas DataFrame, data columns names are used.
categorical_feature : list of strings or int, or 'auto', optional (default="auto")
    Categorical features.
    If list of int, interpreted as indices.
    If list of strings, interpreted as feature names (need to specify ``feature_name`` as well).
    If 'auto' and data is pandas DataFrame, pandas categorical columns are used.
    All values in categorical features should be less than int32 max value (2147483647).
    Large values could be memory consuming. Consider using consecutive integers starting from zero.
    All negative values in categorical features will be treated as missing values.
params : dict or None, optional (default=None)
    Other parameters for Dataset.
free_raw_data : bool, optional (default=True)
    If True, raw data is freed after constructing inner Dataset.

Member Function Documentation

◆ _set_predictor()

lightgbm.basic.Dataset._set_predictor (   self,
  predictor 
)
protected
Set predictor for continued training.

It is not recommended for user to call this function.
Please use init_model argument in engine.train() or engine.cv() instead.

◆ construct()

lightgbm.basic.Dataset.construct (   self)
Lazy init.

Returns
-------
self : Dataset
    Constructed Dataset object.

◆ create_valid()

lightgbm.basic.Dataset.create_valid (   self,
  data,
  label = None,
  weight = None,
  group = None,
  init_score = None,
  silent = False,
  params = None 
)
Create validation data align with current Dataset.

Parameters
----------
data : string, numpy array, pandas DataFrame, H2O DataTable, scipy.sparse or list of numpy arrays
    Data source of Dataset.
    If string, it represents the path to txt file.
label : list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)
    Label of the data.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Weight for each instance.
group : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Group/query size for Dataset.
init_score : list, numpy 1-D array, pandas Series or None, optional (default=None)
    Init score for Dataset.
silent : bool, optional (default=False)
    Whether to print messages during construction.
params : dict or None, optional (default=None)
    Other parameters for validation Dataset.

Returns
-------
valid : Dataset
    Validation Dataset with reference to self.

◆ get_data()

lightgbm.basic.Dataset.get_data (   self)
Get the raw data of the Dataset.

Returns
-------
data : string, numpy array, pandas DataFrame, H2O DataTable, scipy.sparse, list of numpy arrays or None
    Raw data used in the Dataset construction.

◆ get_field()

lightgbm.basic.Dataset.get_field (   self,
  field_name 
)
Get property from the Dataset.

Parameters
----------
field_name : string
    The field name of the information.

Returns
-------
info : numpy array
    A numpy array with information from the Dataset.

◆ get_group()

lightgbm.basic.Dataset.get_group (   self)
Get the group of the Dataset.

Returns
-------
group : numpy array or None
    Group size of each group.

◆ get_init_score()

lightgbm.basic.Dataset.get_init_score (   self)
Get the initial score of the Dataset.

Returns
-------
init_score : numpy array or None
    Init score of Booster.

◆ get_label()

lightgbm.basic.Dataset.get_label (   self)
Get the label of the Dataset.

Returns
-------
label : numpy array or None
    The label information from the Dataset.

◆ get_ref_chain()

lightgbm.basic.Dataset.get_ref_chain (   self,
  ref_limit = 100 
)
Get a chain of Dataset objects.

Starts with r, then goes to r.reference (if exists),
then to r.reference.reference, etc.
until we hit ``ref_limit`` or a reference loop.

Parameters
----------
ref_limit : int, optional (default=100)
    The limit number of references.

Returns
-------
ref_chain : set of Dataset
    Chain of references of the Datasets.

◆ get_weight()

lightgbm.basic.Dataset.get_weight (   self)
Get the weight of the Dataset.

Returns
-------
weight : numpy array or None
    Weight for each data point from the Dataset.

◆ num_data()

lightgbm.basic.Dataset.num_data (   self)
Get the number of rows in the Dataset.

Returns
-------
number_of_rows : int
    The number of rows in the Dataset.

◆ num_feature()

lightgbm.basic.Dataset.num_feature (   self)
Get the number of columns (features) in the Dataset.

Returns
-------
number_of_columns : int
    The number of columns (features) in the Dataset.

◆ save_binary()

lightgbm.basic.Dataset.save_binary (   self,
  filename 
)
Save Dataset to a binary file.

Parameters
----------
filename : string
    Name of the output file.

Returns
-------
self : Dataset
    Returns self.

◆ set_categorical_feature()

lightgbm.basic.Dataset.set_categorical_feature (   self,
  categorical_feature 
)
Set categorical features.

Parameters
----------
categorical_feature : list of int or strings
    Names or indices of categorical features.

Returns
-------
self : Dataset
    Dataset with set categorical features.

◆ set_feature_name()

lightgbm.basic.Dataset.set_feature_name (   self,
  feature_name 
)
Set feature name.

Parameters
----------
feature_name : list of strings
    Feature names.

Returns
-------
self : Dataset
    Dataset with set feature name.

◆ set_field()

lightgbm.basic.Dataset.set_field (   self,
  field_name,
  data 
)
Set property into the Dataset.

Parameters
----------
field_name : string
    The field name of the information.
data : list, numpy 1-D array, pandas Series or None
    The array of data to be set.

Returns
-------
self : Dataset
    Dataset with set property.

◆ set_group()

lightgbm.basic.Dataset.set_group (   self,
  group 
)
Set group size of Dataset (used for ranking).

Parameters
----------
group : list, numpy 1-D array, pandas Series or None
    Group size of each group.

Returns
-------
self : Dataset
    Dataset with set group.

◆ set_init_score()

lightgbm.basic.Dataset.set_init_score (   self,
  init_score 
)
Set init score of Booster to start from.

Parameters
----------
init_score : list, numpy 1-D array, pandas Series or None
    Init score for Booster.

Returns
-------
self : Dataset
    Dataset with set init score.

◆ set_label()

lightgbm.basic.Dataset.set_label (   self,
  label 
)
Set label of Dataset.

Parameters
----------
label : list, numpy 1-D array, pandas Series / one-column DataFrame or None
    The label information to be set into Dataset.

Returns
-------
self : Dataset
    Dataset with set label.

◆ set_reference()

lightgbm.basic.Dataset.set_reference (   self,
  reference 
)
Set reference Dataset.

Parameters
----------
reference : Dataset
    Reference that is used as a template to construct the current Dataset.

Returns
-------
self : Dataset
    Dataset with set reference.

◆ set_weight()

lightgbm.basic.Dataset.set_weight (   self,
  weight 
)
Set weight of each instance.

Parameters
----------
weight : list, numpy 1-D array, pandas Series or None
    Weight to be set for each data point.

Returns
-------
self : Dataset
    Dataset with set weight.

◆ subset()

lightgbm.basic.Dataset.subset (   self,
  used_indices,
  params = None 
)
Get subset of current Dataset.

Parameters
----------
used_indices : list of int
    Indices used to create the subset.
params : dict or None, optional (default=None)
    These parameters will be passed to Dataset constructor.

Returns
-------
subset : Dataset
    Subset of the current Dataset.

The documentation for this class was generated from the following file: