Functions
	log_loss (preds, labels)
	Set up a couple of utilities for our experiments.

	experiment (objective, label_type, data)

Variables
int	N = 1000
	Simulate some binary data with a single categorical and single continuous predictor.

	X

list	CATEGORICAL_EFFECTS = [-1, -1, -2, -2, 2]

	LINEAR_TERM

	TRUE_PROB = expit(LINEAR_TERM)

	Y = np.random.binomial(1, TRUE_PROB, size=N)

dict	DATA

int	K = 10

list	A

list	B

Detailed Description

Comparison of `binary` and `xentropy` objectives.

BLUF: The `xentropy` objective does logistic regression and generalizes
to the case where labels are probabilistic (i.e. numbers between 0 and 1).

Details: Both `binary` and `xentropy` minimize the log loss and use
`boost_from_average = TRUE` by default. Possibly the only difference
between them with default settings is that `binary` may achieve a slight
speed improvement by assuming that the labels are binary instead of
probabilistic.

Function Documentation

◆ experiment()

logistic_regression.experiment	(	objective,
		label_type,
		data
	)

Measure performance of an objective.

Parameters
----------
objective : string 'binary' or 'xentropy'
    Objective function.
label_type : string 'binary' or 'probability'
    Type of the label.
data : dict
    Data for training.

Returns
-------
result : dict
    Experiment summary stats.

◆ log_loss()

logistic_regression.log_loss	(	preds,
		labels
	)

Set up a couple of utilities for our experiments.

Logarithmic loss with non-necessarily-binary labels.

Variable Documentation

◆ A

list logistic_regression.A

Initial value:

1= [experiment('binary', label_type='binary', data=DATA)['time']

2 for k in range(K)]

◆ B

list logistic_regression.B

Initial value:

1= [experiment('xentropy', label_type='binary', data=DATA)['time']

2 for k in range(K)]

◆ DATA

dict logistic_regression.DATA

Initial value:

=  {
    'X': X,
    'probability_labels': TRUE_PROB,
    'binary_labels': Y,
    'lgb_with_binary_labels': lgb.Dataset(X, Y),
    'lgb_with_probability_labels': lgb.Dataset(X, TRUE_PROB),
}

◆ LINEAR_TERM

logistic_regression.LINEAR_TERM

Initial value:

=  np.array([
    -0.5 + 0.01 * X['continuous'][k]
    + CATEGORICAL_EFFECTS[X['categorical'][k]] for k in range(X.shape[0])
]) + np.random.normal(0, 1, X.shape[0])

◆ X

logistic_regression.X

Initial value:

=  pd.DataFrame({
    'continuous': range(N),
    'categorical': np.repeat([0, 1, 2, 3, 4], N / 5)
})

Functions

Variables

Detailed Description

Function Documentation

◆ experiment()

◆ log_loss()

Variable Documentation

◆ A

◆ B

◆ DATA

◆ LINEAR_TERM

◆ X