Querying posterior distributions¶
The third way to use the library is to construct and query posterior distributions.
We construct the posterior distribution by calling the corresponding test class. If j48 and nbc contain scores from cross validation on a single data set, we construct the posterior by
>>> posterior = CorrelatedTTest(nbc, j48)
and then compute the probabilities and plot the histogram
>>> posterior.probs()
(0.4145119975061462, 0.5854880024938538)
>>> fig = posterior.plot(names=("nbc", "j48"))
For comparison on multiple data sets we do the same, except that nbc and j48 must contain average classification accuracies (for sign test and signed rank test) or a matrix of accuracies (for hierarchical test).
>>> posterior = SignedRankTest(nbc, j48, rope=1)
>>> posterior.probs()
(0.23014, 0.00674, 0.76312)
>>> fig = posterior.plot(names=("nbc", "j48"))
Single data set¶

class
baycomp.single.
Posterior
(mean, var, df, rope=0, meanx=None, meany=None, *, names=None, nsamples=50000)[source]¶ The posterior distribution of differences on a single data set.
Parameters:  mean (float) – the mean difference
 var (float) – the variance
 df (float) – degrees of freedom
 rope (float) – rope (default: 0)
 meanx (float) – mean score of the first classifier; shown in a plot
 meany (float) – mean score of the second classifier; shown in a plot
 names (tuple of str) – names of classifiers; shown in a plot
 nsamples (int) – the number of samples; used only in property sample, not in computation of probabilities or plotting (default: 50000)
Unlike the posterior for comparisons on multiple data sets, this distribution is not sampled; probabilities are computed from the posterior Student distribution.
The class can provide a sample (as 1dimensional array), but the sample itself is not used by other methods.

sample
¶ A sample of differences as 1dimensional array.
Like posteriors for comparison on multiple data sets, an instance of this class will always return the same sample.
This sample is not used by other methods.
Multiple data sets¶

class
baycomp.multiple.
Posterior
(sample, *, names=None)[source]¶ Sampled posterior distribution
Parameters:  sample (np.array) – a 3 x nsamples array
 names (tuple of str or None) – names of learning algorithms (default: None)

probs
(with_rope=True)[source]¶ Compute and return probabilities
Parameters: with_rope (bool) – tells whether the sample includes the probabilities for the rope region (default: True) Returns: (p_left, p_rope, p_right) if with_rope=True; otherwise (p_left, p_right).

plot
(names=None)[source]¶ Plot the posterior distribution.
If there are samples in which the probability of rope is higher than 0.1, the distribution is shown in a simplex (see
plot_simplex
), otherwise as a histogram (plot_histogram
).Parameters: names (tuple of str or None) – names of classifiers Returns: matplotlib figure

plot_simplex
(names=None)[source]¶ Plot the posterior distribution in a simplex.
The distribution is shown as a triangle with regions corresponding to first classifier having higher scores than the other by more than rope, the second having higher scores, or the difference being within the rope.
Parameters: names (tuple of str) – names of classifiers Returns: matplotlib figure