Meta data, i.e. number of instances and parameters as well as configuration budget. Statistics apply to the best run, if multiple configurator runs are compared.
Run with best incumbent | Cutoff | 300 | |
# Train instances | 302 | Walltime budget | 172800 |
# Test instances | 302 | Runcount budget | inf |
# Parameters | 26 | CPU budget | inf |
# Features | 138 | Deterministic | False |
# Duplicate Feature vectors | 14 | ||
# Evaluated Configurations | 2922 | # Runs per Config (min) | 0 |
# Default evaluations | 7 | # Runs per Config (mean) | 23.8494 |
# Incumbent evaluations | 2000 | # Runs per Config (max) | 2000 |
Budget spent evaluating configurations | NaN | Total number of configuration runs | 69688 |
# Changed parameters | 21 |
Comparing parameters of default and incumbent. Parameters that differ from default to incumbent are presented first.
Default | Incumbent | |
---|---|---|
-------------- Changed parameters: -------------- | ----- | ----- |
sp-clause-activity-inc | 1 | 1.45757 |
sp-clause-decay | 1.4 | 1.65565 |
sp-first-restart | 100 | 541 |
sp-learned-clause-sort-heur | 0 | 18 |
sp-learned-clauses-inc | 1.3 | 1.18221 |
sp-learned-size-factor | 0.4 | 1.05424 |
sp-orig-clause-sort-heur | 0 | 12 |
sp-rand-var-dec-freq | 0.001 | 0.005 |
sp-resolution | 1 | 0 |
sp-restart-inc | 1.5 | 1.64232 |
sp-use-pure-literal-rule | 1 | 0 |
sp-var-activity-inc | 1 | 0.828371 |
sp-var-dec-heur | 0 | 16 |
sp-variable-decay | 1.4 | 1.40302 |
sp-max-res-lit-inc | 1 | inactive |
sp-max-res-runs | 4 | inactive |
sp-rand-var-dec-scaling | 1 | 0.781189 |
sp-res-cutoff-cls | 8 | inactive |
sp-res-cutoff-lits | 400 | inactive |
sp-res-order-heur | 0 | inactive |
sp-rand-phase-scaling | 1 | 0.67862 |
-------------- Unchanged parameters: -------------- | ----- | ----- |
sp-clause-del-heur | 2 | 2 |
sp-phase-dec-heur | 5 | 5 |
sp-update-dec-queue | 1 | 1 |
sp-rand-phase-dec-freq | 0.001 | 0.001 |
Contains different ways of analyzing the final incumbent and the performance of the algorithm's default parameter configuration.
The most common way to compare performances of algorithms on the same set of instances. Entries in the table depend on the cost metric of the configurator run. For scenarios optimizing running time, this includes average runtime, penalized average runtime as well as number of timeouts. The p-value of the paired permutation test uses 10000 permutations and tests against the null-hypothesis that the mean of performance between default and incumbent is equal.
Train | Test | p-value | |||
---|---|---|---|---|---|
Default | Incumbent | Default | Incumbent | ||
PAR10 | 659.968 | 11.295 | 608.726 | 3.04 | 0.00010 |
PAR1 | 69.902 | 2.355 | 63.362 | 3.04 | 0.00010 |
Timeouts | 62/302 | 1/302 | 55/302 | 0/302 | 0.00010 |
Depicts cost distributions over the set of instances. Since these are empirical distributions, the plots show step functions. These plots provide insights into how well configurations perform up to a certain threshold. For runtime scenarios this shows the probability of solving all instances from the set in a given timeframe. On the left side the training-data is scattered, on the right side the test-data is scattered.
Scatter plots show the costs of the default and optimized parameter configuration on each instance. Since this looses detailed information about the individual cost on each instance by looking at aggregated cost values in tables, scatter plots provide a more detailed picture. They provide insights whether overall performance improvements can be explained only by some outliers or whether they are due to improvements on the entire instance set. On the left side the training-data is scattered, on the right side the test-data is scattered.
The instance features are projected into a two/three dimensional space using principal component analysis (PCA) and the footprint of each algorithm is plotted, i.e., on which instances the default or the optimized configuration performs well. In contrast to the other analysis methods in this section, these plots allow insights into which of the two configurations performs well on specific types or clusters of instances. Inspired by Smith-Miles.
Comparing footprints of default (left) and incumbent (right) in two-dimensional feature space.
Projection of feature space into three dimensions, different viewpoints for enhanced explanation.
Projection of feature space into three dimensions, different viewpoints for enhanced explanation.
Analysis of the trajectory and the runhistory returned by a configurator to gain insights into how the configurator tried to find a well-performing configuration.
Analysis of the iteratively sampled configurations during the optimization procedure. Multi-dimensional scaling (MDS) is used to reduce dimensionality of the search space and plot the distribution of evaluated configurations. The larger the dot, the more often the configuration was evaluated on instances from the set. Configurations that were incumbents at least once during optimization are marked as red squares. Configurations acquired through local search are marked with a 'x'. The downward triangle denotes the final incumbent, whereas the orange upward triangle denotes the default configuration. The heatmap and the colorbar correspond to the predicted performance in that part of the search space.
Depicts the average cost of the best so far found configuration (using all trajectory data) over the time spent by the configurator (including target algorithm runs and the overhead generated by the configurator)If the curve flattens out early, it indicates that too much time was spent for the configurator run; whereas a curve that is still improving at the end of the budget indicates that one should increase the configuration budget. The plotted standard deviation gives the uncertainty over multiple configurator runs.
Previously used by Golovin et al. to study the frequency of chosen parameter settings in black-box-optimization. Each line corresponds to one configuration in the runhistory and shows the parameter settings and the corresponding (estimated) average cost. To handle large configuration spaces with hundreds of parameters, the (at most) 10 most important parameters based on a fANOVA parameter importance analysis are plotted. To emphasize better configurations, the performance is encoded in the color of each line, ranging from blue to red. These plots provide insights into whether the configurator focused on specific parameter values and how these correlate to their costs.
Parameter Importance analysis to determine which of the parameters most influence the analysed algorithms performance.
Parameters are sorted by the average importance-value. Note, that the values are not directly comparable, since the different techniques provide different metrics (see respective tooltips for details on the differences).
fANOVA | Ablation | LPI | |
---|---|---|---|
sp-var-dec-heur | 65.06 | 73.90 | 91.36 |
sp-orig-clause-sort-heur | 1.31 | 21.94 | - |
sp-phase-dec-heur | 5.94 | - | - |
sp-restart-inc | - | 1.44 | 4.05 |
sp-first-restart | - | - | 1.59 |
sp-learned-clause-sort-heur | 1.12 | 2.02 | - |
sp-variable-decay | - | - | 1.50 |
fANOVA (functional analysis of variance) computes the fraction of the variance in the cost space explained by changing a parameter by marginalizing over all other parameters, for each parameter (or for pairs of parameters). Parameters with high importance scores will have a large impact on the performance. To this end, a random forest is trained as an empirical performance model on the available empirical data from the available runhistories.
-------------------- Single importance: -------------------- | -------------------- |
---|---|
sp-var-dec-heur | 65.0623 |
sp-phase-dec-heur | 5.9379 |
Ablation Analysis is a method to determine parameterimportance by comparing two parameter configurations, typically the default and the optimized configuration.It uses a greedy forward search to determine the order of flipping the parameter settings from default configuration to incumbent such that in each step the cost is maximally decreased.
Using an empirical performance model, performance changes of a configuration along each parameter are calculated. To quantify the importance of a parameter value, the variance of all cost values by changing that parameter are predicted and then the fraction of all variances is computed. This analysis is inspired by the human behaviour to look for improvements in the neighborhood of individual parameters of a configuration.
Analysis of the instance features to gain insights into the instance set that was used during the optimization.
Reduction of the out-of-the-bag root mean squared error of the random forest empirical performance model by applying forward selection on the set of instance features. Using this method, we can identify a set of instance features that suffices to obtain prediction accuracy comparable to using the full set of features.
Error | |
---|---|
None | 0.989727 |
nclausesOrig | 0.225080 |
Pre_featuretime | 0.186257 |
vars_reduced_depth_64 | 0.181692 |
Box and Violin Plots show the distribution of each feature value across the instances. Box plots show the quantiles of the distribution and violin plots show the approximated probability density of the feature values. Such plots are useful to inspect the instances and to detect characteristics of the instances. For example, if the distributions have two or more modes, it could indicate that the instance set is heterogeneous which could cause problems in combination with racing strategies configurators typically use. NaN values are removed from the data.
Correlation of features based on the Pearson product-moment correlation. Since instance features are used to train an empirical performance model in model-based configurators, it can be important to remove correlated features in a pre-processing step depending on the machine-learning algorithm. Darker fields corresponds to a larger correlation between the features.
Clustering instances in 2d; the color encodes the cluster assigned to each cluster. Similar to ISAC, we use a k-means to cluster the instances in the feature space. As pre-processing, we use standard scaling and a PCA to 2 dimensions. To guess the number of clusters, we use the silhouette score on the range of 2 to 12 in the number of clusters