Extract and analyze best configurations from FLAML AutoML optimization logs.
This tool processes FLAML optimization logs to:
- Extract best configurations for each learner (algorithm)
- Generate two types of warm start configurations:
- Absolute: Pure top-N by performance
- Representative: Diverse best-per-cluster selections
- Visualize the search space exploration
- Provide detailed optimization statistics
================================================================================
FLAML OPTIMIZATION SUMMARY
================================================================================
OVERALL STATISTICS
--------------------------------------------------------------------------------
Total trials: 12777
Best validation loss: 0.118131
Worst validation loss: 0.668284
Mean validation loss: 0.338847
Std validation loss: 0.075695
Total wall clock time: 21599.62 seconds (359.99 minutes)
Mean trial time: 1.69 seconds
LEARNER STATISTICS
--------------------------------------------------------------------------------
Learner Trials Best Loss Mean Loss
--------------------------------------------------------------------------------
catboost 576 0.315344 0.387085
extra_tree 909 0.233602 0.425164
lgbm 477 0.248275 0.419497
rf 1740 0.180855 0.348641
xgb_limitdepth 321 0.245129 0.411951
xgboost 8754 0.118131 0.317687
...
warm_start_configs = {
# Top 5 configurations for catboost
'catboost': [
{"early_stopping_rounds":10,"learning_rate":0.14189952377559728,"n_estimators":8192,"FLAML_sample_size":49659}, # Rank 1: metric=0.315344
{"early_stopping_rounds":11,"learning_rate":0.1530902242854414,"n_estimators":8192,"FLAML_sample_size":10000}, # Rank 2: metric=0.316097
{"early_stopping_rounds":12,"learning_rate":0.09541333025917802,"n_estimators":8192,"FLAML_sample_size":49659}, # Rank 3: metric=0.320287
{"early_stopping_rounds":10,"learning_rate":0.09544104526717777,"n_estimators":8192,"FLAML_sample_size":49659}, # Rank 4: metric=0.320287
{"early_stopping_rounds":10,"learning_rate":0.09541180730499482,"n_estimators":8192,"FLAML_sample_size":49659}, # Rank 5: metric=0.320287
]
}- Python 3.7+
pip install numpy scikit-learn matplotlibAll dependencies are standard packages, no special installations needed.
./flaml-analyze.py path/to/optimization.logThis will:
- Parse the FLAML log file
- Extract top 5 configs per learner (absolute + representative)
- Generate visualizations and summaries
- Save warm start configurations
# Extract top 10 configs per learner
./flaml-analyze.py optimization.log --warm-start-per-method 10
# Adjust performance filtering (keep top 30% before clustering)
./flaml-analyze.py optimization.log --performance-percentile 30
# Save to specific directory
./flaml-analyze.py optimization.log -o results/
# Extract top 3 for analysis summary
./flaml-analyze.py optimization.log -n 3The script generates:
-
warm_start_configs_absolute.py- Pure top-N configurations ranked by performance
- Use for maximum performance and fast convergence
-
warm_start_configs_representative.py- Diverse configurations (K-Means + best per cluster)
- Use for exploration and robustness
-
search_space_2d.png- PCA 2D projection showing search space exploration
- Visualizes where FLAML searched and selected configs
-
optimization_analysis.png- Temporal progress plots
- Shows optimization convergence over time
-
optimization_summary.txt- Detailed statistics and best configurations
- Text report of the optimization run
positional arguments:
log_file Path to FLAML log file (JSON lines format)
options:
-n N_BEST Number of best configs for analysis summary (default: 1)
-o OUTPUT_DIR Output directory (default: same as log file)
--warm-start-per-method N
Number of configs per method for warm start (default: 5)
--warm-start-overall N
Number of best overall configs (default: 5, currently unused)
--performance-percentile X
Keep top X% before clustering (default: 20.0)
- Method: Simply select the N best configurations by validation loss
- Pros: Guaranteed best performance, fast convergence
- Cons: Configurations may be very similar (redundant)
- Use when: Short optimization time, known good region
- Method:
- Filter to top 20% by performance
- Cluster into K groups using K-Means
- Select best performer from each cluster
- Pros: Diverse exploration, best-in-region configs, robust
- Cons: Slightly lower initial performance than absolute
- Use when: Medium/long optimization time, exploration needed
# Load the configurations
exec(open('warm_start_configs_representative.py').read())
# Extract configs for a specific learner
catboost_configs = warm_start_configs['catboost']
# Use in FLAML (format depends on FLAML version)
# Option 1: Direct warm start
automl = AutoML()
automl.fit(
X_train, y_train,
task='classification',
starting_points={'catboost': [c for c in catboost_configs]},
time_budget=3600
)
# Option 2: As initial points
points_to_evaluate = [
(config, learner)
for learner, configs in warm_start_configs.items()
for config in configs
]./flaml-analyze.py logs/dataset1__descriptors_flaml.logOutput: 5 absolute + 5 representative configs per learner
./flaml-analyze.py logs/optimization.log \
--warm-start-per-method 10 \
--performance-percentile 30Output: 10 configs per learner, selected from top 30%
./flaml-analyze.py logs/optimization.log \
--warm-start-per-method 5 \
--performance-percentile 10Output: 5 configs per learner, selected from top 10% (best quality)
./flaml-analyze.py logs/optimization.log \
-n 10 \
-o analysis_results/Output: Analysis shows top 10, warm start saves 5 (default), all in analysis_results/
For short FLAML runs (<100 trials per learner):
--warm-start-per-method 3 --performance-percentile 30Keep more configs, be less aggressive
For long FLAML runs (>500 trials per learner):
--warm-start-per-method 5 --performance-percentile 10Can be more selective, quality is high
For exploration:
--warm-start-per-method 10 --performance-percentile 20More diverse starting points
For exploitation:
--warm-start-per-method 3 --performance-percentile 5Focus on best-known region
./flaml-analyze.py log.txt \
--warm-start-overall 0 \
--warm-start-per-method 0./flaml-analyze.py log.txt -n 10 \
--warm-start-per-method 1- Performance filtering: Keep top X% (default: 20%)
- K-Means clustering: Group into K clusters
- Best per cluster: Select champion from each cluster
- Result: K diverse, high-performing configurations
- Encoding of Categorical Variables before PCA and clustering using LabelEncoder. The Issue: LabelEncoder assigns arbitrary integers (0, 1, 2). K-Means treats these as continuous distances. The PCA projections and clusters for categorical hyperparameters may be slightly distorted.
- v0.1.2: K-Means + best per cluster approach
- v0.1.0: Added PCA 2D visualization
- v0.0.1: Initial release with absolute/representative selection

