Hello, WandB Community!
I run YOLOv10 training script on a remote GPU cluster and want to finetune hyperparameters before running the main model training.
I run 20 sweeps and all of them result in < null > (see image below)
However, in my yolov10_training/yolov10_finetune_LARD_13/02_19_52/ I can see results.csv which are not null, see the image below
epoch | time | train/box_loss | train/cls_loss | train/dfl_loss | metrics/precision(B) | metrics/recall(B) | metrics/mAP50(B) | metrics/mAP50-95(B) | val/box_loss | val/cls_loss | val/dfl_loss | lr/pg0 | lr/pg1 | lr/pg2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 165.006 | 2.91836 | 10.6008 | 1.95599 | 0.48503 | 0.276 | 0.28385 | 0.14981 | 3.57982 | 6.09934 | 2.40669 | 0.000663743 | 0.000663743 | 0.000663743 |
2 | 340.683 | 2.55896 | 4.95659 | 1.93792 | 0.59701 | 0.44838 | 0.43939 | 0.24511 | 3.36526 | 4.47328 | 2.29902 | 0.00106699 | 0.00106699 | 0.00106699 |
3 | 516.641 | 2.36457 | 2.97833 | 1.85849 | 0.5841 | 0.48533 | 0.4661 | 0.26107 | 3.38215 | 4.35041 | 2.25128 | 0.00120623 | 0.00120623 | 0.00120623 |
4 | 689.54 | 2.12222 | 2.12358 | 1.8063 | 0.58889 | 0.37333 | 0.40226 | 0.24237 | 3.13558 | 3.75161 | 2.21274 | 0.000812 | 0.000812 | 0.000812 |
5 | 865.638 | 1.82564 | 1.71369 | 1.77575 | 0.66677 | 0.45888 | 0.48731 | 0.2987 | 3.04279 | 3.14339 | 2.14769 | 0.000416 | 0.000416 | 0.000416 |
this is my train_model.py:
import ultralytics
import wandb
from ultralytics import YOLO
import sys
import yaml
import datetime
import torch
if name == ‘main’:
sys.stdout.reconfigure(encoding=‘utf-8’)
ultralytics.checks()
wandb.login(key=“my_api_key_here”)
dataset = "LARD"
time = datetime.datetime.now().strftime("%d/%m_%H_%M")
sweep_configuration = {
"method": "random",
"name": "yolov10-sweep",
"metric": {"name": "loss", "goal": "minimize"},
"parameters": {
"batch_size": {"values": [16, 32, 64]},
"epochs": {"values": [5, 10, 15]},
"lr": {"max": 0.1, "min": 0.0001},
},
}
sweep_id = wandb.sweep(sweep=sweep_configuration, project="RLD-training")
def train_yolo():
wandb.init(project="RLD-training", name=f"RLD_Train_{dataset}_{time}")
config_file = "yolo/config/yolov10_config.yaml"
config_data = {
"train": "dataset/images/train",
"val": "dataset/images/train",
"test": "dataset/images/test",
"nc": 1,
"names": ["runway"]
}
with open(config_file, "w") as f:
yaml.dump(config_data, f)
model = YOLO("yolo/weights/yolov10n.pt").to(torch.device("cuda"))
model_path = "best_yolov10.pt"
model.train(
data="yolo/config/yolov10_config.yaml",
epochs=wandb.config.epochs,
batch=wandb.config.batch_size,
lr0=wandb.config.lr,
project="yolov10_training",
name=f"yolov10_finetune_{dataset}_{time}",
)
model.save(model_path)
wandb.log_model(model_path)
wandb.agent(sweep_id, function=train_yolo, count=10)
wandb.finish()
Please, help me fix the visualization of wandb sweeps.
Regards,
Yulian.