How to deal with "Exception: problem"?

I am running into the following error when trying to resume a run:

Traceback (most recent call last):
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1040, in init
    wi.setup(kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 151, in setup
    self._wl = wandb_setup.setup()
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 320, in setup
    ret = _setup(settings=settings)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 315, in _setup
    wl = _WandbSetup(settings=settings)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 301, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 108, in __init__
    self._settings = self._settings_setup(settings, self._early_logger)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 138, in _settings_setup
    s._infer_run_settings_from_environment(_logger=early_logger)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_settings.py", line 1404, in _infer_run_settings_from_environment
    program_relpath = self.program_relpath or _get_program_relpath_from_gitrepo(
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_settings.py", line 138, in _get_program_relpath_from_gitrepo
    root = repo.root
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/lib/git.py", line 46, in root
    return self.repo.git.rev_parse("--show-toplevel")
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 639, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 1184, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 873, in execute
    proc = Popen(command,
  File "/etc/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/etc/anaconda3/lib/python3.9/subprocess.py", line 1754, in _execute_child
    self.pid = _posixsubprocess.fork_exec(
OSError: [Errno 12] Cannot allocate memory
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1040, in init
    wi.setup(kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 151, in setup
    self._wl = wandb_setup.setup()
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 320, in setup
    ret = _setup(settings=settings)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 315, in _setup
    wl = _WandbSetup(settings=settings)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 301, in __init__
    _WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 108, in __init__
    self._settings = self._settings_setup(settings, self._early_logger)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_setup.py", line 138, in _settings_setup
    s._infer_run_settings_from_environment(_logger=early_logger)
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_settings.py", line 1404, in _infer_run_settings_from_environment
    program_relpath = self.program_relpath or _get_program_relpath_from_gitrepo(
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_settings.py", line 138, in _get_program_relpath_from_gitrepo
    root = repo.root
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/lib/git.py", line 46, in root
    return self.repo.git.rev_parse("--show-toplevel")
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 639, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 1184, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/etc/anaconda3/lib/python3.9/site-packages/git/cmd.py", line 873, in execute
    proc = Popen(command,
  File "/etc/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/etc/anaconda3/lib/python3.9/subprocess.py", line 1754, in _execute_child
    self.pid = _posixsubprocess.fork_exec(
OSError: [Errno 12] Cannot allocate memory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/ml-experiments/unet/train.py", line 234, in <module>
    with wandb.init(project="UNet", id=run_id, resume="must" if checkpoint else "never", config=hyperparameters):
  File "/etc/anaconda3/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1081, in init
    raise Exception("problem") from error_seen
Exception: problem

the error is quite opaque, so I have no idea how to handle it. Does anyone know what I should do?
My code is here: ml-experiments/train.py at main · vedantroy/ml-experiments · GitHub

My guess is the issue has to do with the fact that when I press “Ctrl-C”, my program gives the following message: “Waiting for W&B process to finish… (failed 1). Press Control-C to abort syncing.”

(I intercept the Ctrl-C interrupt and save my model to disk, and rely on wandb to finish doing its thing; but maybe I’m doing that part incorrectly??)

Hi Verdant, OSError: [Errno 12] Cannot allocate memory indicates that we can’t allocate system memory. Can you check to see how much free memory you have on your system? You can get this by running free -h or if that doesn’t work top should tell you as well. You can find the processes taking up the most memory on your system with ps aux | sort -rn -k 6. It could also be that your script itself is allocating nearly all available system memory after you start it but before you call wandb.init.

Hi Verdant, are you still running into this issue?

Hi again! Since we haven’t heard back from you, I’m going to close this issue but please feel free to reopen it if you are running into this again