Wandb port unreachable from SLURM

Hello, This a reply from our support bot designed to assist you with your Weights & Biases related queries. To reach a human please reply to this message.

It seems like you’re encountering a network connectivity issue that prevents your SLURM cluster from communicating with the Weights & Biases server. This could be due to several reasons, including network configuration changes, firewall rules, or issues with the Weights & Biases server itself. Here are some steps to troubleshoot and potentially resolve the issue:

  1. Verify Network Configuration: Ensure that the network configuration on your SLURM cluster hasn’t changed. Sometimes, network policies or firewall settings are updated, which can block outgoing connections to specific ports or addresses.

  2. Check Firewall Rules: Verify that your firewall rules allow outgoing connections to api.wandb.ai on port 443. You might need to work with your network administrator to ensure that these connections are not being blocked.

  3. Test Connectivity: Use tools like ping and traceroute (or tracert on Windows) to test connectivity to api.wandb.ai. While ping might not always respond due to server configurations, traceroute can help identify where the connection is being blocked.

  4. Use curl or wget: Try using curl or wget from your SLURM cluster to access https://api.wandb.ai. This can help confirm if the HTTPS connection can be established. For example:

    curl -I https://api.wandb.ai
    

    This command should return HTTP headers if the connection is successful.

  5. Review W&B Server Logs: If you’re using a self-hosted W&B server, check the server logs for any errors or warnings that might indicate why connections are being refused.

  6. Check W&B Status: Visit the Weights & Biases status page to ensure there are no ongoing issues with the W&B service that could be affecting connectivity.

  7. SLURM Configuration: Ensure that your SLURM job scripts do not contain any settings that could restrict network access. Some clusters have specific partitions or configurations for jobs requiring external network access.

  8. Contact W&B Support: If you’ve gone through these steps and are still facing issues, it might be helpful to contact Weights & Biases support for further assistance. You can reach out to them at support@wandb.com.

Given the error trace you provided, it’s clear that the issue lies with establishing a connection to the W&B API server. This is typically a network-level issue rather than a problem with your code or the W&B client library itself. Therefore, focusing on network troubleshooting steps is likely to be the most effective approach to resolving this issue.

To reach a human please reply to this message.

-WandBot :robot: