How do I create a launch job to trigger a sagemaker training job?

I am trying to create a job that I can launch from WandB interface to trigger a sagemaker training job.
I have followed the documentation (Set up for SageMaker | Weights & Biases Documentation) and I have my agent running.
I also read the documentation on how to create a launch job (Create a launch job | Weights & Biases Documentation) which can be done from python script, git or a docker image. However nowhere in the documentation is explained how to create a launch job that uses the docker image I have in my private ECR to run a training job in AWS sagemaker. I would like if you could provide an example on how to do that.

Hi @diogo-maximino , Thank you for reaching out to W&B Technical Support!

To create a launch job in Weights & Biases that uses a Docker image from your private ECR and runs a training job in AWS SageMaker, you’ll need to follow a few steps. Here’s a general outline of the process:

  1. Build and Push Your Docker Image to ECR: Ensure that your Docker image is built with the necessary dependencies and pushed to your private ECR repository.
  2. Set Up Your W&B Environment: Configure your environment with the necessary W&B environment variables, including your API key and the Docker image tag from ECR.
  3. Create a Launch Job with W&B CLI: Use the wandb job create command to create a launch job, specifying your project, entity, and the Docker image tag from ECR.
  4. Add the Launch Job to a Queue: Once the launch job is created, add it to a launch queue from the W&B interface.
  5. Configure SageMaker Integration: Ensure that your SageMaker setup is integrated with W&B, following the SageMaker integration guide.

Here’s an example of how you might set up the W&B environment and create a launch job using the W&B CLI:

# Set W&B environment variables
export WANDB_API_KEY="<your-w&B-api-key>"
export WANDB_ENTITY="<your-entity>"
export WANDB_PROJECT="<project-name>"

# Set the Docker image tag from ECR
export WANDB_DOCKER="123456789012.dkr.ecr.us-east-1.amazonaws.com/my-image:develop"

# Create a launch job with the W&B CLI
wandb job create --project "$WANDB_PROJECT" --entity "$WANDB_ENTITY" \
--name "sagemaker-training-job" image "$WANDB_DOCKER"

Replace the placeholder values with your actual API key, entity name, project name, and the full image tag from your ECR repository.

After creating the launch job, you can add it to a queue from the W&B interface:

  1. Navigate to your W&B project.
  2. Select the Jobs tab on the left panel (thunderbolt icon).
  3. Hover your mouse next to the name of the job you created and select the Launch button.
  4. Configure the job version, overrides, and queue as needed.
  5. Click the Launch now button to enqueue your launch job.

Make sure that your SageMaker environment is set up to pull the Docker image from your private ECR and that the necessary permissions are in place for SageMaker to access ECR.

For more detailed instructions and to ensure that you have the most up-to-date information, please refer to the Weights & Biases documentation on creating launch jobs and the SageMaker integration guide.

Please let me know if this resolves your issue or in case of any further queries.

Hi @diogo-maximino , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi @diogo-maximino , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!