Skip to content

Instantly share code, notes, and snippets.

@charlesfrye
Last active September 24, 2022 03:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save charlesfrye/e27adfc47129a4c209f6dfcb14467fba to your computer and use it in GitHub Desktop.
Save charlesfrye/e27adfc47129a4c209f6dfcb14467fba to your computer and use it in GitHub Desktop.
Getting set up with the LambdaLabs GPU Cloud

FSDL

Thanks to a generous donation from Lambda Labs, project teams in FSDL 2022 can get access to cloud machines with nice server-grade GPUs.

Up to four teams are on each machine, each allocated two GPUs by the CUDA_VISIBLE_DEVICES environment variable.

Access

  1. Find your team ID/number here. Notice that team IDs are zero-padded to three digits!
  2. Join that team on Discord by following the instructions here.
  3. A new voice channel should appear in your Discord sidebar, named team99 if you're in Team #99. Your passphrase and the IP of your server are in that channel's text chat. You have full permissions, e.g. role mentions, in that channel, so feel free to use it to organize your group.
  4. Tunnel to the instance with ssh and login with the three-digit version of your team ID, e.g. ssh team_099@ip.address.here.
  5. Enter your passphrase, e.g. correct horse battery staple.
  6. You're in! Run nvidia-smi to see all the GPUs.

Rules

  1. Do NOT mine cryptocurrency. Instances mining cryptocurrency may be automatically disconnected.
  2. Do NOT power the machine down. You and the other teams sharing the machine will lose access and getting it back is not guaranteed.
  3. Do NOT try to access GPUs on your machine being used by other teams. Teams that are reported violating this rule will have their access revoked.
  4. Do NOT install system-level packages or make system-level configuration changes without running them by course staff. You can post in the Discord. See Features below.

Etiquette

These machines are shared with your fellow students. Caring is a part of sharing.

Don't reboot the machine without checking with the other teams on your machine. You can see which teams are on which machines here.

CPUs, network I/O, RAM, and file storage are shared across teams, so be polite when running jobs that use lots of these resources. For example, make sure to test expensive jobs so they don't need to be re-executed, set resource limits where possible, and be thoughtful about which jobs you really need to run. If you're using Docker, remember to prune regularly.

Aside from Lambda Labs' anti-mining protections, we will not be snooping on the work being done on the machines. However, please limit how much work you do that's not related to the project.

Take care with your credentials and server info and be careful when running services available on the public internet. Public clouds are common targets for attackers, especially public clouds with GPUs.

Features

You're working in an Ubuntu Linux 20.04 virtual machine, with settings configured and resources allocated so that the VMs can operate GPU-accelerated workloads as close to bare metal speed as possible.

They come with the Lambda Stack pre-installed, which includes NVIDIA drivers, NVIDIA's docker tools (e.g. nvidia-container-toolkit), and core ML dependencies, like PyTorch and TensorFlow.

You will likely still wish to control your environment, for which we recommend containerization in docker, rather than Python virtual environments. Without any further configuration on your part, executing docker run commands with the argument --gpus \"device=${CUDA_VISIBLE_DEVICES}\" will expose your GPUs to the container as the first two GPUs, 0,1.

The local file storage will persist until the machines are terminated, which in the absence of issues means "until the class ends". You can see how much storage is available when you log in, in addition to using standard Linux utilities.

Execution

To avoid entering the passphrase each time you log in, you can set up key-based access with SSH. This should also enable easy access via VS Code.

You can work directly from the terminal. In that case, we strongly suggest using tmux or screen.

You can also use ngrok, as covered in the labs, to create a tunnel for a Jupyter server or another webservice IDE. Just make sure you use a password or a token for access!

Support

If you run into configuration issues or have trouble accessing your machine, post in the Discord first.

If you hit an issue with the machine itself, e.g. a crash, you can request support from Lambda Labs' engineers by emailing fsdl@lambdalabs.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment