Skip to content

Deployment Overview of PyTorch on Server

Prerequisites and Basic Requirements

The deployment process requires a server running the Ubuntu operating system. The system must have access to the internet to download packages and dependencies. The following conditions must be met:

  • Operating System: Ubuntu 22.04 (implied by the HWE kernel package linux-generic-hwe-22.04).
  • Privileges: Root access is required to install system packages, configure drivers, and create user accounts.
  • Hardware: The server may include an NVIDIA H100 GPU (PCI ID 10de:2330). If present, specific kernel packages are installed automatically.
  • Ports: Standard ports for package management and SSH are required, though no specific application ports are defined in the configuration.

File and Directory Structure

The application and its supporting files are organized within the home directory of the user account. The following paths are utilized:

  • /home/user/: The home directory for the user account where the application resides.
  • /home/user/venv/: The directory containing the Python virtual environment.
  • /home/user/pytorch_install.sh: The executable script used to initialize the virtual environment and install PyTorch.
  • /home/user/pytorch.sh: The executable script used to activate the virtual environment.
  • /root/user_credentials: A file containing the generated password for the user account.
  • /usr/local/cuda/: The installation directory for the CUDA toolkit.

Application Installation Process

The installation is performed via a shell script that configures the system, installs drivers, and sets up the Python environment. The process includes the following steps:

  1. System Updates: All packages are updated to their latest versions, and unused packages are purged.
  2. Driver Installation:
    • If an NVIDIA H100 GPU is detected, the linux-generic-hwe-22.04 kernel package is installed.
    • The ubuntu-drivers-common package is installed to detect the recommended NVIDIA driver.
    • The recommended NVIDIA driver package is installed automatically.
    • The gcc compiler is installed to support CUDA.
  3. CUDA Toolkit Installation:
    • The CUDA keyring is downloaded and installed for the specific Ubuntu release version.
    • The cuda package is installed via the APT repository.
  4. User Account Creation:
    • A new user named user is created with the home directory /home/user.
    • A random 8-character password is generated and stored in /root/user_credentials.
    • The user account is added to the sudo group.
  5. Library Installation:
    • The libnvinfer5-dev package is installed.
    • Python 3.10, python3-pip, and python3-venv are installed.
  6. Environment Configuration:
    • Environment variables for CUDA (PATH and LD_LIBRARY_PATH) are appended to the ~/.bashrc file of the user account.
    • The pytorch_install.sh script is created to handle the virtual environment setup.

Access Rights and Security

Security and access controls are configured as follows:

  • User Account: A dedicated user account is created with a randomly generated password.
  • Sudo Access: The user account is added to the sudo group, granting administrative privileges.
  • File Permissions:
    • The install_script.sh is created with permissions u=rwx,g=r,o=r.
    • The pytorch_install.sh and pytorch.sh scripts are made executable using chmod +x.
  • Password Storage: The user password is stored in the /root/user_credentials file, which is readable only by the root user.

Databases

No database configuration, connection settings, or storage locations are defined in the provided deployment scripts.

Docker Containers and Their Deployment

The deployment does not utilize Docker containers, docker run, docker compose, or container-related scripts. The application is installed directly on the host operating system.

Proxy Servers

No proxy server configuration (such as Nginx, Traefik, or Certbot) is included in the provided deployment data.

Permission Settings

File and directory permissions are set during the installation process:

  • The install_script.sh located in /root is set to u=rwx,g=r,o=r.
  • The pytorch_install.sh and pytorch.sh scripts in /home/user are set to be executable.
  • The user account owns the files within /home/user.

Starting, Stopping, and Updating

The application is managed through shell scripts rather than a system service manager.

  • Initial Setup: To install PyTorch and set up the virtual environment, run the following command as the user account:

    ./pytorch_install.sh
    
    This script creates a virtual environment, activates it, installs torch, torchvision, and torchaudio, and runs a test script.

  • Activating the Environment: To activate the virtual environment for subsequent use, run:

    ./pytorch.sh
    
    This script sources the venv/bin/activate file.

  • Verification: After activation, verify the installation and CUDA availability with:

    python3 -c "import torch; print('PyTorch installed with CUDA' if torch.cuda.is_available() else 'No CUDA device available')"
    

  • Updating: To update the application, the pytorch_install.sh script can be re-run to reinstall dependencies within the virtual environment. System-level updates are handled via standard apt commands.

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×