Skip to content

Deployment Overview of TensorFlow on Server

Prerequisites and Basic Requirements

The deployment process requires a server running Ubuntu 22.04 with the following specifications:

  • Operating System: Ubuntu 22.04 (HWE kernel recommended for NVIDIA H100 support).
  • Privileges: Root access or sudo privileges are required to install system packages and configure users.
  • Hardware: An NVIDIA GPU is required. The installation script specifically checks for the NVIDIA H100 model (10de:2330).
  • Network: Internet access is required to download CUDA, TensorRT, and Python packages.
  • Ports: Standard network ports are used for downloading packages; no specific application ports are opened by the installation script.

File and Directory Structure

The application and its dependencies are installed in the following locations:

  • User Home Directory: /home/user
  • Virtual Environment: /home/user/venv
  • TensorRT Installation: /home/user/TensorRT-8.6.1.6
  • Installation Scripts:
  • /home/user/tensorflow_install.sh: Script to install TensorFlow and dependencies.
  • /home/user/tensorflow.sh: Script to activate the environment and set variables.
  • System Credentials: /root/user_credentials (contains the generated password for the user account).
  • CUDA Installation: /usr/local/cuda
  • CUDA Libraries: /usr/local/cuda-12.2/lib64

Application Installation Process

The installation is performed via a shell script executed by the user account. The process includes the following steps:

  1. System Preparation:

    • The system updates all packages and installs curl, wget, and sudo.
    • If an NVIDIA H100 GPU is detected, the linux-generic-hwe-22.04 kernel package is installed.
    • The recommended NVIDIA driver is installed using ubuntu-drivers-common.
    • The gcc compiler is installed to support CUDA.
  2. CUDA Installation:

    • The CUDA keyring is downloaded and installed for the specific Ubuntu release version.
    • The cuda package is installed via apt.
  3. User Account Creation:

    • A new user named user is created with a home directory at /home/user.
    • A random 8-character password is generated and stored in /root/user_credentials.
    • The user account is added to the sudo group.
  4. TensorFlow and Dependencies:

    • The libnvinfer5-dev package is installed.
    • Python 3.10, pip, and venv are installed.
    • A virtual environment is created and activated.
    • TensorFlow with CUDA support is installed using pip install tensorflow[and-cuda].
    • TensorRT version 8.6.1.6 is downloaded, extracted, and installed via pip.

Access Rights and Security

  • User Account: The application runs under the user account.
  • Sudo Access: The user account is a member of the sudo group, allowing administrative commands.
  • Credentials: The password for the user account is stored in /root/user_credentials and should be secured or changed immediately after deployment.
  • File Permissions:
  • The installation scripts (tensorflow_install.sh and tensorflow.sh) are set to executable (chmod +x).
  • The install_script.sh is created with permissions u=rwx,g=r,o=r.

Databases

No database configuration, connection settings, or storage locations are defined in the provided deployment data.

Docker Containers and Their Deployment

The deployment does not utilize Docker containers, docker run, or docker compose. The application is installed natively on the host operating system.

Proxy Servers

No proxy server configuration (Nginx, Traefik, SSL, or Certbot) is included in the provided deployment data.

Permission Settings

The following environment variables are configured in the ~/.bashrc file for the user account to ensure proper library loading:

  • PATH: Includes /usr/local/cuda/bin.
  • LD_LIBRARY_PATH: Includes /usr/local/cuda-12.2/lib64.

The tensorflow.sh script sets the following additional environment variables upon activation:

  • CUDNN_PATH: Dynamically determined using Python.
  • LD_LIBRARY_PATH: Appends the CUDNN library path, the CUDA library path, and the TensorRT library path (/home/user/TensorRT-8.6.1.6/lib).

Starting, Stopping, and Updating

To start the TensorFlow environment and run the application:

  1. Log in as the user account.
  2. Execute the activation script:
    ./tensorflow.sh
    
  3. This script activates the virtual environment and sets the necessary environment variables.

To test the installation and verify GPU availability, run the following command within the activated environment:

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000]))); print(tf.config.list_physical_devices())"

To update the system packages, use the standard apt commands with sudo. To update Python packages within the virtual environment, use pip after activating the environment.

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×