Deployment Overview of TensorFlow on Server¶

Prerequisites and Basic Requirements¶

The deployment is designed for Ubuntu or Debian-based operating systems. The system requires root privileges or access to sudo to execute installation tasks. The following hardware and software prerequisites must be met:

Operating System: Ubuntu or Debian distribution.
Hardware: NVIDIA GPU is required. Specifically, the installation script detects NVIDIA H100 GPUs (device ID 10de:2330) to install the appropriate Linux kernel package (linux-generic-hwe-22.04).
Privileges: root access is required for system-wide package installation and user creation.
User Account: The application runs under a specific system user account named user.
Ports: No specific network ports are defined for external access in the configuration files; the setup focuses on local GPU acceleration and Python environment.

File and Directory Structure¶

The installation process creates specific directories and files to manage the TensorFlow environment, CUDA drivers, and user credentials.

/root/install_script.sh: The main installation script generated during the deployment.
/root/user_credentials: A file containing the randomly generated password for the user account.
/home/user: The home directory for the application user where the Python virtual environment is created.
/home/user/venv: The Python virtual environment directory.
/home/user/TensorRT-8.6.1.6: The extracted directory for the TensorRT package.
/home/user/tensorflow_install.sh: The script to install and configure TensorFlow dependencies.
/home/user/tensorflow.sh: The activation script to load environment variables and the virtual environment.
/usr/local/cuda: The installation directory for NVIDIA CUDA Toolkit.
/usr/local/cuda-12.2/lib64: The library path for CUDA version 12.2.

Application Installation Process¶

The deployment is executed via a shell script (install_script.sh) located in the root directory. This script performs the following actions:

System Updates: Updates all system packages and removes unused dependencies.
Driver Installation:
- Detects NVIDIA hardware (specifically checking for 10de:2330 for H100 GPUs).
- Installs the recommended NVIDIA driver using ubuntu-drivers-common.
- Installs the GCC compiler required for CUDA.
CUDA Toolkit Setup:
- Downloads the CUDA keyring and installs the CUDA toolkit based on the detected Ubuntu release version.
User Creation:
- Creates a system user named user with a home directory at /home/user.
- Generates a random 8-character password stored in /root/user_credentials.
- Adds the user to the users and sudo groups.
Environment Setup:
- Installs Python 3.10, pip, and venv.
- Installs libnvinfer5-dev for TensorRT support.
- Appends necessary PATH and LD_LIBRARY_PATH variables to ~/.bashrc to point to CUDA binaries and libraries.
TensorFlow and TensorRT Installation:
- Executes tensorflow_install.sh as the user.
- Creates a virtual environment.
- Installs tensorflow[and-cuda] via pip.
- Downloads and extracts TensorRT 8.6.1.6.
- Installs the pre-release version of tensorrt wheel.
Validation:
- Runs a Python command to verify TensorFlow is functional and can detect physical GPU devices.

Access Rights and Security¶

Security and access control are managed through user accounts and file permissions established during the installation:

User Account: The application runs under the user account, which has membership in the sudo group, granting administrative privileges.
Passwords: A unique password is generated at runtime and saved to /root/user_credentials. This file is readable by the root user.
File Permissions:
- /root/install_script.sh: u=rwx,g=r,o=r (Owner read/write/execute, others read-only).
- Scripts in /home/user/: chmod +x is applied to tensorflow_install.sh and tensorflow.sh.
Restrictions: No specific firewall rules or SSH key restrictions are defined in the provided configuration.

Databases¶

No database installation, configuration, or connection settings are included in the provided source files. The deployment focuses on the TensorFlow framework and GPU drivers.

Docker Containers and Their Deployment¶

No Docker containers, docker run commands, or docker-compose files are present in the configuration. The application is deployed directly on the host operating system using native package managers and Python virtual environments.

Proxy Servers¶

No proxy servers (Nginx, Traefik), SSL certificates, or custom domain configurations are defined. The setup does not include Certbot or reverse proxy management.

Permission Settings¶

The installation script enforces the following permission settings:

System Scripts: The main installer at /root/install_script.sh is set to mode 744 (u=rwx,g=r,o=r).
User Scripts: The scripts created in /home/user/ (tensorflow_install.sh, tensorflow.sh) are made executable using chmod +x.
Ownership: The user account user owns the files and directories within /home/user/.

Location of Configuration Files and Data¶

Configuration and data files are located in the following paths:

Activation Script: /home/user/tensorflow.sh (Loads virtual environment and sets CUDNN_PATH and LD_LIBRARY_PATH).
Virtual Environment: /home/user/venv.
TensorRT Data: /home/user/TensorRT-8.6.1.6.
User Credentials: /root/user_credentials.
Environment Variables: Defined in /home/user/.bashrc and loaded via tensorflow.sh.

Available Ports for Connection¶

The provided configuration does not define any network ports for remote access. The TensorFlow installation is local to the server instance.

Starting, Stopping, and Updating¶

To manage the TensorFlow environment, the following commands must be executed as the user:

Starting the Environment: To activate the virtual environment and set the necessary library paths for CUDA and TensorRT, run:
```
source /home/user/tensorflow.sh
```
This command activates the venv and exports CUDNN_PATH and LD_LIBRARY_PATH.

Testing the Installation: After activating the environment, verify the setup with:

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000]))); print(tf.config.list_physical_devices())"

Reinstallation or Updates: To reinstall or update the dependencies, re-run the installation script:
```
/home/user/tensorflow_install.sh
```
Note that this script creates a new virtual environment and reinstalls packages, potentially overwriting the previous state.