Deployment Overview of TensorFlow on Server¶
Prerequisites and Basic Requirements¶
The deployment process requires a server running Ubuntu 22.04 with the following specifications:
- Operating System: Ubuntu 22.04 (HWE kernel recommended for NVIDIA H100 support).
- Privileges: Root access or
sudoprivileges are required to install system packages and configure users. - Hardware: An NVIDIA GPU is required. The installation script specifically checks for the NVIDIA H100 model (
10de:2330). - Network: Internet access is required to download CUDA, TensorRT, and Python packages.
- Ports: Standard network ports are used for downloading packages; no specific application ports are opened by the installation script.
File and Directory Structure¶
The application and its dependencies are installed in the following locations:
- User Home Directory:
/home/user - Virtual Environment:
/home/user/venv - TensorRT Installation:
/home/user/TensorRT-8.6.1.6 - Installation Scripts:
/home/user/tensorflow_install.sh: Script to install TensorFlow and dependencies./home/user/tensorflow.sh: Script to activate the environment and set variables.- System Credentials:
/root/user_credentials(contains the generated password for theuseraccount). - CUDA Installation:
/usr/local/cuda - CUDA Libraries:
/usr/local/cuda-12.2/lib64
Application Installation Process¶
The installation is performed via a shell script executed by the user account. The process includes the following steps:
-
System Preparation:
- The system updates all packages and installs
curl,wget, andsudo. - If an NVIDIA H100 GPU is detected, the
linux-generic-hwe-22.04kernel package is installed. - The recommended NVIDIA driver is installed using
ubuntu-drivers-common. - The
gcccompiler is installed to support CUDA.
- The system updates all packages and installs
-
CUDA Installation:
- The CUDA keyring is downloaded and installed for the specific Ubuntu release version.
- The
cudapackage is installed viaapt.
-
User Account Creation:
- A new user named
useris created with a home directory at/home/user. - A random 8-character password is generated and stored in
/root/user_credentials. - The
useraccount is added to thesudogroup.
- A new user named
-
TensorFlow and Dependencies:
- The
libnvinfer5-devpackage is installed. - Python 3.10,
pip, andvenvare installed. - A virtual environment is created and activated.
- TensorFlow with CUDA support is installed using
pip install tensorflow[and-cuda]. - TensorRT version 8.6.1.6 is downloaded, extracted, and installed via
pip.
- The
Access Rights and Security¶
- User Account: The application runs under the
useraccount. - Sudo Access: The
useraccount is a member of thesudogroup, allowing administrative commands. - Credentials: The password for the
useraccount is stored in/root/user_credentialsand should be secured or changed immediately after deployment. - File Permissions:
- The installation scripts (
tensorflow_install.shandtensorflow.sh) are set to executable (chmod +x). - The
install_script.shis created with permissionsu=rwx,g=r,o=r.
Databases¶
No database configuration, connection settings, or storage locations are defined in the provided deployment data.
Docker Containers and Their Deployment¶
The deployment does not utilize Docker containers, docker run, or docker compose. The application is installed natively on the host operating system.
Proxy Servers¶
No proxy server configuration (Nginx, Traefik, SSL, or Certbot) is included in the provided deployment data.
Permission Settings¶
The following environment variables are configured in the ~/.bashrc file for the user account to ensure proper library loading:
- PATH: Includes
/usr/local/cuda/bin. - LD_LIBRARY_PATH: Includes
/usr/local/cuda-12.2/lib64.
The tensorflow.sh script sets the following additional environment variables upon activation:
- CUDNN_PATH: Dynamically determined using Python.
- LD_LIBRARY_PATH: Appends the CUDNN library path, the CUDA library path, and the TensorRT library path (
/home/user/TensorRT-8.6.1.6/lib).
Starting, Stopping, and Updating¶
To start the TensorFlow environment and run the application:
- Log in as the
useraccount. - Execute the activation script:
- This script activates the virtual environment and sets the necessary environment variables.
To test the installation and verify GPU availability, run the following command within the activated environment:
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000]))); print(tf.config.list_physical_devices())"
To update the system packages, use the standard apt commands with sudo. To update Python packages within the virtual environment, use pip after activating the environment.