Deployment Overview of TensorFlow on Server¶
Prerequisites and Basic Requirements¶
The deployment is designed for Ubuntu or Debian-based operating systems. The system requires root privileges or access to sudo to execute installation tasks. The following hardware and software prerequisites must be met:
-
Operating System: Ubuntu or Debian distribution.
-
Hardware: NVIDIA GPU is required. Specifically, the installation script detects NVIDIA H100 GPUs (device ID
10de:2330) to install the appropriate Linux kernel package (linux-generic-hwe-22.04). -
Privileges:
rootaccess is required for system-wide package installation and user creation. -
User Account: The application runs under a specific system user account named
user. -
Ports: No specific network ports are defined for external access in the configuration files; the setup focuses on local GPU acceleration and Python environment.
File and Directory Structure¶
The installation process creates specific directories and files to manage the TensorFlow environment, CUDA drivers, and user credentials.
-
/root/install_script.sh: The main installation script generated during the deployment. -
/root/user_credentials: A file containing the randomly generated password for theuseraccount. -
/home/user: The home directory for the application user where the Python virtual environment is created. -
/home/user/venv: The Python virtual environment directory. -
/home/user/TensorRT-8.6.1.6: The extracted directory for the TensorRT package. -
/home/user/tensorflow_install.sh: The script to install and configure TensorFlow dependencies. -
/home/user/tensorflow.sh: The activation script to load environment variables and the virtual environment. -
/usr/local/cuda: The installation directory for NVIDIA CUDA Toolkit. -
/usr/local/cuda-12.2/lib64: The library path for CUDA version 12.2.
Application Installation Process¶
The deployment is executed via a shell script (install_script.sh) located in the root directory. This script performs the following actions:
-
System Updates: Updates all system packages and removes unused dependencies.
-
Driver Installation:
-
Detects NVIDIA hardware (specifically checking for
10de:2330for H100 GPUs). -
Installs the recommended NVIDIA driver using
ubuntu-drivers-common. -
Installs the GCC compiler required for CUDA.
-
-
CUDA Toolkit Setup:
- Downloads the CUDA keyring and installs the CUDA toolkit based on the detected Ubuntu release version.
-
User Creation:
-
Creates a system user named
userwith a home directory at/home/user. -
Generates a random 8-character password stored in
/root/user_credentials. -
Adds the
userto theusersandsudogroups.
-
-
Environment Setup:
-
Installs Python 3.10,
pip, andvenv. -
Installs
libnvinfer5-devfor TensorRT support. -
Appends necessary
PATHandLD_LIBRARY_PATHvariables to~/.bashrcto point to CUDA binaries and libraries.
-
-
TensorFlow and TensorRT Installation:
-
Executes
tensorflow_install.shas theuser. -
Creates a virtual environment.
-
Installs
tensorflow[and-cuda]via pip. -
Downloads and extracts TensorRT 8.6.1.6.
-
Installs the pre-release version of
tensorrtwheel.
-
-
Validation:
- Runs a Python command to verify TensorFlow is functional and can detect physical GPU devices.
Access Rights and Security¶
Security and access control are managed through user accounts and file permissions established during the installation:
-
User Account: The application runs under the
useraccount, which has membership in thesudogroup, granting administrative privileges. -
Passwords: A unique password is generated at runtime and saved to
/root/user_credentials. This file is readable by the root user. -
File Permissions:
-
/root/install_script.sh:u=rwx,g=r,o=r(Owner read/write/execute, others read-only). -
Scripts in
/home/user/:chmod +xis applied totensorflow_install.shandtensorflow.sh.
-
-
Restrictions: No specific firewall rules or SSH key restrictions are defined in the provided configuration.
Databases¶
No database installation, configuration, or connection settings are included in the provided source files. The deployment focuses on the TensorFlow framework and GPU drivers.
Docker Containers and Their Deployment¶
No Docker containers, docker run commands, or docker-compose files are present in the configuration. The application is deployed directly on the host operating system using native package managers and Python virtual environments.
Proxy Servers¶
No proxy servers (Nginx, Traefik), SSL certificates, or custom domain configurations are defined. The setup does not include Certbot or reverse proxy management.
Permission Settings¶
The installation script enforces the following permission settings:
-
System Scripts: The main installer at
/root/install_script.shis set to mode744(u=rwx,g=r,o=r). -
User Scripts: The scripts created in
/home/user/(tensorflow_install.sh,tensorflow.sh) are made executable usingchmod +x. -
Ownership: The user account
userowns the files and directories within/home/user/.
Location of Configuration Files and Data¶
Configuration and data files are located in the following paths:
-
Activation Script:
/home/user/tensorflow.sh(Loads virtual environment and setsCUDNN_PATHandLD_LIBRARY_PATH). -
Virtual Environment:
/home/user/venv. -
TensorRT Data:
/home/user/TensorRT-8.6.1.6. -
User Credentials:
/root/user_credentials. -
Environment Variables: Defined in
/home/user/.bashrcand loaded viatensorflow.sh.
Available Ports for Connection¶
The provided configuration does not define any network ports for remote access. The TensorFlow installation is local to the server instance.
Starting, Stopping, and Updating¶
To manage the TensorFlow environment, the following commands must be executed as the user:
-
Starting the Environment: To activate the virtual environment and set the necessary library paths for CUDA and TensorRT, run:
This command activates thevenvand exportsCUDNN_PATHandLD_LIBRARY_PATH. -
Testing the Installation: After activating the environment, verify the setup with:
-
Reinstallation or Updates: To reinstall or update the dependencies, re-run the installation script:
Note that this script creates a new virtual environment and reinstalls packages, potentially overwriting the previous state.