Aller au contenu

Installation des pilotes GPU AMD, ROCm et HIP sur Ubuntu Linux

Dans cet article

Ce guide décrit le processus d'installation des pilotes GPU AMD et de la pile ROCm (Radeon Open Compute), ainsi que HIP. L'utilisation de ROCm permet des charges de travail d'apprentissage automatique et d'IA sur les GPU AMD modernes, tandis que HIP accélère le traitement graphique — par exemple, dans Blender.

Attention

AMD GPUs on HOSTKEY are guaranteed to work ONLY on Ubuntu 24.04 LTS!

System Preparation

Before starting the installation, make sure the system meets the requirements:

  1. Vérification du système d'exploitation: cat /etc/os-release — la sortie doit contenir VERSION_ID=\"24.04\".

  2. Vérification du noyau: uname -r — vous avez besoin d'une version de noyau Linux ≥6.13. Si nécessaire, installez le dernier noyau mainline disponible:

    sudo add-apt-repository ppa:cappelikan/ppa -y  
    sudo apt update && sudo apt install -y mainline  
    sudo mainline install-latest  
    reboot  
    
  3. Mise à jour du système:

    sudo apt update && sudo apt upgrade -y
    

Installation manuelle de ROCm

  1. Installation des dépendances:

    sudo apt install -y wget gnupg2 build-essential dkms curl
    

  2. Nettoyage des anciens paquets (recommandé)

    sudo dpkg --configure -a
    sudo apt remove --purge -y rocminfo
    sudo apt purge -y 'rocm*' 'amdgpu*' 'graphics*' 'hip*'
    sudo apt autoremove -y
    sudo apt clean
    sudo rm -rf /etc/apt/sources.list.d/amdgpu* /etc/apt/sources.list.d/rocm* /etc/apt/sources.list.d/graphics*
    sudo apt update
    
  3. Ajout du dépôt ROCm "latest":

    sudo install -d -m 0755 /usr/share/keyrings
    wget -qO- https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/rocm-archive-keyring.gpg >/dev/null
    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/rocm-archive-keyring.gpg] https://repo.radeon.com/rocm/apt/latest/ noble main" | sudo tee /etc/apt/sources.list.d/rocm.list >/dev/null
    sudo apt update
    
  4. Installation de la pile ROCm:

    sudo apt install -y rocm-dev rocm-libs rocm-hip-sdk rocm-smi-lib amd-smi-lib rocminfo
    
  5. Création d'un lien symbolique /opt/rocm:

    ROCM_DIR=$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1)
    sudo ln -sfn "$ROCM_DIR" /opt/rocm
    echo "ROCm установлен: $(basename "$ROCM_DIR")"
    
  6. Configuration des droits d'accès:

    sudo usermod -aG render,video $USER
    
  7. Configuration des chemins dans ~/.bashrc:

    ROCM_VER=$(basename "$ROCM_DIR" | sed 's/rocm-//')
    cat >> ~/.bashrc << EOF
    
    # AMD ROCm Paths
    if [ -d "/opt/rocm-${ROCM_VER}" ]; then
    export PATH="/opt/rocm-${ROCM_VER}/bin:\$PATH"
    export LD_LIBRARY_PATH="/opt/rocm-${ROCM_VER}/hip/lib:/opt/rocm-${ROCM_VER}/lib:\$LD_LIBRARY_PATH"
    export ROCM_PATH="/opt/rocm-${ROCM_VER}"
    export HIP_CLANG_PATH="/opt/rocm-${ROCM_VER}/llvm/bin"
    fi
    EOF
    source ~/.bashrc
    

Vérification d'installation

Après l'installation terminée et le redémarrage du système, vérifiez que les pilotes fonctionnent correctement. Pour commencer, "réveillez" la carte avec la commande

echo on | sudo tee /sys/class/drm/card0/device/power/control
  1. Outil rocminfo:

    rocminfo
    
    La commande doit répertorier les GPU disponibles et leurs spécifications (agents HSA).

    Exemple de sortie de rocminfo après l'installation réussie des pilotes et ROCm
    ROCk module is loaded
    =====================
    HSA System Attributes
    =====================
    Runtime Version:         1.18
    Runtime Ext Version:     1.14
    System Timestamp Freq.:  1000.000000MHz
    Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
    Machine Model:           LARGE
    System Endianness:       LITTLE
    Mwaitx:                  DISABLED
    XNACK enabled:           NO
    DMAbuf Support:          YES
    VMM Support:             NO
    
    ==========
    HSA Agents
    ==========
    *******
    Agent 1
    *******
    Name:                    AMD Ryzen 9 7950X 16-Core Processor
    Uuid:                    CPU-XX
    Marketing Name:          AMD Ryzen 9 7950X 16-Core Processor
    Vendor Name:             CPU
    Feature:                 None specified
    Profile:                 FULL_PROFILE
    Float Round Mode:        NEAR
    Max Queue Number:        0(0x0)
    Queue Min Size:          0(0x0)
    Queue Max Size:          0(0x0)
    Queue Type:              MULTI
    Node:                    0
    Device Type:             CPU
    Cache Info:
        L1:                      32768(0x8000) KB
    Chip ID:                 0(0x0)
    ASIC Revision:           0(0x0)
    Cacheline Size:          64(0x40)
    Max Clock Freq. (MHz):   5881
    BDFID:                   0
    Internal Node ID:        0
    Compute Unit:            32
    SIMDs per CU:            0
    Shader Engines:          0
    Shader Arrs. per Eng.:   0
    WatchPts on Addr. Ranges:1
    Memory Properties:
    Features:                None
    Pool Info:
        Pool 1
        Segment:                 GLOBAL; FLAGS: FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 2
        Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 3
        Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 4
        Segment:                 GLOBAL; FLAGS: COARSE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
    ISA Info:
    *******
    Agent 2
    *******
    Name:                    gfx1036
    Uuid:                    GPU-XX
    Marketing Name:          AMD Radeon Graphics
    Vendor Name:             AMD
    Feature:                 KERNEL_DISPATCH
    Profile:                 BASE_PROFILE
    Float Round Mode:        NEAR
    Max Queue Number:        128(0x80)
    Queue Min Size:          64(0x40)
    Queue Max Size:          131072(0x20000)
    Queue Type:              MULTI
    Node:                    1
    Device Type:             GPU
    Cache Info:
        L1:                      16(0x10) KB
        L2:                      256(0x100) KB
    Chip ID:                 5710(0x164e)
    ASIC Revision:           1(0x1)
    Cacheline Size:          64(0x40)
    Max Clock Freq. (MHz):   2200
    BDFID:                   2560
    Internal Node ID:        1
    Compute Unit:            2
    SIMDs per CU:            2
    Shader Engines:          1
    Shader Arrs. per Eng.:   1
    WatchPts on Addr. Ranges:4
    Coherent Host Access:    FALSE
    Memory Properties:       APU
    Features:                KERNEL_DISPATCH
    Fast F16 Operation:      TRUE
    Wavefront Size:          32(0x20)
    Workgroup Max Size:      1024(0x400)
    Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
    Max Waves Per CU:        32(0x20)
    Max Work-item Per CU:    1024(0x400)
    Grid Max Size:           4294967295(0xffffffff)
    Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)
        y                        65535(0xffff)
        z                        65535(0xffff)
    Max fbarriers/Workgrp:   32
    Packet Processor uCode:: 18
    SDMA engine uCode::      1
    IOMMU Support::          None
    Pool Info:
        Pool 1
        Segment:                 GLOBAL; FLAGS: COARSE GRAINED
        Size:                    65490308(0x3e74d84) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:2048KB
        Alloc Alignment:         4KB
        Accessible by all:       FALSE
        Pool 2
        Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
        Size:                    65490308(0x3e74d84) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:2048KB
        Alloc Alignment:         4KB
        Accessible by all:       FALSE
        Pool 3
        Segment:                 GROUP
        Size:                    64(0x40) KB
        Allocatable:             FALSE
        Alloc Granule:           0KB
        Alloc Recommended Granule:0KB
        Alloc Alignment:         0KB
        Accessible by all:       FALSE
    ISA Info:
        ISA 1
        Name:                    amdgcn-amd-amdhsa--gfx1036
        Machine Models:          HSA_MACHINE_MODEL_LARGE
        Profiles:                HSA_PROFILE_BASE
        Default Rounding Mode:   NEAR
        Default Rounding Mode:   NEAR
        Fast f16:                TRUE
        Workgroup Max Size:      1024(0x400)
        Workgroup Max Size per Dimension:
            x                        1024(0x400)
            y                        1024(0x400)
            z                        1024(0x400)
        Grid Max Size:           4294967295(0xffffffff)
        Grid Max Size per Dimension:
            x                        2147483647(0x7fffffff)
            y                        65535(0xffff)
            z                        65535(0xffff)
        FBarrier Max Size:       32
        ISA 2
        Name:                    amdgcn-amd-amdhsa--gfx10-3-generic
        Machine Models:          HSA_MACHINE_MODEL_LARGE
        Profiles:                HSA_PROFILE_BASE
        Default Rounding Mode:   NEAR
        Default Rounding Mode:   NEAR
        Fast f16:                TRUE
        Workgroup Max Size:      1024(0x400)
        Workgroup Max Size per Dimension:
            x                        1024(0x400)
            y                        1024(0x400)
            z                        1024(0x400)
        Grid Max Size:           4294967295(0xffffffff)
        Grid Max Size per Dimension:
            x                        2147483647(0x7fffffff)
            y                        65535(0xffff)
            z                        65535(0xffff)
        FBarrier Max Size:       32
    *** Done ***
    
  2. Outil rocm-smi:

    rocm-smi
    

    Résultat de la sortie de commande:

    WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
    
    =========================================== ROCm System Management Interface ===========================================
    ===================================================== Concise Info =====================================================
    Device  Node  IDs              Temp    Power    Partitions          SCLK     MCLK     Fan    Perf  PwrCap  VRAM%  GPU%
          (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)
    ========================================================================================================================
    0       1     0x7551,   64106  67.0°C  184.0W   N/A, N/A, 0         3259Mhz  96Mhz    54.9%  auto  300.0W  31%    100%
    1       2     0x164e,   36957  46.0°C  35.194W  N/A, N/A, 0         N/A      1800Mhz  0%     auto  N/A     3%     0%
    ========================================================================================================================
    ================================================= End of ROCm SMI Log ==================================================
    
  3. Outil amd-smi:

    amd-smi
    

    Résultat de la sortie de la commande (notez qu'un autre GPU, intégré au processeur, est également affiché):

    +------------------------------------------------------------------------------+
    | AMD-SMI 26.2.0+021c61fc    amdgpu version: 6.18.1-061801 ROCm version: 7.1.1 |
    | VBIOS version: 00158746                                                      |
    | Platform: Linux Baremetal                                                    |
    |-------------------------------------+----------------------------------------|
    | BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
    | GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
    |=====================================+========================================|
    | 0000:03:00.0    AMD Radeon Graphics | 68 %     82 °C   0           285/300 W |
    |   0       0     N/A             N/A | 95 %    52.94           10406/32624 MB |
    |-------------------------------------+----------------------------------------|
    | 0000:0a:00.0 ...X 16-Core Processor | N/A        N/A   0             N/A/0 W |
    |   1       1     N/A             N/A | N/A        N/A               15/512 MB |
    +-------------------------------------+----------------------------------------+
    +------------------------------------------------------------------------------+
    | Processes:                                                                   |
    |  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
    |==============================================================================|
    |    0      12335  ollama                 2.0 MB    9.7 GB    10.1 GB  N/A     |
    |    1      12335  ollama                 2.0 MB   35.2 KB      0.0 B  N/A     |
    +------------------------------------------------------------------------------+
    

Tout cela vous permettra de voir la charge actuelle, la température et la consommation d'énergie des GPU.

Note

amd-smi remplace progressivement rocm-smi comme utilitaire de surveillance principal dans les versions ROCm plus récentes.

Travailler avec Docker

Si vous utilisez Docker, vous devez installer les outils pour passer les GPU dans les conteneurs :

sudo apt install -y rocm-gdb rocm-container-toolkit
sudo systemctl restart docker

Installation automatique en un clic

Script Bash pour l'automatisation complète du processus. Il détecte la dernière version de ROCm, installe les pilotes, l'utilitaire rocminfo et configure les chemins. Copiez-le et collez-le dans la ligne de commande de votre serveur, puis exécutez-le.

#!/bin/bash
set -euo pipefail

# Universal AMD GPU + ROCm ("latest") installer for Ubuntu 24.04+

# FLAGS (enable/disable steps here)

DO_APT_UPGRADE=1

DO_OS_POLICY_CHECK=1                 # Enforce policy: only Ubuntu 24.04 LTS
ALLOWED_UBUNTU_VERSIONS=("24.04")

DO_KERNEL_POLICY_CHECK=1          # Enforce policy "kernel >= REQUIRED_KERNEL_MM"
DO_INSTALL_MAINLINE_KERNEL=1      # If kernel is lower, try installing a mainline kernel
REQUIRED_KERNEL_MM="6.13"

DO_GRUB_PARAMS=0                  # Add GRUB params (conservatively disabled by default)
GRUB_PARAMS=("amdgpu.gpu_recovery=1" "amdgpu.runpm=0" "amdgpu.ppfeaturemask=0xffffffff")

DO_PURGE_OLD_PACKAGES=1           # Remove old rocm/amdgpu/hip packages (best effort)
DO_SETUP_ROCM_REPO=1              # Add ROCm repository
DO_INSTALL_ROCM=1                 # Install rocm-dev/rocm-libs/...
DO_LINK_OPT_ROCM=1                # Make /opt/rocm -> /opt/rocm-X.Y.Z (if found)

DO_USER_GROUPS=1                  # Add user to render,video
DO_BASHRC_PATH=1                  # Add /opt/rocm/bin to PATH via ~/.bashrc

DO_OLLAMA_AMDGPU_IDS_WORKAROUND=1 # Create amdgpu.ids link for some Ollama builds
DO_GPU_POWER_CONTROL_ON=1         # Best effort: set power/control=on (if available)


# Start
echo "Starting AMD ROCm installation..."

# Dependency checks
for cmd in lspci wget gpg curl lsb_release; do
  if ! command -v "$cmd" >/dev/null 2>&1; then
    echo "Missing dependency: $cmd"
    exit 1
  fi
done


# Check this is Ubuntu (robust: don't grep raw file)

. /etc/os-release
if [[ "${ID:-}" != "ubuntu" ]]; then
  echo "This script is intended for Ubuntu. Exiting."
  exit 1
fi


# Restrict script to specific Ubuntu releases (only 24.04)
# ALLOWED_UBUNTU_VERSIONS=("24.04")
if [[ "${DO_OS_POLICY_CHECK}" -eq 1 ]]; then
  UBUNTU_VERSION="$(lsb_release -rs)"  # e.g. 24.04 [web:62]

  ok=0
  for v in "${ALLOWED_UBUNTU_VERSIONS[@]}"; do
    if [[ "${UBUNTU_VERSION}" == "${v}" ]]; then
      ok=1
      break
    fi
  done

  if [[ "${ok}" -ne 1 ]]; then
    echo "Unsupported Ubuntu version: ${UBUNTU_VERSION}"
    echo "Allowed versions: ${ALLOWED_UBUNTU_VERSIONS[*]}"
    exit 1
  fi
fi

# Detect AMD GPU (vendor 1002)
AMD_GPU_LINES="$(lspci -nn | grep -iE 'vga|3d' | grep -i '1002:' || true)"
if [[ -z "${AMD_GPU_LINES}" ]]; then
  echo "No AMD GPU detected (vendor 1002)."
  exit 1
fi
echo "AMD GPUs detected:"
echo "${AMD_GPU_LINES}"

# Update/upgrade
if [[ "${DO_APT_UPGRADE}" -eq 1 ]]; then
  sudo apt update
  sudo apt upgrade -y
fi

# Kernel check/upgrade (script policy)
KERNEL_INSTALLED=0
echo "Current kernel: $(uname -r)"

if [[ "${DO_KERNEL_POLICY_CHECK}" -eq 1 ]]; then
  KERNEL_VERSION="$(uname -r)"
  KERNEL_MM="$(echo "${KERNEL_VERSION}" | sed -nE 's/^([0-9]+)\.([0-9]+).*/\1.\2/p')"

  req_major="${REQUIRED_KERNEL_MM%.*}"
  req_minor="${REQUIRED_KERNEL_MM#*.}"
  cur_major="${KERNEL_MM%.*}"
  cur_minor="${KERNEL_MM#*.}"

  KERNEL_OK=0
  if [[ "${cur_major}" -gt "${req_major}" ]] || \
     [[ "${cur_major}" -eq "${req_major}" && "${cur_minor}" -ge "${req_minor}" ]]; then
    KERNEL_OK=1
  fi

  if [[ "${KERNEL_OK}" -ne 1 ]]; then
    echo "Kernel is older than required by this script policy (>= ${REQUIRED_KERNEL_MM})."
    if [[ "${DO_INSTALL_MAINLINE_KERNEL}" -eq 1 ]]; then
      echo "Installing latest mainline kernel..."
      sudo add-apt-repository ppa:cappelikan/ppa -y 2>/dev/null || true
      sudo apt update
      sudo apt install -y mainline pkexec
      sudo mainline install-latest
      echo "Mainline kernel installed. Reboot required to activate it."
      KERNEL_INSTALLED=1
    else
      echo "Mainline kernel install is disabled by flag DO_INSTALL_MAINLINE_KERNEL=0. Continuing."
    fi
  fi
fi

# Optional: GRUB parameters (append-only)
if [[ "${DO_GRUB_PARAMS}" -eq 1 ]]; then
  GRUB_FILE="/etc/default/grub"
  GRUB_CHANGED=0

  for param in "${GRUB_PARAMS[@]}"; do
    if ! sudo grep -qE "GRUB_CMDLINE_LINUX_DEFAULT=.*\b${param}\b" "${GRUB_FILE}"; then
      sudo cp -a "${GRUB_FILE}" "${GRUB_FILE}.backup.$(date +%F-%H%M%S)"
      sudo sed -i -E "s/^(GRUB_CMDLINE_LINUX_DEFAULT=\")([^\"]*)\"/\1\2 ${param}\"/" "${GRUB_FILE}"
      echo "Added GRUB param: ${param}"
      GRUB_CHANGED=1
    else
      echo "GRUB param already present: ${param}"
    fi
  done

  if [[ "${GRUB_CHANGED}" -eq 1 ]]; then
    sudo update-grub
    echo "GRUB updated."
  fi
else
  echo "Skipping GRUB parameters (DO_GRUB_PARAMS=0)."
fi

# Best effort: purge old packages/repos
if [[ "${DO_PURGE_OLD_PACKAGES}" -eq 1 ]]; then
  echo "Removing previous ROCm/AMDGPU packages (best effort)..."
  sudo dpkg --configure -a || true
  sudo apt remove --purge -y rocminfo || true
  sudo apt purge -y 'rocm*' 'amdgpu*' 'graphics*' 'hip*' || true
  sudo apt autoremove -y || true
  sudo apt clean || true
  sudo rm -rf /etc/apt/sources.list.d/amdgpu* /etc/apt/sources.list.d/rocm* /etc/apt/sources.list.d/graphics* || true
  sudo apt update || true
else
  echo "Skipping purge old packages (DO_PURGE_OLD_PACKAGES=0)."
fi

# Add ROCm "latest" repository
if [[ "${DO_SETUP_ROCM_REPO}" -eq 1 ]]; then
  echo "Setting up ROCm 'latest' repository..."

  . /etc/os-release

  UBUNTU_CODENAME="${UBUNTU_CODENAME:-${VERSION_CODENAME:-}}"
  if [[ -z "${UBUNTU_CODENAME}" ]]; then
  echo "Cannot detect Ubuntu codename (UBUNTU_CODENAME/VERSION_CODENAME)."
  exit 1
  fi

  sudo install -d -m 0755 /usr/share/keyrings
  wget -qO- https://repo.radeon.com/rocm/rocm.gpg.key \
    | gpg --dearmor \
    | sudo tee /usr/share/keyrings/rocm-archive-keyring.gpg >/dev/null

  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/rocm-archive-keyring.gpg] https://repo.radeon.com/rocm/apt/latest/ ${UBUNTU_CODENAME} main" \
    | sudo tee /etc/apt/sources.list.d/rocm.list >/dev/null

  # Pin repo.radeon.com above Ubuntu
  sudo tee /etc/apt/preferences.d/rocm-pin-600 >/dev/null <<'EOF'
Package: *
Pin: origin repo.radeon.com
Pin-Priority: 600
EOF

else
  echo "Skipping ROCm repo setup (DO_SETUP_ROCM_REPO=0)."
fi

# Install ROCm packages
if [[ "${DO_INSTALL_ROCM}" -eq 1 ]]; then
  echo "Installing ROCm stack..."
  sudo apt update
  sudo apt install -y -o Dpkg::Options::="--force-overwrite" \
    rocm-dev rocm-libs rocm-hip-sdk rocm-smi-lib rocminfo
else
  echo "Skipping ROCm install (DO_INSTALL_ROCM=0)."
fi

# /opt/rocm -> /opt/rocm-X.Y.Z
if [[ "${DO_LINK_OPT_ROCM}" -eq 1 ]]; then
  INSTALLED_ROCM_DIR="$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1 || true)"
  if [[ -n "${INSTALLED_ROCM_DIR}" ]]; then
    REAL_VERSION="$(echo "${INSTALLED_ROCM_DIR}" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' || echo latest)"
    sudo ln -sfn "${INSTALLED_ROCM_DIR}" /opt/rocm
    echo "ROCm detected: ${REAL_VERSION} (${INSTALLED_ROCM_DIR}); linked /opt/rocm -> ${INSTALLED_ROCM_DIR}"
  else
    echo "No /opt/rocm-X.Y.Z directory found; leaving /opt/rocm as-is."
  fi
else
  echo "Skipping /opt/rocm symlink (DO_LINK_OPT_ROCM=0)."
fi

# User groups: render,video
if [[ "${DO_USER_GROUPS}" -eq 1 ]]; then
  TARGET_USER="${SUDO_USER:-$USER}"
  sudo usermod -aG render,video "${TARGET_USER}" || true
  echo "User added to groups: render, video (${TARGET_USER}). Re-login or reboot required."
else
  echo "Skipping user groups (DO_USER_GROUPS=0)."
fi

# PATH + LD_LIBRARY_PATH in ~/.bashrc
if [[ "${DO_BASHRC_PATH}" -eq 1 ]]; then
  TARGET_USER="${SUDO_USER:-$USER}"
  TARGET_HOME="$(getent passwd "${TARGET_USER}" | cut -d: -f6)"
  TARGET_BASHRC="${TARGET_HOME}/.bashrc"
  MARKER="AMD ROCm Paths"

  if [[ ! -f "${TARGET_BASHRC}" ]]; then
    sudo -u "${TARGET_USER}" touch "${TARGET_BASHRC}" || true
  fi

  # Determining the installed ROCm version
  ROCM_VERSION_DIR="$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1 || true)"
  if [[ -n "${ROCM_VERSION_DIR}" ]]; then
    ROCM_VERSION="$(basename "${ROCM_VERSION_DIR}" | sed 's/rocm-//')"
    echo "Using ROCm version: ${ROCM_VERSION} (${ROCM_VERSION_DIR})"
  else
    ROCM_VERSION="unknown"
    echo "Warning: No /opt/rocm-X.Y.Z found; using generic paths"
  fi

  if ! grep -q "${MARKER}" "${TARGET_BASHRC}" 2>/dev/null; then
    cat >> "${TARGET_BASHRC}" <<EOF

# ${MARKER}
if [ -d "/opt/rocm-${ROCM_VERSION}" ]; then
  export PATH="/opt/rocm-${ROCM_VERSION}/bin:\$PATH"
  export LD_LIBRARY_PATH="/opt/rocm-${ROCM_VERSION}/hip/lib:/opt/rocm-${ROCM_VERSION}/lib:\$LD_LIBRARY_PATH"
  export ROCM_PATH="/opt/rocm-${ROCM_VERSION}"
  export HIP_CLANG_PATH="/opt/rocm-${ROCM_VERSION}/llvm/bin"
fi
EOF
    echo "Added full ROCm paths (PATH+LD_LIBRARY_PATH) to ${TARGET_BASHRC}"
  else
    echo "ROCm PATH block already present in ${TARGET_BASHRC}"
  fi

  # Apply to the current session
  if [[ -n "${ROCM_VERSION_DIR}" ]]; then
    export PATH="${ROCM_VERSION_DIR}/bin:${PATH}"
    export LD_LIBRARY_PATH="${ROCM_VERSION_DIR}/hip/lib:${ROCM_VERSION_DIR}/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
    export ROCM_PATH="${ROCM_VERSION_DIR}"
    export HIP_CLANG_PATH="${ROCM_VERSION_DIR}/llvm/bin"
  fi
else
  echo "Skipping .bashrc PATH (DO_BASHRC_PATH=0)."
fi

# AMD ROCm Paths
export PATH="/opt/rocm/bin:$PATH"
EOF
    echo "Added ROCm PATH to ${TARGET_BASHRC}"
  else
    echo "ROCm PATH block already present in ${TARGET_BASHRC}"
  fi
else
  echo "Skipping .bashrc PATH (DO_BASHRC_PATH=0)."
fi

# Workaround for amdgpu.ids (some Ollama builds)
if [[ "${DO_OLLAMA_AMDGPU_IDS_WORKAROUND}" -eq 1 ]]; then
  if [[ -f /usr/share/libdrm/amdgpu.ids ]]; then
    sudo mkdir -p /opt/amdgpu/share/libdrm
    sudo ln -sf /usr/share/libdrm/amdgpu.ids /opt/amdgpu/share/libdrm/amdgpu.ids
    echo "Created compatibility link: /opt/amdgpu/share/libdrm/amdgpu.ids -> /usr/share/libdrm/amdgpu.ids"
  else
    echo "amdgpu.ids not found at /usr/share/libdrm/amdgpu.ids; skipping workaround."
  fi
else
  echo "Skipping Ollama amdgpu.ids workaround (DO_OLLAMA_AMDGPU_IDS_WORKAROUND=0)."
fi

# Best effort: power/control=on
if [[ "${DO_GPU_POWER_CONTROL_ON}" -eq 1 ]]; then
  if [[ -w /sys/class/drm/card0/device/power/control ]]; then
    echo on | sudo tee /sys/class/drm/card0/device/power/control >/dev/null
    echo "Set /sys/class/drm/card0/device/power/control = on"
  else
    echo "No write access to /sys/class/drm/card0/device/power/control; skipping."
  fi
else
  echo "Skipping GPU power control (DO_GPU_POWER_CONTROL_ON=0)."
fi

# Final
echo "Installation finished."
if [[ "${KERNEL_INSTALLED}" -eq 1 ]]; then
  echo "Reboot required to activate the new kernel."
else
  echo "Reboot recommended to apply group membership changes."
fi

echo "After reboot, verify:"
echo "  rocminfo"
echo "  amd-smi (if installed)"

Attention

Après l'exécution du script, un redémarrage du serveur à l'aide de la commande sudo reboot est obligatoire. Cela est nécessaire pour activer les nouveaux groupes d'utilisateurs et les modules du noyau.


Une partie du contenu de cette page a été créée ou traduite à l'aide d'IA.