The most visited pages in the Backend.AI Web-UI would be the Sessions and Data & Storage pages. Here, you will learn how to query and create container-based compute sessions and utilize various web applications on the Sessions page.
Start a new session¶
After logging in with a user account, click Sessions on the left sidebar to visit the Sessions page. Sessions page lets you start new sessions or use and manage existing running sessions.
Click the START button to start a new compute session. The following wizard-style dialog will appear.
First, you need to choose the type of session, interactive or batch. Then, you need to choose the language Environment and Version you want to create. The rest items are optional. For a detailed description of each item, please refer to the following.
Session type: Determines the type of the session. “Interactive” and “Batch” are the two session types currently available. The following are the primary distinctions between the two types:
- Interactive compute session
- This type has been supported from the initial version of Backend.AI.
- The compute session is used in a way that the user interacts with after creating a session without specifying a pre-defined execution script or command.
- The session is not terminated automatically unless user explicitly destroys the session or session garbage collectors are set by the admin.
- Batch compute session
- This type of session is supported via GUI from Backend.AI 22.03 (CLI has supported the batch-type session before the 22.03).
- Pre-define the script that will be executed when a compute session is ready.
- Executes the script as soon as the compute session is ready, and then automatically terminates the session as soon as the execution finishes. So, it will more efficiently and flexibly utilize the server farm’s resources if a user can write the execution script in advance or is building a pipeline of workloads.
- You can set the start time of a batch-type compute session. However, it does not guarantee the session will be created at that time. It may still be PENDING due to the lack of resources, etc. Rather, it guarantees that the session WILL NOT run until the start time.
- Interactive compute session
Environments: You can choose the base environment for compute sessions such as TensorFlow, PyTorch, C++, etc. When you select TensorFlow, your compute session will automatically include the TensorFlow library. If you choose another environment, the corresponding packages will be installed by default.
Version: Selects the version of the environment. For example, you can select different versions, such as 1.15, 2.3, etc., for the TensorFlow environment.
Session name: You can specify the name of the compute session to be created. If set, this name appears in Session Info, so it is easy to distinguish among multiple computation sessions. If not specified, a random word is assigned automatically. Session names only accept alphanumeric characters between 4 and 64 without spaces.
Set Environment Variable: Provides an interface for users to set environment variables in a compute session. See the section How to add environment variables before session creation on how to use.
Click the right arrow button at the bottom to advance to the next page. You can also launch a compute session directly by clicking the CONFIRM AND LAUNCH button. In this case, the settings on the other pages will all use the default values.
Here, you can specify the data folders to mount in the compute session. When a compute session is destroyed, all data is deleted altogether by default, but the data stored in the mounted folders will survive. Data in those folders can also be reused by mounting it when creating another compute session. For the information on how to mount a folder and run a compute session, see Mounting Folders to a Compute Session. Here, we will pass by without mounting any folder. Click the right arrow button.
This page allows you to set the resources to be allocated for the new compute session.
- Resource Group: Specifies the resource group in which to create a compute session. A resource group is a unit that groups host servers that each user can access. Usually, servers in a resource group would have the same type of GPU resources. Administrators can classify servers by any criteria, group them into one or more resource groups, and configure which resource groups a user can use. Users can launch a compute session only on servers in resource groups allowed by the administrator. If you are allowed multiple resource groups, you can select any group you want, but you cannot change it if you have only one.
- Resource allocation: These templates have pre-defined resource sets, such as CPU, memory, and GPU, to be allocated to a compute session. Administrators can define frequently used resource settings in advance.
If you want to allocate every resource by yourself, click Custom allocation. The following advanced resource panel opens, and you can set each resource as you wish within the allowed resource limits.
The meaning of each item is as follows, and you can check it by clicking the Information (I) button on the right as well.
- CPU: The number of CPU cores to allocate to the compute session. The maximum value depends on the resource policy applied to the user.
- RAM: The amount of memory (GB) to allocate to the compute session. The maximum value depends on the resource policy applied to the user.
- Shared Memory: The amount of shared memory in GB to allocate for the compute session. Shared memory will use some part of the memory set in RAM. Therefore, it cannot be greater than the amount specified in RAM.
- GPU: The unit of GPU to allocate to the compute session. The maximum value depends on the resource policy applied to the user.
- Sessions: The number of compute sessions to be created with the specified settings. You can specify this value when you need to create the same compute sessions at once.
Backend.AI provides configuring values related to HPC Optimizations. For more information, See the section Optimizing Accelerated Computing.
If you are done with the resource setting, click the right arrow button to proceed to the next page.
Now, we have reached the last page. You can view information of session(s) to create, such as environment itself, allocated resources, mount information, environment variables set on the previous pages, etc. After confirming the settings, click the LAUNCH button. If there is a setting you want to change, you can return to the previous page by clicking the left arrow button.
A warning dialog appears, stating that there are no mounted folders. Ignore the warning for now and click the LAUNCH button to proceed.
Now a new compute session is created in the RUNNING tab.
In the RUNNING tab, you can check the information on the currently running sessions. It includes both interactive and batch sessions. BATCH tab and INTERACTIVE tab show only sessions corresponding to each type, but only for sessions not in terminated status. FINISHED tab shows the list of terminated sessions and OTHERS tab shows the compute sessions with errors. For each session, you can check the information such as session environments, the amount of allocated and used resources, session starting time, etc.
Superadmins can query all compute session information currently running (or terminated) in the cluster, and users can view only the sessions they have created.
Compute session list may not be displayed normally due to intermittent network connection problems, and etc. This can be solved by refreshing the browser page.
Backend.AI provides detailed status information for
CANCELLED sessions. In the case of
PENDING sessions, in particular,
you can check why the session is not scheduled and stuck in the
status. You can see the details by clicking the question mark icon right next
to the status of each session.
The resource statistics are displayed at the top of the screen. You can check the amount of resources currently used and the total amount of resources that can be allocated. The display bars are divided into upper and lower parts. The upper part shows the resource allocation status in the current scaling group and the lower part shows the allocation status of total accessible resources.
- Upper: (Resources allocated by the user in the current scaling group) / (Total resources allocatable by the user in the current scaling group)
- Lower: (Resources allocated by the user) / (Resources allocated by the user + Total resources allocatable by the user in the current scaling group)
If the GPU resource is marked as FGPU, this means that the server is serving the GPU resources in a virtualized form. Backend.AI supports GPU virtualization technology that a single physical GPU can be divided and shared by multiple users for better utilization. Therefore, if you want to execute a task that does not require a large amount of GPU computation, you can create a compute session by allocating only a portion of a GPU. The amount of GPU resources that 1 FGPU actually allocates may vary from system to system depending on the administrator’s setting.
For example, if administrator has set to split one physical GPU into five pieces, 5 FGPU means 1 physical GPU, or 1 FGPU means 0.2 physical GPU. At this configuration, if you create a compute session by allocating 1 FGPU, you can utilize SM (streaming multiprocessor) and GPU memory corresponding to 0.2 physical GPU for the session.
Use Jupyter Notebook¶
Let’s look at how to use and manage compute sessions that are already running. If you look at the Control panel of the session list, there are several icons. When you click the first icon, the app launcher pops up and shows the available app services as below. The app launcher dialog also opens automatically just after the compute session is created.
There are two check options under the app icons. Opening the app with each item checked applies the following features, respectively:
- Open app to public: Open the app to the public. Basically, web services such as Terminal and Jupyter Notebook services are not accessible by other users, even if the user knows the service URL, since they are considered unauthenticated. However, checking this option makes it possible for anyone who knows the service URL (and port number) to access and use it. Of course, the user must have a network path to access the service.
- Try preferred port: Without this option checked, a port number for the web service is randomly assigned from the port pool prepared in advance by Backend.AI. If you check this item and enter a specific port number, the entered port number will be tried first. However, there is no guarantee that the desired port will always be assigned because the port may not exist at all in the port pool or another service may already be using the port. In this case, the port number is randomly assigned.
Depending on the system configuration, these options may not be shown.
Let’s click on Jupyter Notebook.
A new window pops up and you can see that Jupyter Notebook is running. This notebook was created inside a running compute session and can be used easily with the click of a button without any other settings. Also, there is no need for a separate package installation process because the language environment and library provided by the computation session can be used as it is. For detailed instructions on how to use Jupyter Notebook, please refer to the official documentation.
In the notebook’s file explorer, the
id_container file contains a private
SSH key. If necessary, you can download it and use it for SSH / SFTP access to
Click the NEW button at the top right and select the Notebook for Backend.AI, then the ipynb window appears where you can enter your own code.
In this window, you can enter and execute any code you want by using the environment that session provides. The code is executed on one of the Backend.AI nodes where the compute session is actually created and there is no need to configure a separate environment on the local machine.
When you close the window, you can find that the
Untitled.ipynb file is
created in the notebook file explorer. Note that the files created here are
deleted when you terminate the session. The way to preserve those files even
after the session is terminated is described in the Data & Storage Folders section.
Use web terminal¶
Return to the Session list page. This time, let’s launch the terminal. Click the
terminal icon (the second button in the Control panel) to use the container’s
ttyd app. A terminal will appear in a new window and you can run shell commands
to access the computational session as shown in the following figure. If you are
familiar with using commands, you can easily run various Linux commands. You may
notice that the
Untitled.ipynb file automatically generated in Jupyter Notebook
is listed with the
ls command. This shows that both apps are running in the
same container environment.
If you create a file here, you can immediately see it in the Jupyter Notebook you opened earlier as well. Conversely, changes made to files in Jupyter Notebook can also be checked right from the terminal. This is because they are using the same files in the same compute session.
In addition to this, you can use web-based services such as TensorBoard, Jupyter Lab, etc., depending on the type of environments provided by the compute session.
Query compute session log¶
You can view the log of the compute session by clicking the last icon in the Control panel of the running compute session.
From 22.09, you can download session log by clicking download button on upper-right side of the dialog. This feature is helpful for tracking artifacts.
Rename running session¶
You can change the name of an active session. Just click the edit icon in the session information column. Write down the new name and click the confirm button. The new session name should also follow the the authoring rule.
Delete a compute session¶
To terminate a specific session, simply click on the red power icon and click OKAY button in the dialog. Since the data in the folder inside the compute session is deleted as soon as the compute session ends, it is recommended to move the data to the mounted folder or upload it to the mounted folder from the beginning if you want to keep it.
Backend.AI supports three types of inactivity (idleness) criteria for automatic garbage collection of compute sessions: Max Session Lifetime, Network Idle Timeout, and Utilization Checker.
Idle checkers(inactivity criterion) will be displayed in the idle checks column of the session list.
The meaning of idle checkers is as follows, and can also be viewed by clicking the info icon in the idle checks column.
- Max Session Lifetime: Force-terminate sessions after this time from creation. It prevents the session from running infinitely.
- Network Idle Timeout: Force-terminate sessions that do not exchange data with the user (browser or web app) after this time. Traffic between the user and the compute session continuously occurs when the user interacts with an app, like terminal or Jupyter, by keyboard input, Jupyter cell creation, etc. Jupyter cell creation, etc. If there is no interaction for a certain period, the condition of garbage collection will be met. Even if there is a process executing a job in the compute session, it is subject to termination if there is no user interaction.
- Utilization Checker: Force-terminate sessions based only on the utilization of resources allocated
- Grace Period: Utilization idle checker will be activated after this initial grace time. During this time, sessions are not terminated even if utilization is low.
- Utilization Threshold: Threshold criteria of each compute resource. When one or more resource of a compute session does not exceed the configured threshold criteria for a certain time, the session will be garbage collected (terminated). For example, if you set 1% of CUDA utilization threshold, compute sessions that show less than 1% CUDA GPU utilization, for a certain duration of time, will be destroyed. Resources with empty values are excluded from the garbage collection criteria.
If you hover your mouse over the Utilization Checker, a tooltip displays the utilization and threshold will appear. As the current utilization approaches to the threshold (towards lower usage), the font color changes to yellow, and then red.
Depending on the environment settings, idle checkers and resource types of utilization checker’s tooltip may be different.
How to add environment variable before creating a session¶
To give more convenient workspace for users, Backend.AI supports environment variable setting
in session launching. In this feature, you can add any envs such as
PATH by filling out
variable name and value in environment configuration dialog.
To add environment variable, simply click CONFIG button.
and then, environment configuration dialog appears.
In this dialog, you can add,update and delete written env variables. To see more information about how it works, please click ‘i’ button at the header of the dialog.
You can input variable name and value in the same line of the input fields. Then, click save button. It will be applied in the session.
If you close the dialog without click saving variables or If you didn’t fill out the variable and value, then those input values will not be applied into the session as env. Please remind that every variable and value that is not empty will be applied to session by clicking SAVE button.
To Add more environment variables, yon can click
+ button in the right side of the first row of input field.
Also, you can remove the variable by clicking
- button of the row that you want to get rid of.
If you want to delete the whole variables and value, please click DELETE ALL button at the bottom of the dialog.
Save container commit¶
From 22.09, Backend.AI supports container commit feature. Commiting a
RUNNING session will save the current state of the main container as a new
image. Clicking the commit button in the control pane of
will display a dialog to show the information of the session. After checking the
information, you can click the confirmation button to convert the container to
a new image.
After clicking commit button in the dialog, Backend.AI internally requests
Docker to create a new image as
tar.gz to be stored into a specific
host path. Please note that it’s not available to access directly in your local
environment. Users need to contact the administrator to get the image file.
Currently, Backend.AI supports container commit when session is
INTERACTIVE mode only. During container commit process, you may not be
able to terminate the session to prevent unexpected error. If you want to
stop the ongoing process, please check the session, and force-terminate
Optimizing Accelerated Computing¶
Backend.AI provides configuration UI for internal control variable in
Backend.AI sets this value equal to the number of session’s CPU cores by default,
which has the effect of accelerating typical high-performance computing workloads.
Nevertheless, for some multi-thread workloads, multiple processes using OpenMP are used at same time,
resulting in an abnormally large number of threads and significant performance degradation.
To resolve this issue, setting the number of threads to 1 or 2 would work.
Advanced web terminal usage¶
The web-based terminal internally embeds a utility called tmux. tmux is a terminal multiplexer that supports to open multiple shell windows within a single shell, so as to allow multiple programs to run in foreground simultaneously. If you want to take advantage of more powerful tmux features, you can refer to the official tmux documentation and other usage examples on the Internet.
Here we are introducing some simple but useful features.
Copy terminal contents¶
tmux offers a number of useful features, but it’s a bit confusing for first-time
users. In particular, tmux has its own clipboard buffer, so when copying the
contents of the terminal, you can suffer from the fact that it can be pasted
only within tmux by default. Furthermore, it is difficult to expose user
system’s clipboard to tmux inside web browser, so the terminal
contents cannot be copied and pasted to other programs of user’s computer. The
Ctrl-V is not working with tmux.
If you need to copy and paste the terminal contents to your system’s clipboard,
you can temporarily turn off tmux’s mouse support. First, press
to enter tmux control mode. Then type
:set -g mouse off and press
(note that you have to type the first colon as well). You can check what you are
typing in the status bar at the bottom of the screen. Then drag the desired text
from the terminal with the mouse and press the
Cmd-C (in Mac)
to copy them to the clipboard of the user’s computer.
With mouse support turned off, you cannot scroll through the mouse wheel to see
the contents of the previous page from the terminal. In this case, you can turn
on mouse support again. Press
Ctrl-B, and this time, type
:set -g mouse
on. Now you can scroll mouse wheel to see the contents of the previous page.
If you remember
:set -g mouse off or
:set -g mouse on after
you can use the web terminal more conveniently.
Ctrl-B is tmux’s default control mode key. If you set another control key
.tmux.conf in user home directory, you should press the set
key combination instead of
In the Windows environment, refer to the following shortcuts.
- Copy: Hold down
Shift, right-click and drag
- Paste: Press
Check the terminal history using keyboard¶
There is also a way to copy the terminal contents and check the previous
contents of the terminal simultaneously. It is to check the previous contents
using the keyboard. Again, click
Ctrl-B first, and then press the
Page Down keys. You can see that you navigate through the
terminal’s history with just keyboard. To exit search mode, just press the
key. With this method, you can check the contents of the terminal history even
when the mouse support is turned off to allow copy and paste.
Spawn multiple shells¶
The main advantage of tmux is that you can launch and use multiple shells in one
terminal window. Since seeing is believing, let’s press the
Ctrl-B key and
c. You can see that the contents of the existing window disappears
and a new shell environment appears. But the previous window is not terminated.
Ctrl-B and then
w. You can now see the
list of shells currently open on tmux like following image. Here, the shell
0: is the shell environment you first saw, and the shell
1: is the one you just created. You can move between shells
using the up/down keys. Place the cursor on the shell
0: and press the Enter
key to select it.
You can see the first shell environment appears. In this way, you can
use multiple shell environments within a web terminal. To exit or terminate the
current shell, just enter
exit command or press
Ctrl-B x key and then
Ctrl-B c: create a new tmux shell
Ctrl-B w: query current tmux shells and move around among them
Ctrl-B x: terminate the current shell
Combining the above commands allows you to perform various tasks simultaneously on multiple shells.