Federated Learning with NVFLARE¶

In this tutorial, we will guide you on how to use the Federated Learning (FL) server in the platform to perform FL training with NVFlare.

Requirements

🔒 You need a platform account with full access level.

Deploying a Federated server¶

The workflow for deploying a FL server is similar to the one for deploying a module.

In this particular case, you will need to pay attention to:

Your credentials to access the NVFLARE Dashboard and associated Jupyter notebook.
Whether to make the project public or restrict the training to authorized trusted partners.
The Docker image that includes all the necessary dependencies and configurations provided by the admin of the project.
The start and end dates of the training.

Note

If you want to give another user access to sign in to the NVFlare Dashboard and register sites to the project, please make sure that during the deployment creation, you set ‘Make project public’ to True. Otherwise, you can only register the clients from the command line.

Preparing the training environment¶

In the deployments list you will be able to see your newly created NVFLARE instance.

The NVFLARE endpoints¶

Clicking the Quick access button, you can see two endpoints:

DASHBOARD:

This allows you to access the NVFLARE Dashboard.

Enter your credentials from the configuration step and voilá, you’re in as the project admin!

This dashboard is used to generate the startup kits for the server, admins and clients. The startup kits include the configurations and certificates required to establish secure connections between the FL servers, FL clients, and admin clients. These files are essential for verifying identity and enforcing authorization policies between the server and clients.
SERVER-JUPYTER:

This provides access to a JupyterLab environment for the server, also protected by your admin credentials. The server’s startup kit is automatically downloaded to the workspace directory within JupyterLab, and the server is already running.

If the server is stopped for any reason during the project, you can restart it by executing the following script:
```
$ sh workspace/server_address_folder/startup/start.sh
```

Adding new clients to the training¶

A project can have multiple admins (among other roles). The Project Admin is the person who initially created the deployment within the Dashboard. Each organization participating in the federated training should also designate an Organization Admin (Org Admin). Org Admins are responsible for registering their own organization’s sites within the project. The Project Admin has the authority to approve Organization Admins as well as their associated sites.

To allow organization admins to register their sites, share the dashboard link with them. Organization Admins can access the dashboard through this link and click Sign Up to register themselves and their sites (detailed instructions). To register sites, ensure the Role is set to ‘Org Admin’. On the next page, the Org Admin can register their sites, specifying the number of GPUs and the memory capacity for each GPU.

After completing registration, users must wait for the project’s main admin to approve their roles and associated sites. Once approved, the organization admins can log into the dashboard, download the startup kits for their sites, and obtain the Docker image shared by the project admin for the project code. Using these startup kits, they can then launch their sites.

After downloading and unzipping the startup package, the Admin can run the following command to start the sites from anywhere in the world and connect to the server hosted in the platform.

$ sh ./site_name_folder/startup/start.sh

The Admin can also start the Flare Console by running the following command from the downloaded Flare Console startup kit from anywhere in the world.

$ sh ./admin_email/startup/fl_admin.sh

You will be prompted to enter a username. Use the email address provided by the admin during registration.

From the admin console, the admin can orchestrate the FL study—this includes starting and stopping the server and clients, checking their status, deploying applications, and managing FL experiments (available commands).

Note

To maintain a consistent environment, it is advised that the project Admin create a Docker image containing all the necessary dependencies and configurations, and provide it during the deployment of the server on the Dashboard. This approach ensures reproducibility and simplifies deployment across different sites.

By default we provide such an image during the configuration step:

Start your Federated Learning training¶

Once a sufficient number of sites are connected to the server, any Admin can log in to the console and submit an FL job. Before doing so, they need to prepare the FL job by converting their existing ML/DL code into an FL-compatible version using NVFLARE.

Please take a look at the following examples:

Check the getting_started examples in the NVFLARE repository.
Check the ml-to-fl examples demonstrating how to transition simple ML/DL projects to NVFLARE.
We provide a simple hello numpy example.
For an advanced example, you can check the phyto-plankton-classification module that has been adapted to NVFLARE.

For more information, please refer to the official NVFLARE documentation.