Federated Learning with Flower

Requirements

🔒 This tutorial requires full authentication.

In this tutorial, we will guide you on how to use the Federated Learning (FL) server in the AI4OS platform to perform a FL training using Flower.

Deploying a Federated server

The workflow for deploying a FL server is similar to the one for deploying a module.

In this particular case, you will need to pay attention to:

  • The service deployed: When configuring the deployment of the FL server, we recommend selecting JupyterLab or VS Code as service to run if you want to monitor the process. If you select fedserver, the FL server will be started automatically, but you will not be able to monitor the process (e.g. if there is a failure, how the clients are connected or if any of them is disconnected).

  • The Docker tag: In the first configuration step you must select the docker tag. Note that the tag tokens will deploy the federated server with authentication enabled between the server and the clients (more info in the next sections).

  • The Federated configuration: The last section (Federated configuration) will let you choose specific configuration for the FL training server like:

    ../../_images/configure_flower.png

Federated learning training in AI4EOSC

Starting the Federated Learning server

Note

This step is not needed if you configured the deployment to run with the fedserver option.

If you deployed with JupyterLab/VScode, open the IDE and start the fedserver process:

$ cd federated-server/fedserver
$ python3 server.py

If you want to change any parameters in the federated configuration, you can always modify fedserver/server.py.

Retrieve the configuration

Now that your fedserver is running, you need to do the following steps:

  1. Find the endpoint where your server is deployed:

    Once your FL server is running, go back to the Dashboard, find your deployment, click on Info and copy the URL of fedserver endpoint.

  2. Find the secret token of your deployment:

    Note

    This step is only needed if you selected the tokens Docker tag during configuration.

    AI4OS provides users with a token-based system that can be used for authenticating the clients prior to their incorporation into the federated training.

    To access the secret token, find your deployments and click the icon. You can generate as many tokens as needed (eg. 1 token per client), as well as revoke them:

    ../../_images/secrets.png
  3. Share them with the clients:

    Note

    This step is only needed if you selected the tokens Docker tag during configuration.

    You will need to share the endpoint and the appropriate token with the clients that will take part in the training. In the section below we will explain how the clients can use them to connect to the training.

Connecting the clients

In order to connect the clients to the FL server deployed within the platform, two approaches can be followed depending on where the clients are running:

  • Clients running locally on the user’s resources or on servers external to the platform. This is the most classic approach as in general in a FL training the data should not be leave the server where they are stored for training. Note that in most cases privacy restrictions are applied on them that prevent their centralization. Thus, in order to connect each client to the server, the clients must know the UUID of the deployment where the FL server is deployed as well as the datacenter on which it is running (IFCA or IISAS). Then, you can add the call_credentials parameter if the server has been created using tokens, as will be explained in the following section.

    In this line, each client can connect to the server as follows:

    import certifi
    # Start -> connecting with the server
    uuid = "*********************"  # UUID of the deployment with the FL server (dashboard)
    data_center = "****" # The value for the data center can be ifca or iisas (lowercase)
    end_point = f"ide-{uuid}.{data_center}-deployments.cloud.ai4eosc.eu"
    fl.client.start_client(
        server_address=f"{endpoint}:443",
        client=Client(),
        root_certificates=Path(certifi.where()).read_bytes(),
    )
    
  • Clients running on different deployments of the platform. If you are running your clients from different deployments created in the platform, in order to connect to the server you have to first find the IP of the server form the server side. In this line, you first go to the deployment in which you have started the server, open a terminal an run:

    env | grep NOMAD_HOST_ADDR_fedserver
    

    This will provide the IP and the port in which the FL server is running.

    Then, from the client side, you can start the client as follows (again, you can add the call_credentials parameter if needed), introducing the IP and port from the server side as server_address:

    # Start -> connecting with the server
    server_host = "*********************"  # FILL IN WITH THE SERVER IP AND PORT FOR FL (server side)
    fl.client.start_client(
        server_address=server_ip,
        client=Client()
    )
    

Client-server authentication

In the AI4OS project, we use a custom fork of the flower library to perform FL trainings.

In the code below, we provide an example on how to integrate the previously obtained token and endpoint into the client code. More examples are available here.

import flwr as fl
from pathlib import Path
import certifi
import ai4flwr.auth.bearer

# Read the data, create the model
# (...)

# Create the class Client(), example of Flower client:
class Client(fl.client.NumPyClient):
    def get_parameters(self, config):
        return model.get_weights()

    def fit(self, parameters, config):
        model.set_weights(parameters)
        model.fit(x_train, y_train, epochs=5, batch_size=16)
        return model.get_weights(), len(x_train), {}

    def evaluate(self, parameters, config):
        model.set_weights(parameters)
        loss, accuracy = model.evaluate(x_test, y_test)
        return loss, len(x_test), {"accuracy": accuracy}


token = "*********************" # INCLUDE THE TOKEN GENERATED IN THE DASHBOARD
auth_plugin = ai4flwr.auth.bearer.BearerTokenAuthPlugin(token)

# Start -> connecting with the server
endpoint = "*********************"  # FILL IN WITH THE ENDPOINT (dashboard) OR THE SERVER ADDRESS
fl.client.start_client(
    server_address=f"{endpoint}:443",
    client=Client(),
    root_certificates=Path(certifi.where()).read_bytes(),
    call_credentials=auth_plugin.call_credentials()
)

If you didn’t selected token authentication, feel free to remove the call_credentials parameter in the start_client() function.

Server side differential privacy

DP states that an algorithm is differentially private if by viewing its result an adversary cannot know whether a particular individual’s data is included in the database used to achieve such result. This can be achieved by adding controlled noise using different mechanisms, such us Laplace, Exponential, Gaussian, etc. We can use the privacy budget for controlling the amount of noise, i.e. the level of privacy and the utility of the data.

In case that you want to start a FL server and include more privacy restrictions when building the global aggregated model, you can add differential privacy (DP) from the server side. Specifically, you can perform this step from the FL configuration when creating the server. You will need to include the noise multiplier for the Gaussian Mechanism, the clipping norm and the number of clients sampled parameters. Note that this functionality is compatible with each of the aggregation strategies available in the platform. It’s important to note that in this case the noise multiplier is not the privacy budget, but here a greater value of the noise multiplier implies more privacy restrictions (more noise) and less utility. This allows to ensure central DP from the server-side when building the global model with fixed clipping.

Server side metric privacy

Metric privacy (also known as metric differential privacy or d-privacy) is a variant (relaxation) of differential privacy that can be used in domains in which there is a notion of distance. Unlike standard DP, this concept takes into account the distance between the datasets involved. This can be useful in order to adapt the privacy level and noise added to offer better privacy when the distance is small.

Following the work done in this preprint, the distance metric considered depends on the distance between the model updates of the clients involved. In order to do so, the server calculates the maximum distance for each pair of clients by analyzing the local weights received from each of them. With the proposed approach given for including metric privacy in the server side in a FL training, we can guarantee metric-privacy for each round of the architecture. According to the aforementioned work, users can choose to rely on metric privacy instead of standard DP to achieve a better balance between added noise (calibrated using the distance), and protection against client inference attacks in cases where the server is trusted, but not all participants, or they simply want to prevent such attacks from the final model if published.

Note that the same parameters as for the case of server side differential privacy are used (noise multiplier for the Gaussian Mechanism, the clipping norm and the number of clients sampled). More information in this approach can be found in this preprint.

Monitoring of training CO2 emissions

To provide users with the ability to monitor carbon emissions derived from their training, the FL server includes an integrated emissions monitoring feature. This functionality is implemented using the codecarbon Python library.

Users can enable emissions tracking when deploying the FL server. To do so, they must activate this option when deploying the server from the dashboard, and then introduce some modifications on the client side to report local training emissions. The server then aggregates the emissions reported by the participating clients in each round, along with its own emissions, primarily from the aggregation process.

Note that clients must voluntarily enable emissions tracking, the platform cannot enforce this functionality on them. The goal is to allow each client to track its own local emissions and send this data to the server, which then aggregates it with its own emissions to determine the overall training footprint. To facilitate this, only few changes are needed from the client side once the emissions tracking is allowed from the server, using codecarbon and including it the Client class (which inherits from the Flower fl.client.NumPyClient class). Note that the clients must install codecarbon locally:

import flwr as fl
from codecarbon import EmissionsTracker

# Read the data, create the model
# (...)

# Create the class Client(), example of Flower client including tracking the emissions:
class Client(fl.client.NumPyClient):
    def get_parameters(self, config):
        return model.get_weights()

    def fit(self, parameters, config):
        model.set_weights(parameters)
        tracker = EmissionsTracker()
        tracker.start()
        model.fit(x_train, y_train, epochs=3, batch_size=16)
        emissions = tracker.stop()
        print(f"Client Carbon Emissions: {emissions} kg CO2")
        return model.get_weights(), len(x_train), {"emissions": emissions}

   def evaluate(self, parameters, config):
       model.set_weights(parameters)
       loss, accuracy = model.evaluate(x_test, y_test)
       return loss, len(x_test), {"accuracy": accuracy}

The rest of the process, in which the server receives emissions from each client and aggregates them with its own, is handled on the server side once this option has been enabled. This allows the server to compute the total training emissions per round alongside the results for the aggregated metric(s) selected.