Brain Segmentation#
This tutorial will show how to use Fed-BioMed to perform image segmentation on 3D medical MRI images of brains, using the publicly available IXI dataset. It uses a 3D U-Net model for the segmentation, trained on data from 3 separate centers.
Here we display a very complex case, using advanced Fed-BioMed functionalities such as:
exploring the datasets in the federation
loading a MedicalImageDataset
monitoring training loss with Tensorboard
Parts of this tutorial are based on TorchIO’s tutorial.
Table of Contents#
Task 1: Discovering datasets
Task 2: MedicalFolderDataset class
Task 3: Federated feature analytics
Task 4: Train a UNet model
Task 5: Validation on a local holdout set
%load_ext tensorboard
import os
import tabulate
from pprint import pprint
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import AdamW, SGD
from torch.utils.data import DataLoader
from monai.networks.nets import UNet
from monai.losses.dice import DiceLoss
from monai.transforms import (Compose, NormalizeIntensity, AddChannel, Resize, AsDiscrete)
from fedbiomed.common.training_plans import BaseTrainingPlan, TorchTrainingPlan
from fedbiomed.common.logger import logger
from fedbiomed.common.data import DataManager, MedicalFolderDataset
from fedbiomed.researcher.requests import Requests
from fedbiomed.researcher.aggregators import Aggregator, FedAverage
from fedbiomed.researcher.environ import environ
from fedbiomed.common.training_args import TrainingArgs
from fedbiomed.researcher.experiment import Experiment
2023-07-03 15:11:32,411 - Selected pickle protocol: '4'
%matplotlib inline
Task 1: Discovering datasets #
Let’s discover which datasets are available for federated training in the network.
Try it yourself!#
Read the documentation for the Requests
class to figure out which function call can be used to list all the available datasets.
req = Requests()
datasets = req.list(verbose=False)
2023-07-03 15:11:35,240 fedbiomed INFO - Messaging researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7ffa9c238160>
2023-07-03 15:11:35,261 fedbiomed INFO - Listing available datasets in all nodes...
pprint(datasets)
{'node_10797f2f-2524-4595-a1c6-f3c67e03add1': [{'data_type': 'medical-folder',
'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503',
'dataset_parameters': {'index_col': 13},
'description': '',
'name': 'ixi site 2',
'shape': {'T1': [83, 44, 55],
'T2': [83, 44, 55],
'demographics': [69,
13],
'label': [83, 44, 55],
'num_modalities': 3},
'tags': ['ixi-jupyter-sharkovsky']},
{'data_type': 'mednist',
'dataset_id': 'dataset_d25841b6-022d-4fba-bfd0-a19de2fc19f8',
'dataset_parameters': None,
'description': 'mednist client '
'2',
'name': 'MedNIST client 2',
'shape': [18000, 3, 64, 64],
'tags': ['mednist-jupyter-sharkovsky']},
{'data_type': 'csv',
'dataset_id': 'dataset_3fffa907-7569-46f0-a2b1-fa7245e42499',
'dataset_parameters': None,
'description': 'heart client 2',
'name': 'Heart disease client '
'2',
'shape': [349, 19],
'tags': ['heart-jupyter-sharkovsky']}],
'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1': [{'data_type': 'medical-folder',
'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24',
'dataset_parameters': {'index_col': 13},
'description': '',
'name': 'ixi site 1',
'shape': {'T1': [83, 44, 55],
'T2': [83, 44, 55],
'demographics': [305,
13],
'label': [83, 44, 55],
'num_modalities': 3},
'tags': ['ixi-jupyter-sharkovsky']},
{'data_type': 'mednist',
'dataset_id': 'dataset_650c023c-0177-469a-b70e-9a4ed6e26c44',
'dataset_parameters': None,
'description': 'mednist client '
'1',
'name': 'MedNIST client 1',
'shape': [18000, 3, 64, 64],
'tags': ['mednist-jupyter-sharkovsky']},
{'data_type': 'csv',
'dataset_id': 'dataset_6dc88f60-68fb-4bc7-aa01-2f339ea0c08d',
'dataset_parameters': None,
'description': 'heart client 1',
'name': 'Heart disease client '
'1',
'shape': [391, 19],
'tags': ['heart-jupyter-sharkovsky']}]}
Filter results#
There are a lot of datasets available! However, most of them are from nodes that are going to collaborate with other users, not you. To identify datasets that were intended for your use, you should look at the value of the tags
.
Try it yourself!#
Fill in the body of the for loop below such that the datasets_for_me
variable follows these rules:
it has the same structure as the
datasets
variableit has all and only the datasets whose tag contains your username
my_username = 'sharkovsky'
datasets_for_me = dict()
for node, _data in datasets.items():
datasets_for_me[node] = list(filter( lambda dataset: any(my_username in x for x in dataset['tags']), _data))
pprint(datasets_for_me)
{'node_10797f2f-2524-4595-a1c6-f3c67e03add1': [{'data_type': 'medical-folder',
'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503',
'dataset_parameters': {'index_col': 13},
'description': '',
'name': 'ixi site 2',
'shape': {'T1': [83, 44, 55],
'T2': [83, 44, 55],
'demographics': [69,
13],
'label': [83, 44, 55],
'num_modalities': 3},
'tags': ['ixi-jupyter-sharkovsky']},
{'data_type': 'mednist',
'dataset_id': 'dataset_d25841b6-022d-4fba-bfd0-a19de2fc19f8',
'dataset_parameters': None,
'description': 'mednist client '
'2',
'name': 'MedNIST client 2',
'shape': [18000, 3, 64, 64],
'tags': ['mednist-jupyter-sharkovsky']},
{'data_type': 'csv',
'dataset_id': 'dataset_3fffa907-7569-46f0-a2b1-fa7245e42499',
'dataset_parameters': None,
'description': 'heart client 2',
'name': 'Heart disease client '
'2',
'shape': [349, 19],
'tags': ['heart-jupyter-sharkovsky']}],
'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1': [{'data_type': 'medical-folder',
'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24',
'dataset_parameters': {'index_col': 13},
'description': '',
'name': 'ixi site 1',
'shape': {'T1': [83, 44, 55],
'T2': [83, 44, 55],
'demographics': [305,
13],
'label': [83, 44, 55],
'num_modalities': 3},
'tags': ['ixi-jupyter-sharkovsky']},
{'data_type': 'mednist',
'dataset_id': 'dataset_650c023c-0177-469a-b70e-9a4ed6e26c44',
'dataset_parameters': None,
'description': 'mednist client '
'1',
'name': 'MedNIST client 1',
'shape': [18000, 3, 64, 64],
'tags': ['mednist-jupyter-sharkovsky']},
{'data_type': 'csv',
'dataset_id': 'dataset_6dc88f60-68fb-4bc7-aa01-2f339ea0c08d',
'dataset_parameters': None,
'description': 'heart client 1',
'name': 'Heart disease client '
'1',
'shape': [391, 19],
'tags': ['heart-jupyter-sharkovsky']}]}
Tabulate results#
Try it yourself!#
Looking at the format of the datasets_for_me
variable, produce a table with the following format using the tabulate
package:
site |
dataset name |
sample size |
---|---|---|
node_3d7f08fa-ee13-4033-8a01-93b448b6c8be |
ixi |
177 |
tabulate_sample_sizes = list()
for node, _data in datasets_for_me.items():
for d in _data:
if d['data_type'] == 'medical-folder':
tabulate_sample_sizes.append([node, d['name'], d['shape']['demographics'][0]])
else:
tabulate_sample_sizes.append([node, d['name'], d['shape'][0]])
print(tabulate.tabulate(tabulate_sample_sizes, headers=('site', 'dataset name', 'sample size')))
site dataset name sample size
----------------------------------------- ---------------------- -------------
node_10797f2f-2524-4595-a1c6-f3c67e03add1 ixi site 2 69
node_10797f2f-2524-4595-a1c6-f3c67e03add1 MedNIST client 2 18000
node_10797f2f-2524-4595-a1c6-f3c67e03add1 Heart disease client 2 349
node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 ixi site 1 305
node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 MedNIST client 1 18000
node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 Heart disease client 1 391
Task 2: MedicalFolderDataset #
To help you get familiar with this loading medical imaging data in Fed-BioMed, we will practice on a smaller dataset that we assume is available to you locally as a researcher. This is often the case in FL settings where the reseaercher has a small holdout dataset for local validation.
We will use Fed-BioMed’s built-in class for image segmentation tasks: MedicalFolderDataset
.
This class supports several medical imaging modalities (think all the different types of MRI, CT, PET, etc..). It is optimized for segmentation tasks but can be applied to other tasks (e.g. classification, regression, …).
This class supports loading a set of patient demographics data in csv format, in addition to the imaging data.
The inputs to the __init__
function are:
root: the filesystem path where the root of the dataset is located
data_modalities: the names of the modalities of interest for the input data
target_modalities: the names of the modalities of interest for the data to be predicted
transform: optional transformations to be performed on the input images
target_transform: optional transformations to be performed on the target data
demographics_transform: optional transformations to be performed on the demographics (csv) data
Let’s create the dataset:
dataset = MedicalFolderDataset(
root='/datasets/ixi/holdout',
data_modalities=['T1', 'T2'],
target_modalities='label',
transform=None,
target_transform=None,
demographics_transform=None)
Try it yourself!#
You can find out the total number of images with the len
function, and access images individually with the [idx]
operator, where idx
is an integer index.
Note: dataset[0]
will return a tuple corresponding to index 0
. The tuple will be in the form (inputs, targets)
.
What is the format of inputs
and targets
? How do you access a single image? What data type is it, what is its format and shape? How do you access the corresponding binary mask?
(inputs, targets) = dataset[0]
image_modalities, demographics = inputs
img_t1 = image_modalities['T1']
img_t2 = image_modalities['T2']
segm_mask = targets['label']
Try it yourself!#
Plot two images. On the left, the raw input image. On the right, again the raw input image as background but overlayed with the ground truth segmentation. Take some time to familiriaze yourself with the approach for making the overlay plot, and all the plotting arguments that we use.
fig, ax = plt.subplots(1,2, figsize=(6,8))
slice_to_plot = 24
(image_modalities, demographics), target = dataset[42]
img = image_modalities['T1'][..., slice_to_plot]
ax[0].imshow(img, cmap='bone')
ax[0].invert_yaxis()
ax[1].imshow(img, cmap='bone')
label = target['label'][..., slice_to_plot]
plot = ax[1].imshow(np.ma.masked_where(
label < 0.5,
label), cmap='winter', alpha=0.5, interpolation='none', vmin=0., vmax=1.)
ax[1].invert_yaxis()
fig.suptitle('Input slice with ground truth overlay')
fig.colorbar(plot, ax=ax, location='bottom')
<matplotlib.colorbar.Colorbar at 0x7ff94c9513a0>
Task 3: Federated feature analytics #
We want to obtain some basic information about the distribution of values of our features (i.e. the pixels in the MRI images). Our ultimate goal is to obtain an histogram of the counts of pixel values, but this will require a few steps, showcasing how flexible and interactive Fed-BioMed can be.
The following code will slightly bend the current intended usage of some Fed-BioMed classes. You will be working on the alpha version of a functionality which we are actively working on right now.
Federated Min and Max: TrainingPlan#
We need to do some preliminary work in order to compute a federated histogram. One information that we need is the minimum and maximum pixel values over the whole federation, in order to compute a stable set of histogram bins across all the nodes in the federation. (Think about what would happen if we let each node compute their own histogram bins independently).
Implementation details#
We are going to “hack” a TorchTrainingPlan
in order to compute a minimum and a maximum instead of training a model.
To understand what is going on exactly, you need the information below on some details about Fed-BioMed. If you are not interested, you can skip directly to the description of what you need to do to complete the code below.
The TrainingPlan
should implement at least four important functions:
init_model
: create the model (nn.Module
) to be trainedinit_optimizer
: create the optimizertraining_data
: to instantiate the dataset on the node (with customizations made by the researcher)training_routine
ortraining_step
: the actual training code. The latter is a simple shorthand if you only need to customize the training on a single batch, without worrying about managing the data loader, iterations, etc..
The diagram below simplifies the main steps that constitute a run of a federated experiment in Fed-BioMed:
Define data loading and transformations#
In what follows, we provide a suggestion for defining the data loading. The training_data
function will use MedicalFolderDataset
.
Additionally, we can define image transformations for the input images and the labels.
Finally, we define a transformation for the demographics data. Note that this is a required step to ensure that the demographics csv data is transformed into a torch Tensor usable by the training routine.
Our strategy: training_routine#
We will implement a training_routine
that does not really perform any training. Instead, it does one full pass over the whole dataset to find the maximum and the minimum.
Our strategy: the MinMaxTracker model#
We will create a model that inherits from nn.Module
but is not a neural network. Instead, it will simply contain a dictionary to store the minimum and maximum pixel values.
The init_model
function of the TrainingPlan
simply needs to return an instance of MinMaxTracker
.
The init_optimizer
function will return a dummy torch optimizer, since there is no actual optimization happening during our training.
The implementation details of how a model is handled can be a bit confusing at first. Here is a simplified diagram detailing the interactions between the main classes during a federated training round.
Try it yourself!#
Fill in the code below in the MinMaxTracker
and training_routine
.
class FedMinMaxTrainingPlan(TorchTrainingPlan):
class MinMaxTracker(nn.Module):
def __init__(self):
super().__init__()
self.min_max = {'min': torch.Tensor([np.infty]),
'max': torch.Tensor([-np.infty])}
def state_dict(self):
return self.min_max
def load_state_dict(self, params, strict=False):
self.min_max = {'min': params['min'], 'max': params['max']}
class MockMissingKeys:
def __init__(self):
self.missing_keys = []
self.unexpected_keys = []
return MockMissingKeys()
def named_parameters(self):
return self.min_max.items()
def init_model(self, model_arguments):
return FedMinMaxTrainingPlan.MinMaxTracker()
def training_step(self, *args, **kwargs):
pass
def init_optimizer(self, optimizer_arguments):
return SGD([torch.Tensor([0])], lr=0.)
def init_dependencies(self):
deps = [
"from monai.transforms import (Compose, NormalizeIntensity, AddChannel, Resize, AsDiscrete)",
"from fedbiomed.common.data import MedicalFolderDataset",
'import numpy as np',
'from torch.optim import SGD'
]
return deps
@staticmethod
def demographics_transform(demographics: dict):
return {}
def training_data(self, batch_size = 4):
# The training_data creates the Dataloader to be used for training in the general class Torchnn of fedbiomed
common_shape = (44, 44, 56)
training_transform = Compose([AddChannel(), Resize(common_shape)])
target_transform = Compose([AddChannel(), Resize(common_shape), AsDiscrete(to_onehot=2)])
dataset = MedicalFolderDataset(
root=self.dataset_path,
data_modalities='T1',
target_modalities='label',
transform=training_transform,
target_transform=target_transform,
demographics_transform=FedMinMaxTrainingPlan.demographics_transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': False}
return DataManager(dataset, **train_kwargs)
def training_routine(self,
history_monitor = None,
node_args = None):
count = 0
prev_min = self._model.get_weights()['min']
prev_max = self._model.get_weights()['max']
for data, target in self.training_data_loader:
images, demographics = data
image = images['T1']
count += 1
thismin = image.min()
if thismin < prev_min:
prev_min = thismin
thismax = image.max()
if thismax > prev_max:
prev_max = thismax
self._model.set_weights({'min': prev_min,
'max': prev_max})
return count
Federated Min and Max: Aggregator#
The Aggregator
must implement an aggregate
function that returns a dictionary of model parameters.
Try it yourself!#
Fill in the code for the aggregate
function.
The inputs are:
model_params
: a dictionary{node_id: model_params_after_training}
, wheremodel_params_after_training
is thestate_dict
of theMinMaxTracker
model after local training on the node.weights
: a dictionary{node_id: weight}
where the weight is a float between 0 and 1 computed as a proportion of the number of samples in the node to the total number of samples in the federation.
The output should be a dictionary with the same format as the state_dict
of the MinMaxTracker
model, i.e. it should be a dictionary:
{
'min': torch.Tensor([aggregated_minimum]),
'max': torch.Tensor([aggregated_maximum])
}
class MinMaxAggregator(Aggregator):
def __init__(self):
super(MinMaxAggregator, self).__init__()
self.aggregator_name = "MinMaxAggregator"
def aggregate(self, model_params: dict, weights: dict, *args, **kwargs):
all_minimums = all_maximums = list()
for node_id, min_max in model_params.items():
all_minimums.append(min_max['min'])
all_maximums.append(min_max['max'])
return {'min': min(all_minimums), 'max': max(all_maximums)}
Define the experiment#
tags = ['ixi-jupyter-sharkovsky']
exp = Experiment(tags=tags,
model_args={},
training_plan_class=FedMinMaxTrainingPlan,
training_args={},
round_limit=1, # just a single round, with a pass over the whole dataset
aggregator=MinMaxAggregator(),
tensorboard=False,
save_breakpoints=False
)
2023-07-03 15:24:37,093 fedbiomed INFO - Searching dataset with data tags: ['ixi-jupyter-sharkovsky'] for all nodes
2023-07-03 15:24:47,106 fedbiomed INFO - Node selected for training -> node_10797f2f-2524-4595-a1c6-f3c67e03add1
2023-07-03 15:24:47,107 fedbiomed INFO - Node selected for training -> node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
2023-07-03 15:24:47,108 fedbiomed INFO - Checking data quality of federated datasets...
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-07-03 15:24:47,109 fedbiomed DEBUG - using native torch optimizer
2023-07-03 15:24:47,110 fedbiomed DEBUG - Model file has been saved: /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0033/my_model_9fa23f7d-e9a4-4505-957b-5e27cac51a77.py
2023-07-03 15:24:47,122 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0033/my_model_9fa23f7d-e9a4-4505-957b-5e27cac51a77.py successful, with status code 201
2023-07-03 15:24:47,134 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0033/aggregated_params_2ecfce57-76c6-4d35-aa13-fefb9d81fb83.mpk successful, with status code 201
exp.run_once()
2023-07-03 15:24:47,139 fedbiomed INFO - Sampled nodes in round 0 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:24:47,139 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '7de014e1-0fac-4346-81f6-b2901f7d71c8', 'training_args': {'optimizer_args': {}, 'batch_size': 1, 'epochs': None, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_9fa23f7d-e9a4-4505-957b-5e27cac51a77.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_2ecfce57-76c6-4d35-aa13-fefb9d81fb83.mpk', 'training_plan_class': 'FedMinMaxTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:24:47,140 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:24:47,140 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '7de014e1-0fac-4346-81f6-b2901f7d71c8', 'training_args': {'optimizer_args': {}, 'batch_size': 1, 'epochs': None, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_9fa23f7d-e9a4-4505-957b-5e27cac51a77.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_2ecfce57-76c6-4d35-aa13-fefb9d81fb83.mpk', 'training_plan_class': 'FedMinMaxTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:24:47,141 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:02,153 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_be11c61b-3591-41b3-a048-1ebb38235ad3.mpk
2023-07-03 15:25:02,157 fedbiomed DEBUG - download of file node_params_07404e40-61c4-4e73-9db7-ce5aea25609a.mpk successful, with status code 200
2023-07-03 15:25:02,158 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_2b570b81-000f-476d-a829-640f02d5a575.mpk
2023-07-03 15:25:02,161 fedbiomed DEBUG - download of file node_params_d171010e-01eb-4428-9392-0723cdf5e208.mpk successful, with status code 200
2023-07-03 15:25:02,163 fedbiomed INFO - Nodes that successfully reply in round 0 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:25:02,179 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0033/aggregated_params_578f21f9-e589-4eef-b756-ccb9cc54cb7a.mpk successful, with status code 201
2023-07-03 15:25:02,179 fedbiomed INFO - Saved aggregated params for round 0 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0033/aggregated_params_578f21f9-e589-4eef-b756-ccb9cc54cb7a.mpk
1
Try it yourself!#
How can you access the minimum and maximum over the whole federation, after aggregation?
Hint: the experiment holds a copy of the training plan, which contains the model.
fed_min = exp.training_plan().model().state_dict()['min']
fed_max = exp.training_plan().model().state_dict()['max']
Federated Histogram: TrainingPlan#
We want to compute the average histogram of pixel values, where the average is computed over the different images.
Our strategy: training_routine#
We will implement a training_routine
that does not really perform any training. Instead, it does one full pass over the whole dataset to compute the average histogram per image.
Our strategy: the FedHistogram model#
We will create a model that inherits from nn.Module
but is not a neural network. Instead, it will simply contain a dictionary to store the histogram, minimum and maximum pixel values (computed before) and the number of bins. Note that the real “parameter” of this model, i.e. the one that will be updated, is the histogram data. All the rest are static values that will not be updated nor aggregated. We will use model_arguments
and training_arguments
to pass these values.
The init_model
function of the TrainingPlan
simply needs to return an instance of FedHistogram
.
The init_optimizer
function will return a dummy torch optimizer, since there is no actual optimization happening during our training.
Our strategy: training_data#
The training_data
function will use MedicalFolderDataset
.
Try it yourself!#
Fill in the code for FedHistogram
, init_model
and training_routine
.
class FedHistogramTrainingPlan(TorchTrainingPlan):
class FedHistogram(nn.Module):
def __init__(self, bin_min=0., bin_max=1e+5, nbins=10):
super().__init__()
self.hist_data = {'hist': torch.Tensor([np.nan]),
'bin_min': torch.Tensor([bin_min]),
'bin_max': torch.Tensor([bin_max]),
'nbins': torch.Tensor([nbins])}
def state_dict(self):
return self.hist_data
def load_state_dict(self, params, strict=False):
self.hist_data = params
class MockMissingKeys:
def __init__(self):
self.missing_keys = []
self.unexpected_keys = []
return MockMissingKeys()
def named_parameters(self):
return self.hist_data.items()
def init_model(self, model_args):
return FedHistogramTrainingPlan.FedHistogram(
bin_min = model_args['bin_min'],
bin_max = model_args['bin_max'],
nbins = model_args['nbins']
)
def training_step(self, *args, **kwargs):
pass
def init_optimizer(self, optimizer_arguments):
return SGD([torch.Tensor([0])], lr=0.)
def init_dependencies(self):
deps = [
"from monai.transforms import (Compose, NormalizeIntensity, AddChannel, Resize, AsDiscrete)",
"from fedbiomed.common.data import MedicalFolderDataset",
'import numpy as np',
'from torch.optim import SGD'
]
return deps
@staticmethod
def demographics_transform(demographics: dict):
return {}
def training_data(self, batch_size = 4):
# The training_data creates the Dataloader to be used for training in the general class Torchnn of fedbiomed
common_shape = (44, 44, 56)
training_transform = Compose([AddChannel(), Resize(common_shape)])
target_transform = Compose([AddChannel(), Resize(common_shape), AsDiscrete(to_onehot=2)])
dataset = MedicalFolderDataset(
root=self.dataset_path,
data_modalities='T1',
target_modalities='label',
transform=training_transform,
target_transform=target_transform,
demographics_transform=FedHistogramTrainingPlan.demographics_transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': False}
return DataManager(dataset, **train_kwargs)
def training_routine(self,
history_monitor = None,
node_args = None):
hist_metadata = self._model.get_weights()
nbins = int(hist_metadata['nbins'].numpy().tolist()[0])
bin_min = hist_metadata['bin_min'].numpy().tolist()[0]
bin_max = hist_metadata['bin_max'].numpy().tolist()[0]
count = 0
hist = hist_metadata['hist']
for data, target in self.training_data_loader:
images, demographics = data
image = images['T1']
count += 1
if count==1 :
hist, _ = np.histogram(image.flatten().numpy(),
bins=nbins,
range=(bin_min, bin_max))
else:
tmp_hist_, _ = np.histogram(image.flatten().numpy(),
bins=nbins,
range=(bin_min, bin_max))
hist += tmp_hist_
self._model.set_weights({'hist': hist / np.array([float(count)]),
'nbins': hist_metadata['nbins'],
'bin_min': hist_metadata['bin_min'],
'bin_max': hist_metadata['bin_max']})
return count
Federated Histogram: Aggregator#
The Aggregator
must implement an aggregate
function that returns a dictionary of model parameters.
In this case it is important to compute a weighted average of the histograms returned from each node (think about why a non-weighted mean is biased). For this, you can use the weights
argument of the aggregate
function.
Try it yourself!#
Fill in the code for the aggregate
function.
Reminder, the inputs to aggregate
are:
model_params
: a dictionary{node_id: model_params_after_training}
, wheremodel_params_after_training
is thestate_dict
of theMinMaxTracker
model after local training on the node.weights
: a dictionary{node_id: weight}
where the weight is a float between 0 and 1 computed as a proportion of the number of samples in the node to the total number of samples in the federation.
class HistAggregator(Aggregator):
def __init__(self):
super(HistAggregator, self).__init__()
self.aggregator_name = "HistAggregator"
def aggregate(self, model_params: list, weights: list, *args, **kwargs):
hist = None
for node_id, hist_data in model_params.items():
if hist is None:
hist = hist_data['hist']*weights[node_id]
else:
hist += hist_data['hist']*weights[node_id]
return {'hist': hist,
'nbins': hist_data['nbins'],
'bin_min': hist_data['bin_min'],
'bin_max': hist_data['bin_max']}
Define the experiment#
This time we will use model_args
to pass to the node the federated minimum and maximum (computed before) as well as the number of bins.
Our computed values for the federated minimum and maximum are not serializable because they are torch.Tensor
: you need to find a way to convert them to regular python float
.
model_args = {
'bin_min': fed_min.detach().numpy().tolist(),
'bin_max': fed_max.detach().numpy().tolist(),
'nbins': 100
}
tags = ['ixi-jupyter-sharkovsky']
exp = Experiment(tags=tags,
model_args=model_args,
training_plan_class=FedHistogramTrainingPlan,
training_args={},
round_limit=1,
aggregator=HistAggregator(),
tensorboard=False,
save_breakpoints=False
)
2023-07-03 15:25:08,221 fedbiomed INFO - Searching dataset with data tags: ['ixi-jupyter-sharkovsky'] for all nodes
2023-07-03 15:25:18,234 fedbiomed INFO - Node selected for training -> node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
2023-07-03 15:25:18,235 fedbiomed INFO - Node selected for training -> node_10797f2f-2524-4595-a1c6-f3c67e03add1
2023-07-03 15:25:18,238 fedbiomed INFO - Checking data quality of federated datasets...
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-07-03 15:25:18,239 fedbiomed DEBUG - using native torch optimizer
2023-07-03 15:25:18,240 fedbiomed DEBUG - Model file has been saved: /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0034/my_model_70d7233b-531d-4d27-b244-706d6027db39.py
2023-07-03 15:25:18,250 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0034/my_model_70d7233b-531d-4d27-b244-706d6027db39.py successful, with status code 201
2023-07-03 15:25:18,259 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0034/aggregated_params_f9a3f411-0759-4ab9-b40a-e32857417ce2.mpk successful, with status code 201
exp.run_once()
2023-07-03 15:25:18,264 fedbiomed INFO - Sampled nodes in round 0 ['node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1', 'node_10797f2f-2524-4595-a1c6-f3c67e03add1']
2023-07-03 15:25:18,264 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': 'f055eee4-41aa-4562-aad1-44f0e78d0a7c', 'training_args': {'optimizer_args': {}, 'batch_size': 1, 'epochs': None, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'bin_min': -4.25, 'bin_max': 10791.6669921875, 'nbins': 100}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_70d7233b-531d-4d27-b244-706d6027db39.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_f9a3f411-0759-4ab9-b40a-e32857417ce2.mpk', 'training_plan_class': 'FedHistogramTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24'}
-----------------------------------------------------------------
2023-07-03 15:25:18,264 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:18,265 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': 'f055eee4-41aa-4562-aad1-44f0e78d0a7c', 'training_args': {'optimizer_args': {}, 'batch_size': 1, 'epochs': None, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'bin_min': -4.25, 'bin_max': 10791.6669921875, 'nbins': 100}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_70d7233b-531d-4d27-b244-706d6027db39.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_f9a3f411-0759-4ab9-b40a-e32857417ce2.mpk', 'training_plan_class': 'FedHistogramTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:25:18,265 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:33,282 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_d2036983-9321-46a1-a9b1-07e0142a166c.mpk
2023-07-03 15:25:33,286 fedbiomed DEBUG - download of file node_params_9bda9355-2b96-4a4e-af7c-7658158bda80.mpk successful, with status code 200
2023-07-03 15:25:33,287 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_6d6be718-3bb1-49bf-80e1-0ecf50dc4dd9.mpk
2023-07-03 15:25:33,290 fedbiomed DEBUG - download of file node_params_d5437393-40a2-46b1-9395-57811dc6ad43.mpk successful, with status code 200
2023-07-03 15:25:33,291 fedbiomed INFO - Nodes that successfully reply in round 0 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:25:33,302 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0034/aggregated_params_9cfb69df-b812-42d5-85f0-b1963402e0d9.mpk successful, with status code 201
2023-07-03 15:25:33,302 fedbiomed INFO - Saved aggregated params for round 0 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0034/aggregated_params_9cfb69df-b812-42d5-85f0-b1963402e0d9.mpk
1
Sanity check#
It is always good to check the outputs of our calculations. What should the sum of all values in the aggregated histogram be equal to?
sum_of_pixel_counts = exp.training_plan().model().state_dict()['hist'].sum()
expected_value = 44*44*56
print(f"The sum of average pixel counts {sum_of_pixel_counts} "
f"should be equal to the total image size {expected_value}")
The sum of average pixel counts 108415.99999999999 should be equal to the total image size 108416
Plot the histograms#
The aggregated histogram#
Try it yourself#
Using the values from
model_args
, obtain the size (the width) of each binCompute the array of bin edges
plot the histogram using the ax.bar function
fig, ax = plt.subplots(figsize=(12,6))
width_histogram_bin = (model_args['bin_max'] - model_args['bin_min'])/model_args['nbins']
bin_edges = np.arange(start=model_args['bin_min'],
stop=model_args['bin_max'],
step=width_histogram_bin)
ax.bar(bin_edges,
exp.training_plan().model().state_dict()['hist'],
width=0.99*width_histogram_bin)
_ = ax.set_ylabel('Average count per image')
_ = ax.set_xlabel('Pixel value')
_ = ax.set_title('Aggregated histogram')
The node-wise histograms#
Try it yourself#
You can access the models’ state_dicts (after training) through the exp.training_replies()
function. The output is a dictionary of the format {round: node_replies}
, where node_replies
is a list of replies. Each reply is a dictionary, where node_id
and params
are the most important keys for this task.
fig, ax = plt.subplots(2,1,figsize=(12,6))
width_histogram_bar = (model_args['bin_max'] - model_args['bin_min'])/model_args['nbins']
bin_edges = np.arange(start=model_args['bin_min'],
stop=model_args['bin_max'],
step=width_histogram_bar)
for i in range(2):
hist_data = exp.training_replies()[0][i]['params']['hist']
node_id = exp.training_replies()[0][i]['node_id']
ax[i].bar(bin_edges, hist_data, width=0.99*width_histogram_bar)
_ = ax[i].set_ylabel('Average count per image')
_ = ax[i].set_title(f'Histogram for {node_id}')
_ = ax[1].set_xlabel('Pixel value')
fig.tight_layout()
Try it yourself!#
What happens if we normalize the images while loading them?
Add NormalizeIntensity()
as an additional transform for the loaded images in the TrainingPlan
, and check how the histogram changes shape.
Task 4: Training a UNet model for the brain segmentation task #
Create a Training Plan#
We create a training plan that incorporates the UNet model.
Define the neural network model#
We recommend using MONAI’s UNet implementation.
We define the model in the __init__
and forward
functions of the training plan.
Define the loss function#
Loss function is computed based on the Dice Loss.
Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pages 240–248. Springer, 2017.
Define training step#
Here we take as input one batch of (data, target), train the model and compute the loss function.
Note that the MedicalFolderDataset
class returns data
as a tuple of (images, demographics)
, where:
images
is adict
of{modality: image
} (after image transformations)demographics
is adict
of{column_name: values}
where the column names are taken from the demographics csv file while thetarget
is adict
of{modality: image
} (after target transformations).
In our case, the modality used is T1
for the input images, while the modality used for the target is label
. We also ignore the values of the demographics data during training because the UNet model only takes images as input, but the code provided still shows the recommended way to handle such data.
class UNetTrainingPlan(TorchTrainingPlan):
def init_model(self, model_args):
n_base_filters = model_args.get('base_filters',10)
return UNet(
spatial_dims = model_args.get('dimensions',3),
in_channels = model_args.get('in_channels',1),
out_channels = model_args.get('out_channels',2),
channels = (n_base_filters,
2*n_base_filters,
4*n_base_filters),
strides = (2,2),
kernel_size=3,
up_kernel_size=3,
num_res_units=0,
act='PRELU',
norm='INSTANCE',
dropout=0.0,
bias=True,
adn_ordering='NDA'
)
def init_optimizer(self, optimizer_args):
if optimizer_args.get('opt_name', 'not specified') == 'adamw':
optimizer = AdamW(self.model().parameters(), lr=optimizer_args.get('lr', 0.001))
else:
optimizer = SGD(self.model().parameters(), lr=optimizer_args.get('lr', 0.001))
return optimizer
def init_dependencies(self):
deps = [
"from monai.transforms import (Compose, NormalizeIntensity, AddChannel, Resize, AsDiscrete)",
"from monai.losses.dice import DiceLoss",
"import torch.nn as nn",
'import torch.nn.functional as F',
"from fedbiomed.common.data import MedicalFolderDataset",
'import numpy as np',
'from torch.optim import AdamW, SGD',
'from monai.networks.nets import UNet',
'from fedbiomed.common.logger import logger']
return deps
@staticmethod
def get_dice_loss(output, target, epsilon=1e-9):
loss = DiceLoss(include_background=False, sigmoid=False)
return loss(output, target)
@staticmethod
def demographics_transform(demographics: dict):
return {}
def training_data(self, batch_size = 4):
# The training_data creates the Dataloader to be used for training in the general class Torchnn of fedbiomed
common_shape = (44, 44, 56)
training_transform = Compose([AddChannel(), Resize(common_shape), NormalizeIntensity(),])
target_transform = Compose([AddChannel(), Resize(common_shape), AsDiscrete(to_onehot=2)])
dataset = MedicalFolderDataset(
root=self.dataset_path,
data_modalities='T1',
target_modalities='label',
transform=training_transform,
target_transform=target_transform,
demographics_transform=UNetTrainingPlan.demographics_transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
return DataManager(dataset, **train_kwargs)
def training_step(self, data, target):
#this function must return the loss to backward it
img = data[0]['T1']
demographics = data[1]
logits = self.model().forward(img)
output = F.softmax(logits, dim=1)
loss = UNetTrainingPlan.get_dice_loss(output, target['label'])
avg_loss = loss.mean()
return avg_loss
def testing_step(self, data, target):
img = data[0]['T1']
demographics = data[1]
target = target['label']
logits = self.model().forward(img)
prediction = F.softmax(logits, dim=1)
loss = UNetTrainingPlan.get_dice_loss(prediction, target)
avg_loss = loss.mean() # average per batch
return avg_loss
Define Parameters#
Here you can define model_args
and training_args
, two dictionaries that contain parameters and hyperparameters for training and model definition. This provides a flexible way to explore the hyperparameter space without changing the TrainingPlan
, which has potentially been fixed and validated by the clinical partners.
Try it yourself!#
Change any of the parameters below to explore the space of hyperparameters. Please be mindful of the limited available resources when changing parameters that may require more computing power, such as base_filters
and especially batch_size
.
model_args = {
'in_channels': 1,
'out_channels': 2,
'dimensions': 3,
'base_filters': 10,
}
training_args = {
'batch_size': 4,
'num_updates': 8,
'dry_run': False,
'log_interval': 2,
'test_ratio' : 0.1,
'test_on_global_updates': True,
'test_on_local_updates': False,
'optimizer_args': {
'opt_name': 'adamw',
'lr': 0.001
}
}
num_rounds = 15
Dry run your TrainingPlan
locally#
Since we have an holdout dataset available locally, we are going to test that the TrainingPlan
is able to run locally, before we perform the federated training.
Try it yourself!#
First, create a dataloader following these steps:
instantiate a
UNetTrainingPlan
objectcall
post_init
on the training plan. Note: you will need to instantiate aTrainingArgs
object from thetraining_args
dictset the training plan’s
dataset_path
to/datasets/ixi/holdout
Call the
training_data
function from the loaded experiment’s training plan to obtain aTorchDataManager
Use the
_dataset
attribute of the data manager to instantiate a torchDataLoader
(set a small batch size
Then, perform one training iteration to check that it completes without errors:
create a for loop iterating on the dataloader
call the training plan’s
training_step
methodbreak
after the first iteration
dryrun_tp = UNetTrainingPlan()
dryrun_tp.post_init(model_args, TrainingArgs(training_args, only_required=False))
dryrun_tp.dataset_path = '/datasets/ixi/holdout'
dryrun_dataloader = DataLoader(dryrun_tp.training_data()._dataset, batch_size=2)
for (inputs, targets) in dryrun_dataloader:
avg_loss = dryrun_tp.training_step(inputs, targets)
break
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-07-03 15:25:34,004 fedbiomed DEBUG - using native torch optimizer
<class 'monai.transforms.utility.array.AddChannel'>: Class `AddChannel` has been deprecated since version 0.8. please use MetaTensor data type and monai.transforms.EnsureChannelFirst instead.
tags = ['ixi-jupyter-sharkovsky']
exp = Experiment(tags=tags,
model_args=model_args,
training_plan_class=UNetTrainingPlan,
training_args=training_args,
round_limit=num_rounds,
aggregator=FedAverage(),
tensorboard=True,
save_breakpoints=True
)
2023-07-03 15:25:34,228 fedbiomed INFO - Searching dataset with data tags: ['ixi-jupyter-sharkovsky'] for all nodes
2023-07-03 15:25:44,242 fedbiomed INFO - Node selected for training -> node_10797f2f-2524-4595-a1c6-f3c67e03add1
2023-07-03 15:25:44,243 fedbiomed INFO - Node selected for training -> node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
2023-07-03 15:25:44,245 fedbiomed INFO - Checking data quality of federated datasets...
2023-07-03 15:25:44,249 fedbiomed DEBUG - using native torch optimizer
2023-07-03 15:25:44,250 fedbiomed DEBUG - Model file has been saved: /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py
2023-07-03 15:25:44,260 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py successful, with status code 201
2023-07-03 15:25:44,272 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_7055d84b-aace-4dc4-9662-91e55e544301.mpk successful, with status code 201
Run tensorboard#
Follow the instructions to obtain a port number, and run the commands below
tensorboard_dir = environ['TENSORBOARD_RESULTS_DIR']
tensorboard --logdir "$tensorboard_dir" --host 0.0.0.0 --port 6006
The tensorboard plots should look like this:
Run the experiment#
print(f"Saving breakpoints to {exp.experimentation_folder()}")
Saving breakpoints to Experiment_0035
exp.run()
2023-07-03 15:25:45,809 fedbiomed INFO - Sampled nodes in round 0 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:25:45,809 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_7055d84b-aace-4dc4-9662-91e55e544301.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:25:45,810 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:45,810 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_7055d84b-aace-4dc4-9662-91e55e544301.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:25:45,811 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:46,075 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.669393
---------
2023-07-03 15:25:46,283 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.674166
---------
2023-07-03 15:25:46,429 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.652919
---------
2023-07-03 15:25:46,736 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.621792
---------
2023-07-03 15:25:46,737 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.668537
---------
2023-07-03 15:25:46,910 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.666687
---------
2023-07-03 15:25:47,046 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.574756
---------
2023-07-03 15:25:47,054 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.657234
---------
2023-07-03 15:25:47,352 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 1 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.564428
---------
2023-07-03 15:25:47,361 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.614738
---------
2023-07-03 15:25:47,600 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.580021
---------
2023-07-03 15:25:47,833 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 1 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.551721
---------
2023-07-03 15:25:55,824 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_14f857d4-2825-4986-acf5-cb161fd72167.mpk
2023-07-03 15:25:55,829 fedbiomed DEBUG - download of file node_params_27bb3fa4-2838-4b13-8048-2aca18e48b48.mpk successful, with status code 200
2023-07-03 15:25:55,830 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_a3918182-53d9-4fbc-b8c0-d684c33883da.mpk
2023-07-03 15:25:55,833 fedbiomed DEBUG - download of file node_params_5066607e-e2d8-4d6b-8d15-7caa1c776142.mpk successful, with status code 200
2023-07-03 15:25:55,835 fedbiomed INFO - Nodes that successfully reply in round 0 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:25:55,852 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_5a192df2-9162-467b-bccc-4979b1201e3c.mpk successful, with status code 201
2023-07-03 15:25:55,853 fedbiomed INFO - Saved aggregated params for round 0 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_5a192df2-9162-467b-bccc-4979b1201e3c.mpk
2023-07-03 15:25:55,856 fedbiomed INFO - breakpoint for round 0 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0000
2023-07-03 15:25:55,856 fedbiomed INFO - Sampled nodes in round 1 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:25:55,857 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 1, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_5a192df2-9162-467b-bccc-4979b1201e3c.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:25:55,857 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:55,858 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 1, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_5a192df2-9162-467b-bccc-4979b1201e3c.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:25:55,859 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:25:56,044 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.558731
---------
2023-07-03 15:25:56,190 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.548774
---------
2023-07-03 15:25:56,337 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.528328
---------
2023-07-03 15:25:56,624 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.500181
---------
2023-07-03 15:25:56,725 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.535123
---------
2023-07-03 15:25:56,874 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.539535
---------
2023-07-03 15:25:56,958 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.506337
---------
2023-07-03 15:25:57,026 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.524126
---------
2023-07-03 15:25:57,248 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 2 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.467335
---------
2023-07-03 15:25:57,316 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.502197
---------
2023-07-03 15:25:57,573 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.474423
---------
2023-07-03 15:25:57,807 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 2 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.461409
---------
2023-07-03 15:26:05,871 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_ac3b4469-a81a-4026-bc85-bc94ca8e76f3.mpk
2023-07-03 15:26:05,875 fedbiomed DEBUG - download of file node_params_edb7aa45-4445-49d9-9734-17f0a20dd3e4.mpk successful, with status code 200
2023-07-03 15:26:05,876 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_e99f6b93-801a-44f1-9664-fed7d3e840b0.mpk
2023-07-03 15:26:05,879 fedbiomed DEBUG - download of file node_params_0afd316c-bfeb-4f63-9e2b-ec50c8ce5f1c.mpk successful, with status code 200
2023-07-03 15:26:05,880 fedbiomed INFO - Nodes that successfully reply in round 1 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:05,895 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_3f384489-f597-4372-b489-4ae7268fa84f.mpk successful, with status code 201
2023-07-03 15:26:05,895 fedbiomed INFO - Saved aggregated params for round 1 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_3f384489-f597-4372-b489-4ae7268fa84f.mpk
2023-07-03 15:26:05,899 fedbiomed INFO - breakpoint for round 1 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0001
2023-07-03 15:26:05,899 fedbiomed INFO - Sampled nodes in round 2 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:05,900 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 2, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_3f384489-f597-4372-b489-4ae7268fa84f.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:05,900 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:05,901 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 2, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_3f384489-f597-4372-b489-4ae7268fa84f.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:05,901 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:06,085 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.467660
---------
2023-07-03 15:26:06,225 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.456158
---------
2023-07-03 15:26:06,368 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.449278
---------
2023-07-03 15:26:06,654 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.440290
---------
2023-07-03 15:26:06,765 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.448526
---------
2023-07-03 15:26:06,911 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.457173
---------
2023-07-03 15:26:06,976 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.424287
---------
2023-07-03 15:26:07,056 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.447752
---------
2023-07-03 15:26:07,293 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 3 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.407967
---------
2023-07-03 15:26:07,345 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.425157
---------
2023-07-03 15:26:07,582 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.423018
---------
2023-07-03 15:26:07,816 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 3 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.418149
---------
2023-07-03 15:26:15,913 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_97a851a7-8e31-4280-a338-9dbcf4b49716.mpk
2023-07-03 15:26:15,918 fedbiomed DEBUG - download of file node_params_b225d01f-8b4d-4926-a797-f1eb3c20b8a6.mpk successful, with status code 200
2023-07-03 15:26:15,919 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_a0e8b335-39ec-4403-877c-b8a1eda08101.mpk
2023-07-03 15:26:15,922 fedbiomed DEBUG - download of file node_params_3ab4fda3-e4e3-463c-bdad-ceb9f7e61d13.mpk successful, with status code 200
2023-07-03 15:26:15,924 fedbiomed INFO - Nodes that successfully reply in round 2 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:15,939 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_90d0a3a6-6809-4576-939a-15ea78760e3a.mpk successful, with status code 201
2023-07-03 15:26:15,939 fedbiomed INFO - Saved aggregated params for round 2 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_90d0a3a6-6809-4576-939a-15ea78760e3a.mpk
2023-07-03 15:26:15,944 fedbiomed INFO - breakpoint for round 2 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0002
2023-07-03 15:26:15,944 fedbiomed INFO - Sampled nodes in round 3 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:15,945 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 3, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_90d0a3a6-6809-4576-939a-15ea78760e3a.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:15,945 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:15,946 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 3, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_90d0a3a6-6809-4576-939a-15ea78760e3a.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:15,946 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:16,131 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.403672
---------
2023-07-03 15:26:16,274 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.400611
---------
2023-07-03 15:26:16,416 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.398721
---------
2023-07-03 15:26:16,699 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.382506
---------
2023-07-03 15:26:16,840 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.393137
---------
2023-07-03 15:26:16,984 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.393294
---------
2023-07-03 15:26:17,020 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.369901
---------
2023-07-03 15:26:17,132 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.392746
---------
2023-07-03 15:26:17,314 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 4 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.375875
---------
2023-07-03 15:26:17,427 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.375468
---------
2023-07-03 15:26:17,664 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.369481
---------
2023-07-03 15:26:17,901 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 4 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.354816
---------
2023-07-03 15:26:25,959 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_f2965acd-b381-42bd-bf27-28243bf09053.mpk
2023-07-03 15:26:25,963 fedbiomed DEBUG - download of file node_params_99a181ef-e33a-4f6a-8c5e-2ee531ffce4c.mpk successful, with status code 200
2023-07-03 15:26:25,964 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_f1167c66-8be6-4635-a53b-ae802a37ca67.mpk
2023-07-03 15:26:25,967 fedbiomed DEBUG - download of file node_params_45337c54-1d24-482c-82b5-599786a679e9.mpk successful, with status code 200
2023-07-03 15:26:25,969 fedbiomed INFO - Nodes that successfully reply in round 3 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:25,984 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_93402a18-b0d8-42aa-874c-7304f9f196e9.mpk successful, with status code 201
2023-07-03 15:26:25,985 fedbiomed INFO - Saved aggregated params for round 3 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_93402a18-b0d8-42aa-874c-7304f9f196e9.mpk
2023-07-03 15:26:25,991 fedbiomed INFO - breakpoint for round 3 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0003
2023-07-03 15:26:25,992 fedbiomed INFO - Sampled nodes in round 4 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:25,992 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 4, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_93402a18-b0d8-42aa-874c-7304f9f196e9.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:25,993 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:25,994 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 4, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_93402a18-b0d8-42aa-874c-7304f9f196e9.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:25,994 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:26,181 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.351257
---------
2023-07-03 15:26:26,327 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.374600
---------
2023-07-03 15:26:26,470 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.367414
---------
2023-07-03 15:26:26,756 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.360439
---------
2023-07-03 15:26:26,882 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.344389
---------
2023-07-03 15:26:27,028 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.348085
---------
2023-07-03 15:26:27,060 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.332643
---------
2023-07-03 15:26:27,180 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.363724
---------
2023-07-03 15:26:27,353 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 5 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.334190
---------
2023-07-03 15:26:27,469 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.335185
---------
2023-07-03 15:26:27,706 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.348670
---------
2023-07-03 15:26:27,941 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 5 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.316970
---------
2023-07-03 15:26:36,006 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_bacf3723-e82c-4d94-94f0-feae048f577d.mpk
2023-07-03 15:26:36,010 fedbiomed DEBUG - download of file node_params_39c9f36d-ac3b-409d-a2b5-bdeab4b8cdb6.mpk successful, with status code 200
2023-07-03 15:26:36,011 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_e8315e0e-8109-44f3-86d6-0ba6b1811840.mpk
2023-07-03 15:26:36,014 fedbiomed DEBUG - download of file node_params_4e8b39da-f93b-4ca5-a29a-c0741a5f8ed7.mpk successful, with status code 200
2023-07-03 15:26:36,016 fedbiomed INFO - Nodes that successfully reply in round 4 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:36,033 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_eed04bb6-d633-4ea0-ab89-602d8d28c669.mpk successful, with status code 201
2023-07-03 15:26:36,034 fedbiomed INFO - Saved aggregated params for round 4 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_eed04bb6-d633-4ea0-ab89-602d8d28c669.mpk
2023-07-03 15:26:36,040 fedbiomed INFO - breakpoint for round 4 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0004
2023-07-03 15:26:36,041 fedbiomed INFO - Sampled nodes in round 5 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:36,041 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 5, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_eed04bb6-d633-4ea0-ab89-602d8d28c669.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:36,042 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:36,042 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 5, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_eed04bb6-d633-4ea0-ab89-602d8d28c669.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:36,043 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:36,228 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.346341
---------
2023-07-03 15:26:36,370 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.352756
---------
2023-07-03 15:26:36,506 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.318557
---------
2023-07-03 15:26:36,788 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.306386
---------
2023-07-03 15:26:36,936 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.317866
---------
2023-07-03 15:26:37,085 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.316483
---------
2023-07-03 15:26:37,102 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.299668
---------
2023-07-03 15:26:37,237 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.311033
---------
2023-07-03 15:26:37,416 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 6 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.306905
---------
2023-07-03 15:26:37,525 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.287525
---------
2023-07-03 15:26:37,767 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.296751
---------
2023-07-03 15:26:37,998 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 6 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.280906
---------
2023-07-03 15:26:46,055 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_f3ef3bc2-dda5-4238-a019-c9a4d01fee21.mpk
2023-07-03 15:26:46,059 fedbiomed DEBUG - download of file node_params_491515ee-b3ac-465a-8928-965f1d20bc66.mpk successful, with status code 200
2023-07-03 15:26:46,060 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_61cf76e4-2e6d-46c1-9385-ecec86903889.mpk
2023-07-03 15:26:46,063 fedbiomed DEBUG - download of file node_params_20a0e85a-babe-49d9-bf29-7dba29d04d51.mpk successful, with status code 200
2023-07-03 15:26:46,064 fedbiomed INFO - Nodes that successfully reply in round 5 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:46,079 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_3d345bc3-0b1a-46e8-a121-36778af85492.mpk successful, with status code 201
2023-07-03 15:26:46,079 fedbiomed INFO - Saved aggregated params for round 5 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_3d345bc3-0b1a-46e8-a121-36778af85492.mpk
2023-07-03 15:26:46,087 fedbiomed INFO - breakpoint for round 5 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0005
2023-07-03 15:26:46,088 fedbiomed INFO - Sampled nodes in round 6 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:46,089 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 6, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_3d345bc3-0b1a-46e8-a121-36778af85492.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:46,089 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:46,090 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 6, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_3d345bc3-0b1a-46e8-a121-36778af85492.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:46,091 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:46,271 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.290740
---------
2023-07-03 15:26:46,413 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.293899
---------
2023-07-03 15:26:46,555 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.305884
---------
2023-07-03 15:26:46,834 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.280895
---------
2023-07-03 15:26:47,059 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.278067
---------
2023-07-03 15:26:47,145 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.269260
---------
2023-07-03 15:26:47,202 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.281160
---------
2023-07-03 15:26:47,346 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.280498
---------
2023-07-03 15:26:47,448 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 7 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.261392
---------
2023-07-03 15:26:47,618 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.268134
---------
2023-07-03 15:26:47,850 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.255430
---------
2023-07-03 15:26:48,080 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 7 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.252412
---------
2023-07-03 15:26:56,103 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_0c22111e-1921-43a5-b255-e755b8020787.mpk
2023-07-03 15:26:56,107 fedbiomed DEBUG - download of file node_params_972367aa-e104-47c3-8082-8e2d1cb5dc9d.mpk successful, with status code 200
2023-07-03 15:26:56,108 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_a743af30-fba4-477a-a616-5c8471953b17.mpk
2023-07-03 15:26:56,111 fedbiomed DEBUG - download of file node_params_78f7ea96-48d1-404e-b2ee-ed7ae63853e6.mpk successful, with status code 200
2023-07-03 15:26:56,112 fedbiomed INFO - Nodes that successfully reply in round 6 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:56,127 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_10235386-b138-44ff-94f7-a6d6c71470c3.mpk successful, with status code 201
2023-07-03 15:26:56,127 fedbiomed INFO - Saved aggregated params for round 6 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_10235386-b138-44ff-94f7-a6d6c71470c3.mpk
2023-07-03 15:26:56,137 fedbiomed INFO - breakpoint for round 6 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0006
2023-07-03 15:26:56,137 fedbiomed INFO - Sampled nodes in round 7 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:26:56,138 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 7, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_10235386-b138-44ff-94f7-a6d6c71470c3.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:26:56,138 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:56,139 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 7, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_10235386-b138-44ff-94f7-a6d6c71470c3.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:26:56,139 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:26:56,325 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.260062
---------
2023-07-03 15:26:56,469 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.276170
---------
2023-07-03 15:26:56,616 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.266267
---------
2023-07-03 15:26:56,895 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.257818
---------
2023-07-03 15:26:57,031 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.248313
---------
2023-07-03 15:26:57,177 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.243596
---------
2023-07-03 15:26:57,215 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.249228
---------
2023-07-03 15:26:57,322 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.265798
---------
2023-07-03 15:26:57,497 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 8 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.243402
---------
2023-07-03 15:26:57,590 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.255732
---------
2023-07-03 15:26:57,828 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.240263
---------
2023-07-03 15:26:58,057 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 8 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.240697
---------
2023-07-03 15:27:06,152 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_dc23bfd6-9b70-4932-b051-e4b1c17aa6e1.mpk
2023-07-03 15:27:06,156 fedbiomed DEBUG - download of file node_params_d9af3bd4-87a1-4dbc-96ec-5f511e24c9b8.mpk successful, with status code 200
2023-07-03 15:27:06,157 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_d2f32ef3-32fa-4c25-9b34-066e68521816.mpk
2023-07-03 15:27:06,160 fedbiomed DEBUG - download of file node_params_ea3a9f32-c492-4121-b788-daa59715ebc1.mpk successful, with status code 200
2023-07-03 15:27:06,161 fedbiomed INFO - Nodes that successfully reply in round 7 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:06,176 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_aadfc76b-44db-4d37-9b3b-1b591defa61e.mpk successful, with status code 201
2023-07-03 15:27:06,176 fedbiomed INFO - Saved aggregated params for round 7 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_aadfc76b-44db-4d37-9b3b-1b591defa61e.mpk
2023-07-03 15:27:06,186 fedbiomed INFO - breakpoint for round 7 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0007
2023-07-03 15:27:06,187 fedbiomed INFO - Sampled nodes in round 8 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:06,187 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 8, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_aadfc76b-44db-4d37-9b3b-1b591defa61e.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:06,187 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:06,188 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 8, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_aadfc76b-44db-4d37-9b3b-1b591defa61e.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:06,189 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:06,376 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.253103
---------
2023-07-03 15:27:06,521 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.254661
---------
2023-07-03 15:27:06,667 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.242832
---------
2023-07-03 15:27:06,964 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.232194
---------
2023-07-03 15:27:07,072 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.230317
---------
2023-07-03 15:27:07,223 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.239177
---------
2023-07-03 15:27:07,267 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.220112
---------
2023-07-03 15:27:07,365 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.234501
---------
2023-07-03 15:27:07,561 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 9 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.204340
---------
2023-07-03 15:27:07,649 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.213800
---------
2023-07-03 15:27:07,881 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.214610
---------
2023-07-03 15:27:08,113 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 9 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.207103
---------
2023-07-03 15:27:16,201 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_070610ff-8415-468f-8a95-bd6baaa1623d.mpk
2023-07-03 15:27:16,205 fedbiomed DEBUG - download of file node_params_3f6dde66-d8ff-4a67-8f5f-8703c63516ba.mpk successful, with status code 200
2023-07-03 15:27:16,206 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_1f7f2f7a-1b29-453e-9595-7fe79ab9757e.mpk
2023-07-03 15:27:16,210 fedbiomed DEBUG - download of file node_params_e8d19be2-2972-4b93-90a1-208b72cf5776.mpk successful, with status code 200
2023-07-03 15:27:16,211 fedbiomed INFO - Nodes that successfully reply in round 8 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:16,225 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_9be2ed78-2916-4b40-992b-9be12b20621b.mpk successful, with status code 201
2023-07-03 15:27:16,225 fedbiomed INFO - Saved aggregated params for round 8 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_9be2ed78-2916-4b40-992b-9be12b20621b.mpk
2023-07-03 15:27:16,236 fedbiomed INFO - breakpoint for round 8 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0008
2023-07-03 15:27:16,236 fedbiomed INFO - Sampled nodes in round 9 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:16,237 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 9, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_9be2ed78-2916-4b40-992b-9be12b20621b.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:16,237 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:16,238 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 9, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_9be2ed78-2916-4b40-992b-9be12b20621b.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:16,238 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:16,421 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.219118
---------
2023-07-03 15:27:16,563 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.217602
---------
2023-07-03 15:27:16,705 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.234990
---------
2023-07-03 15:27:16,994 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.209549
---------
2023-07-03 15:27:17,110 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.206117
---------
2023-07-03 15:27:17,269 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.202606
---------
2023-07-03 15:27:17,316 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.205439
---------
2023-07-03 15:27:17,413 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.210446
---------
2023-07-03 15:27:17,615 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 10 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.203125
---------
2023-07-03 15:27:17,707 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.200621
---------
2023-07-03 15:27:17,943 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.193945
---------
2023-07-03 15:27:18,176 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 10 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.179735
---------
2023-07-03 15:27:26,250 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_71fb7a70-1916-4c7d-8582-2ba69371b787.mpk
2023-07-03 15:27:26,255 fedbiomed DEBUG - download of file node_params_cf060acc-caa2-4da5-b24a-cfe23433138a.mpk successful, with status code 200
2023-07-03 15:27:26,256 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_238965d9-c731-4fe6-a8bd-9dbd9cfd9515.mpk
2023-07-03 15:27:26,259 fedbiomed DEBUG - download of file node_params_5c9260a4-c00e-408d-add3-6ecf46cf6aab.mpk successful, with status code 200
2023-07-03 15:27:26,261 fedbiomed INFO - Nodes that successfully reply in round 9 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:26,275 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_8a903bea-9b6d-4b61-a84e-05008409b176.mpk successful, with status code 201
2023-07-03 15:27:26,275 fedbiomed INFO - Saved aggregated params for round 9 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_8a903bea-9b6d-4b61-a84e-05008409b176.mpk
2023-07-03 15:27:26,288 fedbiomed INFO - breakpoint for round 9 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0009
2023-07-03 15:27:26,289 fedbiomed INFO - Sampled nodes in round 10 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:26,289 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 10, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_8a903bea-9b6d-4b61-a84e-05008409b176.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:26,290 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:26,291 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 10, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_8a903bea-9b6d-4b61-a84e-05008409b176.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:26,291 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:26,474 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.206012
---------
2023-07-03 15:27:26,616 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.211716
---------
2023-07-03 15:27:26,760 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.211098
---------
2023-07-03 15:27:27,046 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.187615
---------
2023-07-03 15:27:27,164 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.181723
---------
2023-07-03 15:27:27,309 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.186365
---------
2023-07-03 15:27:27,361 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.204853
---------
2023-07-03 15:27:27,472 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.191783
---------
2023-07-03 15:27:27,666 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 11 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.180141
---------
2023-07-03 15:27:27,765 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.192469
---------
2023-07-03 15:27:27,997 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.178307
---------
2023-07-03 15:27:28,229 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 11 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.175291
---------
2023-07-03 15:27:36,304 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_6a04bbac-c7bc-43b7-bd6b-c86edd0e1a52.mpk
2023-07-03 15:27:36,308 fedbiomed DEBUG - download of file node_params_6a84dc84-e2af-4f62-aecc-34d6f26868f5.mpk successful, with status code 200
2023-07-03 15:27:36,309 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_bbb3bf74-3773-4559-af65-e1049684567d.mpk
2023-07-03 15:27:36,312 fedbiomed DEBUG - download of file node_params_dc045cc4-94da-4cf2-9874-ffae879292b6.mpk successful, with status code 200
2023-07-03 15:27:36,313 fedbiomed INFO - Nodes that successfully reply in round 10 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:36,328 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_1acd6e7b-7051-4172-ae6f-d75836dfed8b.mpk successful, with status code 201
2023-07-03 15:27:36,329 fedbiomed INFO - Saved aggregated params for round 10 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_1acd6e7b-7051-4172-ae6f-d75836dfed8b.mpk
2023-07-03 15:27:36,341 fedbiomed INFO - breakpoint for round 10 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0010
2023-07-03 15:27:36,342 fedbiomed INFO - Sampled nodes in round 11 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:36,342 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 11, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_1acd6e7b-7051-4172-ae6f-d75836dfed8b.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:36,342 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:36,343 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 11, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_1acd6e7b-7051-4172-ae6f-d75836dfed8b.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:36,344 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:36,529 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.192321
---------
2023-07-03 15:27:36,672 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.175604
---------
2023-07-03 15:27:36,817 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.192354
---------
2023-07-03 15:27:37,099 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.180088
---------
2023-07-03 15:27:37,217 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.169724
---------
2023-07-03 15:27:37,365 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.173665
---------
2023-07-03 15:27:37,430 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.161954
---------
2023-07-03 15:27:37,512 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.174298
---------
2023-07-03 15:27:37,743 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 12 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.155939
---------
2023-07-03 15:27:37,807 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.167927
---------
2023-07-03 15:27:38,045 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.166235
---------
2023-07-03 15:27:38,283 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 12 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.155272
---------
2023-07-03 15:27:46,356 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_22fedfae-4ae1-4ce2-af84-ba7c0ad08edf.mpk
2023-07-03 15:27:46,360 fedbiomed DEBUG - download of file node_params_15f8efab-7613-4d5d-8897-d4b2bb1338a6.mpk successful, with status code 200
2023-07-03 15:27:46,361 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_9ce14dd2-a62b-4b2b-8961-126111930632.mpk
2023-07-03 15:27:46,364 fedbiomed DEBUG - download of file node_params_2fa7abfc-ccda-4518-af2e-6dd7c5853ae6.mpk successful, with status code 200
2023-07-03 15:27:46,366 fedbiomed INFO - Nodes that successfully reply in round 11 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:46,383 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_2e1c891a-b0ae-48f1-86b4-a0d889eb42a1.mpk successful, with status code 201
2023-07-03 15:27:46,383 fedbiomed INFO - Saved aggregated params for round 11 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_2e1c891a-b0ae-48f1-86b4-a0d889eb42a1.mpk
2023-07-03 15:27:46,399 fedbiomed INFO - breakpoint for round 11 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0011
2023-07-03 15:27:46,399 fedbiomed INFO - Sampled nodes in round 12 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:46,400 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 12, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_2e1c891a-b0ae-48f1-86b4-a0d889eb42a1.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:46,400 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:46,401 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 12, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_2e1c891a-b0ae-48f1-86b4-a0d889eb42a1.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:46,401 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:46,580 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.168423
---------
2023-07-03 15:27:46,724 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.159756
---------
2023-07-03 15:27:46,872 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.175157
---------
2023-07-03 15:27:47,152 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.160788
---------
2023-07-03 15:27:47,336 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.153615
---------
2023-07-03 15:27:47,467 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.164476
---------
2023-07-03 15:27:47,482 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.153208
---------
2023-07-03 15:27:47,635 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.144645
---------
2023-07-03 15:27:47,763 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 13 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.148582
---------
2023-07-03 15:27:47,925 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.147024
---------
2023-07-03 15:27:48,209 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.143274
---------
2023-07-03 15:27:48,454 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 13 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.135392
---------
2023-07-03 15:27:56,413 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_29dfc20e-46ca-4a5f-a571-e806ebf69007.mpk
2023-07-03 15:27:56,417 fedbiomed DEBUG - download of file node_params_54822ef5-b9b4-4e14-a005-770b7a15d694.mpk successful, with status code 200
2023-07-03 15:27:56,418 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_0a5af092-5a05-43e7-bf4d-a872036ffe04.mpk
2023-07-03 15:27:56,421 fedbiomed DEBUG - download of file node_params_beca141f-633a-45c4-b05c-d8195b6860e2.mpk successful, with status code 200
2023-07-03 15:27:56,423 fedbiomed INFO - Nodes that successfully reply in round 12 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:56,437 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_6d539017-1a60-40c6-87a1-96f99c24b30c.mpk successful, with status code 201
2023-07-03 15:27:56,437 fedbiomed INFO - Saved aggregated params for round 12 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_6d539017-1a60-40c6-87a1-96f99c24b30c.mpk
2023-07-03 15:27:56,454 fedbiomed INFO - breakpoint for round 12 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0012
2023-07-03 15:27:56,454 fedbiomed INFO - Sampled nodes in round 13 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:27:56,454 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 13, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_6d539017-1a60-40c6-87a1-96f99c24b30c.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:27:56,455 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:56,456 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 13, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_6d539017-1a60-40c6-87a1-96f99c24b30c.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:27:56,456 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:27:56,651 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.146723
---------
2023-07-03 15:27:56,792 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.161997
---------
2023-07-03 15:27:56,931 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.202094
---------
2023-07-03 15:27:57,208 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.139733
---------
2023-07-03 15:27:57,370 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.136182
---------
2023-07-03 15:27:57,516 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.138592
---------
2023-07-03 15:27:57,530 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.162032
---------
2023-07-03 15:27:57,652 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.154214
---------
2023-07-03 15:27:57,825 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 14 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.172374
---------
2023-07-03 15:27:57,931 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.143040
---------
2023-07-03 15:27:58,164 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.152959
---------
2023-07-03 15:27:58,393 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 14 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.129842
---------
2023-07-03 15:28:06,468 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_44b763a0-2b35-4704-9e66-a35217f26b71.mpk
2023-07-03 15:28:06,472 fedbiomed DEBUG - download of file node_params_fc09ba74-d573-4277-bfe2-3ca21b6d3b06.mpk successful, with status code 200
2023-07-03 15:28:06,473 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_301b8b8f-f271-4399-aa35-344b41139265.mpk
2023-07-03 15:28:06,475 fedbiomed DEBUG - download of file node_params_9f4b375d-4bfb-49f7-a1a1-b0ee12e0bf4b.mpk successful, with status code 200
2023-07-03 15:28:06,477 fedbiomed INFO - Nodes that successfully reply in round 13 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:28:06,492 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_4113e476-cbbb-4b11-8733-5977233319c6.mpk successful, with status code 201
2023-07-03 15:28:06,492 fedbiomed INFO - Saved aggregated params for round 13 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_4113e476-cbbb-4b11-8733-5977233319c6.mpk
2023-07-03 15:28:06,510 fedbiomed INFO - breakpoint for round 13 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0013
2023-07-03 15:28:06,510 fedbiomed INFO - Sampled nodes in round 14 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:28:06,511 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 14, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_4113e476-cbbb-4b11-8733-5977233319c6.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_ae4dd6a8-1227-4379-8d3c-7a2caa0d9503'}
-----------------------------------------------------------------
2023-07-03 15:28:06,511 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:28:06,512 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: : Perform training with the arguments: {'researcher_id': 'researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87', 'job_id': '68a226ac-6cdb-4a84-a21f-60303da18ecc', 'training_args': {'batch_size': 4, 'num_updates': 8, 'dry_run': False, 'log_interval': 2, 'test_ratio': 0.1, 'test_on_global_updates': True, 'test_on_local_updates': False, 'optimizer_args': {'opt_name': 'adamw', 'lr': 0.001}, 'epochs': None, 'batch_maxnum': None, 'test_metric': None, 'test_metric_args': {}, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True}, 'training': True, 'model_args': {'in_channels': 1, 'out_channels': 2, 'dimensions': 3, 'base_filters': 10}, 'round': 14, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/07/03/my_model_fc0d2921-9797-4a5c-94fa-4beda077391d.py', 'params_url': 'http://localhost:8844/media/uploads/2023/07/03/aggregated_params_4113e476-cbbb-4b11-8733-5977233319c6.mpk', 'training_plan_class': 'UNetTrainingPlan', 'dataset_id': 'dataset_0e27e24b-46a6-4440-a189-e5bfd80a0f24', 'protocol_version': '1'}
-----------------------------------------------------------------
2023-07-03 15:28:06,512 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:28:06,699 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.137983
---------
2023-07-03 15:28:06,845 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.149709
---------
2023-07-03 15:28:06,992 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.160152
---------
2023-07-03 15:28:07,287 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.134322
---------
2023-07-03 15:28:07,386 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.130700
---------
2023-07-03 15:28:07,531 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 1/8 (12%) | Samples: 4/32
Loss: 0.125600
---------
2023-07-03 15:28:07,587 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.131024
---------
2023-07-03 15:28:07,680 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 2/8 (25%) | Samples: 8/32
Loss: 0.135730
---------
2023-07-03 15:28:07,882 fedbiomed INFO - TRAINING
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 15 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.130599
---------
2023-07-03 15:28:07,971 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 4/8 (50%) | Samples: 16/32
Loss: 0.130924
---------
2023-07-03 15:28:08,204 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 6/8 (75%) | Samples: 24/32
Loss: 0.114719
---------
2023-07-03 15:28:08,438 fedbiomed INFO - TRAINING
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 15 | Iteration: 8/8 (100%) | Samples: 32/32
Loss: 0.120319
---------
2023-07-03 15:28:16,525 fedbiomed INFO - Downloading model params after training on node_10797f2f-2524-4595-a1c6-f3c67e03add1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_cb213f31-0cac-4a9f-b435-0cd4f1aedcba.mpk
2023-07-03 15:28:16,529 fedbiomed DEBUG - download of file node_params_ceef462c-f386-482a-8d94-146ec0c85a26.mpk successful, with status code 200
2023-07-03 15:28:16,530 fedbiomed INFO - Downloading model params after training on node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1 - from http://localhost:8844/media/uploads/2023/07/03/node_params_c04a81c2-7eb1-46c8-9933-89d20fb37d2d.mpk
2023-07-03 15:28:16,533 fedbiomed DEBUG - download of file node_params_476676c1-eaa1-48eb-861c-383dd28d0095.mpk successful, with status code 200
2023-07-03 15:28:16,534 fedbiomed INFO - Nodes that successfully reply in round 14 ['node_10797f2f-2524-4595-a1c6-f3c67e03add1', 'node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1']
2023-07-03 15:28:16,548 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_5b849352-a583-4a67-8e3a-43fc3ddb9632.mpk successful, with status code 201
2023-07-03 15:28:16,549 fedbiomed INFO - Saved aggregated params for round 14 in /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_5b849352-a583-4a67-8e3a-43fc3ddb9632.mpk
2023-07-03 15:28:16,568 fedbiomed INFO - breakpoint for round 14 saved at /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0014
2023-07-03 15:28:16,568 fedbiomed INFO - Sending request
To: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Request: :Perform final validation on aggregated parameters
-----------------------------------------------------------------
2023-07-03 15:28:16,569 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:28:16,569 fedbiomed INFO - Sending request
To: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Request: :Perform final validation on aggregated parameters
-----------------------------------------------------------------
2023-07-03 15:28:16,570 fedbiomed DEBUG - researcher_dba292ff-efe1-40ad-a1c6-1a4d5f6bbd87
2023-07-03 15:28:16,756 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_10797f2f-2524-4595-a1c6-f3c67e03add1
Round 16 | Iteration: 1/1 (100%) | Samples: 6/6
Custom: 0.133379
---------
2023-07-03 15:28:17,402 fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: node_e27ab041-6133-4e2f-b0ce-afa0a0c59fa1
Round 16 | Iteration: 1/1 (100%) | Samples: 30/30
Custom: 0.118207
---------
15
Task 5: Validate on a local holdout set #
To ensure consistency and simplify our life, we try to reuse the already-available code as much as possible. Note that this process assumes that the held-out data is stored locally on the machine.
Create an instance of the global model#
First, we create an instance of the model using the parameters from the latest aggregation round.
exp_folder = exp.experimentation_folder()
#exp_folder = 'Experiment_0030'
breakpoint_num = num_rounds - 1
breakpoint_dir = os.path.join(environ['EXPERIMENTS_DIR'], exp_folder, f'breakpoint_{breakpoint_num:04d}')
Try it yourself!#
Call the
post_init
function to initialize the model in the newly loaded experiment’s training plan (Note, you will need to create aTrainingArgs
object)extract the model
call the model’s
load_state_dict
, by passing as argument the model weights obtained from the loaded experiment’s_aggregated_params
loaded_exp = Experiment.load_breakpoint(breakpoint_dir)
loaded_exp.training_plan().post_init(model_args, TrainingArgs(training_args, only_required=False))
valid_model = loaded_exp.training_plan().model()
valid_model.load_state_dict(loaded_exp._aggregated_params[num_rounds-1]['params']['model_weights'])
2023-07-03 15:28:26,595 fedbiomed DEBUG - found json file containing states at breakpoint_0014.json
2023-07-03 15:28:26,596 fedbiomed DEBUG - Experiment not fully configured yet: no valid training plan, training_plan_class=UNetTrainingPlan training_plan_class_path=None
2023-07-03 15:28:26,597 fedbiomed INFO - Checking data quality of federated datasets...
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-07-03 15:28:26,601 fedbiomed DEBUG - using native torch optimizer
2023-07-03 15:28:26,603 fedbiomed DEBUG - Model file has been saved: /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/my_model_7e777cc4-4831-4475-b13a-086530169ea5.py
2023-07-03 15:28:26,612 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/my_model_7e777cc4-4831-4475-b13a-086530169ea5.py successful, with status code 201
2023-07-03 15:28:26,623 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/aggregated_params_9bfda251-434c-4483-80cc-06d79f695b6a.mpk successful, with status code 201
2023-07-03 15:28:26,624 fedbiomed INFO - Removing tensorboard logs from previous experiment
2023-07-03 15:28:26,638 fedbiomed DEBUG - HTTP POST request of file /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0014/aggregated_params_current.mpk successful, with status code 201
2023-07-03 15:28:26,643 fedbiomed INFO - Experimentation reload from /home/jupyter-sharkovsky/fedbiomed/var/experiments/Experiment_0035/breakpoint_0014 successful!
2023-07-03 15:28:26,646 fedbiomed DEBUG - using native torch optimizer
<All keys matched successfully>
Define a validation data loader#
We extract the validation data loader from the training plan as well. This requires some knowledge about the internals of the MedicalFolderDataset
class. At the end of the process, calling the split
function with a ratio of 0 will return a data loader that loads all of the data.
Try it yourself!#
Call the training_data
function from the loaded experiment’s training plan to obtain a TorchDataManager
.
Use the _dataset
attribute of the data manager to instantiate a torch DataLoader
.
Note: use a batch size of 1 for local validation.
loaded_exp.training_plan().dataset_path = '/datasets/ixi/holdout'
val_data_manager = loaded_exp.training_plan().training_data(batch_size=1)
val_data_loader = DataLoader(val_data_manager._dataset, batch_size=1)
<class 'monai.transforms.utility.array.AddChannel'>: Class `AddChannel` has been deprecated since version 0.8. please use MetaTensor data type and monai.transforms.EnsureChannelFirst instead.
Compute the loss on validation images#
Try it yourself!#
Iterate over the validation dataset using the DataLoader defined above.
Compute the predictions by manually calling the model forward and
F.softmax
methods.Compute the loss by calling the
UNetTrainingPlan.get_dice_loss
function.Store the loss values in a list.
Attention: do not forget to set valid_model.eval()
and to use the torch.no_grad
context.
losses = []
valid_model.eval()
with torch.no_grad():
for (images, demographics), targets in val_data_loader:
image = images['T1']
target = targets['label']
logits = valid_model.forward(image)
prediction = F.softmax(logits, dim=1)
loss = UNetTrainingPlan.get_dice_loss(prediction, target)
losses.append(loss.detach().numpy())
loss_array = np.array(losses)
print(f'Minumum: {loss_array.min():0.3f} '
f'Mean: {loss_array.mean():0.3f} '
f'StdDev: {loss_array.std():0.3f} '
f'Maximum: {loss_array.max():0.3f} ')
Minumum: 0.111 Mean: 0.123 StdDev: 0.007 Maximum: 0.152
Visualize the outputs#
As a bonus, we visualize the outputs of our model on the holdout dataset.
val_data_loader_iter = iter(val_data_loader)
Try it yourself!#
Create a grid of 4x2
images. In each row, plot a slice of the original on the left overlayed with the ground truth, and the same slice of the original on the right overlayed with the prediction.
Insert the loss value in the title of the image.
Remember that to compute the predictions you need to pass the whole 3d image to model.forward
, not just a slice.
fig, ax = plt.subplots(4,2, figsize=(8,16))
slice_to_plot = 24
for i in range(4):
(image_modalities, demographics), target = next(val_data_loader_iter)
img = image_modalities['T1'][0, 0, ..., slice_to_plot]
ax[i][0].imshow(img, cmap='bone', interpolation='none')
label = target['label'][0, 1, ..., slice_to_plot]
plot = ax[i][0].imshow(np.ma.masked_where(
label < 0.5,
label), cmap='winter', alpha=0.5, interpolation='none', vmin=0., vmax=1.)
ax[i][0].invert_yaxis()
ax[i][0].set_title('Ground truth overlay')
ax[i][1].imshow(img, cmap='bone', interpolation='none')
prediction_3d = F.softmax(valid_model.forward(image_modalities['T1']), dim=1)
prediction = prediction_3d[0, 1, ..., slice_to_plot].detach().numpy()
plot = ax[i][1].imshow(np.ma.masked_where(
prediction < 0.5,
prediction), cmap='winter', alpha=0.5, interpolation='none', vmin=0., vmax=1.)
ax[i][1].invert_yaxis()
ax[i][1].set_title(f'Prediction overlay - loss: {losses[i]:0.3f}')
fig.tight_layout(pad=3.0)
fig.colorbar(plot, ax=ax, location='bottom')
<matplotlib.colorbar.Colorbar at 0x7ff94ae6d340>