When hospitals cannot share data, can deep learning models still be trained collaboratively?

13.05.2026

A stroke patient arrives at the emergency room. The doctor needs to quickly locate the damaged tissue through brain MRI images and formulate a treatment plan. If an AI model could automatically mark the lesion area, it could save valuable time for diagnosis. However, training such a model requires imaging data from multiple hospitals, and privacy regulations prevent hospitals from sharing patient images. This study explored whether federated learning could enable multiple institutions to collaborate on training AI models without sharing any patient data, and applied it to the challenging task of stroke lesion segmentation.

Every year, millions of people worldwide are affected by stroke. Of these, ischemic stroke accounts for approximately 87% of all cases (Luo et al. 2024). When a stroke occurs, brain tissue begins to sustain damage as it loses blood supply. Doctors rely on magnetic resonance imaging (MRI) to determine the location and extent of the damage. Different MRI sequences can capture different aspects of the injury. For instance, diffusion-weighted imaging (DWI) is particularly sensitive to early tissue changes, while the fluid-attenuated inversion recovery sequence (FLAIR) can provide supplementary information about the lesion.

Manual delineation of these damaged areas on brain images, known as lesion segmentation, is extremely time-consuming and requires specialized knowledge. AI models have the potential to automate this process, but they need to learn from a large number of different samples to perform reliably.

Why cannot hospitals directly share data?

Medical images contain sensitive patient information, and strict privacy regulations limit how hospitals handle this data. Even if institutions wish to collaborate on improving AI tools, they usually cannot transfer patient images to a centralized location. Each hospital may only have a limited number of stroke cases, not enough to independently train a reliable AI model, while combining data from multiple hospitals is restricted by privacy regulations.

Federated learning offers a way out. Each hospital trains the AI model using its own local data and only sends the model updates to a central server to be combined. The original patient data never leaves the hospital. The server integrates updates from all parties into an improved global model, which is then distributed back to each hospital for further training.

Why is it so difficult to segment stroke lesions?

Stroke lesions are not uniform. Among different patients, the size, shape, location and boundary clarity of lesions vary greatly. Some lesions are large and well-defined, while others are small, elongated or have blurry boundaries. Even experienced radiologists sometimes disagree on the exact boundaries of a lesion.

To add further complexity, the appearance of stroke lesions on MRI changes over time (Luo et al. 2024). Images taken one day after onset and those taken one week later look completely different. This high variability makes stroke lesion segmentation one of the most challenging problems in medical image analysis.

What did this research do?

This study used the publicly available ISLES 2022 dataset, which contains 250 stroke MRI cases from multiple clinical centers, each with expert-annotated lesion boundaries (Hernandez Petzsche et al. 2022). To simulate a multi-institutional collaboration scenario, the data was distributed across three virtual participants. Each trained the model locally, and a central server periodically combined their updates. After training, the study evaluated the final model on a set of independent cases not used during training.

What did the results show?

The model achieved a mean Dice of 0.58 on the validation set and 0.56 on the independent test set. Dice is a metric for measuring overlap, with 1.0 indicating a perfect match between the model prediction and the expert annotation, and 0.0 indicating no overlap at all. The two sets of results were very close, meaning the model did not simply memorize the training data but maintained similar performance on unseen cases.

However, performance varied widely across individual cases. Approximately one-third scored above 0.80, showing strong agreement with expert annotations, while about 40% scored below 0.50. Well-defined lesions tended to receive higher scores, while small or poorly defined lesions were more challenging for the model. This shows that stroke lesion segmentation remains a difficult task, especially when dealing with atypical cases.

Compared with the Dice of 0.82 achieved by DeepISLES (de la Rosa et al. 2025), a gap remains. However, this comparison is not entirely fair. DeepISLES combined multiple independently optimized models trained on larger and more diverse data, while this study used a single model in a small-scale simulation.

What does this mean for the future?

The main value of this study lies not in achieving the highest segmentation accuracy, but in demonstrating that federated learning can work for stroke MRI segmentation. The entire workflow was built on open-source tools and produced stable, reproducible results.

For hospitals and clinical researchers, this means that collaborative training of AI models without sharing patient data is technically feasible. Future research can further advance this direction by including more participating institutions, testing different collaboration strategies and improving the evaluation system (Rangel & Martinez 2025).

This study is not the final answer, but it shows the direction is feasible. As privacy restrictions continue to tighten, collaborative approaches that do not rely on data sharing may become increasingly important.

References

de la Rosa, E.; Reyes, M.; Liew, S.-L. et al. 2025. DeepISLES: a clinically validated ischemic stroke segmentation model from the ISLES’22 challenge. Nature Communications 16(1), 7357. Available at: https://doi.org/10.1038/s41467-025-62373-x. Accessed 24 April 2026.

Hernandez Petzsche, M. R.; de la Rosa, E.; Hanning, U. et al. 2022. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Scientific Data 9(1), 762. Available at: https://doi.org/10.48550/arXiv.2206.06694. Accessed 24 April 2026.

Jiang Yan. 2026. Federated learning for multimodal MRI-based ischemic stroke segmentation. Bachelor’s Thesis. Turku University of Applied Sciences. Available at: https://urn.fi/URN:NBN:fi:amk-202604308587

Luo, J.; Dai, P.; He, Z. et al. 2024. Deep learning models for ischemic stroke lesion segmentation in medical images: A survey. Computers in Biology and Medicine 175, 108509. Available at: https://doi.org/10.1016/j.compbiomed.2024.108509. Accessed 24 April 2026.

Rangel, E. & Martinez, F. 2025. Federative ischemic stroke segmentation as alternative to overcome domain-shift multi-institution challenges. arXiv. Available at: https://arxiv.org/abs/2508.18296. Accessed 24 April 2026.

Image sources

The MRI data shown are from the ISLES 2022 public dataset (Hernandez Petzsche et al. 2022), and the visualizations were generated by the author. All other figures were created by the author.

Article is part of Turku UAS research group´s INSIGHT – Intelligent Sensing and Computing Technologies publications

When hospitals cannot share data, can deep learning models still be trained collaboratively?

Tekijät

Opettaja

Viittausohje

Teemat | Themes

ICT ja tuotantotalous

Liiketoiminta

Taideakatemia

Tekniikka

Terveys ja hyvinvointi

Tekniikka ja liiketoiminta