Federated learning across patient data from distinct sites
The E2CC-AI4RDP project aims to improve the diagnosis, prognosis and treatment of rare diseases using computational approaches and cutting-edge developments based on large-scale analysis of healthcare Big Data coupled with Artificial Intelligence. It involves the development and validation of security and access protocols, learning algorithms and robust AI based on federated learning.
Solutions to be developed and digital technologies planned
The E2CC-AI4RDP project is a use case of the larger E2CC project, which will provide the core technological referential for interconnecting with Cloud providers, including Cybersecurity, Decarbonization, Orchestration and Platform functions. The project will implement its own use-case aimed at improving and validating the standardized E2CC platform.
In the EU, up to 36 million people live with a rare disease, around 80% of which are genetic in origin. To date, more than 5,000 different rare genetic diseases have been described. However, for around 50% of these diseases, identification of the gene or genetic variant responsible is still lacking, and patients and their families are therefore often faced with what is known as "diagnostic wandering", despite the genetic tests available.
There is therefore an urgent need to improve and accelerate early diagnosis rates, for which approaches based on artificial intelligence for the clinical assessment of genetic variants could represent a major advance. The E2CC project will provide the technological basis to the E2CC-AI4RDP project in order to carry out secure and efficient federated learning strategies across independent health and research institutions that will be ultimately allow to enhance predictive performance.
Contribution to increase energy efficiency and sustainability in Europe
Health and genomics data are sensitive data subject to numerous restrictions for data exchange and use. The E2CC-AI4RDP will allow to implement federated learning strategies across secure european cloud environments where no data exchange take place, but only learning parameters and exchange.
The E2CC-AI4RDP will build upon the E2CC project platform. E2CC design is driven by energy efficiency concerns, in order to deliver sustainable Edge-Cloud based solutions managed by a decarbonization platform.
This platform will evaluate the environment impact of IT infrastructure thanks to carbon emission data collection. It will perform footprint and energy consumption calculation on hardware but also on software, in order to enable end to end understanding of emissions and automated energy aware scheduling.
Prospects and plan to transfer results into practice
Institut Imagine has an in-house innovation and development department that will monitor developments linked to the project, the need to protect the results and the valorization strategy.
The way in which the results are protected (patent, APP filing, open-source licence, etc.) will depend on their type; methods, software, or biological results from the project that are likely to have diagnostic or therapeutic applications.
As for methods and data analysis software in particular, open-source strategies could be explored, while at the same time allowing the results to be exploited commercially. Institut Imagine has experience of this type of strategy. In a similar context, linked to the development of the 'Dr Warehouse' solution, the code of which is distributed as open source, a start-up (CODOC) was created by Imagine with the aim of enabling the rapid deployment of this solution to the hospital community by offering installation, training, maintenance and development services for specific modules.
AI4RDP use case and impact
The main use-case of the E2CC-AI4RDP project involves the federated learning on large amounts of distributed genomics and health data with the aim of stratifying rare-disease patients for differential diagnosis and personalized clinical follow-up.
Its developments will have a major impact on biomedical research, since they will enable the development of technical and logistical resources for the secure use of Big Data; they will also have an impact on patients, care and the care pathway, since the developments will make it possible to speed up the diagnosis, prognosis and treatment of patients suffering from genetic diseases.
These general developments will have an influence on other pathologies and will be exploitable beyond the Imagine Institute, at a European scale.