MedPerf Logo

Clinically Impactful Machine Learning

About our Platform

MedPerf is an open-source platform for benchmarking AI models to deliver clinical efficacy. How it works and who’s involved:

  • 1. Benchmark Committee

    Domain experts organize and kick off an initiative.

  • 2. Researchers

    Algorithm holders prepare their algorithms to test against real-world 
clinical data.

  • 3. Clinicians

    Data providers run the algorithms against their population data.

  • 4. Results

    Proven global benchmark data assists in future medical research.

Powered byGo to MLCommons

What’s New

  • MedPerf Paper Published in Nature Machine Intelligence

    MedPerf Open Benchmarking Platform for Medical AI on Nature Machine Intelligence

    Read more
  • MedPerf Supports a Large Neuro-Oncology Study

    RANO, the clinical authority for neuro-oncology in the world, will use MedPerf for assessment of brain tumor treatment outcomes.

    Read more
  • MedPerf Supports BraTS 2023

    MedPerf is going to be used again for the BraTS 2023 challenge

    Read more

Our Goal

Build powerful medical benchmarks securely

MedPerf is an open-source framework for benchmarking AI models to deliver clinical efficacy while prioritizing patient privacy and mitigating legal and regulatory risks. It enables federated evaluation in which AI models are securely distributed to various facilities for evaluation.

The MedPerf approach empowers healthcare organizations to assess and verify the performance of AI models in an efficient and human-supervised process without sharing any patient data across facilities during the process. It reduces the risks and costs associated with data sharing, towards maximizing medical and patient outcomes.

MedPerf provides the end-to-end toolchain you’ll need to get involved: from organizing the experiment, to carrying it out, to producing results. We provide the database storage, a REST API, researcher tooling to package algorithms to run with any data, and clinician tooling to easily run the algorithms without patient data ever leaving their premises.

MedPerf is licensed under Apache license, making it free to use for any purpose, redistribute, or modify.

1. Benchmark Committee

Improve a healthcare problem through validated experiments

A benchmark committee consists of groups of experts (e.g., clinicians, patient representative groups, regulators) and data or model owners wishing to drive the evaluation of their model or data. Initiating or joining a  benchmark committee in the MedPerf environment allows you to:

  • Hold a leading role in the MedPerf community by defining specifications of a benchmark for a particular medical AI task.
  • Get support to create a strong community around a specific area.
  • Creating guidelines to generate impactful AI in a specific area
  • Help improve best practices in your area of interest.

Interested in forming or joining a benchmark committee?

Initiate or join a benchmark committee
2. Researchers

Test algorithms against real-world data

If you are an AI researcher or software vendor that holds a trained medical AI model and want to evaluate its performance, with MedPerf you can:

  • Measure model performance on private datasets that you would never have access to
  • Connect to specific clinicians that can help you increase the performance of your model
  • Demonstrate results were obtained in a reliable manner and that they can be reproduced
Federated Tumor Segmentation (FeTS) Initiative
Task: Brain Tumor Segmentation Clinical data: from 31 locations worldwide
Global map with markers for all clinical locations

Interested in testing your algorithm?

Join as a researcher
3. Clinicians

Evaluate how well AI models perform on your patients’ data

Data providers include hospitals, medical practices, research organizations, and healthcare insurance providers that own medical data. If you fit into this category, you can take advantage of MedPerf to:

  • Evaluate how well AI models perform on your patient population’s data.
  • Connect to researchers to help them improve medical machine learning in a specific domain.
  • Help define impactful medical machine learning benchmarks, which turn into real-world results.

Interested in connecting with a machine learning initiative?

Join as a clinician
4. Results

Trusted results for future clinical impact

Benchmark results empower and enable leaders to improve healthcare outcomes: patient outcomes, clinical workflows, cost reductions, etc.

  • Well-defined and clinically meaningful benchmark metrics can identify gaps and drive innovation and impact.
  • Benchmark results can be publicly available or private, depending on the specification of the benchmark.
  • Benchmark results preserve privacy because they are simply aggregated (i.e., one score).

Trusting the Results:


The benchmark committee gives us the right tasks and the right metrics; the medical Machine Learning research community gives us the state-of-the-art algorithms; and the evaluating hospitals give us the right real-world data with meaningful diversity. This leads to our results being future-forward & clinic-ready.

Learn More

Want to get further involved?

We have an internal working group focused on current projects and goals, and can also connect you with other groups in starting a new experiment.

MedPerf Logo

Powered by ML Commons

© 2020 – 2024 MLCommons
Privacy Policy