JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

Zeyu Zhang^1,2,*, Xuyin Qi³, Mingxi Chen⁴, Guangxi Li⁵, Ryan Pham³, Ayub Qassim¹, Ella Berry¹, Zhibin Liao³, Owen Siggs¹, Robert Mclaughlin³, Jamie Craig¹, Minh-Son To¹

¹ Flinders University ² The Australian National University
³ The University of Adelaide ⁴ Guangdong Technion - Israel Institute of Technology
⁵ The University of Sydney

MIUA 2024 Oral

^*Work done while being a student researcher at Flinders Health and Medical Research Institute, Flinders University.

Paper arXiv Code BibTeX

News:

(06/18/2024) 🎉 Our paper has been selected as an oral presentation at MIUA 2024!

(05/14/2024) 🎉 Our paper has been accepted to MIUA 2024!

Abstract

The oxygen saturation level in the blood (SaO₂) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO₂ is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders.

Methodology

The figure illustrates the pipeline of our proposed JointViT, which comprises a balancing augmentation and a plain Vision Transformer with a joint loss for supervision. The classes are denoted as numbers in Kermany v3 dataset and alphabets in Prog-OCTA. GT abbreviates ground truth.

Dataset

The figure illustrates OCTA instances corresponding to each level of SaO₂ in Prog-OCTA dataset.

The figure illustrates the SaO₂ value of patients in Prog-OCTA dataset, and the distribution of the dataset is imbalanced and apparently has a lower average SaO₂ than the normal people.

The classification of SaO₂ shown in this table is widely accepted by health-related research and proposed by well-established guidelines.

The figure shows the long-tailed and imbalanced distribution of SaO₂ classes in Prog-OCTA, with the borderline low class being predominant.

The figures depict four categories of OCT included in Kermany v3: Diabetic macular edema (DME), characterized by fluid accumulation in the macula due to diabetes; CNV (Choroidal Neovascularization), involving abnormal growth of blood vessels in the retina; Drusen, indicated by the accumulation of deposits comprised of lipids and proteins in the retina; and normal instances.

The table displays the number of instances in each class within the training set and testing set of the Kermany database v3.

The pie chart shows the proportion of different classes of OCT in the Kermany database v3.

Comparative Studies

We compared our proposed JointViT on Prog-OCTA dataset with well-established 2D and 3D methods which are widely used in medical imaging recognition. The results demonstrate that our method significantly outperforms others in predicting saturation levels using imbalanced OCTA data.

Ablation Studies

Ablation results in the table shows the impact of varying joint loss coefficient (λ) values on model performance, which indicates when λ is set to be 0.99, the model achieved optimal performance.

The figure visualizes the relationship between joint loss coefficient (λ) values and overall model performance, which indicates the optimal λ value is 0.99.

The table presents the results of conducting ablations involving the integration of different loss functions within the joint loss framework. The results indicate that our jointly designed loss function, which combines BCE and MSE losses, consistently achieves superior performance compared to alternative configurations.

We further explored the efficacy of employing the second-best backbone in comparative studies. Our findings reveal that our approach which used a ViT backbone maintains its position as the top-performing method.

The table demonstrates the significant performance improvement achieved by incorporating post-training with OCT images from the Kermany database v3, compared to the absence of post-training, thereby validating its efficacy as an initialization step for downstream OCTA recognition tasks.

The table showcases the efficacy of balancing augmentation in enhancing model performance, with notable improvements observed in various metrics.

Poster

BibTeX

@inproceedings{zhang2024jointvit,
  title={Jointvit: Modeling oxygen saturation levels with joint supervision on long-tailed octa},
  author={Zhang, Zeyu and Qi, Xuyin and Chen, Mingxi and Li, Guangxi and Pham, Ryan and Qassim, Ayub and Berry, Ella and Liao, Zhibin and Siggs, Owen and Mclaughlin, Robert and others},
  booktitle={Annual Conference on Medical Image Understanding and Analysis},
  pages={158--172},
  year={2024},
  organization={Springer}
}