Abstract

The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders.

Methodology

 


The figure illustrates the pipeline of our proposed JointViT, which comprises a balancing augmentation and a plain Vision Transformer with a joint loss for supervision. The classes are denoted as numbers in Kermany v3 dataset and alphabets in Prog-OCTA. GT abbreviates ground truth.


Dataset

 


The figure illustrates OCTA instances corresponding to each level of SaO2 in Prog-OCTA dataset.


 

 


The figure illustrates the SaO2 value of patients in Prog-OCTA dataset, and the distribution of the dataset is imbalanced and apparently has a lower average SaO2 than the normal people.


 

 


The classification of SaO2 shown in this table is widely accepted by health-related research and proposed by well-established guidelines.


 

 


The figure shows the long-tailed and imbalanced distribution of SaO2 classes in Prog-OCTA, with the borderline low class being predominant.


 

 


The figures depict four categories of OCT included in Kermany v3: Diabetic macular edema (DME), characterized by fluid accumulation in the macula due to diabetes; CNV (Choroidal Neovascularization), involving abnormal growth of blood vessels in the retina; Drusen, indicated by the accumulation of deposits comprised of lipids and proteins in the retina; and normal instances.


 

 


The table displays the number of instances in each class within the training set and testing set of the Kermany database v3.


 

 


The pie chart shows the proportion of different classes of OCT in the Kermany database v3.


Comparative Studies

 


We compared our proposed JointViT on Prog-OCTA dataset with well-established 2D and 3D methods which are widely used in medical imaging recognition. The results demonstrate that our method significantly outperforms others in predicting saturation levels using imbalanced OCTA data.


Ablation Studies

 


Ablation results in the table shows the impact of varying joint loss coefficient (λ) values on model performance, which indicates when λ is set to be 0.99, the model achieved optimal performance.


 

 


The figure visualizes the relationship between joint loss coefficient (λ) values and overall model performance, which indicates the optimal λ value is 0.99.


 

 


The table presents the results of conducting ablations involving the integration of different loss functions within the joint loss framework. The results indicate that our jointly designed loss function, which combines BCE and MSE losses, consistently achieves superior performance compared to alternative configurations.


 

 


We further explored the efficacy of employing the second-best backbone in comparative studies. Our findings reveal that our approach which used a ViT backbone maintains its position as the top-performing method.


 

 


The table demonstrates the significant performance improvement achieved by incorporating post-training with OCT images from the Kermany database v3, compared to the absence of post-training, thereby validating its efficacy as an initialization step for downstream OCTA recognition tasks.


 

 


The table showcases the efficacy of balancing augmentation in enhancing model performance, with notable improvements observed in various metrics.


BibTeX

@article{zhang2024jointvit,
  title={JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA},
  author={Zhang, Zeyu and Qi, Xuyin and Chen, Mingxi and Li, Guangxi and Pham, Ryan and Zuhair, Ayub and Berry, Ella and Liao, Zhibin and Siggs, Owen and Mclaughlin, Robert and others},
  journal={arXiv preprint arXiv:2404.11525},
  year={2024}
}