Alzheimer Detection using Squeeze and Excitation Model
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based
Alzheimer’s Progression
Sheraz Waseem Umair Amir
github.com/UmairAmir/Alzheimer-Detection-23-
Abstract
Accurate staging of Alzheimer’s Disease (AD)
from MRI data remains a clinically significant yet
technically challenging task due to subtle, spatially diffuse brain changes. We propose a novel
Slice-Aware Vision Transformer (SE-ViT) architecture that integrates a Squeeze and Excitation
(SE) module to rank and select the most diagnostically salient MRI slices prior to classification.
Using the OASIS-2 dataset, we benchmark four
progressively refined models: a baseline ViT (E0), a binary SE-ViT (E-1), a direct 4-class SE-ViT
(E-2), and a hierarchical SE-ViT (E-3) that mirrors clinical diagnostic pipelines. Results show
that the SE module improves model specificity
and early-stage recall, particularly for the clinically critical ”Very Mild” cohort. The final hierarchical model achieves 74% accuracy and 0.72
F1 score, outperforming the baseline by over 10
percentage points. Our framework offers interpretable, anatomically grounded predictions and
sets the foundation for future extensions incorporating temporal modeling and multimodal fusion.
1. Introduction
Alzheimer’s disease (AD) is a long-term brain disorder that
causes memory loss and thinking problems, getting worse
over time. Tracking how the disease progresses, from normal aging to early and later stages of dementia, is very
important for starting treatment early and for testing new
drugs. MRI scans help doctors see changes in the brain,
such as shrinking of the hippocampus, larger brain cavities
(ventricles), and thinning of the brain’s outer layer (cortex).
However, these changes are often subtle, vary from person to person, and appear in different parts of the brain at
different times. Traditional analysis methods that rely on
manually selected features or specific brain regions often
fail to fully capture this complex and detailed information.
2. Research in the Field
Recent advances in deep learning have shifted the paradigm
towards end-to-end representation learning. Convolutional
Neural Networks (CNNs), Graph Neural Networks (GNNs),
and survival analysis hybrids have reported encouraging
results, but exhibit two persistent limitations:
1. Global token misweighting: Three-dimensional
CNNs process full MRI volumes indiscriminately, allocating equal importance to every slice—even though
medial temporal slices typically carry far more pathological signal than superior or inferior sections.
2. Limited long-range contextual modelling: CNN kernels possess finite receptive fields; capturing distant,
cross-regional dependencies (e.g., simultaneous hippocampal and ventricular changes) requires very deep
architectures with heavy parameter counts.
To address these shortcomings, Vision Transformers (ViTs)
have emerged as attractive alternatives. By decomposing
input images into patch tokens and processing them through
self-attention, ViTs model global interactions irrespective
of spatial distance, providing state-of-the-art performance
on natural image benchmarks and increasingly on medical
imaging tasks. Nevertheless, vanilla ViTs remain sliceagnostic when applied to volumetric MRI stacks, treating each token identically and ignoring domain knowledge
about slice saliency.
2.1. Project Motivation
We propose a Slice-Aware ViT framework enhanced with a
Squeeze and Excitation (SE) gating mechanism that:
1. Learns slice importance weights via channel-wise recalibration, effectively ranking 256 axial slices and
selecting the top-k salient ones during inference.
2. Trains jointly in a pipeline regime, where SE parameters are optimized using the downstream ViT classification loss, ensuring end-to-end gradient flow.
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
3. Supports both direct four-way classification and hierarchical staging, offering two complementary strategies:
• Experiment 1: Single-head ViT with a 4-class
softmax (Nondemented, Very Mild, Mild, Moderate Demented).
• Experiment 2: Two-stage model that first distinguishes Demented vs. Nondemented, then refines
Demented cases into severity sub-classes.
3.3. OASIS 2 Characteristics
The table below describes the characteristics of the OASIS-2
Dataset.
Our tests show that adding the SE module improves the
overall F1 score and sensitivity for each class, especially for
the less common Very Mild stage, while keeping the model
size reasonably small.
3. Dataset
The selection of an appropriate dataset is crucial for research
involving medical imaging, particularly for studies on conditions like Alzheimer’s disease. Two prominent datasets
frequently utilized in this domain are the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access
Series of Imaging Studies (OASIS). A comparison of these
datasets, highlighting their key characteristics, is presented
in the table below.
Figure 2. OASIS 2 Characteristics
3.1. Dataset Comparison
Figure 3. OASIS 2 Age wise distribution
Figure 1. Dataset Comparison between ADNI and OASIS
3.2. Selection Rationale
Although ADNI offers a larger cohort, we opted for OASIS
2 because of several key advantages: Complete Longitudinal Metadata with CDR scores available for every visit,
enabling precise disease progression labels; Homogeneous
Acquisition Protocol where all scans were captured on a
Siemens Vision 1.5.T scanner using identical MPRAGE parameters (TR=9.7ms, TE=4ms, flip=10°), reducing scannerinduced variance; Elderly-focused Cohort with subjects
aged 60–96 years better representing the typical onset window for Alzheimer’s pathology; and Ethical & Licensing
Clarity as the CC BY-SA licence permits derivative works
without additional Institutional Data Use Agreements.
Figure 4. OASIS 2 Slices
Figure 5. OASIS 2 Different Patients
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
3.4. Preprocessing Applied
Before feeding the MRI data into our model, we implemented a comprehensive preprocessing pipeline to standardize the images and remove potential confounding factors.
This multi-step approach was carefully designed to enhance
pathological features while minimizing technical variations
that could interfere with disease classification. Our preprocessing workflow consisted of the following sequential
steps:
pre-processing stack; subsequently, a Squeeze and Excitation (SE) gate ranks axial slices by diagnostic salience.
The gated subset then feeds a custom Vision Transformer
(ViT) that performs either (i) direct four-way staging or (ii)
a hierarchical two-stage classification, depending on the
experiment.
1. Spatial normalization: Resample to 1 mm3 , then
center-crop to 256 × 256 × 128.
2. Intensity normalization: Z-score per volume (µ = 0,
σ = 1).
3. Slice extraction: 256 axial slices retained for SE-ViT
pipeline.
This harmonized pipeline ensures that the model learns
pathology-specific features rather than scanner artifacts or
intensity drift.
Figure 6. Workflow Pipeline
3.5. Limitations & Mitigation
While OASIS 2 provides valuable advantages for our research, we acknowledge several limitations of our dataset
selection. These constraints represent important considerations that could affect the generalizability and scope of our
findings, and we have identified strategies to address them
in future work:
4.2. Slice Aware Gating with Squeeze and Excitation
Let S ∈ R256×H×W be the stack of axial slices for a subject.
The SE block performs:
1. Squeeze:
H W
1 XX
Si,h,w ,
HW
w=1
zi =
• Sample Size: OASIS 2 is modest compared with the
full ADNI dataset.
• Demographic Bias: Predominantly U.S. Caucasian
cohort; future work will validate on cross-site datasets.
• Single Modality: Only T1-weighted MRI is used;
multimodal fusion (e.g., PET, fMRI) remains future
scope.
i = 1, . . . , 256
h=1
producing a 256-dimensional global descriptor z.
2. Excitation:
w = σ(MLP(z)) ,
w ∈ (0, 1)256
where MLP is a fully connected layer with 256 →
64 → 256 dimensions with ReLU activation, and σ is
the Sigmoid function.
3. Top k selection:
By rigorously comparing ADNI and OASIS 2, we selected
the dataset that offers the most consistent longitudinal
ground truth and acquisition homogeneity, thereby maximizing internal validity for slice-aware transformer training.
4. Methodology
4.1. Data Pipeline Overview
Our entire workflow is organized as a slice-aware, end-toend learning pipeline. Each 3D T1-weighted MRI volume
in OASIS 2 is first standardized through a reproducible
⌈arg top 3(w)⌉
yields indices of the three most informative slices, S ′ .
All other slices are zeroed, ensuring backpropagation
still flows through w.
The SE parameters θSE are learned jointly with ViT parameters θViT via the downstream cross-entropy loss:
L=−
C
X
c=1
yc log ŷc ,
ŷ = fViT (S ′ ; θViT )
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
4.3. Vision Transformer Backbone
4.5.1. E XPERIMENT E-0: BASELINE A SSESSMENT
Our model uses a Vision Transformer (ViT) architecture as
the backbone. The specifications provided below.
Our investigation began with a baseline experiment utilizing a pre-existing Vision Transformer model from the Hugging Face repository
(fawadkhan/ViT FineTuned on ImagesOASIS),
which had been previously fine-tuned on the OASIS dataset.
This model implemented standard ViT architecture with
pre-trained weights, focusing solely on binary classification
(demented vs. non-demented). We evaluated this model on
our curated subset of OASIS-2 data to establish baseline
performance metrics.
• Patch projection: Each 128 × 128 slice is split into
16 × 16 patches, yielding 64 tokens per slice. Tokens
are flattened and linearly embedded to d = 768.
• Positional encodings: Learnable, added at the slice
token level to retain intra-slice geometry.
• Transformer encoder: 12 layers, 12-head selfattention, feed-forward width 3072 with GELU activation.
• Classification scheme:
– E0 /E1 (Binary) → single neuron sigmoid.
– E2 (Multiclass) → 4 neuron softmax.
– E3 (Hierarchical) → Stage 1: sigmoid; Stage 2:
3-way softmax attached to the same CLS embedding, enabling parameter sharing.
The results revealed suboptimal generalization, with accuracy falling significantly below reported benchmarks.
Through error analysis, we identified several contributing
factors:
• Divergent preprocessing protocols between the original
model training and our implementation
• Lack of slice-specific attention mechanisms to focus
on diagnostically relevant brain regions
• Inability to distinguish between progressive stages of
Alzheimer’s disease
4.4. Training Details
4.5.2. E XPERIMENT E-1: C USTOM SE-V I T FOR
B INARY C LASSIFICATION
Building upon lessons from E-0, we developed a custom
Vision Transformer implementation with integrated Squeeze
and Excitation (SE) gating mechanism. This model maintained the binary classification objective but introduced several key innovations:
• Implementation of an end-to-end differentiable
pipeline with full control over all architectural components
• Introduction of the novel SE gating module to automatically identify and prioritize the most diagnostically
informative MRI slices
Figure 7. Training Details
4.5. Experiments
Our experimental methodology involved four distinct model
configurations (E-0 through E-3) to systematically evaluate
different aspects of Alzheimer’s disease progression classification. Each experiment was designed to build upon insights
from the previous one, progressively enhancing the model
architecture to address specific challenges in MRI-based
AD staging.
The SE module specifically addressed the ”all slices are
equal” limitation of conventional ViTs, enabling the model
to focus computational resources on medial temporal regions where AD pathology is most evident. Binary classification performance improved significantly with this approach, achieving a 7.2% increase in F1 score compared to
E-0.
4.5.3. E XPERIMENT E-2: D IRECT M ULTI - CLASS
P ROGRESSION S TAGING
Experiment E-2 extended our SE-ViT architecture to perform direct four-way classification, distinguishing between
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
Nondemented, Very Mild, Mild, and Moderate-to-Severe
Dementia classes. The model architecture remained consistent with E-1, with the following modifications:
comparison. All models were trained until convergence with
epochs from 5-20.
• Replacement of the sigmoid output layer with a 4-class
softmax
• Implementation of class-weighted cross-entropy loss
to account for class imbalance
• Additional regularization via label smoothing (= 0.1)
to enhance generalization
• Modified learning rate schedule with longer warm-up
period to stabilize multi-class training
This experiment tested the hypothesis that direct multiclass staging could leverage inter-class relationships and
subtle progression markers that might be lost in binary classification. While overall accuracy remained competitive,
we observed challenges in differentiating between adjacent
severity classes, particularly between Very Mild and Mild
categories.
4.5.4. E XPERIMENT E-3: H IERARCHICAL S TAGING
A PPROACH
Our final experiment implemented a hierarchical classification strategy that mirrors clinical diagnostic procedures.
The SE-ViT architecture first performed binary classification (demented vs. non-demented), and then, only for subjects classified as demented, further differentiated between
Very Mild, Mild, and Moderate-to-Severe categories using
a secondary three-class softmax.
This approach offered several advantages:
• Better alignment with the clinical decision process
• Mitigation of class imbalance effects by separating the
initial binary decision
• Allowing the model to develop specialized feature representations for severity staging
• Parameter sharing between stages, reducing overall
model complexity
In particular, the hierarchical classifier achieved superior
performance for the challenging Very Mild class (9% improvement in recall), which has particular clinical value
for early intervention. The model maintained high precision across all severity levels while reducing false-positive
classifications within the critical mild cognitive impairment
spectrum.
Each experiment was carried out using identical train / validation splits (80% / 20% ) stratified by class to ensure fair
Figure 8. Experiments
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
5. Results
5.0.1. E XPERIMENT E-0: BASELINE A SSESSMENT
R ESULT
Our baseline experiment served as the initial benchmark to
evaluate how well existing ViT architectures could generalize to our curated OASIS-2 subset.
The model achieved an accuracy of 64.1% and an F1 score of
0.643 on the held-out test set. The corresponding confusion
matrix is shown in Figure 9.
Figure 10. Confusion matrix for SE-ViT model (E-1) on test set.
The model correctly classified 21 out of 23 Nondemented
patients (True Negatives) and misclassified only 2 as Demented (False Positives). Among the 16 Demented patients,
9 were correctly identified (True Positives), while 7 were
misclassified as Nondemented (False Negatives).
Figure 9. Confusion matrix for baseline ViT model on test set.
From the matrix, we observe that the model correctly identified 15 out of 23 Nondemented patients (True Negatives)
and misclassified 8 as Demented (False Positives). Among
16 Demented patients, 10 were correctly classified (True
Positives), while 6 were misclassified as Nondemented
(False Negatives).
These results highlight a noticeable asymmetry in classification performance. While the model showed moderate ability
to detect dementia, it struggled with specificity, as evidenced
by the high number of false positives. This underperformance likely stems from the model’s lack of slice-specific
attention which we introduced in our next experiment.
Compared to the baseline (E-0), the SE-ViT model significantly reduced false positives, demonstrating improved
specificity. This improvement is attributed to the SE module’s ability to rank and prioritize diagnostically informative axial slices—primarily from the medial temporal
lobe—during training and inference.
These results confirm that incorporating slice-level attention
not only aligns better with neuropathological expectations
but also enhances the model’s ability to focus on diseaserelevant patterns while ignoring non-contributory slices.
Figure 11 shows the frequency with which different axial
slices were selected by the SE module during validation. Notably, slices near indices 140–180 were chosen most often,
aligning with the location of medial temporal structures like
the hippocampus, which are known to exhibit early signs of
atrophy in Alzheimer’s disease.
5.0.2. E XPERIMENT E-1: SE-V I T B INARY
C LASSIFICATION R ESULT
In Experiment E-1, we introduced our custom Slice-Aware
Vision Transformer model enhanced with a Squeeze and
Excitation (SE) gating mechanism. Unlike the baseline ViT,
this model was trained end-to-end with full control over
the preprocessing pipeline, slice selection, and architectural
parameters.
The SE-ViT model achieved a notable improvement in performance, with an accuracy of 71.7% and an F1 score of
0.716 on the held-out test set. The corresponding confusion
matrix is shown in Figure 10.
Figure 11. Most frequently selected slices by SE module on the
validation set. Peaks around slices 140–180 reflect medial temporal
lobe importance.
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
5.0.3. E XPERIMENT E-2: D IRECT M ULTI - CLASS
P ROGRESSION S TAGING R ESULT
5.0.4. E XPERIMENT E-3: H IERARCHICAL S TAGING
R ESULT
In Experiment E-2, we adapted the SE-ViT architecture to
perform direct four-way classification of Alzheimer’s disease progression using the Clinical Dementia Rating (CDR)
scale. The model predicted one of four stages: Nondemented (CDR 0), Very Mild (CDR 0.5), Mild (CDR 1), and
Moderate or worse (CDR 2+). This setup sought to evaluate the model’s ability to distinguish fine-grained disease
severity levels directly.
In Experiment E-3, we adopted a hierarchical classification
strategy that mirrors real-world diagnostic workflows: the
model first performed binary classification (Demented vs.
Nondemented), followed by a second-stage classifier that determined the severity level for demented cases. Both stages
were trained jointly in a multi-head architecture, allowing
parameter sharing and optimizing for both coarse detection
and fine-grained staging.
The model achieved an accuracy of 64.1% and an F1 score
of 0.58 on the validation set. The corresponding confusion
matrix is shown in Figure 12.
The model achieved an accuracy of 74.0% and an F1 score
of 0.72 on the validation set. The resulting 4-class confusion
matrix is presented in Figure 14.
Figure 12. Confusion matrix for SE-ViT (E-2) in direct 4-class
progression staging.
The model correctly classified 20 out of 22 Nondemented patients. However, performance degraded across the dementia
spectrum: Only 5 out of 12 Very Mild cases were correctly
identified, with the rest misclassified as Nondemented and
all Mild cases were misclassified, predominantly as Nondemented. However, no Moderate cases were detected.
While overall accuracy remained comparable to the baseline, the model struggled to separate adjacent CDR stages.
This confusion is likely due to subtle anatomical differences
between early-stage classes, which the model found difficult
to resolve in a flat classification setup. The severe underperformance for Moderate cases further suggests a need for
stronger inductive bias or staging structure.
These shortcomings motivated the hierarchical setup explored in Experiment E-3, which separates dementia detection and severity estimation into distinct, dedicated tasks.
Figure 13. Confusion matrix for SE-ViT (E-3) hierarchical staging
model.
The model correctly classified 21 out of 23 Nondemented
subjects. Most notably, it achieved:
• Improved sensitivity to the Very Mild class: 6 out of
11 cases correctly classified (vs. 5/12 in E-2).
• Mild class detection with limited accuracy: 1 correct,
3 misclassified as Nondemented or Very Mild.
• No Moderate cases detected, likely due to class imbalance and limited training samples for that category.
These results represent a significant improvement in earlystage detection, particularly for the Very Mild class, which
holds high clinical value for early intervention. The separation of detection and staging allowed the model to specialize
in both tasks independently, leading to a more interpretable
and effective framework compared to flat softmax classification.
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
5.0.5. OVERALL C OMPARISON ACROSS E XPERIMENTS
To consolidate insights from our experimental pipeline, we
present a summary comparison in Table ??. Each model
reflects a stepwise refinement in architecture and training
strategy, progressing from a baseline ViT to slice-aware and
hierarchically structured approaches.
Figure 15. Interpreting the Performance Gains
These findings align with neuropathological evidence: the
mesial temporal lobe manifests earliest and most pronounced atrophy in AD. By allowing the SE gate to upweight such slices, we implicitly encode domain knowledge
without manual ROI segmentation, preserving end-to-end
differentiability.
6.2. Clinical Relevance
Figure 14. Experiment results
• Early stage sensitivity: SE ViT improves recall for the
Very Mild group, arguably the most clinically valuable
cohort. Detecting such prodromal changes can facilitate earlier lifestyle or pharmacological interventions.
• Workflow synergy: The hierarchical model mirrors
the diagnostic pipeline: radiologists first confirm “presence of dementia” (binary) before assigning severity
(ordinal). Embedding this structure in the model yields
a more intuitive user experience.
As seen in Figure 14, the baseline model (E-0) struggled
with false positives and lacked contextual focus, motivating
the integration of slice-level attention in E-1. The SE-ViT
binary classifier (E-1) significantly improved specificity by
leveraging medial temporal saliency. However, direct fourway staging in E-2 introduced confusion between adjacent
classes, particularly Mild and Moderate. The hierarchical
model (E-3) achieved the best balance between accuracy and
clinical relevance, demonstrating improved sensitivity to
early dementia stages and clearer separation of classification
responsibilities. This stepwise evolution confirms the value
of both anatomical priors and architectural structuring in
medical imaging pipelines.
6. Discussion
6.1. Interpreting the Performance Gains
The experimental series demonstrates that explicit slicelevel attention is a decisive factor in MRI-based Alzheimer’s
staging:
6.3. Limitations
1. Cohort Size & Demographics: OASIS-2 involves
≈150 subjects, predominantly Caucasian. This limits
statistical power and external validity across ethnic
groups or scanner vendors.
2. Longitudinal Ignorance: Our current framework
treats each session independently; temporal progression cues (atrophy trajectory) are not explicitly modeled.
3. Single Modality Constraint: MRI alone may not capture metabolic changes detectable via PET or CSF
biomarkers; multimodal fusion could enhance staging
accuracy.
4. Hyperparameter Sensitivity: Top-k = 3 was selected
heuristically. Although ablation shows it outperforms
k = 1, broader search (k = 2–5) may further optimize
the trade-off between information retention and noise.
Slice Aware Vision Transformer with Squeeze and Excitation for MRI Based Alzheimer’s Progression
5. Potential Slice Order Bias: Axial ordering is fixed;
pathologies in oblique planes might be overlooked. Incorporating multi-plane slices (axial, coronal, sagittal)
may provide a fuller context.
6.4. Future Work
• Temporal Transformers: Explore models like TimeSformer or recurrent Vision Transformers to capture how
a patient’s condition changes over time across multiple
scan sessions.
• Multimodal Extension: Combine MRI with other
data sources such as FDG-PET scans and genetic information using cross-modal attention to build a more
comprehensive diagnostic tool.
• Uncertainty Quantification: Add methods such as
Monte Carlo dropout or deep ensembles to estimate
how confident the model is in its predictions - an important step toward clinical reliability.
• Broader Applications: Our combined approach of SE
and Vision Transformers can also be applied to other
medical fields that use 3D scan data, such as CT or
MRI scans for cancer detection or orthopedic analysis.
7. Conclusion
To conclude, the proposed SE ViT pipeline advances MRIbased Alzheimer’s staging by combining slice-level saliency
learning with global transformer reasoning. Its enhanced
SE module highlights diagnostically relevant slices without
manual ROI selection, while the hierarchical classification
approach improves early-stage detection, particularly for
Very Mild cases. Moreover, its modular and generalizable
design extends to other 3D imaging tasks like tumor staging, orthopedic analysis, and organ assessments, supporting
future integration with multimodal data and temporal modeling.
8. Contributions
Dr. Murtaza Taj supervised the research, provided expert
guidance throughout the project, and offered critical feedback on the model design and evaluation. Sheraz and Umair
contributed equally to the work, each designing and conducting two key experiments, including model training, analysis,
and result interpretation. Yahya Khawaja offered architectural insights and advised on the training pipeline and
experimental setup. All authors discussed the results and
contributed to the final manuscript.
References
Abunadi, I. (2022).
Deep and hybrid learning of
MRI diagnosis for early detection of the progression
stages in Alzheimer’s disease. Connection Science,
34(1),-. https://doi.org/10.1080/-
Malik, I., Iqbal, A., Gu, Y. H., & Al-Antari, M. A. (2024).
Deep Learning for Alzheimer’s Disease prediction: A Comprehensive review. Diagnostics, 14(12), 1281. https:
//doi.org/10.3390/diagnostics-
Li, H., Habes, M., Wolk, D. A., & Fan, Y. (2019). A
deep learning model for early prediction of Alzheimer’s
disease dementia based on hippocampal magnetic resonance imaging data. Alzheimer’s & Dementia, 15(8),-. https://doi.org/10.1016/j.jalz-
Kim, M., Kim, J., Qu, J., Huang, H., Long, Q., Sohn, K.,
Kim, D., & Shen, L. (2021). Interpretable temporal graph
neural network for prognostic prediction of Alzheimer’s disease using longitudinal neuroimaging data. In 2021 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (pp-). https://doi.org/10.
1109/bibm-
Ocasio, E., & Duong, T. Q. (2021). Deep learning prediction
of mild cognitive impairment conversion to Alzheimer’s
disease at 3 years after diagnosis using longitudinal and
whole-brain 3D MRI. PeerJ Computer Science, 7, e560.
https://doi.org/10.7717/peerj-cs.560
fawadkhan. (n.d.). ViT FineTuned on ImagesOASIS.
Hugging Face.
https://huggingface.co/
fawadkhan/ViT_FineTuned_on_ImagesOASIS