publications | Md Rezwanul Haque

2025

ICCV 2025
A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition

Md Rezwanul Haque, Md Milon Islam, SM Raju, and 1 more author

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Abs Bib HTML PDF

Continuous Sign Language Recognition (CSLR) faces multiple challenges, including significant inter-signer variability and poor generalization to novel sentence structures. Traditional solutions frequently fail to handle these issues efficiently. For overcoming these constraints, we propose a dual-architecture framework. For the Signer-Independent (SI) challenge, we propose a Signer-Invariant Conformer that combines convolutions with multi-head self-attention to learn robust, signer-agnostic representations from pose-based skeletal keypoints. For the Unseen-Sentences (US) task, we designed a Multi-Scale Fusion Transformer with a novel dual-path temporal encoder that captures both fine-grained posture dynamics, enabling the model’s ability to comprehend novel grammatical compositions. Experiments on the challenging Isharah-1000 dataset establish a new standard for both CSLR benchmarks. The proposed conformer architecture achieves a Word Error Rate (WER) of 13.07% on the SI challenge, a reduction of 13.53% from the state-of-the-art. On the US task, the transformer model scores a WER of 47.78%, surpassing previous work. In the SignEval 2025 CSLR challenge, our team placed 2nd in the US task and 4th in the SI task, demonstrating the performance of these models. The findings validate our key hypothesis: that developing task-specific networks designed for the particular challenges of CSLR leads to considerable performance improvements and establishes a new baseline for further research. The source code is available at: https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah.
@inproceedings{haque2025signer, title = {A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition}, author = {Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Karray, Fakhri}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages = {4931--4940}, year = {2025}, }
ICCV 2025
FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition

Md Milon Islam, Md Rezwanul Haque, SM Raju, and 1 more author

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Abs Bib HTML PDF

Accurate recognition of sign language in healthcare communication poses a significant challenge, requiring frameworks that can accurately interpret complex multimodal gestures. To deal with this, we propose FusionEnsemble-Net, a novel attention-based ensemble of spatiotemporal networks that dynamically fuses visual and motion data to enhance recognition accuracy. The proposed approach processes RGB video and range Doppler map radar modalities synchronously through four different spatiotemporal networks. For each network, features from both modalities are continuously fused using an attention-based fusion module before being fed into an ensemble of classifiers. Finally, the outputs of these four different fused channels are combined in an ensemble classification head, thereby enhancing the model’s robustness. Experiments demonstrate that FusionEnsemble-Net outperforms state-of-the-art approaches with a test accuracy of 99.44% on the large-scale MultiMeDaLIS dataset for Italian Sign Language. Our findings indicate that an ensemble of diverse spatiotemporal networks, unified by attention-based fusion, yields a robust and accurate framework for complex, multimodal isolated gesture recognition tasks. The source code is available at: https://github.com/rezwanh001/Multimodal-Isolated-Italian-Sign-Language-Recognition.
@inproceedings{islam2025fusionensemble, title = {FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition}, author = {Islam, Md Milon and Haque, Md Rezwanul and Raju, SM and Karray, Fakhri}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages = {4924--4930}, year = {2025}, }
IEEE SMC 2025
MDD-Net: Multimodal Depression Detection through Mutual Transformer

Md Rezwanul Haque, Md Milon Islam, SM Raju, and 3 more authors

arXiv preprint arXiv:2508.08093, 2025

Abs Bib HTML PDF

Depression is a major mental health condition that severely impacts the emotional and physical well-being of individuals. The simple nature of data collection from social media platforms has attracted significant interest in properly utilizing this information for mental health research. A Multimodal Depression Detection Network (MDD-Net), utilizing acoustic and visual data obtained from social media networks, is proposed in this work where mutual transformers are exploited to efficiently extract and fuse multimodal features for efficient depression detection. The MDD-Net consists of four core modules: an acoustic feature extraction module for retrieving relevant acoustic attributes, a visual feature extraction module for extracting significant high-level patterns, a mutual transformer for computing the correlations among the generated features and fusing these features from multiple modalities, and a detection layer for detecting depression using the fused feature representations. The extensive experiments are performed using the multimodal D-Vlog dataset, and the findings reveal that the developed multimodal depression detection network surpasses the state-of-the-art by up to 17.37% for F1-Score, demonstrating the greater performance of the proposed system. The source code is accessible at https://github.com/rezwanh001/Multimodal-Depression-Detection.
@article{haque2025mdd, title = {MDD-Net: Multimodal Depression Detection through Mutual Transformer}, author = {Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Altaheri, Hamdi and Nassar, Lobna and Karray, Fakhri}, journal = {arXiv preprint arXiv:2508.08093}, year = {2025}, }
IEEE SMC 2025
MMFformer: Multimodal Fusion Transformer Network for Depression Detection

Md Rezwanul Haque, Md Milon Islam, SM Raju, and 3 more authors

arXiv preprint arXiv:2508.06701, 2025

Abs Bib HTML PDF

Depression is a serious mental health illness that significantly affects an individual’s well-being and quality of life, making early detection crucial for adequate care and treatment. Detecting depression is often difficult, as it is based primarily on subjective evaluations during clinical interviews. Hence, the early diagnosis of depression, thanks to the content of social networks, has become a prominent research area. The extensive and diverse nature of user-generated information poses a significant challenge, limiting the accurate extraction of relevant temporal information and the effective fusion of data across multiple modalities. This paper introduces MMFformer, a multimodal depression detection network designed to retrieve depressive spatio-temporal high-level patterns from multimodal social media information. The transformer network with residual connections captures spatial features from videos, and a transformer encoder is exploited to design important temporal dynamics in audio. Moreover, the fusion architecture fused the extracted features through late and intermediate fusion strategies to find out the most relevant intermodal correlations among them. Finally, the proposed network is assessed on two large-scale depression detection datasets, and the results clearly reveal that it surpasses existing state-of-the-art approaches, improving the F1-Score by 13.92% for D-Vlog dataset and 7.74% for LMVD dataset. The code is made available publicly at https://github.com/rezwanh001/Large-Scale-Multimodal-Depression-Detection.
@article{haque2025mmfformer, title = {MMFformer: Multimodal Fusion Transformer Network for Depression Detection}, author = {Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Altaheri, Hamdi and Nassar, Lobna and Karray, Fakhri}, journal = {arXiv preprint arXiv:2508.06701}, year = {2025}, }
IJCNN 2025
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning

SM Raju, Md Milon Islam, Md Rezwanul Haque, and 2 more authors

arXiv preprint arXiv:2507.07006, 2025

Abs Bib HTML PDF

Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model’s input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.
@article{raju2025gnn, title = {GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning}, author = {Raju, SM and Islam, Md Milon and Haque, Md Rezwanul and Altaheri, Hamdi and Karray, Fakhri}, journal = {arXiv preprint arXiv:2507.07006}, year = {2025}, }

2023

ICCIT 2023

Body Weight Estimation Using Smartphone Based Photoplethysmography Signal

Md Rezwanul Haque, MHC Ritom, AA Noman, and 3 more authors

In 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023

Bib HTML

@inproceedings{haque2023body,
  title = {Body Weight Estimation Using Smartphone Based Photoplethysmography Signal},
  author = {Haque, Md Rezwanul and Ritom, MHC and Noman, AA and Haque, E and Ahmed, F and others},
  booktitle = {2023 26th International Conference on Computer and Information Technology (ICCIT)},
  pages = {1--6},
  year = {2023},
  organization = {IEEE},
}

ATC 2023

Smartphone Based BP Level Monitoring System Using DNN Model

Md Rezwanul Haque, Abdullah Al Noman, Emranul Haque, and 2 more authors

In 2023 International Conference on Advanced Technologies for Communications (ATC), 2023

Bib HTML

@inproceedings{haque2023smartphone,
  title = {Smartphone Based BP Level Monitoring System Using DNN Model},
  author = {Haque, Md Rezwanul and Al Noman, Abdullah and Haque, Emranul and Ahmed, Feroz and others},
  booktitle = {2023 International Conference on Advanced Technologies for Communications (ATC)},
  pages = {12--18},
  year = {2023},
  organization = {IEEE},
}

ICDAR 2023

Badlad: A large multi-domain bengali document layout analysis dataset

Md Istiak Hossain Shihab, Md Rakibul Hasan, Mahfuzur Rahman Emon, and 10 more authors

In International Conference on Document Analysis and Recognition, 2023

Bib HTML

@inproceedings{shihab2023badlad,
  title = {Badlad: A large multi-domain bengali document layout analysis dataset},
  author = {Shihab, Md Istiak Hossain and Hasan, Md Rakibul and Emon, Mahfuzur Rahman and Hossen, Syed Mobassir and Ansary, Md Nazmuddoha and Ahmed, Intesur and Rakib, Fazle Rabbi and Dhruvo, Shahriar Elahi and Dip, Souhardya Saha and Pavel, Akib Hasan and Meghla, Marsia Haque and Haque, Md Rezwanul and others},
  booktitle = {International Conference on Document Analysis and Recognition},
  pages = {326--341},
  year = {2023},
  organization = {Springer},
}

2021

IEEE Access

Corrections to" A Novel Technique for Non-Invasive Measurement of Human Blood Component Levels From Fingertip Video Using DNN Based Models".

Md Rezwanul Haque, SM Taslim Uddin Raju, MD Asaf-Uddowla Golap, and 1 more author

IEEE Access, 2021

Bib PDF

@article{haque2021corrections,
  title = {Corrections to" A Novel Technique for Non-Invasive Measurement of Human Blood Component Levels From Fingertip Video Using DNN Based Models".},
  author = {Haque, Md Rezwanul and Raju, SM Taslim Uddin and Golap, MD Asaf-Uddowla and Hashem, MMA},
  journal = {IEEE Access},
  volume = {9},
  pages = {84178--84179},
  year = {2021},
}

Springer

Prediction of cervical cancer from behavior risk using machine learning techniques

Laboni Akter, Md Milon Islam, Mabrook S Al-Rakhami, and 1 more author

SN Computer Science, 2021

Bib HTML

@article{akter2021prediction,
  title = {Prediction of cervical cancer from behavior risk using machine learning techniques},
  author = {Akter, Laboni and Islam, Md Milon and Al-Rakhami, Mabrook S and Haque, Md Rezwanul},
  journal = {SN Computer Science},
  volume = {2},
  number = {3},
  pages = {177},
  year = {2021},
  publisher = {Springer},
}

Elsevier

Hemoglobin and glucose level estimation from PPG characteristics features of fingertip video using MGGP-based model

Md Asaf-uddowla Golap, SM Taslim Uddin Raju, Md Rezwanul Haque, and 1 more author

Biomedical Signal Processing and Control, 2021

Bib HTML PDF

@article{golap2021hemoglobin,
  title = {Hemoglobin and glucose level estimation from PPG characteristics features of fingertip video using MGGP-based model},
  author = {Golap, Md Asaf-uddowla and Raju, SM Taslim Uddin and Haque, Md Rezwanul and Hashem, MMA},
  journal = {Biomedical Signal Processing and Control},
  volume = {67},
  pages = {102478},
  year = {2021},
  publisher = {Elsevier},
}

Springer

Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic

Shah Muhammad Azmat Ullah, Md Milon Islam, Saifuddin Mahmud, and 3 more authors

SN Computer Science, 2021

Bib HTML

@article{ullah2021scalable,
  title = {Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic},
  author = {Ullah, Shah Muhammad Azmat and Islam, Md Milon and Mahmud, Saifuddin and Nooruddin, Sheikh and Raju, SM Taslim Uddin and Haque, Md Rezwanul},
  journal = {SN Computer Science},
  volume = {2},
  pages = {1--8},
  year = {2021},
  publisher = {Springer},
}

IEEE Access
A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models

Md Rezwanul Haque, SM Taslim Uddin Raju, Md Asaf-Uddowla Golap, and 1 more author

IEEE Access, 2021

Abs Bib HTML PDF

Blood components such as hemoglobin, glucose, and creatinine are essential for monitoring one’s health condition. The current blood component measurement approaches still depend on invasive techniques that are painful and uncomfortable for patients. To facilitate measurement at home, we proposed a novel non-invasive technique to measure blood hemoglobin, glucose, and creatinine levels based on Photoplethysmography (PPG) signals using Deep Neural Networks (DNN). Fingertip videos from 93 subjects have been collected using a smartphone. The PPG signal is generated from each video, and 46 characteristic features are then extracted from the PPG signal, its derivatives (1st and 2nd), and from Fourier analysis. Additionally, age and gender are also included as features due to their significant effects on hemoglobin, glucose, and creatinine. A correlation-based feature selection (CFS) using genetic algorithms (GA) has been used to select the optimal features to avoid redundancy and overfitting. Finally, DNN-based models have been developed to estimate the blood Hemoglobin (Hb), Glucose (Gl), and Creatinine (Cr) levels from the selected features. The approach provides the best-estimated accuracy of R² = 0.922 for Hb, R² = 0.902 for Gl, and R² = 0.969 for Cr. Experimental results show that the proposed method is a suitable technique to be used clinically to measure human blood component levels without taking blood samples. This paper also reveals that smartphone-based PPG signals have great potential to measure different blood components.
@article{haque2021novel, title = {A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models}, author = {Haque, Md Rezwanul and Raju, SM Taslim Uddin and Golap, Md Asaf-Uddowla and Hashem, MMA}, journal = {IEEE Access}, volume = {9}, pages = {19025--19042}, year = {2021}, publisher = {IEEE}, }

2020

Springer

Deep learning applications to combat novel coronavirus (COVID-19) pandemic

Amanullah Asraf, Md Zabirul Islam, Md Rezwanul Haque, and 1 more author

SN Computer Science, 2020

Bib HTML

@article{asraf2020deep,
  title = {Deep learning applications to combat novel coronavirus (COVID-19) pandemic},
  author = {Asraf, Amanullah and Islam, Md Zabirul and Haque, Md Rezwanul and Islam, Md Milon},
  journal = {SN Computer Science},
  volume = {1},
  pages = {1--7},
  year = {2020},
  publisher = {Springer},
}

Springer

Breast cancer prediction: a comparative study using machine learning techniques

Md Milon Islam, Md Rezwanul Haque, Hasib Iqbal, and 3 more authors

SN Computer Science, 2020

Bib HTML

@article{islam2020breast,
  title = {Breast cancer prediction: a comparative study using machine learning techniques},
  author = {Islam, Md Milon and Haque, Md Rezwanul and Iqbal, Hasib and Hasan, Md Munirul and Hasan, Mahmudul and Kabir, Muhammad Nomani},
  journal = {SN Computer Science},
  volume = {1},
  pages = {1--14},
  year = {2020},
  publisher = {Springer},
}

2019

MECS Press

A computer vision based lane detection approach

Md Rezwanul Haque, Md Milon Islam, Kazi Saeed Alam, and 2 more authors

International Journal of Image, Graphics and Signal Processing, 2019

Bib PDF

@article{haque2019computer,
  title = {A computer vision based lane detection approach},
  author = {Haque, Md Rezwanul and Islam, Md Milon and Alam, Kazi Saeed and Iqbal, Hasib and Shaik, Md Ebrahim},
  journal = {International Journal of Image, Graphics and Signal Processing},
  volume = {10},
  number = {3},
  pages = {27},
  year = {2019},
  publisher = {Modern Education and Computer Science Press},
}

2018

IC4ME2 2018

Performance evaluation of random forests and artificial neural networks for the classification of liver disorder

Md Rezwanul Haque, Md Milon Islam, Hasib Iqbal, and 2 more authors

In 2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2), 2018

Bib HTML

@inproceedings{haque2018performance,
  title = {Performance evaluation of random forests and artificial neural networks for the classification of liver disorder},
  author = {Haque, Md Rezwanul and Islam, Md Milon and Iqbal, Hasib and Reza, Md Sumon and Hasan, Md Kamrul},
  booktitle = {2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2)},
  pages = {1--5},
  year = {2018},
  organization = {IEEE},
}

2017

R10-HTC 2017

Prediction of breast cancer using support vector machine and K-Nearest neighbors

Md Milon Islam, Hasib Iqbal, Md Rezwanul Haque, and 1 more author

In 2017 IEEE region 10 humanitarian technology conference (R10-HTC), 2017

Bib HTML

@inproceedings{islam2017prediction,
  title = {Prediction of breast cancer using support vector machine and K-Nearest neighbors},
  author = {Islam, Md Milon and Iqbal, Hasib and Haque, Md Rezwanul and Hasan, Md Kamrul},
  booktitle = {2017 IEEE region 10 humanitarian technology conference (R10-HTC)},
  pages = {226--229},
  year = {2017},
  organization = {IEEE},
}