Continuous Sign Language Recognition (CSLR) faces multiple challenges, including significant inter-signer variability and poor generalization to novel sentence structures. Traditional solutions frequently fail to handle these issues efficiently. For overcoming these constraints, we propose a dual-architecture framework. For the Signer-Independent (SI) challenge, we propose a Signer-Invariant Conformer that combines convolutions with multi-head self-attention to learn robust, signer-agnostic representations from pose-based skeletal keypoints. For the Unseen-Sentences (US) task, we designed a Multi-Scale Fusion Transformer with a novel dual-path temporal encoder that captures both fine-grained posture dynamics, enabling the model’s ability to comprehend novel grammatical compositions. Experiments on the challenging Isharah-1000 dataset establish a new standard for both CSLR benchmarks. The proposed conformer architecture achieves a Word Error Rate (WER) of 13.07% on the SI challenge, a reduction of 13.53% from the state-of-the-art. On the US task, the transformer model scores a WER of 47.78%, surpassing previous work. In the SignEval 2025 CSLR challenge, our team placed 2nd in the US task and 4th in the SI task, demonstrating the performance of these models. The findings validate our key hypothesis: that developing task-specific networks designed for the particular challenges of CSLR leads to considerable performance improvements and establishes a new baseline for further research. The source code is available at: https://github.com/rezwanh001/MSLR-Pose86K-CSLR-Isharah.
@inproceedings{haque2025signer,title={A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language Recognition},author={Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Karray, Fakhri},journal={arXiv preprint arXiv:2508.09372},year={2025},}
ICCV 2025 Workshop
FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition
Md Milon Islam, Md Rezwanul Haque, SM Raju, and 1 more author
Accurate recognition of sign language in healthcare communication poses a significant challenge, requiring frameworks that can accurately interpret complex multimodal gestures. To deal with this, we propose FusionEnsemble-Net, a novel attention-based ensemble of spatiotemporal networks that dynamically fuses visual and motion data to enhance recognition accuracy. The proposed approach processes RGB video and range Doppler map radar modalities synchronously through four different spatiotemporal networks. For each network, features from both modalities are continuously fused using an attention-based fusion module before being fed into an ensemble of classifiers. Finally, the outputs of these four different fused channels are combined in an ensemble classification head, thereby enhancing the model’s robustness. Experiments demonstrate that FusionEnsemble-Net outperforms state-of-the-art approaches with a test accuracy of 99.44% on the large-scale MultiMeDaLIS dataset for Italian Sign Language. Our findings indicate that an ensemble of diverse spatiotemporal networks, unified by attention-based fusion, yields a robust and accurate framework for complex, multimodal isolated gesture recognition tasks. The source code is available at: https://github.com/rezwanh001/Multimodal-Isolated-Italian-Sign-Language-Recognition.
@inproceedings{islam2025fusionensemble,title={FusionEnsemble-Net: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language Recognition},author={Islam, Md Milon and Haque, Md Rezwanul and Raju, SM and Karray, Fakhri},journal={arXiv preprint arXiv:2508.09362},year={2025},}
IEEE SMC 2025
MDD-Net: Multimodal Depression Detection through Mutual Transformer
Md Rezwanul Haque, Md Milon Islam, SM Raju, and 3 more authors
Depression is a major mental health condition that severely impacts the emotional and physical well-being of individuals. The simple nature of data collection from social media platforms has attracted significant interest in properly utilizing this information for mental health research. A Multimodal Depression Detection Network (MDD-Net), utilizing acoustic and visual data obtained from social media networks, is proposed in this work where mutual transformers are exploited to efficiently extract and fuse multimodal features for efficient depression detection. The MDD-Net consists of four core modules: an acoustic feature extraction module for retrieving relevant acoustic attributes, a visual feature extraction module for extracting significant high-level patterns, a mutual transformer for computing the correlations among the generated features and fusing these features from multiple modalities, and a detection layer for detecting depression using the fused feature representations. The extensive experiments are performed using the multimodal D-Vlog dataset, and the findings reveal that the developed multimodal depression detection network surpasses the state-of-the-art by up to 17.37% for F1-Score, demonstrating the greater performance of the proposed system. The source code is accessible at https://github.com/rezwanh001/Multimodal-Depression-Detection.
@article{haque2025mdd,title={MDD-Net: Multimodal Depression Detection through Mutual Transformer},author={Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Altaheri, Hamdi and Nassar, Lobna and Karray, Fakhri},journal={arXiv preprint arXiv:2508.08093},year={2025},}
IEEE SMC 2025
MMFformer: Multimodal Fusion Transformer Network for Depression Detection
Md Rezwanul Haque, Md Milon Islam, SM Raju, and 3 more authors
Depression is a serious mental health illness that significantly affects an individual’s well-being and quality of life, making early detection crucial for adequate care and treatment. Detecting depression is often difficult, as it is based primarily on subjective evaluations during clinical interviews. Hence, the early diagnosis of depression, thanks to the content of social networks, has become a prominent research area. The extensive and diverse nature of user-generated information poses a significant challenge, limiting the accurate extraction of relevant temporal information and the effective fusion of data across multiple modalities. This paper introduces MMFformer, a multimodal depression detection network designed to retrieve depressive spatio-temporal high-level patterns from multimodal social media information. The transformer network with residual connections captures spatial features from videos, and a transformer encoder is exploited to design important temporal dynamics in audio. Moreover, the fusion architecture fused the extracted features through late and intermediate fusion strategies to find out the most relevant intermodal correlations among them. Finally, the proposed network is assessed on two large-scale depression detection datasets, and the results clearly reveal that it surpasses existing state-of-the-art approaches, improving the F1-Score by 13.92% for D-Vlog dataset and 7.74% for LMVD dataset. The code is made available publicly at https://github.com/rezwanh001/Large-Scale-Multimodal-Depression-Detection.
@article{haque2025mmfformer,title={MMFformer: Multimodal Fusion Transformer Network for Depression Detection},author={Haque, Md Rezwanul and Islam, Md Milon and Raju, SM and Altaheri, Hamdi and Nassar, Lobna and Karray, Fakhri},journal={arXiv preprint arXiv:2508.06701},year={2025},}
IJCNN 2025
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning
SM Raju, Md Milon Islam, Md Rezwanul Haque, and 2 more authors
Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model’s input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.
@article{raju2025gnn,title={GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning},author={Raju, SM and Islam, Md Milon and Haque, Md Rezwanul and Altaheri, Hamdi and Karray, Fakhri},journal={arXiv preprint arXiv:2507.07006},year={2025},}
2023
ICCIT 2023
Body Weight Estimation Using Smartphone Based Photoplethysmography Signal
Md Rezwanul Haque, MHC Ritom, AA Noman, and 3 more authors
In 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023
@inproceedings{haque2023body,title={Body Weight Estimation Using Smartphone Based Photoplethysmography Signal},author={Haque, Md Rezwanul and Ritom, MHC and Noman, AA and Haque, E and Ahmed, F and others},booktitle={2023 26th International Conference on Computer and Information Technology (ICCIT)},pages={1--6},year={2023},organization={IEEE},}
ATC 2023
Smartphone Based BP Level Monitoring System Using DNN Model
Md Rezwanul Haque, Abdullah Al Noman, Emranul Haque, and 2 more authors
In 2023 International Conference on Advanced Technologies for Communications (ATC), 2023
@inproceedings{haque2023smartphone,title={Smartphone Based BP Level Monitoring System Using DNN Model},author={Haque, Md Rezwanul and Al Noman, Abdullah and Haque, Emranul and Ahmed, Feroz and others},booktitle={2023 International Conference on Advanced Technologies for Communications (ATC)},pages={12--18},year={2023},organization={IEEE},}
ICDAR 2023
Badlad: A large multi-domain bengali document layout analysis dataset
Md Istiak Hossain Shihab, Md Rakibul Hasan, Mahfuzur Rahman Emon, and 10 more authors
In International Conference on Document Analysis and Recognition, 2023
@inproceedings{shihab2023badlad,title={Badlad: A large multi-domain bengali document layout analysis dataset},author={Shihab, Md Istiak Hossain and Hasan, Md Rakibul and Emon, Mahfuzur Rahman and Hossen, Syed Mobassir and Ansary, Md Nazmuddoha and Ahmed, Intesur and Rakib, Fazle Rabbi and Dhruvo, Shahriar Elahi and Dip, Souhardya Saha and Pavel, Akib Hasan and Meghla, Marsia Haque and Haque, Md Rezwanul and others},booktitle={International Conference on Document Analysis and Recognition},pages={326--341},year={2023},organization={Springer},}
2021
IEEE Access
Corrections to" A Novel Technique for Non-Invasive Measurement of Human Blood Component Levels From Fingertip Video Using DNN Based Models".
Md Rezwanul Haque, SM Taslim Uddin Raju, MD Asaf-Uddowla Golap, and 1 more author
@article{haque2021corrections,title={Corrections to" A Novel Technique for Non-Invasive Measurement of Human Blood Component Levels From Fingertip Video Using DNN Based Models".},author={Haque, Md Rezwanul and Raju, SM Taslim Uddin and Golap, MD Asaf-Uddowla and Hashem, MMA},journal={IEEE Access},volume={9},pages={84178--84179},year={2021},}
Springer
Prediction of cervical cancer from behavior risk using machine learning techniques
Laboni Akter, Md Milon Islam, Mabrook S Al-Rakhami, and 1 more author
@article{akter2021prediction,title={Prediction of cervical cancer from behavior risk using machine learning techniques},author={Akter, Laboni and Islam, Md Milon and Al-Rakhami, Mabrook S and Haque, Md Rezwanul},journal={SN Computer Science},volume={2},number={3},pages={177},year={2021},publisher={Springer},}
Elsevier
Hemoglobin and glucose level estimation from PPG characteristics features of fingertip video using MGGP-based model
Md Asaf-uddowla Golap, SM Taslim Uddin Raju, Md Rezwanul Haque, and 1 more author
@article{golap2021hemoglobin,title={Hemoglobin and glucose level estimation from PPG characteristics features of fingertip video using MGGP-based model},author={Golap, Md Asaf-uddowla and Raju, SM Taslim Uddin and Haque, Md Rezwanul and Hashem, MMA},journal={Biomedical Signal Processing and Control},volume={67},pages={102478},year={2021},publisher={Elsevier},}
Springer
Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic
Shah Muhammad Azmat Ullah, Md Milon Islam, Saifuddin Mahmud, and 3 more authors
@article{ullah2021scalable,title={Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic},author={Ullah, Shah Muhammad Azmat and Islam, Md Milon and Mahmud, Saifuddin and Nooruddin, Sheikh and Raju, SM Taslim Uddin and Haque, Md Rezwanul},journal={SN Computer Science},volume={2},pages={1--8},year={2021},publisher={Springer},}
IEEE Access
A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models
Md Rezwanul Haque, SM Taslim Uddin Raju, Md Asaf-Uddowla Golap, and 1 more author
Blood components such as hemoglobin, glucose, and creatinine are essential for monitoring one’s health condition. The current blood component measurement approaches still depend on invasive techniques that are painful and uncomfortable for patients. To facilitate measurement at home, we proposed a novel non-invasive technique to measure blood hemoglobin, glucose, and creatinine levels based on Photoplethysmography (PPG) signals using Deep Neural Networks (DNN). Fingertip videos from 93 subjects have been collected using a smartphone. The PPG signal is generated from each video, and 46 characteristic features are then extracted from the PPG signal, its derivatives (1st and 2nd), and from Fourier analysis. Additionally, age and gender are also included as features due to their significant effects on hemoglobin, glucose, and creatinine. A correlation-based feature selection (CFS) using genetic algorithms (GA) has been used to select the optimal features to avoid redundancy and overfitting. Finally, DNN-based models have been developed to estimate the blood Hemoglobin (Hb), Glucose (Gl), and Creatinine (Cr) levels from the selected features. The approach provides the best-estimated accuracy of R² = 0.922 for Hb, R² = 0.902 for Gl, and R² = 0.969 for Cr. Experimental results show that the proposed method is a suitable technique to be used clinically to measure human blood component levels without taking blood samples. This paper also reveals that smartphone-based PPG signals have great potential to measure different blood components.
@article{haque2021novel,title={A novel technique for non-invasive measurement of human blood component levels from fingertip video using DNN based models},author={Haque, Md Rezwanul and Raju, SM Taslim Uddin and Golap, Md Asaf-Uddowla and Hashem, MMA},journal={IEEE Access},volume={9},pages={19025--19042},year={2021},publisher={IEEE},}
2020
Springer
Deep learning applications to combat novel coronavirus (COVID-19) pandemic
Amanullah Asraf, Md Zabirul Islam, Md Rezwanul Haque, and 1 more author
@article{islam2020breast,title={Breast cancer prediction: a comparative study using machine learning techniques},author={Islam, Md Milon and Haque, Md Rezwanul and Iqbal, Hasib and Hasan, Md Munirul and Hasan, Mahmudul and Kabir, Muhammad Nomani},journal={SN Computer Science},volume={1},pages={1--14},year={2020},publisher={Springer},}
2019
MECS Press
A computer vision based lane detection approach
Md Rezwanul Haque, Md Milon Islam, Kazi Saeed Alam, and 2 more authors
International Journal of Image, Graphics and Signal Processing, 2019
@article{haque2019computer,title={A computer vision based lane detection approach},author={Haque, Md Rezwanul and Islam, Md Milon and Alam, Kazi Saeed and Iqbal, Hasib and Shaik, Md Ebrahim},journal={International Journal of Image, Graphics and Signal Processing},volume={10},number={3},pages={27},year={2019},publisher={Modern Education and Computer Science Press},}
2018
IC4ME2 2018
Performance evaluation of random forests and artificial neural networks for the classification of liver disorder
Md Rezwanul Haque, Md Milon Islam, Hasib Iqbal, and 2 more authors
In 2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2), 2018
@inproceedings{haque2018performance,title={Performance evaluation of random forests and artificial neural networks for the classification of liver disorder},author={Haque, Md Rezwanul and Islam, Md Milon and Iqbal, Hasib and Reza, Md Sumon and Hasan, Md Kamrul},booktitle={2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2)},pages={1--5},year={2018},organization={IEEE},}
2017
R10-HTC 2017
Prediction of breast cancer using support vector machine and K-Nearest neighbors
Md Milon Islam, Hasib Iqbal, Md Rezwanul Haque, and 1 more author
In 2017 IEEE region 10 humanitarian technology conference (R10-HTC), 2017
@inproceedings{islam2017prediction,title={Prediction of breast cancer using support vector machine and K-Nearest neighbors},author={Islam, Md Milon and Iqbal, Hasib and Haque, Md Rezwanul and Hasan, Md Kamrul},booktitle={2017 IEEE region 10 humanitarian technology conference (R10-HTC)},pages={226--229},year={2017},organization={IEEE},}