Scalable Real-Time Sentiment Analysis on Massive Social Media Streams Using Parallel and Distributed Computing

Hafsa Maryam; Ahmad Farid

doi:10.22161/eec.106.1

Indexing and Abstracting

Scalable Real-Time Sentiment Analysis on Massive Social Media Streams Using Parallel and Distributed Computing

Hafsa Maryam , Ahmad Farid

International Journal of Electrical, Electronics and Computers (IJECC), Vol-10,Issue-6, November - December 2025, Pages 1-6, 10.22161/eec.106.1

Download | Downloads : 7 | Total View : 1236

Article Info: Received: 06 Oct 2025; Accepted: 08 Nov 2025; Date of Publication: 18 Nov 2025

Abstract:

The rapid growth of social media streams has intensified the need for scalable, low-latency sentiment analysis pipelines that can operate under high-volume, real-time constraints. This paper proposes a distributed framework built on Apache Spark for massive parallel processing of text streams and seamless integration of a fine-tuned large language model (LLM), Grok-4, for sentiment classification. The system employs micro-batch streaming, distributed tokenization, and GPU-accelerated model serving to achieve real-time inference at scale. Experiments conducted on a 10-node cluster using a synthetic dataset of 10,000 tweets, extended to 1.2 million streaming events, demonstrate substantial performance gains. Our approach achieves a 5.4× improvement in distributed training throughput and a 4.7× reduction in inference time compared with single-node baselines. The streaming pipeline sustains 2,100 tweets per second with an end-to-end median latency of 120 ms, satisfying real-time constraints for high-volume applications. The fine-tuned Grok-4 model attains 92.8% sentiment classification accuracy, outperforming conventional machine learning baselines by 8.5% absolute, while preserving high throughput. Comparative analysis shows the framework scales nearly linearly with increasing cluster size and maintains robustness against executor failures and network-induced delays. The results highlight the effectiveness of combining parallel and distributed computing with advanced LLM-based natural language understanding for high-frequency social data analytics. The proposed architecture provides a practical foundation for scalable deployments in domains such as public health surveillance, financial market monitoring, and real-time situational awareness systems.

Keywords:

Big data, distributed stream processing, real-time sentiment analysis, machine learning, parallel and distributed computing, Apache spark, sentiment analysis, large language models

References:

[1] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in IR, 2:1–135, 2008.
[2] B. Liu. Sentiment Analysis and Opinion Mining. Morgan & Claypool, 2012.
[3] M. Zaharia et al. Apache spark: A unified engine for big data processing. Commu- nications of the ACM, 59:56–65, 2016.
[4] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clus- ters. Communications of the ACM, 51: 107–113, 2008.
[5] P. Carbone et al. Apache flink: Stream and batch processing. IEEE Data Eng. Bull., 38:28–38, 2015.
[6] J. Kreps, N. Narkhede, and J. Rao. Kafka: A distributed messaging system. Proc. NetDB, 2011.
[7] M. Armbrust et al. Spark sql: Relational data processing in spark. Proc. SIGMOD, pages 1383–1394, 2015.
[8] S. Rosenthal, N. Farra, and P. Nakov. Semeval-2017 task 4: Sentiment analy- sis in twitter. Proc. SemEval, pages 502– 518, 2017.
[9] M. Abadi et al. Tensorflow: Large-scale machine learning. Proc. OSDI, pages 265–283, 2016.
[10] A. Paszke et al. Pytorch: High- performance deep learning. NeurIPS, 32, 2019.
[11] Li Zhang and Hao Wang. A cloud- based distributed approach for social me- dia sentiment analysis. ACM Transac- tions on Data Science, 6:1–25, 2025.
[12] J. Kim and S. Lee. Distributed senti- ment analysis for geo-tagged twitter data. IEEE Transactions on Big Data, 8:1023– 1035, 2022.
[13] S. A. Alsaidi. Sentiment analysis in mod- ern distributed systems: A survey. arXiv preprint arXiv:2503.18260, 2025.
[14] John Snow Labs. Unlocking faster infer- ence at scale with spark nlp, 2023.
[15] R. Smith. Bert-based models for text classification in python, 2023.
[16] J. Devlin, M. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT,pages4171–4186, 2019.
[17] A. Vaswani et al. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
[18] D. Gupta and R. Singh. Distributed real- time sentiment analysis for big data so- cial streams. ResearchGate Preprint, 2025.
[19] A. Bifet and E. Frank. Distributed real- time sentiment analysis for big data so- cial streams. IEEE Intelligent Systems, 29:72–77, 2014.
[20] Y. Li, X. Wang, and J. Zhang. Improv- ing sentiment analysis using ensemble deep learning. PLoS ONE, 16:e0247890, 2021.
[21] Georgia Southern University. Cloud- based ml and sentiment analysis, 2023.
[22] M. Taylor and J. Francis. Ml-based op- timization for sentiment analysis. Intl. J. Geographical Information Science, 39: 45–68, 2025.
[23] R. Johnson and M. Zhang. Deep learning in social media sentiment. Computers in Human Behavior, 130:107189, 2022.
[24] S. Patel and K. Mehta. Frameworks for real-time sentiment analysis. Frontiers in Public Health, 11:1234567, 2023.
[25] P. Kumar and V. Sharma. Lstm vs mlp in sentiment classification. European J. Electrical and Computer Eng., 7:45–52, 2023.
[26] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8): 1735–1780, 1997.
[27] K. L. Wong and T. H. Lee. Svm-based sentiment analysis on hotel reviews. Ap- plied Sciences, 15:890–905, 2025.
[28] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20:273– 297, 1995.
[29] A. Singh and M. Patel. Bi-lstm for drug review sentiment analysis. Intl. J. Inno- vative Research in Science, 14:112–125, 2025.
[30] J. Ali and M. Khan. Extra trees for covid- 19 tweet sentiment. Journal of Medical Systems, 47:1–12, 2023.
[31] L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
[32] T. Brown and L. Davis. Optimized ml models for sentiment tasks. Expert Systems with Applications, 238:121789, 2024.
[33] H. Nguyen and Q. Tran. Cnn and lstm for drug review sentiment. arXiv preprint arXiv:2103.04567, 2021.
[34] F. Pedregosa et al. Scikit-learn: Ma- chine learning in python. JMLR, 12: 2825–2830, 2011.

International Journal Of Electrical, Electronics And Computers(IJEEC)

For Authors

Issues

Downloads

Indexing and Abstracting

Scalable Real-Time Sentiment Analysis on Massive Social Media Streams Using Parallel and Distributed Computing

Abstract:

Keywords:

References: