Hafsa Maryam , Ahmad Farid
International Journal of Electrical, Electronics and Computers (IJECC), Vol-10,Issue-6, November - December 2025, Pages 1-6, 10.22161/eec.106.1
Download | Downloads : 4 | Total View : 679
Article Info: Received: 06 Oct 2025; Accepted: 08 Nov 2025; Date of Publication: 18 Nov 2025
The rapid growth of social media streams has intensified the need for scalable, low-latency sentiment analysis pipelines that can operate under high-volume, real-time constraints. This paper proposes a distributed framework built on Apache Spark for massive parallel processing of text streams and seamless integration of a fine-tuned large language model (LLM), Grok-4, for sentiment classification. The system employs micro-batch streaming, distributed tokenization, and GPU-accelerated model serving to achieve real-time inference at scale. Experiments conducted on a 10-node cluster using a synthetic dataset of 10,000 tweets, extended to 1.2 million streaming events, demonstrate substantial performance gains. Our approach achieves a 5.4× improvement in distributed training throughput and a 4.7× reduction in inference time compared with single-node baselines. The streaming pipeline sustains 2,100 tweets per second with an end-to-end median latency of 120 ms, satisfying real-time constraints for high-volume applications. The fine-tuned Grok-4 model attains 92.8% sentiment classification accuracy, outperforming conventional machine learning baselines by 8.5% absolute, while preserving high throughput. Comparative analysis shows the framework scales nearly linearly with increasing cluster size and maintains robustness against executor failures and network-induced delays. The results highlight the effectiveness of combining parallel and distributed computing with advanced LLM-based natural language understanding for high-frequency social data analytics. The proposed architecture provides a practical foundation for scalable deployments in domains such as public health surveillance, financial market monitoring, and real-time situational awareness systems.