• editor.aipublications@gmail.com
  • Track Your Paper
  • Contact Us
  • ISSN: 2456-2319

International Journal Of Electrical, Electronics And Computers(IJEEC)

Scalable AI Model Deployment and Management on Serverless Cloud Architecture

Prudhvi Naayini


International Journal of Electrical, Electronics and Computers (IJECC), Vol-9,Issue-1, January - February 2024, Pages 1-12, 10.22161/eec.91.1

Download | Downloads : 7 | Total View : 1563

Article Info: Received: 18 Dec 2023; Accepted: 21 Jan 2024; Date of Publication: 30 Jan 2024

Share

Scalable deployment of deep learning models in the cloud faces challenges in balancing performance, cost, and manageability. This paper investigates serverless cloud architecture for AI model inference, focusing on AWS technologies such as AWS Lambda, API Gateway, and Kubernetes-based serverless extensions (e.g., AWS EKS with Knative). We first outline the limitations of traditional, server-based model hosting to motivate the serverless approach. Then, we present novel strategies for scalable model serving: an adaptive resource provisioning algorithm, intelligent model caching, and efficient model sharding. Our methodology includes pseudo-code and architectural diagrams that illustrate these techniques on AWS. Analytical modeling and simulation using AWS performance and cost metrics validate that the proposed system can automatically scale to thousands of concurrent requests while maintaining low latency. In addition, an in-depth threat model is developed to address security and privacy concerns. Finally, real-world case studies (e.g., real-time video analytics, recommendation engines, and fraud detection) are described to demonstrate the practical viability of the approach, and a detailed cost analysis is presented. Future research directions include advanced scheduling algorithms and serverless training frameworks.

Serverless computing, AWS Lambda, Knative, deep learning inference, scalability, model serving, cloud architecture, security, cost analysis, Kubernetes.

[1] S. Venkataraman, “Ai goes serverless: Are systems ready?” ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/
[2] J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635–644. [Online]. Available: https://arxiv.org/abs/2309.00558
[3] AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
[4] M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online]. Available: https://ieeexplore.ieee.org/document/9546452
[5] W.-Q. Ren, Y.-B. Qu, C. Dong, Y.-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, “A survey on collaborative dnn inference for edge intelligence,” Machine Intelligence Research, vol. 20, no. 3, pp. 370–395, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s11633-022-1391-7
[6] Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/
[7] K. Kojs, “A survey of serverless machine learning model inference,”arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587
[8] Y. Yang, L. Zhao, Y. Li, H. Zhang, J. Li, M. Zhao, X. Chen, and K. Li, “Infless: a native serverless system for low-latency, high-throughput inference,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2022, p. 768–781. [Online]. Available: https://doi.org/10.1145/3503222.3507709
[9] Y. Yu, J. Liu, H. Liu, B. Yu, and Y. Wang, “Faaswap: Cost-effective pre-warming of serverless functions using learning-based scheduling,” 2023. [Online]. Available: https://arxiv.org/abs/2306.03622
[10] C. McKinnel, “Massively parallel machine learn- ing inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/ massively-parallel-machine-learning-inference-using-aws-lambda.html
[11] A. Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2023, pp. 1–6. [Online]. Available: https://repository.ubn.ru.nl/bitstream/handle/2066/ 308588/308588.pdf?sequence=1
[12] M. Li, X. Zhang, J. Guo, and F. Li, “Cloud–edge collaborative inference with network pruning,” Electronics, vol. 12, no. 17, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/17/3598
[13] D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: generalized pipeline parallelism for dnn training,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, 2019, p. 1–15. [Online]. Available: https://doi.org/10.1145/3341301.3359646
[14] L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2,
pp. 595–608, 2021.
[15] M. Golec, S. S. Gill, F. Cuadrado, A. K. Parlikad, M. Xu, H. Wu, and S. Uhlig, “Atom: Ai-powered sustainable resource management for serverless edge computing environments,” IEEE Transactions on Sustainable Computing, vol. 9, no. 6, pp. 817–829, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10376318
[16] R. Rajkumar, “Designing a serverless recommender in aws,” Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium. com/designing-a-serverless-recommender-in-aws-fcf2de9a807e
[17] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460
[18] AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/
[19] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8360337
[20] S. S. Katreddy, “Event-driven cloud architectures for real-time data pro- cessing,” Economic Sciences, vol. 13, no. 1, 2017. [Online]. Available: https://economic-sciences.com/index.php/journal/article/view/176
[21] A. Agache, M. Brooker, and et al., “Firecracker: Lightweight virtualization for serverless applications,” in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 419–434. [Online]. Available: https://www.usenix.org/conference/ nsdi20/presentation/agache
[22] NVIDIA, “Tensorrt: High-performance deep learning inference opti- mizer and runtime,” https://developer.nvidia.com/tensorrt, 2023.
[23] P. Kairouz, B. McMahan, B. Avent, and et al., “Advances and open problems in federated learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1-2, pp. 1–210, 2021. [Online]. Available: https://doi.org/10.1561/2200000083
[24] M. Li, D. G. Andersen, and A. J. Smola, “Scaling distributed machine learning with the parameter server,” in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 583–598. [Online]. Available: https://www.usenix.org/conference/ osdi14/technical-sessions/presentation/li mu