[1] S. Venkataraman, “Ai goes serverless: Are systems ready?” ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/
[2] J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635–644. [Online]. Available: https://arxiv.org/abs/2309.00558
[3] AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
[4] M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online]. Available: https://ieeexplore.ieee.org/document/9546452
[5] W.-Q. Ren, Y.-B. Qu, C. Dong, Y.-Q. Jing, H. Sun, Q.-H. Wu, and S. Guo, “A survey on collaborative dnn inference for edge intelligence,” Machine Intelligence Research, vol. 20, no. 3, pp. 370–395, 2023. [Online]. Available: https://link.springer.com/article/10.1007/s11633-022-1391-7
[6] Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/
[7] K. Kojs, “A survey of serverless machine learning model inference,”arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587
[8] Y. Yang, L. Zhao, Y. Li, H. Zhang, J. Li, M. Zhao, X. Chen, and K. Li, “Infless: a native serverless system for low-latency, high-throughput inference,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2022, p. 768–781. [Online]. Available: https://doi.org/10.1145/3503222.3507709
[9] Y. Yu, J. Liu, H. Liu, B. Yu, and Y. Wang, “Faaswap: Cost-effective pre-warming of serverless functions using learning-based scheduling,” 2023. [Online]. Available: https://arxiv.org/abs/2306.03622
[10] C. McKinnel, “Massively parallel machine learn- ing inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/ massively-parallel-machine-learning-inference-using-aws-lambda.html
[11] A. Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2023, pp. 1–6. [Online]. Available: https://repository.ubn.ru.nl/bitstream/handle/2066/ 308588/308588.pdf?sequence=1
[12] M. Li, X. Zhang, J. Guo, and F. Li, “Cloud–edge collaborative inference with network pruning,” Electronics, vol. 12, no. 17, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/17/3598
[13] D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: generalized pipeline parallelism for dnn training,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, 2019, p. 1–15. [Online]. Available: https://doi.org/10.1145/3341301.3359646
[14] L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2,
pp. 595–608, 2021.
[15] M. Golec, S. S. Gill, F. Cuadrado, A. K. Parlikad, M. Xu, H. Wu, and S. Uhlig, “Atom: Ai-powered sustainable resource management for serverless edge computing environments,” IEEE Transactions on Sustainable Computing, vol. 9, no. 6, pp. 817–829, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10376318
[16] R. Rajkumar, “Designing a serverless recommender in aws,” Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium. com/designing-a-serverless-recommender-in-aws-fcf2de9a807e
[17] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460
[18] AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/
[19] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8360337
[20] S. S. Katreddy, “Event-driven cloud architectures for real-time data pro- cessing,” Economic Sciences, vol. 13, no. 1, 2017. [Online]. Available: https://economic-sciences.com/index.php/journal/article/view/176
[21] A. Agache, M. Brooker, and et al., “Firecracker: Lightweight virtualization for serverless applications,” in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 419–434. [Online]. Available: https://www.usenix.org/conference/ nsdi20/presentation/agache
[22] NVIDIA, “Tensorrt: High-performance deep learning inference opti- mizer and runtime,” https://developer.nvidia.com/tensorrt, 2023.
[23] P. Kairouz, B. McMahan, B. Avent, and et al., “Advances and open problems in federated learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1-2, pp. 1–210, 2021. [Online]. Available: https://doi.org/10.1561/2200000083
[24] M. Li, D. G. Andersen, and A. J. Smola, “Scaling distributed machine learning with the parameter server,” in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pp. 583–598. [Online]. Available: https://www.usenix.org/conference/ osdi14/technical-sessions/presentation/li mu