Hello everyone! I have a question about the archi...
# 06-technical-discussion
w
Hello everyone! I have a question about the architecture of our AI system. We use a python server in production to handle our core AI logic (together with Azure ML), because of all the data science oriented libraries. We use a sentence transformer package, which has pytorch as a dependency (also nvidia). After checking, these two packages have a combined size of something above 3 Gb, which is a lot - it’s very slow to upload on our container registry + it’s also slow to launch when scaling up. Because we run in kubernetes, we want to be able to scale and downscale fairly fast. I was thinking about extracting the functionality using the huge libraries to a separate microservice so it would scale only if it really had to, but this could potentially create bottlenecks + it would also increase the complexity of the entire system. Maybe it’s better to just tank the huge size when scaling the pods?. Any advice on how to deal with issues like these or what is the best AI System architecture for these kinds of deployments ?
v
Are these services in the request path?
w
yes, both services could be deployed at same request path. Why?
v
I was trying to see if you can move some of the thigns off the request path. Btw, may I ask why the pytorch dependency is so high?
In request path, you should only be running inference right?