Hello everyone I have a question about the architecture of o Cerebral Valley #06-technical-discussion

Hello everyone! I have a question about the archi...

William Brach

04/27/2023, 3:04 PM

Hello everyone! I have a question about the architecture of our AI system. We use a python server in production to handle our core AI logic (together with Azure ML), because of all the data science oriented libraries. We use a sentence transformer package, which has pytorch as a dependency (also nvidia). After checking, these two packages have a combined size of something above 3 Gb, which is a lot - it’s very slow to upload on our container registry + it’s also slow to launch when scaling up. Because we run in kubernetes, we want to be able to scale and downscale fairly fast. I was thinking about extracting the functionality using the huge libraries to a separate microservice so it would scale only if it really had to, but this could potentially create bottlenecks + it would also increase the complexity of the entire system. Maybe it’s better to just tank the huge size when scaling the pods?. Any advice on how to deal with issues like these or what is the best AI System architecture for these kinds of deployments ?

Vishwanath Seshagiri

04/27/2023, 3:25 PM

Are these services in the request path?

William Brach

04/27/2023, 3:42 PM

yes, both services could be deployed at same request path. Why?

Vishwanath Seshagiri

04/28/2023, 3:43 PM

I was trying to see if you can move some of the thigns off the request path. Btw, may I ask why the pytorch dependency is so high?

Vishwanath Seshagiri

04/28/2023, 3:43 PM

In request path, you should only be running inference right?

2 Views

Open in Slack

Previous Next