2023 / Week 39

2023 / Week 39

notion image
  • With Nx Servings, you can easily batch multiple inference requests.
  • This is very helpful because of a lot of reasons. One of the main one being that there’s a lot of overhead in transferring the data from CPU to GPU. CPU & GPU have separate RAM. A variable you store in CPU memory cannot be directly accessed in GPU kernel. We need to copy it there. With batching, you dont have to do the copying again & again!
 
Versova Beach, Mumbai
Versova Beach, Mumbai