Any tips to prepare for a ML system design interview?

Do you have any tips for preparing for a system design (SD) interview focused on machine learning?

I’ve come across some helpful SD interview mocks on YouTube, but they aren’t ML-oriented. The ones that do focus on ML mostly discuss choosing algorithms rather than designing the system around them.

I’m looking for suggestions, including a set of standard questions that could be asked to help choose aspects like the appropriate database, scalability considerations, and whether to use microservices. For example, starting with:

  • Where is the data located and how accessible is it? This would help determine whether we can query a database directly or if we need to enable an API for data uploads.
  • Who will use the output and how? This informs whether the model needs to handle batch or stream predictions (and thus efficiency considerations), and if output storage or an API for consumption is necessary.
  • What types of data are available? This guides decisions on database design (structured data), blob storage (unstructured data), and potential caching for low-latency requirements.
  • How many users/predictions are expected? This informs whether a microservices architecture is necessary or if a monolithic approach could suffice.
  • How much data needs to be processed? This helps determine database type and whether sharding might be required for scaling.
  • Is there access to a training set? This depends on the problem nature but is crucial to consider.
  • What metrics are best to evaluate the model’s performance?
  • How important is model explainability? This discussion informs decisions regarding the trade-offs between black box and traditional approaches.

If you have additional insights or alternative approaches to preparing for SD interviews, I’d love to hear them!

hallo
enhance your design abilities by tackling example machine learning system design problems available online or through resources like https://www.tryexponent.com/courses/ml-system-design.,

There should be a quick route for people who have a lot of knowledge. Don’t need 5 rounds of expert interviews to decide if you want someone. If they have papers, code, and good references, you should be able to tell right away. Keep those for people who have never been caught before.

Thank you for sharing, this looks fantastic! I’ll look at it.

1 Like

Consider starting with the question: “Where is the data?” Who will use the output, and how? What data do we have available? How many users and predictions should we serve? How much info do we have? Do we have access to a training set? What are the best measures for evaluating the model?