Do you have any tips for preparing for a system design (SD) interview focused on machine learning?
I’ve come across some helpful SD interview mocks on YouTube, but they aren’t ML-oriented. The ones that do focus on ML mostly discuss choosing algorithms rather than designing the system around them.
I’m looking for suggestions, including a set of standard questions that could be asked to help choose aspects like the appropriate database, scalability considerations, and whether to use microservices. For example, starting with:
- Where is the data located and how accessible is it? This would help determine whether we can query a database directly or if we need to enable an API for data uploads.
- Who will use the output and how? This informs whether the model needs to handle batch or stream predictions (and thus efficiency considerations), and if output storage or an API for consumption is necessary.
- What types of data are available? This guides decisions on database design (structured data), blob storage (unstructured data), and potential caching for low-latency requirements.
- How many users/predictions are expected? This informs whether a microservices architecture is necessary or if a monolithic approach could suffice.
- How much data needs to be processed? This helps determine database type and whether sharding might be required for scaling.
- Is there access to a training set? This depends on the problem nature but is crucial to consider.
- What metrics are best to evaluate the model’s performance?
- How important is model explainability? This discussion informs decisions regarding the trade-offs between black box and traditional approaches.
If you have additional insights or alternative approaches to preparing for SD interviews, I’d love to hear them!