Accountabilities:
- Design, develop, and optimize advanced model serving architectures focused on high throughput, low latency, and efficient memory utilization.
- Build scalable inference pipelines capable of running across cloud, edge, and resource-constrained environments.
- Conduct controlled inference experiments in simulated and production environments to evaluate system performance and reliability.
- Monitor and analyze key performance metrics such as latency, throughput, memory consumption, token response time, and error rates.
- Develop and maintain benchmarking methodologies and performance validation frameworks for AI inference systems.
- Identify bottlenecks in serving pipelines, including batch processing inefficiencies, network overhead, and excessive memory usage.
- Optimize inference frameworks and deployment strategies for scalability, resilience, and operational efficiency.
- Collaborate with cross-functional engineering and research teams to integrate optimized inference solutions into production environments.
- Create high-quality testing datasets and deployment scenarios that reflect real-world operational challenges.
- Continuously improve inference infrastructure through experimentation, iteration, and adoption of cutting-edge AI serving techniques.
Requirements:
- Strong experience in AI/ML engineering with a focus on inference optimization, model serving, or AI systems performance.
- Deep understanding of model deployment architectures and inference frameworks for large-scale AI applications.
- Expertise in optimizing latency, throughput, scalability, and memory footprint in production AI systems.
- Hands-on experience with performance monitoring, benchmarking, profiling, and bottleneck analysis.
- Strong knowledge of advanced AI model architectures, including multi-modal systems and resource-efficient models.
- Experience building and deploying AI systems across cloud, edge, or low-resource hardware environments.
- Proficiency in programming languages commonly used in AI infrastructure and optimization workflows.
- Strong analytical and problem-solving abilities with a research-oriented mindset.
- Ability to work independently in a highly distributed and fast-moving global environment.
- Excellent English communication skills and ability to collaborate across technical and non-technical teams.
- Passion for innovation, experimentation, and scalable AI infrastructure development.
Benefits:
- Fully remote global work environment with flexible location options.
- Opportunity to work on cutting-edge AI, blockchain, and fintech technologies.
- Collaborative international team of highly skilled engineers and researchers.
- Exposure to innovative projects involving AI infrastructure, digital finance, and decentralized technologies.
- High-impact role with significant technical ownership and influence on product direction.
- Fast-paced and innovation-driven culture focused on experimentation and growth.
- Opportunities for continuous learning and professional development.
- Work environment that values autonomy, creativity, and technical excellence.
- Participation in projects with global reach and real-world scalability challenges.
🇧🇷 Essa vaga exige inglês. Você está pronto?
A DevSpeak Academy prepara desenvolvedores brasileiros para conquistar vagas internacionais. Domine o inglês técnico com professores que entendem o mundo dev.
Conheça a DevSpeak AcademyCandidaturas encerradasVer outras vagas
