Senior AI Engineer @ Jeeves — Vaga Remota

What You'll Do:

Design, build, and maintain production-grade LLM integration pipelines — including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.
Develop and operate AI features within Jeeves's core financial products: spend categorization, document extraction, anomaly detection, financial Q&A, and automated reconciliation.
Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.
Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.
Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.

Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.
Build document ingestion and chunking pipelines for Jeeves's financial data — processing invoices, receipts, policy documents, and transaction records.
Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.

Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.
Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.
Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.
Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.

Integrate AI services cleanly with Jeeves's backend microservices — designing clear API contracts, circuit breakers, and graceful degradation patterns.
Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.
Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.
Build human-in-the-loop review workflows for AI decisions that require oversight — particularly for high-value financial actions.

Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.
Contribute to a culture of quality by writing design docs, reviewing peers' AI system designs, and sharing learnings openly.
Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.

Bachelor's degree in Computer Science, Engineering, or a related field — or equivalent practical experience.
5+ years of professional software engineering experience, with at least 3 years focused on AI/ML systems in production.
Hands-on experience building and deploying LLM-powered applications using APIs such as OpenAI, Anthropic, or Cohere in a production environment.
Experience designing and operating RAG pipelines, including chunking strategies, embedding models, and vector database integration (Pinecone, Weaviate, pgvector, or similar).
Strong proficiency in Python for AI/ML workloads; familiarity with at least one AI orchestration framework (LangChain, LlamaIndex, or equivalent).
Experience with ML model serving infrastructure: REST or gRPC inference endpoints, input/output validation, latency budgeting, and monitoring.
Solid backend engineering fundamentals: REST APIs, relational databases (PostgreSQL preferred), async patterns, and cloud infrastructure (AWS, GCP, or Azure).
Experience with observability tooling: structured logging, distributed tracing, and building dashboards for AI system health.

Experience in fintech, financial services, or any regulated industry where AI reliability and auditability are critical.
Familiarity with prompt evaluation frameworks, A/B testing AI outputs, and tracking model performance degradation in production.
Experience with ML lifecycle management tools: MLflow, Weights & Biases, Vertex AI, or SageMaker.
Knowledge of real-time data streaming (Kafka, Kinesis) for event-driven AI pipelines.
Contributions to open-source AI tooling, published technical writing, or talks at AI/ML conferences.
Prior startup or scale-up experience — comfortable with ambiguity and building foundational systems from scratch.