You will:
- Design and implement data processing systems, including data warehouses, data lakes, and real-time processing platforms;
- Configure and manage technologies such as Hadoop, Spark, and Kafka, as well as cloud environments across Azure, AWS, and GCP;
- Build and maintain automated ETL/ELT processes for data collection, cleansing, and transformation;
- Ensure seamless, reliable data flow between diverse systems and sources, with a strong focus on data quality and consistency;
- Optimize data systems for high-volume, high-velocity workloads;
- Design and implement distributed computing solutions that maintain performance at scale, proactively identifying and resolving bottlenecks.
Requirements:
- Hands-on experience with PySpark for large-scale data processing;
- Strong knowledge of Apache Kafka for real-time data streaming;
- Cloud platform experience across Azure, AWS, and/or GCP;
- Proven ability to design and optimize ETL/ELT pipelines;
- Familiarity with Hadoop ecosystems and distributed computing principles;
- Solid understanding of data warehouse and data lake architectures;
- Nice to haves: experience with infrastructure-as-code tools (Terraform, Bicep), knowledge of data governance and security best practices, exposure to orchestration tools such as Apache Airflow or Azure Data Factory.
Benefits:
- Flexible Working Hours – Manage your workday with flexibility and the option to work from home when needed, while enjoying our city-centre office as a convenient, collaborative workspace;
- Culture & Connection – From team bonding activities like Christmas parties and summer events to spontaneous celebrations, monthly breakfasts, or team lunches, we celebrate wins—big or small—together;
- Competitive salary.
🇧🇷 Essa vaga exige inglês. Você está pronto?
A DevSpeak Academy prepara desenvolvedores brasileiros para conquistar vagas internacionais. Domine o inglês técnico com professores que entendem o mundo dev.
Conheça a DevSpeak Academy