Role Overview
We are seeking a highly skilled Senior Data Engineer with strong expertise in dimensional data modeling
and modern data architecture patterns to design, build, and maintain scalable analytics-ready data
platforms on AWS (and GCP). The ideal candidate has hands-on experience designing star and
snowflake schemas, building curated data layers for BI and analytics, and implementing robust ETL/ELT
pipelines that transform raw data into trusted, modelled datasets.
Responsibilities
- Design, build, and maintain scalable ETL/ELT pipelines to ingest, transform, and curate data
from multiple source systems into AWS data platforms.
- Design and implement dimensional data models (star and snowflake schemas) optimized for
analytics and BI.
- Build and maintain analytics-ready data warehouses and data lakes using Amazon S3 and
Amazon Redshift.
- Implement CDC (Change Data Capture) patterns to support incremental and near-real-time data
processing.
- Develop and manage data transformations using dbt, AWS Glue (PySpark), and SQL-based
workflows.
- Orchestrate data pipelines using Apache Airflow (MWAA), Step Functions, and Lambda.
- Optimize production-grade SQL for performance, scalability, and cost efficiency in Redshift.
- Translate business requirements into technical data solutions, collaborating closely with
analysts and stakeholders.
- Monitor, troubleshoot, and resolve data pipeline and performance issues across AWS services.
- Ensure data quality, reliability, security, and governance across all data pipelines.
- Support streaming or event-driven use cases using Kinesis or Kafka (Amazon MSK) where
applicable.
Requirements
- Bachelor’s degree in Computer Science, Data Engineering, or a related field.
- 4+ years of experience in Data Engineering or related roles.
- Strong experience with dimensional modeling, including star and snowflake schemas.
- Hands-on experience with Amazon Redshift, including schema design, performance tuning, and
query optimization.
- Experience building and maintaining data warehouses and data lakes on AWS.
- Experience with CDC patterns and incremental data processing.
- Proficiency in SQL (writing complex, production-grade queries).
- Proficiency in Python and/or PySpark.