Boston, MA · Job # 8559BK
You will work with a Vice President of Engineering who values critical thinking/ the ability to challenge ideas/ and a fine mentor and teacher.
You will be part of a cross-functional, autonomous data team, collaborating closely with analysts, data scientists, and engineers. You will be deeply involved in building out a data lake.
As part of that effort, you will leverage your mastery of Spark and ETL to transition workloads from a legacy relational data warehouse into modern, open-source data processing platforms.
Who you are today:
You are an excellent problem-solving skills, strong attention to detail, and the technical know-how to independently build solutions from start to finish.
You are a true data expert, someone who is comfortable working with and shaping datasets of varying latencies, size, and format.
Would like you to have:
- 5+ years of hands-on industry experience with Python and AWS
- 2+ years of proven ability developing ETL in Spark (PySpark preferred)
- Advanced SQL (ANSI SQL or Transact-SQL)
- Working knowledge of Data Lake patterns: partitioning, multi-step transformations, data cataloging
- Working knowledge of self-describing, compressed data file formats: Parquet, Avro
- Working knowledge of event streaming platforms: Kinesis, Kafka, Flink
- Working knowledge of Domain Driven Design (DDD) and event storming
- Experience with AWS data processing services: EMR, Athena, Redshift
- Experience with AWS serverless infrastructure: API Gateway, Lambda, DynamoDB, S3
- Experience with NoSQL/non-relational databases, especially document stores
- Experience building data models intended for data visualization solutions
- Demonstrable experience implementing business logic into well-structured data models that have been successfully applied to BI
Desired skills include:
- Experience coding in Java or Scala
- Docker or other containerization tooling
- CI/CD exposure using git based deployment automation.
- Infrastructure-as-code: Terraform, CloudFormation
- Experience working on a SCRUM Team
- Experience using global data catalogs for either end user reference or data automation
- Experience with relational modeling, star schemas and Kimball Data Warehousing
Apply For this Position