Specialist, GSF DnA Data Engineer

EyeBio

EyeBio

Software Engineering, Data Science

Hyderabad, Telangana, India

Posted on May 20, 2026

Job Description

The Opportunity

  • Based in Hyderabad, join a global healthcare biopharma company and be part of a 130- year legacy of success backed by ethical integrity, forward momentum, and an inspiring mission to achieve new milestones in global healthcare.

  • Be part of an organisation driven by digital technology and data-backed approaches that support a diversified portfolio of prescription medicines, vaccines, and animal health products.

  • Drive innovation and execution excellence. Be a part of a team with passion for using data, analytics, and insights to drive decision-making, and which creates custom software, allowing us to tackle some of the world's greatest health threats.

Our Technology Centers focus on creating a space where teams can come together to deliver business solutions that save and improve lives. An integral part of our company’s IT operating model, Tech Centers are globally distributed locations where each IT division has employees to enable our digital transformation journey and drive business outcomes. These locations, in addition to the other sites, are essential to supporting our business and strategy.

A focused group of leaders in each Tech Center helps to ensure we can manage and improve each location, from investing in growth, success, and well-being of our people, to making sure colleagues from each IT division feel a sense of belonging to managing critical emergencies. And together, we must leverage the strength of our team to collaborate globally to optimize connections and share best practices across the Tech Centers.

Role Overview

We are hiring a hands-on Data Engineer who can design, build, and operate production-grade data platforms and pipelines end to end. You will deliver reliable, governed, secure, and analytics-ready data by implementing modern data warehousing and lakehouse patterns on AWS and Databricks, with strong focus on data quality, dimensional modeling, and scalable ETL/ELT. This role partners closely with analytics, data science, and business stakeholders to translate requirements into robust datasets, while applying engineering best practices such as testing, code reviews, CI/CD, and observability.

What you will do

  • Design, build, and operate batch and streaming data pipelines to ingest data from multiple sources into an AWS data lake / lakehouse and data warehouse.
  • Develop and maintain ETL/ELT transformations using Python, PySpark, and SQL; optimize jobs for performance, cost, and reliability.
  • Partner with Data Analysts, Data Scientists, and business stakeholders to understand use cases and deliver curated, analytics-ready datasets and features.
  • Implement data quality controls (validation rules, reconciliation, anomaly checks), define SLAs/SLOs, and contribute to metadata, lineage, and data catalog practices.
  • Use orchestration and observability to run pipelines reliably (e.g., Databricks Workflows, AWS Step Functions, scheduling, logging, monitoring, alerting).
  • Apply engineering best practices: unit/integration testing, automated data tests, code reviews, and quality gates within CI/CD.
  • Model and publish data for BI/analytics using dimensional modeling (star/snowflake), facts & dimensions, and slowly changing dimensions (SCD).
  • Write and tune advanced SQL for profiling, transformations, and performance troubleshooting across large datasets.
  • Build on AWS using services such as S3, Glue, Lambda, Step Functions, EMR, and CloudWatch; follow security best practices (IAM, encryption, least privilege).
  • Provision and manage cloud resources using Infrastructure as Code (e.g., Terraform) across dev/test/prod environments.
  • Package and deploy workloads using Docker (and where applicable ECS/Fargate); manage dependencies and runtime configurations.
  • Use GitHub for version control (branching strategies, pull requests, code reviews) and set up CI/CD for automated build, test, and deployment.
  • Develop scalable processing on Databricks / Apache Spark using PySpark and lakehouse concepts (e.g., Delta Lake, ACID, schema evolution).
  • Use notebooks (e.g., Jupyter/Databricks) for exploration and PoCs, then productionize solutions with reusable modules, tests, and deployment pipelines.
  • Work in an Agile delivery model (planning, daily sync, reviews, retros), providing accurate estimates and proactively managing risks/dependencies.
  • Create and maintain technical documentation (data contracts, pipeline specs, runbooks) and support operational handoffs.

What you should have

  • Primary Skill- AWS, Databricks, Python, PySpark
  • Secondary Skill- CI/CD, SQL
  • 5+ years of hands-on experience in data engineering building production pipelines and data platforms.
  • Strong AWS experience: S3, Glue, Lambda, Step Functions, EMR (and/or ECS/Fargate), plus CloudWatch; solid grasp of IAM and encryption.
  • Nice to have: AWS certification (Developer/Architect) or equivalent demonstrated expertise.
  • Experience working in Agile teams; strong collaboration, communication, and stakeholder management skills.
  • Experience with Databricks and lakehouse capabilities (e.g., Delta Lake, job/workflow orchestration, cluster tuning) is strongly preferred.
  • Strong SQL skills including complex joins/window functions, data profiling, and performance tuning; understanding of dimensional modeling concepts.
  • Proficient in Python and PySpark with solid Spark fundamentals (partitioning, shuffle, caching, file formats) and ability to debug/optimize.
  • Strong with GitHub, CI/CD concepts, and engineering practices (code reviews, branching, release management); working knowledge of Docker and Terraform.
  • Demonstrated ability to work across teams, drive alignment, and take ownership to deliver outcomes (including production support/on-call as needed).
  • Nice to have: experience with data quality/testing frameworks (e.g., Great Expectations/Deequ) and data governance practices (catalog, lineage, access controls).
  • Nice to have: experience with orchestration tools (e.g., Airflow), streaming (Kafka/Kinesis), and modern table formats (Delta/Iceberg/Hudi).
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).

Our technology teams operate as business partners, proposing ideas and innovative solutions that enable new organizational capabilities. We collaborate internationally to deliver services and solutions that help everyone be more productive and enable innovation.

Who we are

We are known as Merck & Co., Inc., Rahway, New Jersey, USA in the United States and Canada and MSD everywhere else. For more than a century, we have been bringing forward medicines and vaccines for many of the world's most challenging diseases. Today, our company continues to be at the forefront of research to deliver innovative health solutions and advance the prevention and treatment of diseases that threaten people and animals around the world.

What we look for

Imagine getting up in the morning for a job as important as helping to save and improve lives around the world. Here, you have that opportunity. You can put your empathy, creativity, digital mastery, or scientific genius to work in collaboration with a diverse group of colleagues who pursue and bring hope to countless people who are battling some of the most challenging diseases of our time. Our team is constantly evolving, so if you are among the intellectually curious, join us—and start making your impact today.

#HYDIT2025

Required Skills:

Agile Methodology, Best Practices Research, Build Automation, Business Intelligence (BI), Database Administration, Data Engineering, Data Governance, Data Management, Data Modeling, Data Profiling, Data Quality Control, Data Visualization, Data Warehouse Management, Design Applications, Dimensional Modeling, Information Management, PySpark, Python (Programming Language), Software Development, Software Development Life Cycle (SDLC), Stakeholder Management, System Designs, Technical Writing Documentation

Preferred Skills:

Current Employees apply HERE

Current Contingent Workers apply HERE

Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.

Employee Status:

Regular

Relocation:

VISA Sponsorship:

Travel Requirements:

Flexible Work Arrangements:

Hybrid

Shift:

Valid Driving License:

Hazardous Material(s):

Job Posting End Date:

05/26/2026

*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.