Working as a Data Flow Manager you will:
-
Design, implement, evaluate, and support sophisticated data processing pipelines within Cloudera DataFlow (Apache NiFi), managing everything from data ingestion and transformation to enrichment and routing.
-
Develop and fine-tune Change Data Capture (CDC) architectures to enable real-time or near-real-time data processing, utilizing tools like NiFi, Kafka, and Debezium or SQL CDC connectors.
-
Establish seamless integrations between internal architectures and external data sources using various protocols, including REST API, JDBC, and Kafka.
-
Oversee data structures and lifecycles by maintaining Avro schemas and managing metadata as well as data lineage through Apache Atlas.
-
Ensure robust data security and governance by configuring appropriate policies and utilizing tools such as Apache Ranger.
-
Actively monitor pipeline performance, troubleshoot stability issues, and drive the full automation of data flow processes.
-
Partner closely with business stakeholders, architects, and fellow engineers to define system architecture, draft comprehensive flow documentation, and author Standard Operating Procedures (SOPs) and runbooks.
-
Support system enhancements by participating in the upgrade processes for platforms like CDP, NiFi, and Kafka, and evaluating new data connectors.