Senior Software Engineer - Stream Storage (Apache Fluss)

Ververica GmbH

Job Overview

Location

Remote

Employment Type

Full-time

Work Arrangement

Remote

Sector

Information Technology & Software

Experience Level

Senior (5-8 years)

Application Deadline

May 3, 2026

About the Company

Ververica GmbH is the driving force behind Apache Flink®, the leading open-source stream processing framework. We are pioneers in defining and engineering stream processing at massive scale, handling billions of events per second with millisecond latency.

Our platform is engineered in Europe, emphasizing sovereignty and discipline. It unifies data movement, processing, and storage, offering a solution that is up to 10x faster and 40% more cost-effective than standard open-source Flink. We provide a no-lock-in, no-latency, and no-compromise experience, deployable across any infrastructure – cloud, on-premise, or your own environment.

Global brands rely on Ververica for mission-critical real-time workloads that demand unwavering reliability. Join us to build the future of data processing with the very architects who established the industry standard.

Job Description

Ververica GmbH is pioneering the future of stream-native storage, moving beyond traditional log-based systems to embrace table semantics for enhanced data management. We are developing high-performance distributed systems that are crucial for powering real-time analytics, sophisticated streaming pipelines, and robust transactional workloads.

As a Senior Software Engineer on our Stream Storage team, you will play a key role in shaping the core infrastructure of our stream storage solutions. This includes contributing directly to the open-source Apache Fluss project and building internal, production-grade systems that leverage its capabilities. This position offers a unique opportunity to work at the intersection of distributed systems, advanced storage engines, cutting-edge streaming technologies, and database internals.

Your responsibilities will encompass designing and implementing distributed storage components for streaming tables, managing the full table lifecycle including schema evolution, ingestion, compaction, and retention. You will also enhance the Fluss Lakehouse ecosystem, contribute features and fixes to the Apache Fluss OSS project, and actively participate in design discussions and Fluss Improvement Proposals (FIPs). Furthermore, you will focus on improving the performance and reliability of the Fluss table engine and its integration with streaming engines like Flink. Engaging with the open-source community through pull request reviews and discussions is also a vital part of this role.

Internally, you will build essential tooling and services on top of Fluss, improve observability through metrics and logging, optimize production deployments, and contribute to benchmarking and testing frameworks. This role requires a strong foundation in systems languages such as Java, Go, or Rust, coupled with extensive experience in building distributed systems and a deep understanding of consensus and replication mechanisms, storage engine principles, and streaming systems.

To apply for this role, click the Apply button on this page and follow the instructions.

Required Skills

JavaGoRustDistributed SystemsStorage SystemsConsensus AlgorithmsRaftPaxosLSM TreesB-TreesWALCompactionStreaming SystemsKafkaPulsarFlinkTransactional SystemsConsistency ModelsDebuggingProduction Systems

Key Responsibilities

Design and implement distributed storage components for streaming tables
Work on table lifecycle: schema evolution, ingestion, compaction, retention, indexing
Enhance the Fluss Lakehouse ecosystem support
Contribute features and fixes to the Apache Fluss OSS project
Participate in design discussions and Fluss Improvement Proposals (FIPs)
Improve Fluss table engine performance and reliability
Enhance integration with streaming engines (e.g., Flink)
Engage with the open-source community via PR reviews and discussions
Build internal tooling and services on top of Fluss
Improve observability (metrics, logging, failure diagnostics)
Optimize production deployments
Contribute to benchmarking and testing frameworks

Qualifications

5+ years building distributed systems or storage systems
Strong experience in Java, Go, Rust, or similar systems languages
Good understanding of consensus and replication (Raft, Paxos, etc.)
Good understanding of storage engines (LSM trees, B-Trees, WAL, compaction)
Good understanding of streaming systems (Kafka, Pulsar, Flink, etc.)
Good understanding of transactional systems and consistency models
Experience debugging production distributed systems
Experience with database internals or stream processing engines (Strongly Preferred)
Familiarity with table formats (Iceberg, Hudi, Delta, etc.) (Strongly Preferred)
Contributions to open-source projects (better if ASF) (Strongly Preferred)
Experience with Flink or streaming SQL engines (Strongly Preferred)

Benefits & Perks

Work on table-first stream storage, not just message logs
Direct impact on Apache Fluss open-source evolution
Solve complex consistency and performance trade-offs
Influence architectural decisions in a fast-moving space
Collaborate with engineers passionate about distributed systems

How to Apply

To apply for this role, click the Apply button on this page and follow the instructions.

Join Our Communities

Join WhatsApp Channel Join Telegram Group

The global landscape of real-time data processing is rapidly evolving, with a growing demand for robust, stream-native storage solutions. Ververica GmbH is at the forefront, developing next-generation systems that prioritize table semantics over simple logs. This role is pivotal in advancing high-performance distributed systems essential for real-time analytics and streaming pipelines. You will engage with core stream storage infrastructure, contribute to the Apache Fluss open-source project, and build internal production-grade systems. Your work will directly impact the efficiency and reliability of data processing, influencing business ROI through enhanced analytical capabilities and scalable data management. Key technical areas include distributed systems design, storage engine optimization, and stream processing integration.