Stream Processing with Confluent Cloud for Apache Flink¶
Apache Flink® is a powerful, scalable stream processing framework for running complex, stateful, low-latency streaming applications on large volumes of data. Flink excels at complex, high-performance, mission-critical streaming workloads and is used by many companies for production stream processing applications. Flink is the de facto industry standard for stream processing.
Confluent Cloud provides a cloud-native, serverless service for Flink that enables simple, scalable, and secure stream processing that integrates seamlessly with Apache Kafka®. Your Kafka topics appear automatically as queryable Flink tables, with schemas and metadata attached by Confluent Cloud.
Confluent Cloud for Apache Flink currently supports Flink SQL.
What is Confluent Cloud for Apache Flink?¶
Confluent Cloud for Apache Flink is Flink re-imagined as a truly cloud-native service. Confluent’s fully managed Flink service enables you to:
- Easily filter, join, and enrich your data streams with Flink
- Enable high-performance and efficient stream processing at any scale, without the complexities of managing infrastructure
- Experience Kafka and Flink as a unified platform, with fully integrated monitoring, security, and governance
When bringing Flink to Confluent Cloud, the goal was to provide a uniquely serverless experience superior to just “cloud-hosted” Flink. Kafka on Confluent Cloud goes beyond Kafka by using the Kora engine, which showcases Confluent’s engineering expertise in building cloud-native data systems. Confluent’s goal is to deliver the same simplicity, security, and scalability for Flink that you expect for Kafka.
Confluent Cloud for Apache Flink is engineered to be:
- Cloud-native: Flink is fully managed on Confluent Cloud and autoscales up and down with your workloads.
- Complete: Flink is integrated deeply with Confluent Cloud to provide an enterprise-ready experience.
- Everywhere: Flink is available in AWS, Azure, and Google Cloud.
Get started with Confluent Cloud for Apache Flink:
Confluent Cloud for Apache Flink is cloud-native¶
Confluent Cloud for Apache Flink provides a cloud-native experience for Flink. This means you can focus fully on your business logic, encapsulated in Flink SQL statements, and Confluent Cloud takes care of what’s needed to run them in a secure, resource-efficient and fault-tolerant manner. You don’t need to know about or interact with Flink clusters, state backends, checkpointing, or any of the other aspects that are usually involved when operating a production-ready Flink deployment.
- Fully Managed
- On Confluent Cloud, you don’t need to choose a runtime version of Flink. You’re always using the latest version and benefit from continuous improvements and innovations. All of your running statements automatically and transparently receive security patches and minor upgrades of the Flink runtime.
- Autoscaling
- All of your Flink SQL statements on Confluent Cloud are monitored continuously and auto-scaled to keep up with the rate of their input topics. The resources required by a statement depend on its complexity and the throughput of topics it reads from.
- Usage-based billing
- You pay only for what you use, not what you provision. Flink compute in Confluent Cloud is elastic: once you stop using the compute resources, they are deallocated, and you no longer pay for them. Coupled with the elasticity provided by scale-to-zero, you can benefit from unbounded scalability while maintaining cost efficiency. For more information, see Billing.
Confluent Cloud for Apache Flink is complete¶
Confluent has integrated Flink deeply with Confluent Cloud to provide an enterprise-ready, complete experience that enables data discovery and processing using familiar SQL semantics.
Flink is a regional service¶
Confluent Cloud for Apache Flink is a regional service, and you can create compute pools in any of the supported regions. Compute pools represent a set of resources that scale automatically between zero and their maximum size to provide all of the power required by your statements. A compute pool is bound to a region, and the resources provided by a compute pool are shared among all statements that use them.
While compute pools are created within an environment, you can query data in any topic in your Confluent Cloud organization, even if the data is in a different environment, as long as it’s in the same region. This enables Flink to do cross-cluster, cross-environment queries while providing low latency. Of course, access control with RBAC still determines the data that can be read or written.
Flink can read from and write to any Kafka cluster in the same region, but by design, Confluent Cloud doesn’t allow you to query across regions. This helps you to avoid expensive data transfer charges, and also protects data locality and sovereignty by keeping reads and writes in-region.
For a list of available regions, see Supported Cloud Regions.
Metadata mapping between Kafka cluster, topics, schemas, and Flink¶
Kafka topics and schemas are always in sync with Flink, simplifying how you can process your data. Any topic created in Kafka is visible directly as a table in Flink, and any table created in Flink is visible as a topic in Kafka. Effectively, Flink provides a SQL interface on top of Confluent Cloud.
Because Flink follows the SQL standard, the terminology is slightly different from Kafka. The following table shows the mapping between Kafka and Flink terminology.
Kafka | Flink | Notes |
---|---|---|
Environment | Catalog | Flink can query and join data that are in any environments/catalogs |
Cluster | Database | Flink can query and join data that are in different clusters/databases |
Topic + Schema | Table | Kafka topics and Flink tables are always in sync. You never need to to declare tables manually for existing topics. Creating a table in Flink creates a topic and the associated schema. |
As a result, when you start using Flink, you can directly access all of the environments, clusters, and topics that you already have in Confluent Cloud, without any additional metadata creation.
Compared with Open Source Flink, the main difference is that the Data Definition Language (DDL) statements related to catalogs, databases, and tables act on physical objects and not only on metadata. For example, when you create a table in Flink, the corresponding topic and schema are created immediately in Confluent Cloud.
Confluent Cloud provides a unified approach to metadata management. There is one object definition, and Flink integrates directly with this definition, avoiding unnecessary duplication of metadata and making all topics immediately queryable with Flink SQL. Also, any existing schemas in Schema Registry are used to surface fully-defined entities in Confluent Cloud. If you’re already on Confluent Cloud, you see tables automatically that are ready to query using Flink, simplifying data discovery and exploration.
Observability¶
Confluent Cloud provides you with a curated set of metrics, exposing them through Confluent’s existing metrics API. If you have established observability platforms in place, Confluent Cloud provides first-class integrations with New Relic, Datadog, Grafana Cloud, and Dynatrace.
You can also monitor workloads directly within the Confluent Cloud Console. Clicking into a compute pool gives you insight into the health and performance of your applications, in addition to the resource consumption of your compute pool.
Security¶
Confluent Cloud for Apache Flink has a deep integration with Role-Based Access Control (RBAC), ensuring that you can easily access and process the data that you have access to, and no other data.
Access from Flink to the data¶
- For ad-hoc queries, you can use your user account, because the permissions of the current user are applied automatically without any additional setting needed.
- For long-running INSERT INTO statements that need to run 24/7, you should use a service account, so the statements are not affected by a user leaving the company or changing teams.
Access to Flink¶
To manage Flink access, Confluent has introduced two new roles. In both cases, RBAC of the user on the underlying data is still applied.
- FlinkDeveloper: basic access to Flink, enabling users to query data and manage their own statements.
- FlinkAdmin: role that enables creating and managing Flink compute pools.
Service accounts¶
Service accounts are available for running statements permanently. If you want to run a statement with service account permissions, an OrgAdmin must create an Assigner role binding for the user on the service account. For more information, see Grant a service account and user permission to run SQL statements.
Private networking¶
Confluent Cloud for Apache Flink supports private networking on AWS and provides a simple, secure, and flexible solution that enables new scenarios while keeping your data securely in private networking.
All Kafka clusters in AWS are supported, with any type of connectivity (public, Private Links, VPC Peering, and Transit Gateway).
- Simple to set up
- Confluent makes private networking and security easy to configure and use. By simply adding a Private Link Attachment to your Confluent Cloud environments, you can use Flink and Kafka seamlessly with private networking. This not only enables Flink to access to all private Kafka clusters in your organization but also ensures that all Flink resources are secured and can only be accessed with private networking.
- Secure by default
- By enabling Flink private networking for an environment, all Flink resources in the environment are secured by private networking, ensuring that no unauthorized access can be used to create resources, modify any resource, or see sensitive data, like Flink SQL query text. Private networking ensures that statement data is protected from unauthorized access and prevents any mission-critical statements from being stopped by unauthorized users. Confluent ensures that data stored in clusters located in private networking can’t be exfiltrated to a location on public networking, safeguarding against unintentional or malicious data exfiltration scenarios.
- Flexible
- With Confluent Cloud for Apache Flink, you can process data across different environment and clusters by joining data from two clusters and writing to a third for example, or you can route data between clusters. You can read data from both public and private sources, and write it to a private cluster, for example, augmenting secured transactions with public data. The Confluent network security policy ensures that private data can’t be written to a cluster that’s accessible on the public internet.
For more information, see Enable Private Networking.
Program Flink with SQL, Java, and Python¶
Confluent Cloud for Apache Flink supports programming your streaming applications in these languages:
Install Confluent for VS Code to access Smart Project Templates that accelerate project setup by providing ready-to-use templates tailored for common development patterns. These templates enable you to launch new projects quickly with minimal configuration, significantly reducing setup time.