Data Portal on Confluent Cloud

Data Portal is a self-service interface for discovering, exploring, and accessing Apache Kafka® topics on Confluent Cloud.

Building new streaming applications and pipelines on top of Kafka can be slow and inefficient when there is a lack of visibility into what data exists, where it comes from, and who can grant access. Data Portal leverages Stream Catalog and Stream Lineage to empower data users to interact with their organization’s data streams efficiently and collaboratively.

With Data Portal, data practitioners can:

  • Search and discover existing topics with the help of topic metadata and get a drill-down view to understand data they hold (without access to the actual data).
  • Request access to topics through an approval workflow that connects the data user with the data owner, and admins that can approve the request.
  • View and use data in topics (once access is granted) to build new streaming applications and pipelines.

The following sections take you through each of the steps on this journey, from the perspectives of both data user and topic owner.

Demo: Data Portal and Flink

This video introduction to a real-world use case for Stream Governance highlights various aspects of Stream Catalog, stream lineage, data discovery, and data quality. Learn how the Data Portal and Apache Flink® in Confluent Cloud can help developers and data practitioners find the data they need to quickly create new data products.

Prerequisites and notes

  • Data Portal is available in Confluent Cloud for users with a Stream Governance package enabled in their environments.
  • User-generated metadata should be appended to topics to make them discoverable and present them effectively on the Data Portal. In particular, description, tags, business metadata and owner name and email should be added to topics.
  • The topic access request workflow is not available for topics on Basic clusters.
  • The collaboration workflow for topic access requests through email is dependent upon having owner names and emails appended to topics.
  • Users need search Stream Catalog permissions to use the Data Portal, at a minimum DataDiscovery role. On the “add new user” workflow, the DataDiscovery role is pre-selected by default to give the user permission to use the Data Portal.
  • To approve access requests to topics, users need topic read and write granting permissions, in particular ResourceOwner, CloudClusterAdmin, EnvironmentAdmin or OrganizationAdmin.
  • Users need query permissions on one or more compute pools to query data with Apache Flink®️, at a minimum FlinkDeveloper role.

Search and discover existing topics

To get started, sign in to Confluent Cloud, and click Data portal on the left menu.

Users with search Stream Catalog permissions can use the Data Portal. At the very least, a user must have the DataDiscovery RBAC role.

By default, the Discover tab on the Data Portal shows a curated view of all available Kafka topics by environment.

../_images/dg-data-portal-starting-view.png

You can search for topics by name or tag, or alternatively browse topics by tag associated, creation date, and modified date.

Each topic card on the page gives a summary of the topic with the name, data location (environment, cluster, cloud provider, and region), description, tags, and when it was created or modified.

../_images/dg-data-portal-purchase-orders.png

Click a tag pile, or View all to get to an advanced search page where you can apply additional filtering to narrow down the topic results. Available filters include tags, business metadata, cloud provider, region, and other topic technical metadata.

../_images/dg-data-portal-filter-cards.png

You can also sort topic results alphabetically, by creation date and by modified date.

../_images/dg-data-portal-filter-by-newest.png

To learn more about a topic, click its card. A summary of the topic is shown on the right panel, including the following information:

  • Name
  • Environment
  • Cluster
  • Cloud provider
  • Cloud region
  • Description
  • Schema (with link to see the schema in full screen)
  • Link to see topic Lineage
  • Owner name and email address
  • Business metadata
  • Technical metadata (created date, retention period, and so on)
../_images/dg-data-portal-topic-card-select.png

If you already have read access to the topic, you will see the last message produced and a link to the topic message browser. Also, you can set up a client and query the topic with Flink SQL from the actions section on the top.

../_images/dg-data-portal-topic-full-access.png

If you don’t have read access to the topic, you will see a Request access button on the actions section.

../_images/dg-data-portal-request-access.png

Request access to topics

When you click Request access, you are prompted to select the type of topic access you are looking for (Read only or Read and Write) and leave an optional message to the person that will review your request.

Read only maps to granting DeveloperRead role in RBAC, and Read and Write maps to granting DeveloperRead and DeveloperWrite roles in RBAC.

../_images/dg-data-portal-request-access-purchase-orders.png

When you submit a request to access a topic, you get a confirmation message.

../_images/dg-data-portal-request-sent.png

If the topic has an owner email, an email is sent to the topic owner with your request for access. Also, all requests show on Access requests under Accounts & access where any user with permissions to grant topic access can review.

When someone reviews your request, you will receive an email notifying you of the approval or rejection of your request.

../_images/dg-data-portal-topic-request-granted.png

If your request is approved, your account is granted DeveloperRead and DeveloperWrite (if requested) to the topic.

Requests expire after 30 days. In this case, you will receive an email notifying you of request expiration.

You now have more visibility into the topic, and can start working with the data immediately. If you have both read and write privileges, you can develop clients to produce and consume from the topic, and query the topic with Flink SQL.

(Data Owners) Manage access to your topics

When a request for access to a topic is submitted, the topic owner receives an email to review the request (if an owner email is associated with the topic).

../_images/dg-data-portal-topic-request-to-admin.png

Additionally, all requests for access to topics show on the Access requests section under Accounts & access. From here, any user with permissions to grant access can approve or deny the request; for example, any admin.

../_images/dg-data-portal-access-requests-manage.png

The data owner of the topic or any admin can:

  • View the request in an approval queue, along with the message submitted by the requestor.
  • Approve or deny access to the topic.

Requests expire after 30 days. In this case, the requesting user receives an email notification of request expiration.

The Past requests tab under Access requests shows past requests (approved, rejected or expired) from the last 90 days.

../_images/dg-data-portal-admin-past-requests.png

Once the data owner or admin approves the request, the data user gets an email indicating access to the topic is granted.

View and use data in topics

With read access to the topic, you can view the last message produced and a link to the topic message browser.

../_images/dg-data-portal-topic-full-access.png

Also, you can set up a client and query the topic with Flink SQL from the actions section on the top.

Click Set up a client to go to the clients page on the Cloud Console, where you can get the instructions to build an event-driven application in the programming language of your choice.

../_images/dg-data-portal-client.png

Click Query to work with Flink SQL. If a Flink SQL pool is pre-created in the same region of the topic, this takes you to a new workspace with a pre-selected query: SELECT * FROM <TOPIC_NAME> LIMIT 10;.

../_images/dg-data-portal-flink.png

If more than one Flink SQL compute pool is created for the region of the topic, you are prompted to select one to run the query.

../_images/dg-data-portal-flink-select-pool.png

You can use Flink SQL to filter, join, and enrich your Kafka data streams. To learn more, see the blog post on Confluent Cloud for Apache Flink and the SQL documentation.