Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata¶
The key to unlocking the value within data in motion and increasing productivity across an organization is a self-service tool for data discovery.
Available through both the Cloud Console and API (Stream Catalog REST API Usage and Examples on Confluent Cloud and Stream Catalog GraphQL API Usage and Examples on Confluent Cloud), Stream Catalog allows users across teams to collaborate within a centralized, organized library designed for sharing, finding, and understanding the data needed to drive their projects forward quickly and efficiently. It’s like a digital library for data in motion allowing any user, experienced with Kafka or not, to search for what they need, find what’s already been built, and put it to use right away.
With classifications, teams can constantly increase the value of the business’s Stream Catalog by adding contextual details to data; for example, by labeling schema fields as “PII” (personal identifiable information) or “Sensitive”.
The following high level sections are included here:
- Search entities and tags
- Tag entities, data, and schemas
- Business metadata
- Access control (RBAC) for Stream Catalog
Search entities and tags¶
Stream Catalog in Confluent Cloud centralizes all entities metadata and makes them available for search and discovery using the UI and APIs. Entity types available on the Stream Catalog are:
- Schemas
- Schema subject name
- Schema record name
- Schema field name
- Topics
- Connectors
- Clusters
- Environments
- Streaming data pipelines
- Apache Flink® compute pools
These are also listed by their sub-class names under Entity types in the Catalog API Usage Guide.
This guide provides example workflows for some but not all entity types. All follow the same pattern with regard to workflows and API examples. The full list of supported entities is provided with more detail in Entity types.
You can also search on tags, which will return results for all catalog-supported entities with a specified tag. To learn more, see Tag entities, data, and schemas and Search for entities with a given tag.
Global search¶
To try it out, log into Confluent Cloud in an environment where you have Schema Registry enabled, and start typing the name of a schema subject, data record, data field, or tag into the search bar at the top. You will get results as you type. Hit enter to select an entity.
Note that you will get hits from other entities, such as topics, on the same search. Entity names that include the word you type also show
(for example, the tag my_stocks
is shown when you search for stock
).
Click a search result to navigate to the entity. For example:
- Click
stocks-value
(listed as one of the subjects under Schema in the search hits) to navigate to the value schema for thestocks
topic. - Click
stocks
(listed under Topic in the search hits) to navigate to thestocks
Kafka topic.
If an entity (other than a schema) of the same name exists on multiple clusters, all are shown with associated environments.
Schema subjects live at the level of an environment, in per-environment Schema Registry clusters. If a schema subject of the same name exists in multiple environments, all are shown.
Depending on the entity selected, different filters are available in the advanced search.
For example, on topics you can filter on values for the following criteria to further narrow your topic search:
- Environments
- Tags
- Date created
- Date modified
- Retention time
As another example, on connectors you can filter on values for the following:
- Environments
- Tags
- Date created
- Date modified
- Category
- Plugin Type
These aspects of an entity also come into play for role-based access control (RBAC) in that you can provide access to some aspects of an entity and not others. To learn more, see Access control (RBAC) for Stream Catalog.
Search with filters¶
Click See all.. or hit return on your initial search results to view a more detailed list, with an option apply more filters.
This view shows “All Results” for a search on stocks
.
You can filter this further by selecting an entity on the left menu; for example, “Schemas”.
Also, you can use the filters across the top to filter by “Environment”, “Tag”, or “Entities”.
This example narrows the search result based on entity type to show only schema subject names that include stock
.
This example filters on schema subjects, records, and fields in the demos environment that are tagged with my_stocks
.
You can change the search on this detailed list view to perform an advanced search using filters.
For example, if you worked through the steps in the Quick Start to create a schema
for the employees
topic, search for employees
in schemas. (Make sure you clear any other filters like tags
that may not match the employees schema.)
Searching on stocks across environments shows the following hits for topics.
As another example, searching on tags across environments shows the following hit for a tagged connector.
Note that advanced searches on entities include filters for objects specific to each entity type. For example; connectors include filters on resources such as connector “Category” and “Plug-in type”, in addition to the standard “Environments”, “Date created”, and “Date modified” filters.
Tag entities, data, and schemas¶
Stream Catalog features enables entity tagging. Tags are searchable like any other entity.
What is it?¶
A fundamental aspect of governance is the ability to organize data based on a shared vocabulary, including multiple concepts and categories. Confluent Cloud now provides the option to create and apply tags to schemas and fine-grained entities like data records and fields. On this version of Confluent Cloud, you can:
- Create instances of provided tags (Public, Private, Sensitive, PII) and custom (“free form”) tags
- Associate tags with schema versions, records, and fields
- Apply multiple tags to a single field, record, or schema version
How to use it¶
You can create and work with tags through either the Confluent Cloud Console, as described below, or through the Confluent Cloud REST API, as described in Tags API examples.
Important
Tag definitions with attributes created with the Confluent Cloud CATALOG API (V1) are currently not accessible through the Confluent Cloud Console (UI) for search, update, and so on. Tags with attributes created through the API must be managed through the API, as described in Stream Catalog REST API Usage and Examples on Confluent Cloud. (Tags created without attributes through the API will show on the UI.)
How tags work with schema versioning¶
When you apply tags to schemas, you always apply them to a particular version of a schema. As you modify schemas, they evolve to newer versions. Tags that you applied to previous versions of a schema are automatically propagated to new versions.
For example, if you applied the tag my_stocks
starting with version 2 of a
schema, that tag would propagate to versions 3, 4, and so on, but version 1,
which never had the my_stocks
applied, would not be tagged, unless you went
back and explicitly added it to version 1.
Inline and External tags¶
Stream Catalog and Schema Registry make use of both inline and external tags.
- Inline tags are embedded directly in a schema.
- External tags are specified external to a schema.
Only inline tags show up as the tags of a field in the Confluent Cloud Console. External tags do not show up in the UI.
So, for example, if you are using the Maven plugin to register a schema with tags included, the specified tags must be:
- Already specified in the catalog, either on the Confluent Cloud Console as described below or through the Stream Catalog REST API
- Embedded directly in the schema, also either through the Confluent Cloud Console or the Stream Catalog REST API.
Embedded tags look like this, within a schema definition:
"confluent:tags": [ "PII", "PRIVATE" ]
To learn more about inline and external tags, see the section on Tags in Data Contracts.
View available tags¶
To find tags that are already available in an environment, do one of the following:
Select an environment and click Tags on the right panel.
Navigate to a topic or schema and click under the Tags label on the right panel to get a drop-down list of tags available to add to the selected entity.
Create tags¶
To create a tag:
Select an environment and click Tags on the right panel.
Click Create tag.
When the name and description is properly filled in, the Create button is active. Click Create to create the tag with the current name and description.
View available tags.
After a tag is created, it shows in the list under Tags.
Apply tags¶
Tags can be applied to any catalog-supported entity type. In the current release, you can tag schemas, data records, data fields, topics, and connectors.
Example: Apply a tag to a schema record or field¶
To apply a tag:
Navigate to the entity for which you want to apply the tag.
Using schemas as an example, there are a few different ways to do this:
- From the Search bar, start typing the name of a schema, record, field, or topic where you want to apply a tag or business metadata.
- From the same environment level view click Schemas on the right side panel, and select a schema from the list.
- From within a cluster, navigate to a topic, then click the Schema tab for that topic.
On the Schemas tab for a topic, add tags as follows.
Click Tag fields
To add tags to a selected record or field, expand the tree view of the schema (the default view), click the plus icon next to the entity, and select a tag from the drop-down list of available tags. You can add multiple tags to multiple elements of the schema. (You can also delete tags.)
- Click Save.
A new schema version is created, which incorporates your current tag updates.
View applied tags.
Applied tags show next to the schema version, records, and fields with which they are associated.
Example: Apply a tag to a topic¶
To apply a tag to a topic:
Navigate to a topic for which you want to apply a tag.
On the tab for the topic, click Tags in the right menu, and select a tag.
View applied tags.
Applied tags show next to the topic with which they are associated.
Remove a tag from an entity¶
To remove a previously applied tag from a schema record or field:
- Navigate to the schema that includes the tag.
- Click Tag fields.
- Click the delete icon (x) on the applied tag(s).
- Click Save.
A new schema version is created, which incorporates the tag updates (in this case, deleted tags).
To remove a tag from a topic:
- Navigate to a topic.
- Click Tags in the right menu to get the list of applied tags.
- Click the delete icon (x) next to the tags you want to remove from the topic.
Edit a tag¶
To edit a tag description:
- Select an environment and click Tags on the right side panel.
- Select the tag you want to edit from the Tag management list.
- Click the edit icon next to the description, edit, and click Save.
Tip
You cannot edit the name of an existing tag, only its description. To rename a tag, remove it from any entities to which it is applied, delete the tag, and create a new one.
Delete a tag¶
If you want to delete a tag, first make sure that the tag is not currently applied to any entities. If the tag is in use, the delete operation will fail.
To delete a tag from an environment:
- Select an environment and click Tags on the right side panel.
- Select the tag you want to delete from the Tag management list.
- Click the trashcan icon in the upper right.
- If the tag is in use (applied to one or more entities), you will get a warning and the tag will not be deleted.
- If the tag is not in use, it is deleted.
Search for entities with a given tag¶
As shown in Search entities and tags, tags are now discoverable through the global search.
This means that you can search for a tag name (or part of the tag name), and the search will return all entities that have that tag applied. From there, you can drill down into the resource as with any other search.
For example, searching on stock
returns the my_stocks
and buy_stocks
tags in the results.
Click one of these, to get a list of all entities tagged accordingly.
For example, click my_stocks
under Tag in the results to get a list of all entities tagged with my_stocks
.
Click an entity to drill down. For example, click StockTrade
to drill down into the schema that has a field tagged with my_stocks
Tip
You must switch to the tree view of the schema to see record and field level tags. The raw schema view, which is the default, does not show them. The tree and raw schema view buttons are on the top left next to the schema search field, as highlighted in the illustration below.
Here is another example showing a more specific search for entities tagged PII
(personally identifiable information).
A search for the tag PII
provides these search results.
Scroll down to find balance
.
Drill down on balance
, which is a tagged field in the account-value
schema.
(Remember to switch to the tree view to see the record and field level tags.)
Business metadata¶
What is it?¶
Business metadata is a collection of attributes in the form of key-value pairs that provide more contextual information to entities across the platform. Suppose you want to document or find out:
- Which team is responsible for a particular schema?
- Which product domain does a schema belong to?
- What is the GitHub location for a schema?
These are all examples of how owners can use metadata to provide context around data, and that users can discover to augment their understanding of entities. You can assign business metadata to a schema.
For example, you can create a collection named Domain
that includes the
attributes Name
, Team_owner
and Slack_contact
. Once users assign a
business metadata collection to an entity like a topic, they can input the
attributes values tailored to that specific entity.
Each customer will have their own business metadata concepts. Here are some examples of ideas for business metadata:
Collection | Attributes |
---|---|
Team |
|
Domain |
|
Owner |
|
Data_product |
|
github |
|
How is business metadata different from tags?¶
A tag (described in Tag entities, data, and schemas) is a word or acronym you can
associate with an entity to provide additional context in terms of meaning,
classification, and organization. For example, you can create a tag named
PII
, Sensitive
, or Public
, and assign it to a topic or a schema.
Tags help to build a shared vocabulary, support data discovery and compliance, and are a great way to enrich entities with user-generated metadata.
However, tags are less flexible metadata entity types than business metadata
because when you attach a tag to an entity (like a topic or schema), you cannot add
extra information at attach time. With tags, a user can mark a topic as PII
, but
with business metadata they can express more metadata information, such as this topic
has owner=david
.
That said, one does not replace the other: business metadata is mainly used for defining extra information for entities, while tags are used for organizing and classifying entities. In general, although tags and business metadata are closely related, they are different concepts and are created and used in a different way.
Why is it important?¶
Business metadata allows entities on the platform to be more self-descriptive and helps data consumers understand what those entities mean and are used for.
How business metadata works with schema versioning¶
When you apply business metadata, you always apply it to a particular version of a schema. As you modify schemas, they evolve to newer versions. Business metadata that you applied to previous versions of a schema is automatically propagated to new versions.
For example, if you applied a location label and attributes starting with version 2 of a schema, that location metadata would propagate to versions 3, 4, and so on, but version 1, which never had that label applied, would not have any metadata unless you went back and explicitly added it to version 1.
Examples¶
To learn more about using business metadata in context of a real-world use case, check out the Demo in the Stream Governance overview. You can tune in at about 6:00 minutes into the video for a cursory overview of the application being presented, followed by a discussion of how to add business metadata to the schemas.
How to use it¶
You can create and apply business metadata through the Confluent Cloud Console as described in the sections below or through the Confluent Cloud REST API, as described in Business metadata API examples.
Create business metadata and add attributes¶
Tip
Business metadata is only available in the Advanced package for Stream Governance. If you do not currently have this package, you can upgrade directly from the Confluent Cloud Console. Click Upgrade now on the right menu for an environment.
To create business metadata:
Select an environment, and click Business metadata on the right side panel.
If this is the first time you’ve created business metadata on this cluster, click Get started.
Otherwise, click Create business metadata.
On already created metadata, there is also an option to add new attributes to the currently selected label. To do so, click Create attribute.
Fill in values for the metadata label name, description, and attributes, then click Create.
Like tags, naming rules for business metadata labels and attributes require that these names start with a letter and are followed by alphanumeric or
_
charactersThe metadata you created is listed, with its label name on the left menu.
View available business metadata¶
To view all available business metadata:
Select an environment and click Business Metadata on the right side panel.
Apply business metadata to an entity¶
Business metadata can be applied to any catalog-supported entity types, including schemas, data fields, data records, topics, and connectors.
Example: Apply business metadata to a schema¶
This example shows how to apply business metadata to a schema.
To apply business metadata to a schema:
Navigate to a schema for which you want to apply business metadata.
There are a few of different ways to do this:
- From the Search bar, start typing the name of a schema where you want to apply a tag or business metadata.
- From the same environment level view, click Schemas on the right side panel.
- From within a cluster, navigate to a topic, then click the Schema tab for that topic.
If needed, select the specific schema version to which you want to apply the business metadata.
The schema version is shown on the top left. By default, the latest (current) version is selected.
Click Add business metadata on the schema Overview panel.
On Add business metadata dialog, select the data and attributes to associate with the currently displayed schema version. Note that:
- On this dialog, you have the option to apply multiple labels (business metadata) to this same schema version by clicking + Add business metadata at the bottom of the dialog.
- You cannot create new business metadata from this dialog; only add already existing labels and attributes. If you want to create new labels and attributes, you must do so from the View & manage business metadata screen.
When you have added all of the business metadata labels and attributes desired, click Continue to apply them to selected schema version.
The business metadata you applied to this schema version is displayed on the lower right.
Example: Apply business metadata to a topic¶
Here is another example, showing how to apply business metadata to a topic.
Navigate to the topic to which you want to apply the metadata.
For example, go to <Environment> -> <Cluster> -> Topics on left menu, to list topics, then click the topic you want.
On the tab for the topic, click Business Metadata in the right menu, and select a tag.
On Add business metadata dialog, select the data and attributes to associate with the currently displayed schema version. Note that:
- On this dialog, you have the option to apply multiple labels (business metadata) to this same topic by clicking + Add business metadata at the bottom of the dialog.
- You cannot create new business metadata from this dialog; only add already existing labels and attributes. If you want to create new labels and attributes, you must do so from the View & manage business metadata screen.
When you have added all of the business metadata labels and attributes desired, click Continue to apply them to the topic.
The business metadata you applied to the topic is now associated, as shown on the right menu under Business Metadata.
Edit a business metadata¶
To edit a existing metadata:
- Navigate to the Environment.
- Click View & manage business metadata on the right panel.
- Select the label you want to edit from the list.
- Edit the description and/or add attributes.
- Click Save for each option.
Tip
You cannot delete attributes from an existing metadata definition/labels; only add them. Your other option for reconstructing a business metadata definition is to remove it from any entities to which it is applied, delete the definition/label, and create a new one, adding only the attributes you want it to include.
Delete business metadata¶
If you want to delete a metadata group, first make sure that the label is not currently applied to any entities. If the label is in use, the delete operation will fail.
To delete a metadata group from an environment:
- Navigate to the Environment.
- Click View & manage business metadata.
- Select the group label you want to delete from the list.
- Click the trash can icon in the upper right.
- If the label is in use (applied to one or more entities), you will get a warning and the label will not be deleted.
- If the label is not in use, it is deleted.
Search for labels¶
If you have a long list of business metadata labels, you might want to search for label names in the Search bar above the list. The predictive search shows matching labels as you type.
Access control (RBAC) for Stream Catalog¶
Role-Based Access Control (RBAC) enables administrators to set up and manage user access to Schema Registry subjects and topics. This allows for multiple users to collaborate on with different access levels to various resources.
The following table shows how RBAC roles map to Stream Catalog resources. For details on how to manage RBAC for these resources, see List the role bindings for a principal, Predefined RBAC Roles on Confluent Cloud, and List the role bindings for a principal.
Role | Scope | Tags & business metadata: DEFINE, MANAGE | Tags & business metadata: WRITE, DELETE | Tags & business metadata: READ | Catalog search APIs (READ) | Catalog global search on UI (READ) | Data portal |
---|---|---|---|---|---|---|---|
CloudClusterAdmin | Cluster | ||||||
ResourceOwner | Resource | ||||||
DeveloperManage | Resource | ||||||
DeveloperWrite | Resource | ||||||
DeveloperRead | Resource | ||||||
OrganizationAdmin | Organization | All | ✔ | ✔ | ✔ | ✔ | ✔ |
EnvironmentAdmin | Environment | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
DataDiscovery | Environment | ✔ | ✔ | ✔ | ✔ | ||
DataSteward | Environment | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Operator | Organization, Environment | ✔ | ✔ | ✔ |
Table Legend:
- ✔ = Yes
- Blank space = No
- Catalog search APIs are documented here in the Confluent Cloud API reference: Search by Attribute, Search by Fulltext Query, and the Catalog REST API examples guide under Searching.
- Catalog global search from the UI is described in this user guide under Global search.
Note
Granular RBAC roles for Operator (Cluster scope), MetricsViewer (Cluster scope) ResourceOwner, CloudClusterAdmin, DeveloperManage, DeveloperWrite, and DeveloperRead are not currently available. Therefore, these roles currently cannot view the topic and schema metadata (which includes technical metadata such as date created, date modified, and so on); nor can they view user-defined metadata such as description, tags, and business metadata. Also, these roles currently cannot use the Stream Catalog REST APIs or the GraphQL APIs. That said, customers should not depend on this behavior to restrict access, as this is a temporary limitation.