Improving performance in Schema Registry

Adding Knox as a load balancer can increase the performance of Schema Registry.

You can increase the performance of Schema Registry by adding Apache Knox, the Cloudera load balancer, to your Schema Registry configuration. This increase in performance offsets the minor decrease in performance caused by the lack of server-side caching.

By default, Schema Registry does not have server-side caching. Each time a client sends a query, the client reads the data directly from the database.

The lack of server-side caching creates a minor performance impact. However, it enables Schema Registry to be stateless and be distributed over multiple hosts. All hosts use the same database. This enables a client to insert a schema on host A and query the same schema on host B.

You can store JAR files similarly in HDFS and retrieve a file even if one of the hosts is down.

To make Schema Registry highly available in this setup, include a load balancer on top of the services. Clients communicate through the load balancer. The load balancer then determines which host to send the requests to. If one host is down, the request is sent to another host. In CDP, the load balancer is Apache Knox.

The following diagram shows the flow of a client query and response:


Integrating Schema Registry with Knox

Follow these steps to add Knox load balancer to your Schema Registry configuration:

  1. In Cloudera Manager, navigate to Knox service > Configuration.
  2. Add the following property to the Knox Gateway Advanced Configuration Snippet (Safety Valve) for conf/cdp-resources.xml configuration:
    ```xml
    <name>providerConfigs:sso</name>
    <value>
      <role>ha#ha.name=HaProvider#ha.param.SCHEMA-REGISTRY=
        enabled=true;maxFailoverAttempts=3;failoverSleep=1000;
        maxRetryAttempts=300;retrySleep=1000
    </value>
    ```
  3. Click Save Changes, and restart the Knox service.
  4. Check if the changes are reflected in the cdp-proxy.xml file.
  5. To verify if the Knox load balancer works fine, disable one Schema Registry host and check if you can still access Schema Registry through Knox.