Striim 3.9.4 / 3.9.5 documentation

Running the CDC sample apps

The CDC sample applications demonstrate Striim's data migration capabilities using a PostgreSQL instance in a Docker container.

Note

To run these sample applications, we recommend using a computer with 12 GB or more memory. If you run them on a computer with 8 GB, watch memory usage closely and close other applications as necessary to avoid running out of memory. Do not attempt to run them on a computer with less than 8 GB.

Installing Striim with the CDC sample apps

To install the CDC sample apps in a DEB, RPM, TGZ, or ZIP Striim installation, see Installing the CDC sample apps in an existing Striim server.

  1. If you have already installed Docker Desktop, skip this step.

    In Windows 10 or OS X, install Docker Desktop.

    In Linux, install Docker CE.

  2. If you are on OS X or Linux, skip this step, as it will be done for you by the Striim installer.

    In Windows 10, open a command prompt and enter:

    docker run --name striimpostgres -d -p 5432:5432 striim/striim-postgres-cdc:latest

    This will download and start a Docker container with a PostgreSQL instance configured to serve as a source and target for the sample applications.

  3. Install Striim as described in Evaluating on Microsoft Windows or Evaluating on Mac OS X or Linux.

    • On the Select Installation Packages page (step 3), leave the PostgreSQL pack selected.

    • On the Kafka configuration page, choose "start Kafka locally."

CDC_sample_apps.png

When Striim has launched, log in and go to the Apps page. The Postgres and Kafka icons should be green, indicating that the apps are ready to run. If instead they are red, see Troubleshooting the CDC sample apps.

Installing the CDC sample apps in an existing Striim server

Use the following method to install the CDC sample apps when the Striim server was installed from a DEB, RPM, TGZ, or ZIP package. This may also be used to reiinitialize the PostgreSQL instance, Kafka topic, and SampleOutput directory if the sample apps become unusable.

  1. If you have already installed Docker Desktop, skip this step.

    In Windows 10 or OS X, install Docker Desktop.

    In Linux, install Docker CE.

  2. On Windows only, if Striim's internal Kafka instance is not running, start it (see Configuring Kafka).

  3. Run the striim/Samples/Scripts/InitializePostgresSamples.sh script or on Windows InitializePostgresSamples.bat batch file.

About the CDC sample apps

There are three groups of applications:

  • SamplesDB demonstrates a SQL CDC source to database target pipeline, replicating data from one set of PostgreSQL tables to another set. The two applications that are similar to their real-world equivalents are:

    • PostgresToPostgresInitialLoad150KRows uses DatabaseReader and DatabaseWriter to replicate 150,000 existing records from the customer, nation, and region tables to the customertarget, nationtarget, and regiontarget tables. In a real-world application, the source and target would typically be different databases. For example, the source might be Oracle and the target might be Amazon Redshift; Azure SQL Data Warehouse, PostgreSQL, or SQL DB; Google BigQuery, Cloud SQL, or Spanner; or Snowflake.DatabaseReaderDatabaseWriter

    • PostgresToPosgresCDC uses PostgreSQLReader (see PostgreSQL) and DatabaseWriter to continuously update the target tables with changes to the source.DatabaseWriter

  • SamplesDB2Kafka demonstrates a typical SQL CDC source to database target pipeline, replicating data from a set of PostgreSQL tables to a Kafka topic. The two applications that are similar to their real-world equivalents are:

    • PostgresToKafkaInitialLoad150KRows uses DatabaseReader and KafkaWriter to replicate 150,000 existing records from the PostgreSQL customer, nation, and region tables to messages in a Kafka topic called kafkaPostgresTopic. In a real-world application, the target would be an external Kafka instance, either on-premise or in the cloud.DatabaseReader

    • PostgresToKafkaCDC uses PostgreSQLReader (see PostgreSQL) and KafkaWriter to continuously update the Kafka topic with changes to the PostgreSQL source tables. Note that updates and deletes in PostgreSQL create new messages in Kafka rather than updating or deleting previous messages relating to those rows.

  • SamplesDB2File demonstrates a typical SQL CDC source to file target pipeline, replicating data from a set of PostgreSQL tables to files. The two applications that are similar to their real-world equivalents are:

    • PostgresToFileInitialLoad150KRows uses DatabaseReader and FileWriter to replicate 150,000 existing records from the PostgreSQL customer, nation, and region tables to files in striim/SampleOutput.. In a real-world application, the target directory would typically be on another host, perhaps in AWS S3, Azure Blob Storage or HD Insight Hadoop, or Google Cloud Storage.DatabaseReader

    • PostgresToFileCDC uses PostgreSQLReader (see PostgreSQL) and FileWriter to continuously update the files with changes to the PostgreSQL source tables. Note that updates and deletes in PostgreSQL add new entries to the target files rather than updating or deleting previous entries relating to those rows.

Striim provides wizards to help you create similar applications for many source-target combinations (see Creating apps using templates).

The other applications use open processors (see Creating an open processor component) and other custom components to manage the PostgreSQL instance and generate inserts, updates, and deletes. In a real-world application, the source database would be updated by users and other applications.

  • ValidatePostgres, ValidateKafka, and ValidateFile verify that the sources and targets used by the other apps are available.

  • Execute250Inserts adds 250 rows to the source tables and stops automatically.

  • Execute250Updates changes 250 rows in the source tables and stops automatically.

  • Execute250Deletes removes 250 rows from the source tables and stops automatically.

  • ResetPostgresSample, ResetKafkaSample, and ResetFileSample clear all the data created by the other apps, leaving the apps, PostgreSQL tables, Kafka, and SampleOutput directory in their original states.

Running the applications

When Striim, the PostgreSQL instance in Docker, and Kafka are running, you can use the PostgreSQL sample applications. The process is the same for all three sets of applications.

  1. Deploy and start the ValidatePostgres, ValidateKafka, and ValidateFile applications and leave them running.

  2. Deploy and start the InitialLoad application.

    initial_load.png
  3. When you see the alert that initial load has completed, undeploy and stop the InitialLoad application.

  4. Deploy and start the CDC application.

  5. Once the CDC application is running, deploy and start the Execute250Inserts application. It will add 250 rows to the customer table, give you an alert, and stop automatically. The CDC app will replicate the rows to the target.

    insert_alert.png
  6. Run Execute250Updates. It will update a random range of 250 rows in the customer table, give you an alert, and stop automatically. The PostgreSQL CDC app will replicate the changes to the corresponding rows in the customertarget table. PostgresToKafkaCDC will add messages describing the updates to the target topic. PostgresToFileCDC will add entries describing the updates to the files in SampleOutput.

    update_alert.png
  7. Run Execute250Deletes. It will delete the first 250 rows in the customer table, give you an alert containing the , and stop automatically. The PostgreSQL CDC app will delete the corresponding rows in the customertarget table. PostgresToKafkaCDC will add messages describing the deletes to the target topic. PostgresToFileCDC will add entries describing the deletes to the files in SampleOutput.

    delete_alert.png

Verifying PostgreSQL to PostgreSQL replication

To view the results of the load, insert, update, and delete commands in the PostgreSQL target, use any PostgreSQL client to log in to localhost:5432 with username striim and password striim.

Alternatively, you can access virtual machine's command line and run psql:

  1. In a Docker Quickstart, OS X, or Linux terminal, enter:

    docker exec -it striimpostgres /bin/bash
  2. When you see the bash prompt, enter:

    psql -U striim -d webaction

Before running PostgresToPostgresInitialLoad150KRows, the customer table has 150,000 rows and customertarget has none:

webaction=# select count(*) from customer;
 count  
--------
 150000
(1 row)

webaction=# select count(*) from customertarget;
 count 
-------
     0
(1 row)

After running PostgresToPostgresInitialLoad150KRows, customertarget has 150,000 rows:

webaction=# select count(*) from customertarget;
 count  
--------
 150000
(1 row)

After stopping PostgresToPostgresInitialLoad150KRows, starting PostgresToPosgresCDC, and running Execute250Inserts:

webaction=# select count(*) from customertarget;
 count  
--------
 150250
(1 row)

After running Execute250Updates:

webaction=# select * from customer where c_custkey=113981;
 c_custkey |       c_name       |             c_address              | c_nationkey ...    
-----------+--------------------+------------------------------------+------------ ...
    113981 | Customer#000113981 | kpxLWwaZh3DpOr Qudn1OKolRYyIlFshOG |           4 ...
(1 row)

webaction=# select * from customertarget where c_custkey=113981;
 c_custkey |       c_name       |             c_address              | c_nationkey ...    
-----------+--------------------+------------------------------------+------------ ...
    113981 | Customer#000113981 | kpxLWwaZh3DpOr Qudn1OKolRYyIlFshOG |           4 ...
(1 row)

After running Execute250Deletes:

webaction=# select * from customer where c_custkey=1;
 c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment 
-----------+--------+-----------+-------------+---------+-----------+--------------+-----------
(0 rows)

webaction=# select * from customertarget where c_custkey=1;
 c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment 
-----------+--------+-----------+-------------+---------+-----------+--------------+-----------
(0 rows)

Viewing Kafka target data

To see the output of PostgresToKafkaCDC, use Kafka Tool or a similar viewer. The Kafka cluster name is the same as your Striim cluster name. The Kafka version is 0.11.

Viewing file target data

The output of PostgresToFileCDC is in striim/SampleOutput.

Running the applications again at a later time

  1. In a Docker Quickstart, OS X, or Linux terminal, enter:

    docker start striimpostgres
  2. If Striim's internal Kafka instance is not running, start it (see Configuring Kafka).

  3. Deploy and start the ValidatePostgres, ValidateKafka, and ValidateFile applications and leave them running.

  4. Deploy and start the ResetPostgresSample, ResetKafkaSample, and ResetFileSample apps, then when they have completed undeploy them.

Troubleshooting the CDC sample apps

If the apps don't work, or stop working, try running the initialization script or batch file as described in Installing the CDC sample apps in an existing Striim server. If that does not resolve the problem, Contact Striim support.