Striim 3.9.4 / 3.9.5 documentation

ADLSGen2Writer

Writes to files in an Azure Data Lake Storage Gen2 file system. A common use case is to write data from on-premise sources to an ADLS staging area from which it can be consumed by Azure-based analytics tools.

When you create the Gen2 storage account, set Storage account kind to StorageV2 and enable Hierarchical namespace.

property

type

default value

notes

account name

java. lang. string

the storage account name

compression type

java. lang. string

Set to gzip when the input is in gzip format. Otherwise, leave blank.

directory

java. lang. String

The full path to the directory in which to write the files. See Setting output names and rollover / upload policies for advanced options.

file name

java. lang. String 

The base name of the files to be written. See Setting output names and rollover / upload policies.

file system name

java. lang. String

the ADLS Gen2 file system where the files will be written

rollover on ddl

java. lang. Boolean

True

Has effect only when the input stream is the output stream of a MySQLReader or OracleReader source. With the default value of True, rolls over to a new file when a DDL event is received. Set to False to keep writing to the same file.

SAS token

com. webaction. security. Password

The SAS token for a shared access signature for the storage account. Allowed services must include Blob, allowed resource types must include Object, and allowed permissions must include Write and Create. Remove the ? from the beginning of the SAS token.

Note that SAS tokens have an expiration date. See Best practices when using SAS.

upload policy

java. lang. String

eventcount:10000, interval:5m

See Setting output names and rollover / upload policies. Keep these settings low enough that individual uploads do not exceed the underlying Microsoft REST API's limit of 100 MB for a single operation.

For best performance, Microsoft recommends uploads between 4 and 16 MB. Setting UploadPolicy to filesize:16M will accomplish that. However, if there is a long gap between events, this will mean some events will not be written to ADLS for some time. For example, if Striim receives events only during working hours, the last events received at the end of the day on Friday would not be written until Monday morning.

When the app is stopped, any remaining data in the upload buffer is discarded.

This adapter has a choice of formatters. See Supported writer-formatter combinations for more information.

Sample application:

CREATE APPLICATION ADLSGen2Test;

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview.csv',
  directory: 'Samples/PosApp/appData',
  positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false )
OUTPUT TO PosSource_Stream;

CREATE CQ PosSource_Stream_CQ
INSERT INTO PosSource_TransformedStream
SELECT TO_STRING(data[1]) AS MerchantId,
  TO_DATE(data[4]) AS DateTime,
  TO_DOUBLE(data[7]) AS AuthAmount,
  TO_STRING(data[9]) AS Zip
FROM PosSource_Stream;

CREATE TARGET ADLSGen2Target USING ADLSGen2Writer (
  accountname:'mystorageaccount',
  sastoken:'********************************************',
  filesystemname:'myfilesystem',
  directory:'mydir',
  filename:'myfile.json',
  uploadpolicy: 'interval:15s'
)
FORMAT USING JSONFormatter ()
INPUT FROM PosSource_TransformedStream;

END APPLICATION ADLSGen2Test;