Striim 3.9.4 / 3.9.5 documentation

Using analytics and regression functions

The com.webaction.analytics classes provide you with several regression functions that enable you to make predictions based on the regression model you have chosen. To get started, begin with the following IMPORT statements in your TQL:

IMPORT STATIC com.webaction.analytics.regressions.LeastSquares.*;
IMPORT STATIC com.webaction.analytics.regressions.LeastSquaresPredictor.*;
IMPORT STATIC com.webaction.analytics.regressions.PredictionResult.*;
IMPORT STATIC com.webaction.analytics.regressions.Helpers.*;

These imports provide you with the following regression functions, along with supporting helper methods. Use these aggregate functions in a CQ that selects from a window, which is used as the training set of the regression algorithm.

Regression Class

Description

SIMPLE_LINEAR_REGRESSION(Object yArg, Object xArg)

Performs simple linear regression using a single independent variable and a single dependent variable.

CONSTRAINED_SIMPLE_LINEAR_REGRESSION(Object yArg, Object xArg)

Performs simple linear regression, constrained so that the fitted line passes through the "rightmost" (i.e. newest) point in the window.

MULTIPLE_LINEAR_REGRESSION(Object yArg, Object... xArgs)

Performs multiple linear regression using multiple independent variables and a single dependent variable.

CONSTRAINED_MULTIPLE_LINEAR_REGRESSION(int numFixedPoints, Object yArg, Object... xArgs)

Performs multiple linear regression, constrained so that the fitted hyperplane passes through the desired number of "rightmost" (i.e. newest) points in the window.

POLYNOMIAL_REGRESSION(int degree, Object yArg, Object xArg)

Performs polynomial regression, creating a nonlinear model using a single independent variable and a single dependent variable.

CONSTRAINED_POLYNOMIAL_REGRESSION(int numFixedPoints, int degree, Object yArg, Object xArg)

Performs polynomial regression, constrained so that the fitted polynomial passes through the desired number of "rightmost" (i.e. newest) points in the window.

The InventoryPredictor example illustrates the usage of these regression analytics functions through the use of a prediction algorithm, which sends alerts when the inventory is predicted to be low. The general approach to using regression in time-series predictions is as follows:

  1. Create a window.

  2. Specify the regression model.

  3. Specify the variable on which predictions are made and the independent variable (in the following example, the timestamp).

  4. Determine whether there are enough points to make a valid prediction.

The syntax is:

SELECT [ISTREAM] [CONSTRAINED_]
  {SIMPLE_LINEAR_REGRESSION | MULTIPLE_LINEAR_REGRESSION | POLYNOMIAL_REGRESSION}
    (properties)
    AS pred,
      CASE IS_PREDICTOR_READY(pred)
        WHEN TRUE THEN PREDICT[_MULTIPLE](properties)
        ELSE ZERO_PREDICTION_RESULT([properties])
    END AS result,
    output1,
    output2,
    [COMPUTE_BOUNDS(properties),]
    [{GREATER|LESS}_THAN_PROBABILITY()], ...

In this first section of the TQL, we import the required com.webaction.analytics classes, create the application (InventoryPredictor), create the inventory stream (InventoryStream), and finally create a two-hour window over that inventory stream (InventoryChangesWindow):

IMPORT STATIC com.webaction.analytics.regressions.LeastSquares.*;
IMPORT STATIC com.webaction.analytics.regressions.LeastSquaresPredictor.*;
IMPORT STATIC com.webaction.analytics.regressions.PredictionResult.*;
IMPORT STATIC com.webaction.analytics.regressions.Helpers.*;

CREATE APPLICATION InventoryPredictor;

CREATE TYPE InventoryType(
  ts org.joda.time.DateTime,
  SKU java.lang.String,
  inventory java.lang.Double,
  location_id java.lang.Integer
);
CREATE STREAM InventoryStream OF InventoryType;
 
CREATE  SOURCE JSONSource USING FileReader (
  directory: 'Samples',
  WildCard: 'inventory.json',
  positionByEOF: false
 )
 PARSE USING JSONParser (
  eventType: 'InventoryType'
 )
OUTPUT TO InventoryStream;

CREATE WINDOW InventoryChangesWindow OVER InventoryStream KEEP WITHIN 2 HOUR PARTITION BY SKU;

In this example, simple linear regression will be chosen to create a model in which the inventory is predicted according to the current system time. We will begin by setting up an AlertStream based on inventory predictions:

CREATE STREAM AlertStream OF Global.AlertEvent;

CREATE CQ AlertOnInventoryPredictions
INSERT INTO AlertStream
. . .

The SELECT statement that follows sets up inventory alert messages based on boolean variables indicating whether the inventory is low (isInventoryLow) or ok (isInventoryOK) based on a 90% confidence interval used in the prediction model:

SELECT 
  "Low Inventory Alert",
  SKU,
  CASE WHEN isInventoryLow THEN "warning" WHEN isInventoryOK THEN "info" END,
  CASE WHEN isInventoryLow THEN "raise" WHEN isInventoryOK THEN "cancel" END,
  CASE WHEN isInventoryLow THEN "Inventory for SKU " + SKU + 
      " is expected to run low within 2 hours (90% confidence)."
    WHEN isInventoryOK THEN "Inventory status for SKU " + SKU +
      " looks OK for the next 2 hours (90% confidence)."
    END ...

The next part of the SELECT statement sets up the simple linear regression model (pred) based on the inventory variable (inventory) and a timestamp variable (ts):

FROM (SELECT ISTREAM
  SKU,
  DNOW() AS ts,
  SIMPLE_LINEAR_REGRESSION(inventory, ts) AS pred, ...

Next we determine whether there are enough points to make a valid prediction by calling the IS_PREDICTOR_READY function. If there are, we compute the probability of the event that the inventory has fallen below (or exceeded) a given critical threshold by calling the LESS_THAN_PROBABILITY (or GREATER_THAN_PROBABILITY) function. If the computed probabilities are higher than 90%, we raise the appropriate flag (isInventoryLow or isInventoryOK).

IS_PREDICTOR_READY(pred) AND LESS_THAN_PROBABILITY(PREDICT(pred,
  DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryLow,
IS_PREDICTOR_READY(pred) AND GREATER_THAN_PROBABILITY(PREDICT(pred, 
  DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryOK, ...

Note that the PREDICT function feeds the data into the prediction model, which returns a PredictionResult object that is subsequently passed to the probability utility functions.

Here is the entire CREATE CQ statement:

CREATE CQ AlertOnInventoryPredictions
INSERT INTO AlertStream
SELECT "Low Inventory Alert",
  SKU,
  CASE WHEN isInventoryLow THEN "warning" WHEN isInventoryOK THEN "info" END,
  CASE WHEN isInventoryLow THEN "raise" WHEN isInventoryOK THEN "cancel" END,
  CASE
  WHEN isInventoryLow THEN "Inventory for SKU " + SKU + 
    " is expected to run low within 2 hours (90% confidence)."
  WHEN isInventoryOK THEN "Inventory status for SKU " + SKU + 
    " looks OK for the next 2 hours (90% confidence)."
  END 
FROM
  (SELECT ISTREAM
    SKU,
    DNOW() AS ts,
    SIMPLE_LINEAR_REGRESSION(inventory, ts) AS pred,
    IS_PREDICTOR_READY(pred) AND LESS_THAN_PROBABILITY(PREDICT(pred,
      DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryLow,
    IS_PREDICTOR_READY(pred) AND GREATER_THAN_PROBABILITY(PREDICT(pred,
      DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryOK,
  FROM InventoryChangesWindow
  GROUP BY SKU 
  HAVING isInventoryLow OR isInventoryOK)
  AS subQuery;

Having created the prediction model, we can send email alerts whenever there is at least 90% confidence that the inventory is low:

CREATE SUBSCRIPTION InventoryEmailAlert USING EmailAdapter (
  smtp_auth: false,
  smtpurl: "smtp.company.com",
  subject: "Low Inventory Alert",
  emailList: "sysadmin@company.com",
  senderEmail:"striim@company.com" 
)
INPUT FROM AlertStream;
END APPLICATION InventoryPredictor;