#### Using analytics and regression functions

The `com.webaction.analytics`

classes provide you with several regression functions that enable you to make predictions based on the regression model you have chosen. To get started, begin with the following IMPORT statements in your TQL:

IMPORT STATIC com.webaction.analytics.regressions.LeastSquares.*; IMPORT STATIC com.webaction.analytics.regressions.LeastSquaresPredictor.*; IMPORT STATIC com.webaction.analytics.regressions.PredictionResult.*; IMPORT STATIC com.webaction.analytics.regressions.Helpers.*;

These imports provide you with the following regression functions, along with supporting helper methods. Use these aggregate functions in a CQ that selects from a window, which is used as the training set of the regression algorithm.

Regression Class | Description |
---|---|

| Performs simple linear regression using a single independent variable and a single dependent variable. |

| Performs simple linear regression, constrained so that the fitted line passes through the "rightmost" (i.e. newest) point in the window. |

| Performs multiple linear regression using multiple independent variables and a single dependent variable. |

| Performs multiple linear regression, constrained so that the fitted hyperplane passes through the desired number of "rightmost" (i.e. newest) points in the window. |

| Performs polynomial regression, creating a nonlinear model using a single independent variable and a single dependent variable. |

| Performs polynomial regression, constrained so that the fitted polynomial passes through the desired number of "rightmost" (i.e. newest) points in the window. |

The InventoryPredictor example illustrates the usage of these regression analytics functions through the use of a prediction algorithm, which sends alerts when the inventory is predicted to be low. The general approach to using regression in time-series predictions is as follows:

Create a window.

Specify the regression model.

Specify the variable on which predictions are made and the independent variable (in the following example, the timestamp).

Determine whether there are enough points to make a valid prediction.

The syntax is:

SELECT [ISTREAM] [CONSTRAINED_] {SIMPLE_LINEAR_REGRESSION | MULTIPLE_LINEAR_REGRESSION | POLYNOMIAL_REGRESSION} (properties) AS pred, CASE IS_PREDICTOR_READY(pred) WHEN TRUE THEN PREDICT[_MULTIPLE](properties) ELSE ZERO_PREDICTION_RESULT([properties]) END AS result, output1, output2, [COMPUTE_BOUNDS(properties),] [{GREATER|LESS}_THAN_PROBABILITY()], ...

In this first section of the TQL, we import the required `com.webaction.analytics`

classes, create the application (`InventoryPredictor`

), create the inventory stream (`InventoryStream`

), and finally create a two-hour window over that inventory stream (`InventoryChangesWindow`

):

IMPORT STATIC com.webaction.analytics.regressions.LeastSquares.*; IMPORT STATIC com.webaction.analytics.regressions.LeastSquaresPredictor.*; IMPORT STATIC com.webaction.analytics.regressions.PredictionResult.*; IMPORT STATIC com.webaction.analytics.regressions.Helpers.*; CREATE APPLICATION InventoryPredictor; CREATE TYPE InventoryType( ts org.joda.time.DateTime, SKU java.lang.String, inventory java.lang.Double, location_id java.lang.Integer ); CREATE STREAM InventoryStream OF InventoryType; CREATE SOURCE JSONSource USING FileReader ( directory: 'Samples', WildCard: 'inventory.json', positionByEOF: false ) PARSE USING JSONParser ( eventType: 'InventoryType' ) OUTPUT TO InventoryStream; CREATE WINDOW InventoryChangesWindow OVER InventoryStream KEEP WITHIN 2 HOUR PARTITION BY SKU;

In this example, simple linear regression will be chosen to create a model in which the inventory is predicted according to the current system time. We will begin by setting up an `AlertStream`

based on inventory predictions:

CREATE STREAM AlertStream OF Global.AlertEvent; CREATE CQ AlertOnInventoryPredictions INSERT INTO AlertStream . . .

The `SELECT`

statement that follows sets up inventory alert messages based on boolean variables indicating whether the inventory is low (`isInventoryLow`

) or ok (`isInventoryOK`

) based on a 90% confidence interval used in the prediction model:

SELECT "Low Inventory Alert", SKU, CASE WHEN isInventoryLow THEN "warning" WHEN isInventoryOK THEN "info" END, CASE WHEN isInventoryLow THEN "raise" WHEN isInventoryOK THEN "cancel" END, CASE WHEN isInventoryLow THEN "Inventory for SKU " + SKU + " is expected to run low within 2 hours (90% confidence)." WHEN isInventoryOK THEN "Inventory status for SKU " + SKU + " looks OK for the next 2 hours (90% confidence)." END ...

The next part of the `SELECT`

statement sets up the simple linear regression model (`pred`

) based on the inventory variable (`inventory`

) and a timestamp variable (`ts`

):

FROM (SELECT ISTREAM SKU, DNOW() AS ts, SIMPLE_LINEAR_REGRESSION(inventory, ts) AS pred, ...

Next we determine whether there are enough points to make a valid prediction by calling the `IS_PREDICTOR_READY`

function. If there are, we compute the probability of the event that the inventory has fallen below (or exceeded) a given critical threshold by calling the `LESS_THAN_PROBABILITY`

(or `GREATER_THAN_PROBABILITY`

) function. If the computed probabilities are higher than 90%, we raise the appropriate flag (`isInventoryLow`

or `isInventoryOK`

).

IS_PREDICTOR_READY(pred) AND LESS_THAN_PROBABILITY(PREDICT(pred, DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryLow, IS_PREDICTOR_READY(pred) AND GREATER_THAN_PROBABILITY(PREDICT(pred, DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryOK, ...

Note that the `PREDICT`

function feeds the data into the prediction model, which returns a `PredictionResult`

object that is subsequently passed to the probability utility functions.

Here is the entire `CREATE CQ`

statement:

CREATE CQ AlertOnInventoryPredictions INSERT INTO AlertStream SELECT "Low Inventory Alert", SKU, CASE WHEN isInventoryLow THEN "warning" WHEN isInventoryOK THEN "info" END, CASE WHEN isInventoryLow THEN "raise" WHEN isInventoryOK THEN "cancel" END, CASE WHEN isInventoryLow THEN "Inventory for SKU " + SKU + " is expected to run low within 2 hours (90% confidence)." WHEN isInventoryOK THEN "Inventory status for SKU " + SKU + " looks OK for the next 2 hours (90% confidence)." END FROM (SELECT ISTREAM SKU, DNOW() AS ts, SIMPLE_LINEAR_REGRESSION(inventory, ts) AS pred, IS_PREDICTOR_READY(pred) AND LESS_THAN_PROBABILITY(PREDICT(pred, DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryLow, IS_PREDICTOR_READY(pred) AND GREATER_THAN_PROBABILITY(PREDICT(pred, DADD(ts, DHOURS(2))), 5) > 0.9 AS isInventoryOK, FROM InventoryChangesWindow GROUP BY SKU HAVING isInventoryLow OR isInventoryOK) AS subQuery;

Having created the prediction model, we can send email alerts whenever there is at least 90% confidence that the inventory is low:

CREATE SUBSCRIPTION InventoryEmailAlert USING EmailAdapter ( smtp_auth: false, smtpurl: "smtp.company.com", subject: "Low Inventory Alert", emailList: "sysadmin@company.com", senderEmail:"striim@company.com" ) INPUT FROM AlertStream; END APPLICATION InventoryPredictor;