Thursday, January 10, 2019

Data Analytics

Data Analytics - Data analytics application is a state of the art MultiVariate Statistical Analysis (MVSA) and neural network tool that supports the creation of statistical models expressly designed for the purpose of assisting in abnormal situations. The tool enables the interactive creation of statistical/neural models that facilitate early warnings for situations that appear abnormal in a statistical sense. These models can be used stand alone or in conjunction with any appropriate DA components. Data analytics supports data analysis and model synthesis for multivariable processes. It has been designed to operate in an intuitive interactive fashion.


Models
Four model types are defined for Data analytics applications. They are:

  • PCA Models   – Principal Component Analysis (PCA) models are the statistical engine of the EED application.
  • Fuzzy Models Fuzzy Logic models are used as post processors to normalize and shape and tune the results of the PCA models.
  • DA Models     – This model type is really only a shell and is used to wrap the PCA and Fuzzy models.
  • Neural Models – Neural network models are generic models that use a Stochastic Gradient Decent (SGD) solver. The models used here incorporate an activation function similar to RELU function but has a continuous derivative and no zero outputs and is sometimes referred to as a rectifier. Capability also exists for an SGD solution using a Long-Term-Short-Term (LSTM) formulation
DA Model performance is given in terms of Cumulative/Incremental Variance, Eigenvector, Q and Q limit, T2 and T2 limit, Score Plots (confidence ellipse), Worst Actors, Key Contributors, Prediction errors and Event prediction. Non-stationary noisy data is handled by special filters.



In the preceding screen capture, the evolution of the scores as a function of time is plotted against the first two principal components. The confidence ellipse represents an area of acceptable behavior. Green points represent data from the training set while blue points represent fresh data never seen by the model. In addition scores may also be displayed in higher dimension spaces (hypercube) as shown below.



Predicting EventsIn the event or indicator plot shown below, X-axis represents data indexes and Y-axis represents indicator value. The brown trend plot is the trend for the indicator values calculated from the PCA model for the selected data ranges for the fuzzy model. The user entered events are shown in green boxes and predicted events are shown in red boxes. The fuzzy lower limit, fuzzy upper limit and final fuzzy limit are shown in purple lines. The blue line is for the QLimit if the indicator type is QResidual or T2Limit if the indicator type is T2Residual. The light blue trend is for fuzzy output values by applying fuzzy member function to the indicator values. The fuzzy output values are between 0 and 1 and the fuzzy output trend has been normalized according to the original indicator values




Neural Network Models - These models can be used either alone or in conjunction with other models. An important feature of these models is the activation function. The activation function and its derivative are shown in the following figure.



The functions are completely defined in terms of the activation length (in this case 15) and the activation ratio (in this case .2). The neural net is defined in terms of six parameters; network dimensions, learning rate, number of epochs, number of mini-batches, activation ratio and the activation length. Application to the well known benchmark MNIST data is shown below.

MNIST Benchmark data

The setup of the neural network model for the MNIST data set is shown below. The data is composed of 60,000 elements. Of these elements 50,000 are used for training and 10,000 for test validation. Each element is defined in terms of a matrix of 28x28 pixels representing numbers 0 through 9.


As shown, the network is composed of an input layer of 784 neurons (to accommodate the 28x28 pixels of each number), an output layer of 10 neurons to represent the numbers o through 9, and two arbitrary hidden layers both of which use 150 neurons. The activation parameters correspond to the rectifier shown above. The number of mini-batches is 10 and the learning rate 3.0. With this very simple setup it is easy to obtain accuracies of greater than 98% for this data set. The solution at each epoch is shown in the following illustration. The first value represents the number correct predictions while the second is the total number of the validation data. The highest efficiency is stored as the final value.



A few selected predictions are given in the following figures. It can be seen that the hand written numbers can be rather difficult to identify correctly. The hand written numerals can vary dramatically.

The first figure is the numeral 3.


The next two figures are the numeral 4.



The next four figures are the all the numeral 5.





And finally, the next two figures are the numeral 8 followed by a figure of the numeral 9.





Moons Benchmark data

This set of data is available in the open literature and shows the ability to classification tasks at least on a toy set of data. In the following figure we can see that there are two classes of data represented by the two different colored circles


The next figure shows a contour map of the neural net classification. Here we can see that the classification missed only point

Another classification problem with three different different classes is shown below.
Finally, the contour map for this data shows a high quality classification


Tuesday, January 8, 2019

Soft Sensing

Soft SensingSoft sensing can be thought of as a regression tool whose main task is to synthesize static/dynamic linear/nonlinear models from plant data. Here it is assumed that the inputs are potentially cross-correlated and quite possibly degenerate. As such a host of tools are provided to obtain practical models under these conditions. A variety of different models are supported and the intent is to add new model types and analysis tools as appropriate. In addition to dealing explicitly with correlated inputs, the regression tools provide both statistics and metrics to provide a meaningful interpretation of model quality. This information can be used to compare and contrast different model and input types.
 
Principal component analysis 
used to select dimension of the input sub-space

Models
  1. OLSOrdinary Least Squares. Conventional least squares models.
  2. WLSWeighted Least Squares or more commonly called Robust Regression. These models are desensitized to outliers in the data. While the robust regression models require a nonlinear solution, the final model is in fact linear.
  3. PLSPartial Least Squares. PLS models can be either linear or nonlinear depending on the selection of the subspace regression. A polynomial model is assumed for the subspace regression. Its order is user selectable.
  4. DSSDynamic Sub-Space. If dynamics are desirable in the inferential calculation, then DSS models can be used. Model structure, delay and order are determined automatically. All models are ranked and displayed in LaPlace domain form. Since a sub-space solution is used, the user specifies the desired level of captured variance in the solution. The dimension of the sub-space is then automatically determined using an SVD factorization.
  5. UES – Nonlinear User Entered System of Equations. The UES model provides a solution to a nonlinear regression problem with constraints. The user can define the nonlinear relationship and the input variables of interest.


Monday, January 7, 2019

Control

Control -  Multivariate control is focused on an active set method approach for solving the dynamic QP problem. This approach utilizes a novel method for calculating the null space and subsequent Lagrangian multipliers. Although it is applicable to conventional MPC formulations, the target MPC solution is an alternate formulation in which SS targets are not required for the dynamic solution. In addition the formulation also deals directly with both numerical ill-conditioning and more importantly the problem of ill-conditioned plants.

Control Profile 


Control Law Formulation with Constraints

Control Performance


Saturday, January 5, 2019

Overview

J. W. M. Consultants can provide help and guidance in the areas such as; Identification, Control, Optimization, Soft sensing, Data analysis as well as data analytics. In this overview post we start with a high level description of the identification approach. This will be followed by subsequent blogs describing the remaining key elements of our expertise.

  • Identification - Practical application in open and closed-loop identification. Motivation for the method used here has been driven primarily by commercial needs. The formulation has been shaped by more than ten years of practical experience as a practitioner and provider of commercial identification and control software. While recent academic advances have influenced this formulation, the plethora of academic approaches can be counter intuitive as many do not map conveniently into a practical formulation due to underlying assumptions that are unrealistic in practical applications
General closed-loop block diagram

Generic process model structure used to define
both direct and indirect sub-structures

Generic solution for linear and nonlinear systems

Prediction under non-stationary operation (red-model, green process)

5x4 Closed loop data

Model Matrix error - shaded region is difference between actual and identified model using closed-loop data shown above