Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Prepared By: Eyehub Partner National Physical Laboratory Date: March 2014
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
Table of Contents
Introduction ........................................................................................................................... 3 Inference and decision-making in the presence of variation and uncertainty ......................... 4 The metrology paradigm: traceability, calibration and uncertainty.......................................... 5 Degrees of freedom, (effective) redundancy and resilience ................................................... 6 Model uncertainty and statistical learning .............................................................................. 7 Information integrity ............................................................................................................ 10 References ......................................................................................................................... 11
Page |2
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
Introduction
The internet of things (IoT) concept, as interpreted within the project, has two main building blocks, i) many cheap sensors measuring a number of variables in the ecosystem and returning digital representations of the estimated measured variables, and ii) the availability of the measured data in a common internet platform. Anyone who wants to know something about the ecosystem can then interrogate the data, using data analysis algorithms to extract the relevant information from the available data. Often, the required information must be assembled from a number of data streams associated with different system variables, rather than a single variable. Hence, the importance of having all the data streams available in a common platform. The term data transformation converting raw data into usable information. Data transformation includes data curation, i.e., storing and making available the data, data assimilation, i.e., updating the current estimates of the system using new data, data interpretation, data visualisation,data fusion,and data modelling and simulation techniques. By usable information, we mean information on the basis of which inferences and decisions can be made. Usually, data transformation operates at a number of levels. At each level, inputs are aggregated and the relevant information summarised to form inputs to the next level. The goal of the data transformation is to convert the raw input data into information or knowledge to be acted upon. The data transformation process, involving data analysis and inference algorithms, hopefully adds value and the outputs of the data transformation process are often referred to as information products reflecting this added value. A weather forecast can be thought of as an information product assembled from observational data and weather models. If data is being transformed to information, what should we be able to say about the fidelity of the transformation process? If the algorithms applied in the transformation process are based on incorrect models of the underlying ecosystem, then it is quite possible that valid data is being transformed into invalid information. This document looks at issues associated with making sure the data transformation process produces valid information, information that can be assessed and acted upon on the basis of risks that can be quantified. We aim for a decision making process that can be defended (if necessary, in court) so that peers examining a decision will be able to say that the information used to make the decision was well founded and that others would of made the same decision on the basis of the evidence that was available (or indeed not available).
Page |3
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
Page |4
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
uncertainties associated with climate models, increase the risk with decisions associated with climate change mitigation might not bring about the desired amelioration.
Page |5
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
at each stage, the uncertainty contributions are reasonably well understood. Furthermore, the validity of the processes are constantly tested using inter-laboratory comparisons (ILCs) in which different laboratories measure the same artefacts and compare their results. The Mutual Recognition Arrangement (MRA) [5] uses ILCs, usually referred to as key comparisons, to compare the capabilities of National Metrology Institutes (NMIs) such as NPL and provide a mechanism by which measurements made in one country can be recognised in another.
Page |6
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
Many of the challenges associated with environmental monitoring, smart infrastructure and structural health monitoring necessarily involve measurements outside the laboratory which are subject to far greater influence factors that are usually only partially understood. These systems are characterised by a high number of degrees of freedom, the exact number of which might be very difficult to estimate and may indeed evolve over time. However, if we have a large number of measurements of variables associated with the system, we may hope that we have sufficient information to estimate the state of the ecosystem. If the system has, say, 200 degrees of freedom then we will need at least 200 measurements (or other information items) in order to estimate the state of the system. We use the term redundancy to describe the situation where there are more measurements than degrees of freedom (unknown parameters). In this situation, some of the measurements will be redundant in the sense that the state of the system can be estimated from a subset of the parameters. However, it will usually be better to use all the measurements to estimate the system parameters so that, e.g., measurement noise is averaged out. In order for this averaging process to be effective it is necessary to have an estimate of the uncertainties associated with each data point so that each piece of information is given an appropriate weight. Similarly, in integrating data from a number of data streams, it is necessary to be able to weight each data stream appropriately. Without these uncertainty estimates, it is quite possible that a weighted, averaging process would lead to worse estimates than those based on a subset of the data. In cases where some of the sources are providing rogue data, it is important (but often difficult) to identify these rogue sources and eliminate or de-weight them so that they do not compromise the averaging process. A simple equating of the number of sensor measurements with the (estimated) number of degrees of freedom of the ecosystem is not sufficient to show that all parameters of the system can be estimated since it is possible (and in fact quite likely) that some the measurements made are providing information that replicates information already available from other measurements or information sources, leaving other aspects of the system unresolved. We use the term effective redundancy to describe the degree to which parameters are estimated on the basis of more that one data point. A sensitivity analysis can be used to determine the contribution of each measurement source to the estimation of each parameter specifying the state of the system, and to evaluate the effective redundancy. We use the term resilience to denote the ability of the data transformation process to detect and account for rogue sources of data. In a model fitting context [2] the term robust estimation is also used [32, 33]. If an estimate of a parameter value depends on only one data point, then there is no corroborative information available to validate this data point and if the data point is spurious, there would be no way of knowing and inferences based on such data could well be wrong. The more effective redundancy, the more scope there is for detecting rogue data and building resilient systems.
Page |7
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
models. For complex ecosystems, the uncertainties associated with the model of the system is large initially, but can be reduced as more observations about the system is gathered and analysed. Traditional uncertainty evaluation methodologies such as the GUM [7] do not cope well with the concept of uncertainty associated with a model. Their main focus is on how uncertainty associated with measured data propagates through to parameters derived from the data for a fixed, known model of the system. So how can we account for model uncertainty? Consider the following example. The response y of a system to a stimulus variable x is expected to be approximately linear so a response model of the form y=a1+a2x could be appropriate. However, it is also suspected that some systematic effects present in the system could lead to quadratic or even higher order responses. Thus we may wish to consider modelling the response in terms of a higher order polynomial, y=a1+a2x++anxn1, for some n. Given response data (xi,yi), i=1,,m, we could fit in turn each polynomial for n=2,,m and choose the n that provides the best fit to the data. This approach is referred to as model selection: from a number of competing models choose the best one on the basis of the evidence supplied by the data [15, 37, 54]. The selection of the best model is usually based on criteria [1, 34, 47] that attempt to balance the goodness of fit to the data against finding the most economical acceptable fit to the data, i.e., one that minimises the complexity of the model, often related to the degrees of freedom (number of free parameters) associated with the model. In practical terms, the measured data will not be exact and we will be satisfied with a model fit that passes close enough to the data so that the difference between the model fit and the measured data can be easily accounted for in terms of measurement noise. In the example of fitting a polynomial to data we are concerned with explaining the response in terms of basis functions xj of the stimulus variable x. For many complex systems, we wish to explain a response y in terms of functions fj(x1,,xp) of possibly a large number p of stimulus variables, e.g., p y=a0+ ajxj, j=1 involving linear functions of the stimulus variables xj, but mixed components such as xjxk or higher order terms might also be included. Algorithms such as principle components analysis (PCA) and partial least squares (PLS) attempt to model the data involving a reduced number of linear combinations of (functions of) the variables xj ; see, e.g., [28]. These combinations will generally involve all the variables which means that all have to taken into account. In many cases, we wish to provide an adequate description of the behaviour of the system in terms of few of the variables as possible (ideally those determining the degrees of freedom of the system), an approach referred to as subset selection For even a modest number of variables, testing all possible subsets can very quickly become computationally infeasible. The LASSO algorithm (least absolute shrinkage and selection operator [53]) arrives as a subset by minimising a function of the form
Page |8
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
2 |aj|. F(a,)= yi ajfj(x1,,xp) + j i j The first summand provides a measure of the goodness of fit of the model to the data while the second summand is a penalty term that penalises nonzero coefficients aj. The size of the penalty is controlled by . At the solution of this optimisation problem, only those terms with nonzero coefficients aj are included in the model. A related algorithm is the least angle regression algorithm (LARS, [24]) that involves a stepwise procedure in which the functions fj(x1,,xp) are included into the model, one by one, until a suitable fit to the data is achieved. The LARS algorithm is remarkably efficient and can be tuned to give the LASSO solution [28]. Subset selection algorithms are important in determining key variables that need to be measured in order to understand the response behaviour. Model selection algorithms can be used to take into account the possibility of rogue sensors. Suppose there are m sensors providing data yi and associated uncertainties ui. The initial model M0 may be that all the sensors are working correctly so that ui is a valid statement of the uncertainty associated with yi . The next category of models, Error! , i=1,,m, is such that Mi regards the ith sensor as unreliable so that ui=. Additional models Mij, etc., may be put forward. The selected model will indicate which sensor measurements should be included in the analysis. In the metrology context, such approaches, using a computationally efficient scheme, have been applied to the analysis of inter-laboratory comparison data [20] and sensor network data [17]. After the best model has been chosen using model selection, then standard uncertainty calculations can be applied using this model. The approach regards the accepted model as exact and therefore contributing no uncertainty to derived information, but the uncertainty associated with the model has been at least partially accounted for by the fact that alternative models have been considered and only rejected because the data does not support sufficiently these models. If the data provides strong evidence that one model is to preferred above all others, this approach is effective and valid. What happens if the data is consistent with more than one of the proposed models? The model averaging approach [31, 43] does what it says, produces a model that averages across all the proposed models using a weighting scheme that reflects the degree of support for each model provided by the data. Model averaging agrees with model selection if only one weight is non-zero, corresponding to the selected model. Model averaging is most easily interpreted in a Bayesian framework [3, 4, 13, 27, 50] in which we give a prior estimate of the belief in each possible model and use the data to update these beliefs. The model predictions are then derived from an averaging process using the posterior beliefs, i.e., the beliefs derived after the data has been analysed. The space of all potential models is parameterized by additional hyper-parameters that are given a prior probability distribution and this distribution is updated as more data becomes available. The term statistical learning (other terms used similarly are data mining machine learning [12, 28, 29, 38, 44]) is used to denote analysis methods that build as much as possible on the observed data rather than on prior models of how the system is expected to
Page |9
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
behave. The term data assimilation refers to the cyclical updating of the information on the basis of observations, e.g., the daily updating of weather forecasts, for systems that evolve over time. The state of the system is estimate on the basis of two sources of information, firstly, a model that predicts the current state on the basis of information from the past and, secondly, observations made of the current state. The predictions and the observations are then combined to form a best estimate of the current state and, if necessary, form a prediction for the next time step. In order to combine the two sources of information effectively, it is necessary to have estimates of the uncertainties associated with each source of information. The Kalman filter [36, 51] is an early example of this type of approach and it and its generalisations, have proved effective in a wide range of applications, e.g., [14, 25, 46, 52]. If the underlying system changes, e.g., through the addition of a new subsystem or the failure of another subsystem then the observed data will likely not be consistent with the current description of the model. If this is detected then the belief in the current model should be reduced so that the new response behaviour can be learnt. Change point or tipping point analysis is concerned with detecting qualitative changes in a system response. Much of statistical process control is about the early detection [41, 45, 49] of a change in the behaviour of the manufacturing process. The word learning reflects that fact that as more data is gathered and analysed the range of possible models that are consistent with data is reduced so that over time a better understanding of how the ecosystem behaves is developed and used to make better predictions, inferences and decisions. Model selection can be regarded as a learning process: the model to be used has been derived on the basis of information learnt from the data. Model averaging is a learning process: the prior belief in the potential models is updated on the basis of the data as it becomes available.
Information integrity
We use the term information integrity to reflect the degree of belief or trust we have in the information and how it was derived. Integrity is comprised of the following components: Uncertainty quantification. The data transformation process will generally involve assembling information from different data sources. Uncertainties estimates at each stage are required in order that each data stream is given an appropriate weight. The uncertainties at each stage will propagate through to the uncertainties associated with the output information. However, the uncertainties associated with the output information should also take into account uncertainties associated with the models used in the data transformation, and uncertainties associated with prior assumptions, not just the uncertainties associated with measured data. Validation and self-validation. The data transformation processes will be based on models of the ecosystem. It is necessary to show that these models do reflect the actual behaviour of the system, e.g., by showing that, on the basis of a subset of the data, the model can estimate, to within the stated uncertainties, the response of the system as monitored by other data streams. Self-validation refers to the ability of data transformation process to detect when the underlying model of the system fails to
P a g e | 10
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
apply. If insufficient information is available, it is possible that unresolved degrees of freedom associated with the system can invalidate inferences derived from the output information. The system should be able to diagnose common failure modes and ideally have sufficient redundancy to be able to detect less common or unforeseen failure modes. Documentation. In order to justify a decision, it is necessary to understand the information used to make the decision and how it was arrived at. The input data will likely be one component, but it is also necessary to document the models that form the basis of the data transformation process and the evidence available that these models are appropriate for the ecosystem at hand. We may regard the measurement infrastructure as a high-integrity system. The components of the system are well understood (measurement instruments are designed according to accepted physical theory), the analysis of the data associated with system is undertaking using known methods, such as those described in the GUM and its supplements [8, 9], all stages are documented and the documentation is available (calibration certificates) and components in the system are regularly tested (periodic re-calibrations, inter-laboratory comparisons) to ensure that they are operating as expected. The measurements made by a supermarket may not be particularly accurate but we can be largely confident that they are in accordance their agreed operating performance on the basis of data with known provenance. Statistical process control can lead to high integrity systems: state-of-the-art manufacturing systems can deliver defect rates measured in parts per million. Information integrity can be compromised in many ways: invalid data, the use of models that do not accord with the actual behaviour of the ecosystem, etc. Documentation is required to assess the integrity of information arising from a data transformation process. Often the raw data is easy to store (although the sheer volume may be problematic). What is more difficult to store is the metadata the provides the correct context for the data, e.g., the calibration history of the sensors providing the data, the location of the sensors, what is happening in the immediate environment of the sensors. Even more difficult to document properly are the models and algorithms that form the basis of the data transformation process. If the data transformation process is regarded as a black box, then the integrity associated with the outputs of the black box are very difficult to assess, even if all the inputs are known to be valid.
References
[1] H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716723, 1974. [2] R. M. Barker, M. G. Cox, A. B. Forbes, and P. M. Harris. Software Support for Metrology Best Practice Guide no. 4: Modelling Discrete Data and Experimental Data Analysis. Technical Report DEM-ES 018, National Physical Laboratory, Teddington, 2007. [3] J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 2nd edition, 1985.
P a g e | 11
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
[4] J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, New York, 1994. [5] BIPM. Mutual recognition of national measurement standards and of calibration and measurement certificates issued by national metrology institutes. Technical report, Bureau International des Poids et Mesures, Svres, France, 1999. [6] BIPM. The International System of Units (SI). Bureau International des Poids et Mesures, Paris, 8th edition, 2005. [7] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Guide to the expression of uncertainty in measurement. Joint Committee for Guides in Metrology, JCGM 100:2008. [8] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Supplement 1 to the Guide to the expression of uncertainty in measurement Propagation of distributions using a Monte Carlo method. Joint Committee for Guides in Metrology, JCGM 101:2008. [9] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Supplement 2 to the Guide to the expression of uncertainty in measurement extension to any number of output quantities. Joint Committee for Guides in Metrology, JCGM 102:2011. [10] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. International vocabulary of metrology basic and general concepts and associated terms,. Joint Committee for Guides in Metrology, JCGM 200, 2008. [11] R. T. Birge. Probable values of the general physical constants. Rev. Mod. Phys, 1:173, 1929. [12] C. M. Bishop. Neural networks and pattern recognition. Oxford Univeristy Press, 1995. [13] G. E. P. Box and G. C. Tiao. Bayesian inference in statistical analysis. Wiley, New York, Wiley Classics Library Edition 1992 edition, 1973. [14] R. G. Brown and P. Y. C. Hwang. Introduction to Random Signals and Applied Kalman Filtering. Wiley, New York, 3rd edition, 1997. [15] H. Chipman, E. I. George, and R. E. McCulloch. The practical implementation of Bayesian model selection. Institute of Mathematical Statistics, Beachwood, Ohio, 2001. [16] A. G. Chunovkina, C. Elster, I. Lira, and W. Wger. Analysis of key comparison data and laboratory biases. Metrologia, 45(2):211216, 2008. [17] M. A. Collett, M. G. Cox, M. Duta, T. J. Esward, P. M. Harris, and M. Henry. The application of self-validation to wireless sensor networks. Meas. Sci. Technol., 19, 2008. [18] M. G. Cox. A discussion of approaches for determining a reference value in the analysis of key-comparison data. In P. Ciarlini, A. B. Forbes, F. Pavese, and D. Richter, editors, Advanced Mathematical Tools in Metrology, IV, pages 4565, Singapore, 2000. World Scientific. [19] M. G. Cox, A. B. Forbes, J. Flowers, and P. M. Harris. Least squares adjustment in the presence of discrepant data. In P. Ciarlini, M. G. Cox, F. Pavese, and G. B. Rossi, editors, Advanced Mathematical and Computational Tools in Metrology VI, pages 3751, Singapore, 2004. World Scientific. [20] M.G. Cox. The evaluation of key comparison data: determining the largest consistent subset. Metrologia, 44(3):187200, 2007. [21] W. E. Deming. On probability as a basis for action. The American Statistician, 29(4):146152, 1975.
P a g e | 12
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
[22] T. Doiron and J Beers. The gauge block handbook. Technical Report Monograph 180, National Institute of Science and Technology, Gaithersburg, 2005. [23] J. A. Eddy. The Mauder minimum. Science, 192(4245):11892102, 1976. [24] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407499, 2004. [25] Geir Evensen. Data Assimilation: The Ensemble Kalman Filter. Springer, 2009. [26] A. B. Forbes and C. Perruchet. Measurement systems analysis: concepts and computational approaches. In IMEKO World Congress, September 1822, 2006, Rio de Janeiro, 2006. [27] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, Fl., second edition, 2004. [28] T. Hastie, R. Tibshirani, and J. Friedman. Elements of Statistical Learning. Springer, New York, 2nd edition, 2011. [29] S. Haykin. Neural Networks: A Comprehensive Foundation. Macmillan, New York, second edition, 1999. [30] J. D. Hays, J. Imbrie, and N. J. Shackleton. Variations in the earths orbit: pacemaker of the ice ages. Science, 194(4270):11211132, 1976. [31] J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: a tutorial. Statistical Science, 14:382401, 1999. [32] P. J. Huber. Robust estimation of a location parameter. Ann. Math. Stat., 35:73101, 1964. [33] P. J. Huber. Robust Statistics. Wiley, New York, 1980. [34] C. M. Hurvich and C. Tsai. Regression and time series model selection in sample samples. Biometrika, 76:297307, 1989. [35] International Organization for Standardization, Geneva. ISO 11462: Guidelines for implementation of statistical process control (SPC) Part 1: Elements of SPC, 2001. [36] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Engr., pages 3545, 1960. [37] H. Linhart and W. Zucchini. Model Selection. Wiley, New York, 1986. [38] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. [39] D. C. Montgomery. Design and analysis of experiments. John Wiley & Sons, New York, 5th edition, 1997. [40] R. C. Paule and J. Mandel. Consensus values and weighting factors. J Res. Natl. Bur. Stand., 87:377385, 1982. [41] M. Pollak. Optimal detection of a change in distribution. Annals of Statistics, 13(1):206227, 1985. [42] F. Pukelsheim. Optimal design of experiments. SIAM, Philadelphia, 2006. Reproduction of 1993 book published by John Wiley and Sons, New York. [43] A. E. Raftery, D. Madigan, and J. A. Hoeting. Bayesian model averaging for linear regression. Journal of the American Statistical Association, 92:179191, 1997. [44] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, Mass., 2006. [45] S. Roberts. A comparison of some control chart procedures. Technometrics, 8(3):411430, 1966. [46] S. K. Sahu and K. V. Mardia. A Bayesian kriged Kalman model for short-term forecasting of air pollution levels. Applied Statistics, 54:223244, 2005.
P a g e | 13
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
[47] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461 464, 1978. [48] W. A. Shewhart. Economic control of quality of manufactured product. D. Van Nostrand Company, New York, 1931. [49] A. N. Shiryaev. On optimum methods for quickest detection problems. Theory of Probability and Its Applications, 8:2246, 1963. [50] D. S. Sivia. Data Analysis: a Bayesian Tutorial. Clarendon Press, Oxford, 1996. [51] H. W. Sorenson. Least-squares estimation: from gauss to kalman. IEEE Spectrum, 7:6368, July 1970. [52] H. W. Sorenson, editor. Kalman filtering: theory and application. IEEE, New York, 1985. [53] R. Tibsharini. Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc., Series B, 58:267288, 1996. [54] X.-S. Yang and A. B. Forbes. Model and feature selection in metrology data approximation. In E. H. Georgoulis, A. Iske, and J. Levesley, editors, Approximation Algorithms for Complex Systems, pages 293307, Heidelberg, 2010. Springer-Verlag.
P a g e | 14
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.