Eyehub White Paper March 2014

The Internet of Things and Information Integrity
Prepared By: Eyehub Partner National Physical Laboratory Date: March 2014
The EyeHub Internet of Things Ecosystem Demonstrator is led by Flexeye Ltd (http://www.flexeye.com). It is co-funded by the UK Governments Technology Strategy Board (http://www.innovateuk.org). The views expressed in this publication are those of the author(s) and not necessarily the Technology Strategy Board.
Table of Contents
Introduction ........................................................................................................................... 3 Inference and decision-making in the presence of variation and uncertainty ......................... 4 The metrology paradigm: traceability, calibration and uncertainty.......................................... 5 Degrees of freedom, (effective) redundancy and resilience ................................................... 6 Model uncertainty and statistical learning .............................................................................. 7 Information integrity ............................................................................................................ 10 References ......................................................................................................................... 11
Page |2
Introduction
The internet of things (IoT) concept, as interpreted within the project, has two main building blocks, i) many cheap sensors measuring a number of variables in the ecosystem and returning digital representations of the estimated measured variables, and ii) the availability of the measured data in a common internet platform. Anyone who wants to know something about the ecosystem can then interrogate the data, using data analysis algorithms to extract the relevant information from the available data. Often, the required information must be assembled from a number of data streams associated with different system variables, rather than a single variable. Hence, the importance of having all the data streams available in a common platform. The term data transformation converting raw data into usable information. Data transformation includes data curation, i.e., storing and making available the data, data assimilation, i.e., updating the current estimates of the system using new data, data interpretation, data visualisation,data fusion,and data modelling and simulation techniques. By usable information, we mean information on the basis of which inferences and decisions can be made. Usually, data transformation operates at a number of levels. At each level, inputs are aggregated and the relevant information summarised to form inputs to the next level. The goal of the data transformation is to convert the raw input data into information or knowledge to be acted upon. The data transformation process, involving data analysis and inference algorithms, hopefully adds value and the outputs of the data transformation process are often referred to as information products reflecting this added value. A weather forecast can be thought of as an information product assembled from observational data and weather models. If data is being transformed to information, what should we be able to say about the fidelity of the transformation process? If the algorithms applied in the transformation process are based on incorrect models of the underlying ecosystem, then it is quite possible that valid data is being transformed into invalid information. This document looks at issues associated with making sure the data transformation process produces valid information, information that can be assessed and acted upon on the basis of risks that can be quantified. We aim for a decision making process that can be defended (if necessary, in court) so that peers examining a decision will be able to say that the information used to make the decision was well founded and that others would of made the same decision on the basis of the evidence that was available (or indeed not available).
Page |3
Inference and decision-making in the presence of variation and uncertainty

The output of the data transformation process is information on the basis of which inferences and decisions can be made. In practically all ecosystems, the information derived cannot be regarded as exact for a number of reasons, e.g. i) the input sensor data will have associated uncertainties, ii) the input data is incomplete with some variables that have an impact on the behaviour of the ecosystem unmeasured, iii) the data transformation process will generally be based on models of the ecosystem that are approximate, and iv) aspects of the system are subject to special causes such as sensor failure. In order to make informed inferences and decisions it is necessary to assess the quality of the information that has been derived, e.g., by estimating the uncertainties associated with the outputs of the data transformation. In general, uncertainty evaluation is usually based on an underlying model but there is likely to be uncertainties associated with the model and other prior assumptions that has also to be taken into account. Finally, the ecosystem itself will likely to be subject to inherent variation that the limits the accuracy of information derived about the system. If decisions are made on the basis of uncertainty information, then there is a possibility that a wrong decision will be made. A wrong decision will generally incur a cost, such as scrapping a conforming part or passing a non-conforming part. Often the costs are asymmetric. The cost of putting too much fuel in an aeroplane relates to the inefficiency of carrying extra load around while not putting enough fuel involve costs operating on a completely different scale. Decision theory [3] uses a statistical model associated with the data/information and a cost model associated with wrong decisions in order to set thresholds that minimise the expected total cost. If the expected decision costs are judged to be too high, then we can ask what new information would be most useful in reducing them. The science of experimental design (DoE design of experiment) is aimed at coming up with answers to this sort of question [39, 42]. In fact, it is usually most sensible to value information on the basis of the expected reduction in cost. In this way, the cost of the new information (measurement systems, etc.) can be balanced against the information gain and corresponding reduction in risk. Statistical process control (SPC) [35] is an engineering application of decision theory in the presence of uncertainty and variation. It was recognised that a manufacturing process will necessarily have an inherent variability that cannot be eliminated [21, 48]. If the process stays within the limits of the inherent variation no action is required to change the system. Only when these limits are exceeded in a statistically significant way is action required to check if some aspect of the system is being affected by a special cause, such as a tool breaking. Without an understanding and quantification of the inherent variation of the system, there was a tendency to over-control the system, adjusting it in order to try to account for the random variations, that led to a great expense of effort and, more often than not, and increase in the total variation of the system. The determination of trends in key climate variables is made extremely difficult by the large cyclical variations operating at many different timescales: daily, lunar, seasonal, yearly, eleven year sunspot cycles [23] up to 100,000 year cycles associated with the dynamical evolution of the solar system [30]. These influence factors, as well as, for example,
Page |4
uncertainties associated with climate models, increase the risk with decisions associated with climate change mitigation might not bring about the desired amelioration.
The metrology paradigm: traceability, calibration and uncertainty

The International Vocabulary of Metrology (VIM), [10], defines (metrological) traceability as the property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty. The same document defines calibration as an operation that, under specified conditions, in a first step, establishes a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties and, in a second step, uses this information to establish a relation for obtaining a measurement result from an indication. Thus, calibration is generally a two stage process, the first characterising the response of an instrument given known values of the stimulus variable, e.g., the response of a weighing machine when measuring standards of known mass, and the second converting the measured response of the instrument to an estimate of the stimulus variable that gave rise to the response. The VIM defines measurement uncertainty as a non-negative parameter characterising the dispersion of the quantity values being attributed to a measurand, based on the information used. This definition allows for considerable latitude in its interpretation. The Guide to the expression of Uncertainty in Measurement (GUM), [7] provides a much tighter specification and adopts the view that information about the value of a quantity is encoded in a probability distribution usually defined in terms of a probability density function. The best estimate of a quantity is given by the mean of this distribution while the standard uncertainty is given by the standard deviation of the distribution (assuming they exist). The metrology infrastructure is constructed to ensure that measurements made at any time and any place are comparable and can be related to the same standard units, e.g., those defined by the System International (SI) [6]. Thus, when we go to a supermarket to buy a kilogram of potatoes, we can be sure that within a stated uncertainty that the weight of the potatoes can be related in documented, traceable manner to the SI standard of mass, the International Prototype Kilogram (IKP) kept at the Bureau International des Poid et Mesures (BIBM) in Sevres, near Paris. The chain of calibrations starting with the comparison of national standards with the IKP to the testing of supermarket scales using working standards can be thought of a flow of data in which information about masses at one level of the calibration is converted to information about masses and the next lower level. A mass balance is essentially an instrument that converts information associated with a calibrated standard to information about the mass artefact under test. The transfer of information is not perfect because the operation of the balance is not perfect. The uncertainty associated with the mass being calibrated will incorporate the uncertainty associated with the calibrated standard but will also have a component arising from the uncertainty contribution of the balance, i.e., the uncertainty associated with the data transformation process. As we mover further down the calibration chain the uncertainties increase. However,
Page |5
at each stage, the uncertainty contributions are reasonably well understood. Furthermore, the validity of the processes are constantly tested using inter-laboratory comparisons (ILCs) in which different laboratories measure the same artefacts and compare their results. The Mutual Recognition Arrangement (MRA) [5] uses ILCs, usually referred to as key comparisons, to compare the capabilities of National Metrology Institutes (NMIs) such as NPL and provide a mechanism by which measurements made in one country can be recognised in another.
Degrees of freedom, (effective) redundancy and resilience

The number of degrees of freedom associated with an ecosystem is a rough measure of the complexity of the system. The number of degrees of freedom equates approximately to the number of pieces of information required in order to specify the state of the system. Consider an example from metrology. Gauge blocks are precision-engineered straight metal bars with nominally flat, parallel faces that are used as length artefacts [22]. Gauge blocks were introduced by the Swedish engineer C. E. Johannson in the late 19th century and were soon adopted by the manufacturers such as Henry Ford. Today they are still used widely as high accuracy length standards (with a calibrated accuracy of better that 1 part in 107) and can be credited with enabling the development of high precision engineering. The gauge block is used to define a length at the standard temperature of 20 C. If we assume the temperature of the gauge block is 20 C, then there is nominally one degree of freedom associated with the system, its length. One length measurement is required to estimate the state of the system. If the the gauge block is not held at 20 C, then it is necessary to know its temperature and coefficient of thermal expansion, two further degrees freedom requiring further measurements, e.g., the measured lengths at two temperatures. If we no longer assume that the gauge block is straight and the end faces are parallel and flat, additional degrees of freedom are required to specify the non-ideal geometry. We may also need to relax the assumption that the temperature of the gauge is constant along its length. A typical calibration of a gauge block will list ten to twenty influence factors or degrees of freedom that need to be estimated in order to determine the calibrated length. The high accuracies achieved within national measurement institutes are largely due to the elimination of troublesome environmental influence factors using highly specialised laboratory facilities in which these influence factors are tightly controlled. The number of degrees of freedom associated with such systems are modest, usually a few tens of influence factors and the number of measurements required to estimate the state of the system is equally modest. Although the measurement infrastructure can be regarded as a tightly monitored and controlled ecosystem, it operates in the real world. Measurement systems are subject to environmental factors, particularly temperature, and it is difficult to eliminate or account for their effects completely, instruments can develop faults, operators can make mistakes. Even at the higher levels of the measurement chain, inter-laboratory comparisons provide data in which the results are not self-consistent within the stated uncertainties and steps have to be taken that attempt to resolve the inconsistencies, either to arrive at a consensus value for the artefact being measured [11, 18, 20, 40] or to diagnose where and why the inconsistencies arise [16, 19, 26] and hopefully improve the performance of the individual laboratories concerned.
Page |6
Many of the challenges associated with environmental monitoring, smart infrastructure and structural health monitoring necessarily involve measurements outside the laboratory which are subject to far greater influence factors that are usually only partially understood. These systems are characterised by a high number of degrees of freedom, the exact number of which might be very difficult to estimate and may indeed evolve over time. However, if we have a large number of measurements of variables associated with the system, we may hope that we have sufficient information to estimate the state of the ecosystem. If the system has, say, 200 degrees of freedom then we will need at least 200 measurements (or other information items) in order to estimate the state of the system. We use the term redundancy to describe the situation where there are more measurements than degrees of freedom (unknown parameters). In this situation, some of the measurements will be redundant in the sense that the state of the system can be estimated from a subset of the parameters. However, it will usually be better to use all the measurements to estimate the system parameters so that, e.g., measurement noise is averaged out. In order for this averaging process to be effective it is necessary to have an estimate of the uncertainties associated with each data point so that each piece of information is given an appropriate weight. Similarly, in integrating data from a number of data streams, it is necessary to be able to weight each data stream appropriately. Without these uncertainty estimates, it is quite possible that a weighted, averaging process would lead to worse estimates than those based on a subset of the data. In cases where some of the sources are providing rogue data, it is important (but often difficult) to identify these rogue sources and eliminate or de-weight them so that they do not compromise the averaging process. A simple equating of the number of sensor measurements with the (estimated) number of degrees of freedom of the ecosystem is not sufficient to show that all parameters of the system can be estimated since it is possible (and in fact quite likely) that some the measurements made are providing information that replicates information already available from other measurements or information sources, leaving other aspects of the system unresolved. We use the term effective redundancy to describe the degree to which parameters are estimated on the basis of more that one data point. A sensitivity analysis can be used to determine the contribution of each measurement source to the estimation of each parameter specifying the state of the system, and to evaluate the effective redundancy. We use the term resilience to denote the ability of the data transformation process to detect and account for rogue sources of data. In a model fitting context [2] the term robust estimation is also used [32, 33]. If an estimate of a parameter value depends on only one data point, then there is no corroborative information available to validate this data point and if the data point is spurious, there would be no way of knowing and inferences based on such data could well be wrong. The more effective redundancy, the more scope there is for detecting rogue data and building resilient systems.
Model uncertainty and statistical learning

In traditional engineering disciplines in which systems behave according to well-known principles, e.g., Newtonian mechanics, the main sources of uncertainty lie in the accuracies associated with the measured data; very little of the uncertainty is associated with the
Page |7
models. For complex ecosystems, the uncertainties associated with the model of the system is large initially, but can be reduced as more observations about the system is gathered and analysed. Traditional uncertainty evaluation methodologies such as the GUM [7] do not cope well with the concept of uncertainty associated with a model. Their main focus is on how uncertainty associated with measured data propagates through to parameters derived from the data for a fixed, known model of the system. So how can we account for model uncertainty? Consider the following example. The response y of a system to a stimulus variable x is expected to be approximately linear so a response model of the form y=a1+a2x could be appropriate. However, it is also suspected that some systematic effects present in the system could lead to quadratic or even higher order responses. Thus we may wish to consider modelling the response in terms of a higher order polynomial, y=a1+a2x++anxn1, for some n. Given response data (xi,yi), i=1,,m, we could fit in turn each polynomial for n=2,,m and choose the n that provides the best fit to the data. This approach is referred to as model selection: from a number of competing models choose the best one on the basis of the evidence supplied by the data [15, 37, 54]. The selection of the best model is usually based on criteria [1, 34, 47] that attempt to balance the goodness of fit to the data against finding the most economical acceptable fit to the data, i.e., one that minimises the complexity of the model, often related to the degrees of freedom (number of free parameters) associated with the model. In practical terms, the measured data will not be exact and we will be satisfied with a model fit that passes close enough to the data so that the difference between the model fit and the measured data can be easily accounted for in terms of measurement noise. In the example of fitting a polynomial to data we are concerned with explaining the response in terms of basis functions xj of the stimulus variable x. For many complex systems, we wish to explain a response y in terms of functions fj(x1,,xp) of possibly a large number p of stimulus variables, e.g., p y=a0+ ajxj, j=1 involving linear functions of the stimulus variables xj, but mixed components such as xjxk or higher order terms might also be included. Algorithms such as principle components analysis (PCA) and partial least squares (PLS) attempt to model the data involving a reduced number of linear combinations of (functions of) the variables xj ; see, e.g., [28]. These combinations will generally involve all the variables which means that all have to taken into account. In many cases, we wish to provide an adequate description of the behaviour of the system in terms of few of the variables as possible (ideally those determining the degrees of freedom of the system), an approach referred to as subset selection For even a modest number of variables, testing all possible subsets can very quickly become computationally infeasible. The LASSO algorithm (least absolute shrinkage and selection operator [53]) arrives as a subset by minimising a function of the form
Page |8
2 |aj|. F(a,)= yi ajfj(x1,,xp) + j i j The first summand provides a measure of the goodness of fit of the model to the data while the second summand is a penalty term that penalises nonzero coefficients aj. The size of the penalty is controlled by . At the solution of this optimisation problem, only those terms with nonzero coefficients aj are included in the model. A related algorithm is the least angle regression algorithm (LARS, [24]) that involves a stepwise procedure in which the functions fj(x1,,xp) are included into the model, one by one, until a suitable fit to the data is achieved. The LARS algorithm is remarkably efficient and can be tuned to give the LASSO solution [28]. Subset selection algorithms are important in determining key variables that need to be measured in order to understand the response behaviour. Model selection algorithms can be used to take into account the possibility of rogue sensors. Suppose there are m sensors providing data yi and associated uncertainties ui. The initial model M0 may be that all the sensors are working correctly so that ui is a valid statement of the uncertainty associated with yi . The next category of models, Error! , i=1,,m, is such that Mi regards the ith sensor as unreliable so that ui=. Additional models Mij, etc., may be put forward. The selected model will indicate which sensor measurements should be included in the analysis. In the metrology context, such approaches, using a computationally efficient scheme, have been applied to the analysis of inter-laboratory comparison data [20] and sensor network data [17]. After the best model has been chosen using model selection, then standard uncertainty calculations can be applied using this model. The approach regards the accepted model as exact and therefore contributing no uncertainty to derived information, but the uncertainty associated with the model has been at least partially accounted for by the fact that alternative models have been considered and only rejected because the data does not support sufficiently these models. If the data provides strong evidence that one model is to preferred above all others, this approach is effective and valid. What happens if the data is consistent with more than one of the proposed models? The model averaging approach [31, 43] does what it says, produces a model that averages across all the proposed models using a weighting scheme that reflects the degree of support for each model provided by the data. Model averaging agrees with model selection if only one weight is non-zero, corresponding to the selected model. Model averaging is most easily interpreted in a Bayesian framework [3, 4, 13, 27, 50] in which we give a prior estimate of the belief in each possible model and use the data to update these beliefs. The model predictions are then derived from an averaging process using the posterior beliefs, i.e., the beliefs derived after the data has been analysed. The space of all potential models is parameterized by additional hyper-parameters that are given a prior probability distribution and this distribution is updated as more data becomes available. The term statistical learning (other terms used similarly are data mining machine learning [12, 28, 29, 38, 44]) is used to denote analysis methods that build as much as possible on the observed data rather than on prior models of how the system is expected to
Page |9
behave. The term data assimilation refers to the cyclical updating of the information on the basis of observations, e.g., the daily updating of weather forecasts, for systems that evolve over time. The state of the system is estimate on the basis of two sources of information, firstly, a model that predicts the current state on the basis of information from the past and, secondly, observations made of the current state. The predictions and the observations are then combined to form a best estimate of the current state and, if necessary, form a prediction for the next time step. In order to combine the two sources of information effectively, it is necessary to have estimates of the uncertainties associated with each source of information. The Kalman filter [36, 51] is an early example of this type of approach and it and its generalisations, have proved effective in a wide range of applications, e.g., [14, 25, 46, 52]. If the underlying system changes, e.g., through the addition of a new subsystem or the failure of another subsystem then the observed data will likely not be consistent with the current description of the model. If this is detected then the belief in the current model should be reduced so that the new response behaviour can be learnt. Change point or tipping point analysis is concerned with detecting qualitative changes in a system response. Much of statistical process control is about the early detection [41, 45, 49] of a change in the behaviour of the manufacturing process. The word learning reflects that fact that as more data is gathered and analysed the range of possible models that are consistent with data is reduced so that over time a better understanding of how the ecosystem behaves is developed and used to make better predictions, inferences and decisions. Model selection can be regarded as a learning process: the model to be used has been derived on the basis of information learnt from the data. Model averaging is a learning process: the prior belief in the potential models is updated on the basis of the data as it becomes available.
Information integrity
We use the term information integrity to reflect the degree of belief or trust we have in the information and how it was derived. Integrity is comprised of the following components: Uncertainty quantification. The data transformation process will generally involve assembling information from different data sources. Uncertainties estimates at each stage are required in order that each data stream is given an appropriate weight. The uncertainties at each stage will propagate through to the uncertainties associated with the output information. However, the uncertainties associated with the output information should also take into account uncertainties associated with the models used in the data transformation, and uncertainties associated with prior assumptions, not just the uncertainties associated with measured data. Validation and self-validation. The data transformation processes will be based on models of the ecosystem. It is necessary to show that these models do reflect the actual behaviour of the system, e.g., by showing that, on the basis of a subset of the data, the model can estimate, to within the stated uncertainties, the response of the system as monitored by other data streams. Self-validation refers to the ability of data transformation process to detect when the underlying model of the system fails to
P a g e | 10
apply. If insufficient information is available, it is possible that unresolved degrees of freedom associated with the system can invalidate inferences derived from the output information. The system should be able to diagnose common failure modes and ideally have sufficient redundancy to be able to detect less common or unforeseen failure modes. Documentation. In order to justify a decision, it is necessary to understand the information used to make the decision and how it was arrived at. The input data will likely be one component, but it is also necessary to document the models that form the basis of the data transformation process and the evidence available that these models are appropriate for the ecosystem at hand. We may regard the measurement infrastructure as a high-integrity system. The components of the system are well understood (measurement instruments are designed according to accepted physical theory), the analysis of the data associated with system is undertaking using known methods, such as those described in the GUM and its supplements [8, 9], all stages are documented and the documentation is available (calibration certificates) and components in the system are regularly tested (periodic re-calibrations, inter-laboratory comparisons) to ensure that they are operating as expected. The measurements made by a supermarket may not be particularly accurate but we can be largely confident that they are in accordance their agreed operating performance on the basis of data with known provenance. Statistical process control can lead to high integrity systems: state-of-the-art manufacturing systems can deliver defect rates measured in parts per million. Information integrity can be compromised in many ways: invalid data, the use of models that do not accord with the actual behaviour of the ecosystem, etc. Documentation is required to assess the integrity of information arising from a data transformation process. Often the raw data is easy to store (although the sheer volume may be problematic). What is more difficult to store is the metadata the provides the correct context for the data, e.g., the calibration history of the sensors providing the data, the location of the sensors, what is happening in the immediate environment of the sensors. Even more difficult to document properly are the models and algorithms that form the basis of the data transformation process. If the data transformation process is regarded as a black box, then the integrity associated with the outputs of the black box are very difficult to assess, even if all the inputs are known to be valid.
References
[1] H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716723, 1974. [2] R. M. Barker, M. G. Cox, A. B. Forbes, and P. M. Harris. Software Support for Metrology Best Practice Guide no. 4: Modelling Discrete Data and Experimental Data Analysis. Technical Report DEM-ES 018, National Physical Laboratory, Teddington, 2007. [3] J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 2nd edition, 1985.
P a g e | 11
[4] J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, New York, 1994. [5] BIPM. Mutual recognition of national measurement standards and of calibration and measurement certificates issued by national metrology institutes. Technical report, Bureau International des Poids et Mesures, Svres, France, 1999. [6] BIPM. The International System of Units (SI). Bureau International des Poids et Mesures, Paris, 8th edition, 2005. [7] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Guide to the expression of uncertainty in measurement. Joint Committee for Guides in Metrology, JCGM 100:2008. [8] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Supplement 1 to the Guide to the expression of uncertainty in measurement Propagation of distributions using a Monte Carlo method. Joint Committee for Guides in Metrology, JCGM 101:2008. [9] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Evaluation of measurement data Supplement 2 to the Guide to the expression of uncertainty in measurement extension to any number of output quantities. Joint Committee for Guides in Metrology, JCGM 102:2011. [10] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. International vocabulary of metrology basic and general concepts and associated terms,. Joint Committee for Guides in Metrology, JCGM 200, 2008. [11] R. T. Birge. Probable values of the general physical constants. Rev. Mod. Phys, 1:173, 1929. [12] C. M. Bishop. Neural networks and pattern recognition. Oxford Univeristy Press, 1995. [13] G. E. P. Box and G. C. Tiao. Bayesian inference in statistical analysis. Wiley, New York, Wiley Classics Library Edition 1992 edition, 1973. [14] R. G. Brown and P. Y. C. Hwang. Introduction to Random Signals and Applied Kalman Filtering. Wiley, New York, 3rd edition, 1997. [15] H. Chipman, E. I. George, and R. E. McCulloch. The practical implementation of Bayesian model selection. Institute of Mathematical Statistics, Beachwood, Ohio, 2001. [16] A. G. Chunovkina, C. Elster, I. Lira, and W. Wger. Analysis of key comparison data and laboratory biases. Metrologia, 45(2):211216, 2008. [17] M. A. Collett, M. G. Cox, M. Duta, T. J. Esward, P. M. Harris, and M. Henry. The application of self-validation to wireless sensor networks. Meas. Sci. Technol., 19, 2008. [18] M. G. Cox. A discussion of approaches for determining a reference value in the analysis of key-comparison data. In P. Ciarlini, A. B. Forbes, F. Pavese, and D. Richter, editors, Advanced Mathematical Tools in Metrology, IV, pages 4565, Singapore, 2000. World Scientific. [19] M. G. Cox, A. B. Forbes, J. Flowers, and P. M. Harris. Least squares adjustment in the presence of discrepant data. In P. Ciarlini, M. G. Cox, F. Pavese, and G. B. Rossi, editors, Advanced Mathematical and Computational Tools in Metrology VI, pages 3751, Singapore, 2004. World Scientific. [20] M.G. Cox. The evaluation of key comparison data: determining the largest consistent subset. Metrologia, 44(3):187200, 2007. [21] W. E. Deming. On probability as a basis for action. The American Statistician, 29(4):146152, 1975.
P a g e | 12
[22] T. Doiron and J Beers. The gauge block handbook. Technical Report Monograph 180, National Institute of Science and Technology, Gaithersburg, 2005. [23] J. A. Eddy. The Mauder minimum. Science, 192(4245):11892102, 1976. [24] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407499, 2004. [25] Geir Evensen. Data Assimilation: The Ensemble Kalman Filter. Springer, 2009. [26] A. B. Forbes and C. Perruchet. Measurement systems analysis: concepts and computational approaches. In IMEKO World Congress, September 1822, 2006, Rio de Janeiro, 2006. [27] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, Boca Raton, Fl., second edition, 2004. [28] T. Hastie, R. Tibshirani, and J. Friedman. Elements of Statistical Learning. Springer, New York, 2nd edition, 2011. [29] S. Haykin. Neural Networks: A Comprehensive Foundation. Macmillan, New York, second edition, 1999. [30] J. D. Hays, J. Imbrie, and N. J. Shackleton. Variations in the earths orbit: pacemaker of the ice ages. Science, 194(4270):11211132, 1976. [31] J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: a tutorial. Statistical Science, 14:382401, 1999. [32] P. J. Huber. Robust estimation of a location parameter. Ann. Math. Stat., 35:73101, 1964. [33] P. J. Huber. Robust Statistics. Wiley, New York, 1980. [34] C. M. Hurvich and C. Tsai. Regression and time series model selection in sample samples. Biometrika, 76:297307, 1989. [35] International Organization for Standardization, Geneva. ISO 11462: Guidelines for implementation of statistical process control (SPC) Part 1: Elements of SPC, 2001. [36] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Engr., pages 3545, 1960. [37] H. Linhart and W. Zucchini. Model Selection. Wiley, New York, 1986. [38] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. [39] D. C. Montgomery. Design and analysis of experiments. John Wiley & Sons, New York, 5th edition, 1997. [40] R. C. Paule and J. Mandel. Consensus values and weighting factors. J Res. Natl. Bur. Stand., 87:377385, 1982. [41] M. Pollak. Optimal detection of a change in distribution. Annals of Statistics, 13(1):206227, 1985. [42] F. Pukelsheim. Optimal design of experiments. SIAM, Philadelphia, 2006. Reproduction of 1993 book published by John Wiley and Sons, New York. [43] A. E. Raftery, D. Madigan, and J. A. Hoeting. Bayesian model averaging for linear regression. Journal of the American Statistical Association, 92:179191, 1997. [44] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, Mass., 2006. [45] S. Roberts. A comparison of some control chart procedures. Technometrics, 8(3):411430, 1966. [46] S. K. Sahu and K. V. Mardia. A Bayesian kriged Kalman model for short-term forecasting of air pollution levels. Applied Statistics, 54:223244, 2005.
P a g e | 13
[47] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461 464, 1978. [48] W. A. Shewhart. Economic control of quality of manufactured product. D. Van Nostrand Company, New York, 1931. [49] A. N. Shiryaev. On optimum methods for quickest detection problems. Theory of Probability and Its Applications, 8:2246, 1963. [50] D. S. Sivia. Data Analysis: a Bayesian Tutorial. Clarendon Press, Oxford, 1996. [51] H. W. Sorenson. Least-squares estimation: from gauss to kalman. IEEE Spectrum, 7:6368, July 1970. [52] H. W. Sorenson, editor. Kalman filtering: theory and application. IEEE, New York, 1985. [53] R. Tibsharini. Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc., Series B, 58:267288, 1996. [54] X.-S. Yang and A. B. Forbes. Model and feature selection in metrology data approximation. In E. H. Georgoulis, A. Iske, and J. Levesley, editors, Approximation Algorithms for Complex Systems, pages 293307, Heidelberg, 2010. Springer-Verlag.
P a g e | 14

Eyehub White Paper March 2014

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Eyehub White Paper March 2014

Caricato da

Copyright:

Formati disponibili

The Internet of Things and Information Integrity

Inference and decision-making in the presence of variation and uncertainty

The metrology paradigm: traceability, calibration and uncertainty

Degrees of freedom, (effective) redundancy and resilience

Model uncertainty and statistical learning

Potrebbero piacerti anche