Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines
2/5
()
About this ebook
Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines is a must-read for all scientists about a very simple computation method designed to simulate big-data neural processing. This book is inspired by the Calculus Ratiocinator idea of Gottfried Leibniz, which is that machine computation should be developed to simulate human cognitive processes, thus avoiding problematic subjective bias in analytic solutions to practical and scientific problems.
The reduced error logistic regression (RELR) method is proposed as such a "Calculus of Thought." This book reviews how RELR's completely automated processing may parallel important aspects of explicit and implicit learning in neural processes. It emphasizes the fact that RELR is really just a simple adjustment to already widely used logistic regression, along with RELR's new applications that go well beyond standard logistic regression in prediction and explanation. Readers will learn how RELR solves some of the most basic problems in today’s big and small data related to high dimensionality, multi-colinearity, and cognitive bias in capricious outcomes commonly involving human behavior.
- Provides a high-level introduction and detailed reviews of the neural, statistical and machine learning knowledge base as a foundation for a new era of smarter machines
- Argues that smarter machine learning to handle both explanation and prediction without cognitive bias must have a foundation in cognitive neuroscience and must embody similar explicit and implicit learning principles that occur in the brain
Daniel M Rice
Daniel M. Rice is Principal and Senior Scientist of Rice Analytics. He founded the business in early 1996 as a sole proprietorship, but it was incorporated into its current structure in 2006. Prior to 1996, he was an assistant professor at the University of California-Irvine and the University of Southern California. Dan has almost 25 years of research project and advanced statistical modeling experience for major organizations that include the National Institute on Aging, Eli Lilly, Anheuser-Busch, Sears Portrait Studios, Hewlett-Packard, UBS, and Bank of America. He has a Ph.D. from the University of New Hampshire in Cognitive Neuroscience and Postdoctoral training in Applied Statistics from the University of California-Irvine. Dan is a previous recipient of an Individual National Research Service Award from the National Institutes of Health and is author of more than 20 publications, many of which are in conference proceedings and peer-reviewed journals in cognitive neuroscience and statistics.
Related to Calculus of Thought
Related ebooks
Natural and Artificial Intelligence: Misconceptions about Brains and Neural Networks Rating: 0 out of 5 stars0 ratingsTensorFlow A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsA Practical Approach for Machine Learning and Deep Learning Algorithms: Tools and Techniques Using MATLAB and Python Rating: 0 out of 5 stars0 ratingsDeep Learning and Parallel Computing Environment for Bioengineering Systems Rating: 0 out of 5 stars0 ratingsParallel Programming with OpenACC Rating: 5 out of 5 stars5/5Machine Learning: A Bayesian and Optimization Perspective Rating: 3 out of 5 stars3/5Nature-Inspired Optimization Algorithms Rating: 4 out of 5 stars4/5Foundations of Genetic Algorithms 1991 (FOGA 1) Rating: 0 out of 5 stars0 ratingsComplex Systems: Lecture Notes of the Les Houches Summer School 2006 Rating: 0 out of 5 stars0 ratingsF# for Machine Learning Essentials Rating: 0 out of 5 stars0 ratingsJulia for Data Science Rating: 0 out of 5 stars0 ratingsApplied Automata Theory Rating: 0 out of 5 stars0 ratingsHaskell Data Analysis Cookbook Rating: 3 out of 5 stars3/5Machine Learning Algorithms for Data Scientists: An Overview Rating: 0 out of 5 stars0 ratingsMem-elements for Neuromorphic Circuits with Artificial Intelligence Applications Rating: 0 out of 5 stars0 ratingsA Mathematical Kaleidoscope: Applications in Industry, Business and Science Rating: 0 out of 5 stars0 ratingsComputability, Complexity, Logic Rating: 0 out of 5 stars0 ratingsFirst-Order Partial Differential Equations, Vol. 2 Rating: 0 out of 5 stars0 ratingsMachine Learning: An Artificial Intelligence Approach, Volume III Rating: 0 out of 5 stars0 ratingsIntroduction to Applied Statistical Signal Analysis: Guide to Biomedical and Electrical Engineering Applications Rating: 0 out of 5 stars0 ratingsNonlinear Dynamics: Exploration Through Normal Forms Rating: 5 out of 5 stars5/5Cooperative and Graph Signal Processing: Principles and Applications Rating: 0 out of 5 stars0 ratingsThe Convolution Transform Rating: 0 out of 5 stars0 ratingsAdvances in Independent Component Analysis and Learning Machines Rating: 0 out of 5 stars0 ratingsIntroduction to Deep Learning and Neural Networks with Python™: A Practical Guide Rating: 0 out of 5 stars0 ratingsThe Digital Mind Rating: 0 out of 5 stars0 ratingsGeneral Theory of Markov Processes Rating: 0 out of 5 stars0 ratingsIntroduction to Dynamic Programming: International Series in Modern Applied Mathematics and Computer Science, Volume 1 Rating: 0 out of 5 stars0 ratingsNatural Language Processing A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratings
Mathematics For You
Game Theory: A Simple Introduction Rating: 4 out of 5 stars4/5Algebra - The Very Basics Rating: 5 out of 5 stars5/5Mental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5Geometry For Dummies Rating: 5 out of 5 stars5/5Is God a Mathematician? Rating: 4 out of 5 stars4/5Basic Math & Pre-Algebra For Dummies Rating: 4 out of 5 stars4/5Precalculus: A Self-Teaching Guide Rating: 5 out of 5 stars5/5Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis Rating: 5 out of 5 stars5/5Calculus Made Easy Rating: 4 out of 5 stars4/5The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English! Rating: 4 out of 5 stars4/5Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis Rating: 0 out of 5 stars0 ratingsAlgebra I Workbook For Dummies Rating: 3 out of 5 stars3/5Introducing Game Theory: A Graphic Guide Rating: 4 out of 5 stars4/5The Golden Ratio: The Divine Beauty of Mathematics Rating: 5 out of 5 stars5/5The Thirteen Books of the Elements, Vol. 1 Rating: 0 out of 5 stars0 ratingsThe Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5Sneaky Math: A Graphic Primer with Projects Rating: 0 out of 5 stars0 ratingsSee Ya Later Calculator: Simple Math Tricks You Can Do in Your Head Rating: 4 out of 5 stars4/5My Best Mathematical and Logic Puzzles Rating: 5 out of 5 stars5/5Relativity: The special and the general theory Rating: 5 out of 5 stars5/5The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need Rating: 5 out of 5 stars5/5Limitless Mind: Learn, Lead, and Live Without Barriers Rating: 4 out of 5 stars4/5A Mind for Numbers | Summary Rating: 4 out of 5 stars4/5
Reviews for Calculus of Thought
1 rating0 reviews
Book preview
Calculus of Thought - Daniel M Rice
Calculus of Thought
Neuromorphic Logistic Regression in Cognitive Machines
Daniel M. Rice
Table of Contents
Cover image
Title page
Copyright
Preface
A Personal Perspective
Chapter 1. Calculus Ratiocinator
Abstract
1 A Fundamental Problem with the Widely Used Methods
2 Ensemble Models and Cognitive Processing in Playing Jeopardy
3 The Brain's Explicit and Implicit Learning
4 Two Distinct Modeling Cultures and Machine Intelligence
5 Logistic Regression and the Calculus Ratiocinator Problem
Chapter 2. Most Likely Inference
Abstract
1 The Jaynes Maximum Entropy Principle
2 Maximum Entropy and Standard Maximum Likelihood Logistic Regression
3 Discrete Choice, Logit Error, and Correlated Observations
4 RELR and the Logit Error
5 RELR and the Jaynes Principle
Chapter 3. Probability Learning and Memory
Abstract
1 Bayesian Online Learning and Memory
2 Most Probable Features
3 Implicit RELR
4 Explicit RELR
Chapter 4. Causal Reasoning
Abstract
1 Propensity Score Matching
2 RELR's Outcome Score Matching
3 An Example of RELR's Causal Reasoning
4 Comparison to Other Bayesian and Causal Methods
Chapter 5. Neural Calculus
Abstract
1 RELR as a Neural Computational Model
2 RELR and Neural Dynamics
3 Small Samples in Neural Learning
4 What about Artificial Neural Networks?
Chapter 6. Oscillating Neural Synchrony
Abstract
1 The EEG and Neural Synchrony
2 Neural Synchrony, Parsimony, and Grandmother Cells
3 Gestalt Pragnanz and Oscillating Neural Synchrony
4 RELR and Spike-Timing-Dependent Plasticity
5 Attention and Neural Synchrony
6 Metrical Rhythm in Oscillating Neural Synchrony
7 Higher Frequency Gamma Oscillations
Chapter 7. Alzheimer's and Mind–Brain Problems
Abstract
1 Neuroplasticity Selection in Development and Aging
2 Brain and Cognitive Changes in Very Early Alzheimer's Disease
3 A RELR Model of Recent Episodic and Semantic Memory
4 What Causes the Medial Temporal Lobe Disturbance in Early Alzheimer's?
5 The Mind–Brain Problem
Chapter 8. Let Us Calculate
Abstract
1 Human Decision Bias and the Calculus Ratiocinator
2 When the Experts are Wrong
3 When Predictive Models Crash
4 The Promise of Cognitive Machines
Appendix
A1 RELR Maximum Entropy Formulation
A2 Derivation of RELR Logit from Errors-in-Variables Considerations
A3 Methodology for Pew 2004 Election Weekend Model Study
A4 Derivation of Posterior Probabilities in RELR's Sequential Online Learning
A5 Chain Rule Derivation of Explicit RELR Feature Importance
A6 Further Details on the Explicit RELR Low Birth Weight Model in Chapter 3
A7 Zero Intercepts in Perfectly Balanced Stratified Samples
A8 Detailed Steps in RELR's Causal Machine Learning Method
Notes and References
Index
Copyright
Academic Press is an imprint of Elsevier
225, Wyman Street, Waltham, MA 02451, USA
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands
© 2014 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher
Permissions may be sought directly from Elsevier's Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material
Notice
No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-410407-5
For information on all Academic Press publications visit our web site at store.elsevier.com
Printed and bound in USA
14 15 16 17 10 9 8 7 6 5 4 3 2 1
Preface
A Personal Perspective
I first heard the phrase Calculus of Thought as a second year graduate student in 1981 when I took a seminar called Brain and Behavior taught by our professor James Davis at the University of New Hampshire. He gave us a thorough grounding in the cognitive neuroscience of that day, and he spent significant time on various models of neural computation. The goal of cognitive research in neuroscience, he constantly reminded us, was to discover the neural calculus, which he took to be a complete and unifying understanding of how the brain performs computation in diverse functions such as coordinated motor action, perception, learning and memory, decision making, and causal reasoning. He suggested that if we had such a calculus, then there would be truly amazing practical artificial intelligence applications beyond our wildest dreams. This was before the widespread popularity of artificial neural network methods in the mid-1980s, as none of the quantitative models that we learned about were artificial neural networks, but instead were grounded in empirical data in neural systems like the primary visual cortex and the basal ganglia. He admitted that the models that we were taught fell way short of such a calculus, but he did fuel an idea within me that has stayed for more than 30 years.
I went on to do my doctoral dissertation with my thesis advisor Earl Hagstrom on timing relationships in auditory attention processing as reflected by the scalp recorded Electroencephalography (EEG) alpha rhythm. During the early and mid-1980s, there was not a lot of interest in the EEG as a window into normal human cognition, as the prevailing sentiment was that the brain's electric field potentials were too gross of a measure to contain valuable information. We now have abundant evidence that brain field potential measures, such as the EEG alpha rhythm, are sensitive to oscillating and synchronous neural signals that do reflect the rhythm and time course of cognitive processing. My dissertation study was one of a handful of initial studies to document a parallelism between the oscillating neural synchrony as measured by the EEG and cognition.¹ Through that dissertation study, and the advice that I also got from other faculty committee members—John Limber and Rebecca Warner, I learned the importance of well-controlled experimental findings in providing more reliable explanations. Yet, this dissertation did not address how basic cognitive computations might be performed by oscillating neural synchrony mechanisms, and my thoughts have wandered back to how this Calculus of Thought might work ever since.
I did postdoctoral fellowship research with Monte Buchsbaum at the University of California-Irvine Brain Imaging Center. Much of our work focused on abnormal temporal lobe EEG slowing in probable Alzheimer's patients and related temporal lobe slowing in nondemented older adults. We published one paper in 1990 that replicated other studies including one from Monte's lab showing abnormal temporal lobe EEG slowing in Alzheimer's patients compared to nondemented elderly, but we critically refined the methodology to get a higher fidelity measurement.² We published a second paper in 1991 that used this refined methodology to make the first claim that Alzheimer's disease must have an average preclinical period of at least 10 years.³ Our basic finding was that a milder form of the temporal lobe EEG slowing observed in Alzheimer's patients was seen in nondemented older adults with minor memory impairment. This observed memory impairment had the exact same profile as what neuropsychologist Brenda Milner had observed in the famous medial temporal lobe patient H.M. This is the most famous clinical case study in neuroscience.⁴ This was that there was normal immediate recall ability, but dramatically greater forgetting a short time later. Other than this memory deficit, we could find nothing else wrong in terms of cognitive and intelligence tests with these nondemented older adults. Given the nature of the memory deficit and the location of the EEG abnormality, we suggested that the focus of this abnormality must be in the medial temporal lobe of the brain. Given the prevalence of such EEG slowing in nondemented older adults and the prevalence of Alzheimer's disease, we calculated that this must be a preclinical sign of Alzheimer's disease that is present 10 years earlier than the clinical diagnosis. We had no idea how abnormal neural synchrony might be related to dysfunction in memory computations, but ever since my thoughts have wandered back to how this Calculus of Thought could go awry early in Alzheimer's disease.
Our claim that there is a long preclinical period in Alzheimer's disease with a major focal abnormality originating in the medial temporal lobe memory system has now become the prevailing view in Alzheimer's research.⁵,⁶,⁷ This evidence for the long preclinical period is such that the National Institute on Aging issued new diagnostic recommendations in 2011 to incorporate preclinical Alzheimer's disease into the clinical Alzheimer's disease diagnosis.⁸ Yet, when I moved to a new assistant professor position at the University of Southern California (USC) in the earlier 1990s and tried to get National Institutes of Health (NIH) funding to validate this hypothesis with a longitudinal, prospective study, my proposals were repeatedly not funded. It was not controversial that Alzheimer's starts with medial temporal lobe memory system problems, as almost everyone believed that was true. However, I was a complete newcomer, and the idea that Alzheimer's had such a long preclinical period was just too unexpected to be accepted as reasonable by the quorum needed to get NIH funding. It was then that I began to have recollections of Thomas Kuhn's theory about bias in science taught by my undergraduate history of science professor Charles Bonwell many years earlier. Kuhn warned that the established scientific community is often overinvested in the basic theory in their field that he called paradigm
.⁹ Kuhn argued that this bias precludes most scientists from considering new results and hypotheses that are inconsistent with the paradigm. In the early 1990s, the entire Alzheimer's clinical diagnostic process and pharmaceutical research were only concerned with actual demented patients. So any results or explanation that there is in fact a long 10-year preclinical period was not yet welcome. As it turns out, the only hope to prevent and treat Alzheimer's disease now seems to be in this very early period when the disease is still mild,¹⁰ as all drug studies in actual demented patients have not really worked. At that time, I was not yet familiar with Leibniz's concept of Calculus Ratiocinator, or a thought calculus machine designed to generate explanations that avoid such bias. Though, I did start to think that any unbiased machine that could generate most reasonable hypotheses based upon all available data would be useful.
Fortunately, I had wider research interests than Alzheimer's disease, as I had an interest in quantitative modeling of the brain's electric activity as a means to understand the brain's computations. One day while still at UC-Irvine, I attended a seminar given by a graduate student on maximum entropy and information theory in a group organized by mathematical cognitive psychologists Duncan Luce and Bill Batchelder. I then began to study maximum entropy on my own and became interested in the possibility that this could be the basic computational process within neurons and the brain. When I ended up teaching at the USC a few years later, I was fortunate enough to collaborate with engineering professor Manbir Singh and his graduate student Deepak Khosla on modeling the EEG with the maximum entropy method.¹¹ In our maximum entropy modeling, Deepak taught me a very interesting new way to smooth error out of EEG models using what we now call L2 norm regularization. Yet, I also began to think that there might be a better approach based upon probability theory to model and ultimately reduce error in regression models that model the brain's neural computation. This thinking eventually led to the reduced error logistic regression (RELR) method that is the proposed Calculus of Thought, which is the subject of this book.
In April of 1992, I had a bird's eye view of the Los Angeles (LA) riots through my third floor laboratory windows at USC that faced the south central section of the city. I watched shops and houses burn, and I was shocked by the magnitude of the violence. But, I also began to wonder whether human social behavior also might be determined probabilistically in ways similar to how causal mechanisms determine cognitive neural processes like attention and memory so that it might be possible to predict and explain such behavior. After the riots, I listened to the heated debates about causal forces involved in the 1992 LA riots, and again I began to wonder how objective these hypotheses about causal explanations of human behavior ever could be due to extremely strong biases. This was also true of most explanations of human behavior that I saw offered in social science whether they were conservative or liberal. So, it became clear to me that bias was the most significant problem in social science predictions and explanations of human behavior. And, I began to believe that an unbiased machine learning methodology would be a huge benefit to the understanding of human behavior outcomes. However, I did not yet make the connection that a data-driven quantitative methodology that models neural computations could be the basis of this unbiased Calculus Ratiocinator machine.
Eventually, it became clear that my proposed research on a longitudinal, prospective study to test the putative long preclinical period in Alzheimer's disease would not be funded. So, because NIH funding was required to get tenure at USC, I realized that I had better find another career. I resigned from USC in 1995, which was years prior to when my tenure decision would have been made. Even though I had a contract that would be renewed automatically until at least 1999, I saw no reason to be dead weight for several years and then face a negative tenure decision. Instead, I wanted to get started on a more entrepreneurial career where I would have more control over my fate. So, I pursued an applied analytics career. I have enjoyed this career often better than my academic career because the rewards and feedback are more immediate. Also, there is much more emphasis on problem solving as the goal rather than writing a paper or getting a grant, although sales ability is obviously always valued. Yet, I eventually realized that the same bias problems that limit understanding of human behavior in academic neural, behavioral, and social science also limit practical business analytics. Time and time again I noticed decisions based upon biased predictive or explanatory models dictated by incorrect, questionable assumptions, so again I noticed a need for a data-driven Calculus Ratiocinator to generate reasonable and testable hypotheses and conclusions that avoid human bias.
Much of my applied analytic work has involved logistic regression, and I eventually learned that maximum entropy and maximum likelihood methods yield identical results in modeling the kind of binary signals that both neurons and business decisions produce. Thus, at some point I also realized that I could continue my research into the RELR Calculus of Thought. But instead of focusing on the brain's computations, I could focus on practical real world business applications using the same computational method that I believed that neurons use—the RELR method. In so doing, I eventually realized that RELR could be useful as the Calculus Ratiocinator that Leibniz had suggested is needed to remove biased answers to important, real world questions.
Many of today's data scientists
have a background in statistics, pure mathematics, computer science, physics, engineering, and operations research. Yet, these are academic areas that are not focused on studying human behavior, but instead focus on quantitative and technical issues related to more mechanical data processes. Many other analytic scientists, along with many analytic executives, have a background in behaviorally oriented fields like economics, marketing, business, psychology, and sociology. These academic areas do focus on studying human behavior, but are not heavily quantitatively oriented. There is a real need for a theoretical bridge between the more quantitative and more behavioral knowledge areas, and that is the intention of this book. So, I believe that this book could appeal both to quantitative and behaviorally oriented analytic professionals and executives and yet fill important knowledge gaps in each case.
Unless an analytic professional today has a specific background in cognitive or computational neuroscience, it is unlikely that they will have a very good understanding of neuroscience and how the brain may compute cognition and behavior. Yet, data-driven modeling that seeks to solve the Turing problem and mimic the performance of the human brain without its propensity for bias and error is probably the implicit goal of all machine-learning applications today. In fact, this book argues that cognitive machines will need to be neuromorphic, or based upon neuroscience, in order to simulate aspects of human cognition. So, this book covers the most fundamental and important concepts in modern cognitive neuroscience including neural dynamics, implicit and explicit learning, neural synchrony, Hebbian spike-timing dependent plasticity, and neural Darwinism. Although this book presents a unified view of these fundamental neuroscience concepts in relationship to RELR computation mechanisms, it also should serve as a much needed introductory overview of these fundamental cognitive neuroscience principles for analytic professionals who are interested in neuromorphic cognitive machines.
Theories in science can simplify a subject and make that subject more understandable and applicable in practice, even when theories ultimately are modified with new data. So this book may help analytic professionals understand fundamental neural computational theory and cognitive neuroscientists understand practical issues surrounding real world machine learning applications. It is true that the theory of the brain's computation in this book should be viewed as a question rather than an answer. But, if cognitive neuroscientists and analytic professionals all start asking questions about whether the brain and machine learning each can work according to the information theory principles that are laid out in this book, then this book will have served its purpose.
Daniel M. Rice
St Louis, MO (USA)
Autumn, 2013
Chapter 1
Calculus Ratiocinator
Abstract
There is more need than ever to implement Leibniz's Calculus Ratiocinator suggestion concerning a machine that simulates human cognition but without the inherent subjective biases of humans. This need is seen in how predictive models based upon observation data often vary widely across different blinded modelers or across the same standard automated variable selection methods. This unreliability is amplified in today's Big Data with very high-dimension confounding candidate variables. The single biggest reason for this unreliability is uncontrolled error that is especially prevalent with highly multicollinear input variables. So, modelers need to make arbitrary or biased subjective choices to overcome these problems because widely used automated variable selection methods like standard stepwise methods are simply not built to handle such error. The stacked ensemble method that averages many different elementary models was reviewed as one way to avoid such bias and error to generate a reliable prediction, but there are disadvantages including lack of automation and lack of transparent, parsimonious, and understandable solutions. A form of logistic regression that also models error events as a component of the maximum likelihood estimation called Reduced Error Logistic Regression (RELR) was also introduced as a method that avoids this multicollinearity error. An important neuromorphic property of RELR is that it shows stable explicit and implicit learning in small training samples and high-dimension inputs as observed in neurons. Other important neuromorphic properties of RELR consistent with a Calculus Ratiocinator machine were also introduced including the ability to produce unbiased automatic stable maximum probability solutions and stable causal reasoning based upon matched sample quasi-experiments. Given RELR's connection to information theory, these stability properties are the basis of the new stable information theory that is reviewed in this book with wide ranging causal and predictive analytics applications.
Keywords
Analytic science; Big data; Calculus Ratiocinator; Causal analytics; Causality; Cognitive neuroscience; Cognitive science; Data mining; Ensemble learning; Explanation; Explicit learning and memory; High dimension data; Implicit learning and memory; Information theory; Logistic regression; Machine learning; Matching experiment; Maximum entropy; Maximum likelihood; Multicollinearity; Neuromorphic; Neuroscience; Observational data; Outcome score matching; Prediction; Predictive analytics; Propensity score; Quasi-experiment; Randomized controlled experiment; Reduced error logistic regression (RELR); Stable information theory
It is obvious that if we could find characters or signs suited for expressing all our thoughts as clearly and as exactly as arithmetic expresses numbers or geometry expresses lines, we could do in all matters insofar as they are subject to reasoning all that we can do in arithmetic and geometry. For all investigations which depend on reasoning would be carried out by transposing these characters and by a species of calculus.
Gottfried Leibniz, Preface to the General Science, 1677.¹
Contents
1. A Fundamental Problem with the Widely Used Methods
2. Ensemble Models and Cognitive Processing in Playing Jeopardy
3. The Brain's Explicit and Implicit Learning
4. Two Distinct Modeling Cultures and Machine Intelligence
5. Logistic Regression and the Calculus Ratiocinator Problem
At the of end of his life and starting in 1703 Gottfried Leibniz engaged in a 12-year feud with Isaac Newton over who first invented the calculus and who committed plagiarism. All serious scholarship now indicates that both Newton and Leibniz developed calculus independently.² Yet, stories about Leibniz's invention of calculus usually focus on this priority dispute with Newton and give much less attention to how Leibniz's vision of calculus differed substantially from Newton's. Whereas Newton was trained in mathematical physics and continued to be associated with academia during the most creative time in his career, Leibniz's early academic failings in math led him to become a lawyer by training and an entrepreneur by profession.³ So Leibniz's deep mathematical insights that led to calculus occurred away from a university professional association. Unlike Newton whose entire mathematical interests seemed tied to physics, Leibniz clearly had a much broader goal for calculus. These applications were in areas well beyond physics that seem to have nothing to do with mathematics. His dream application was for a Calculus Ratiocinator, which is synonymous with Calculus of Thought.⁴ This can be interpreted to be a very precise mathematical model of cognition that could be automated in a machine to answer any important philosophical, scientific, or practical question that traditionally would be answered with human subjective conjecture.⁵ Leibniz proposed that if we had such a cognitive calculus, we could just say Let us calculate
⁶ and always find most reasonable answers uncontaminated by human bias.
In a sense, this concept of Calculus Ratiocinator foreshadows today's predictive analytic technology.⁷ Predictive analytics are widely used today to generate better than chance longer term projections for more stable physical and biological outcomes like climate change, schizophrenia, Parkinson's disease, Alzheimer's disease, diabetes, cancer, optimal crop yields, and even good short-term projections for less stable social outcomes like marriage satisfaction, divorce, successful parenting, crime, successful businesses, satisfied customers, great employees, successful ad campaigns, stock price changes, loan decisions, among many others. Until the widespread practice of predictive analytics with the introduction of the computers in the past century, most of these outcomes were thought to be too capricious to have anything to do with mathematics. Instead, they were traditionally answered with speculative and biased hypotheses or intuitions often rooted in culture or philosophy (Fig. 1.1).
Figure 1.1 Gottfried Wilhelm Leibniz.⁸
Until just very recently, standard computer technology could only evaluate a small number of predictive features and observations. But, we are now in an era of big data and high performance massively parallel computing. So our predictive models should now become much more powerful. This is because it would seem reasonable that those traditional methods that worked to select important predictive features from small data will scale to high-dimension data and suddenly select predictive models that are much more accurate and insightful. This would give us a new and much more powerful big data machine intelligence technology that is everything that Leibniz imagined in a Calculus Ratiocinator. Big data massively parallel technology should thus theoretically allow completely new data-driven cognitive machines to predict and explain capricious outcomes in science, medicine, business, and government.
Unfortunately, it is not this simple. This is because observation samples are still fairly small in most of today's predictive analytic applications. One reason is that most real-world data are not representative samples of the population to which one wishes to generalize. For example, the people who visit Facebook or search on Google might not be a good representative sample of many populations, so smaller representative samples will need to be taken if the analytics are to generalize very well. Another problem is that many real-world data are not independent observations and instead are often repeated observations from the same individuals. For this reason, data also need to be down sampled significantly to be independent observations. Still, another problem is that even when there are many millions of independent representative observations, there are usually a much smaller number of individuals who do things like respond to a particular type of cancer drug or commit fraud or respond to an advertising promotion in the recent past. The informative sample for a predictive model is the group of targeted individuals and a group of similar size that did not show such a response, but these are not usually big data samples in terms of large numbers of observations. So, the biggest limitation of big data in the sense of a large number of observations is that most real-world data are not big
and instead have limited numbers of observations. This is especially true because most predictive models are not built from Facebook or Google data.⁹
Still, most real-world data are big
in another sense. This is in the sense of being very high dimensional given that interactions between variables and nonlinear effects are also predictive features. Previously we have not had the technology to evaluate high dimensions of potentially predictive variables rapidly enough to be useful. The slower processing that was the reason for this curse of dimensionality
is now behind us. So many might believe that this suddenly allows the evaluation of almost unfathomably high dimensions of data for the selection of important features in much more accurate and smarter big data predictive models simply by applying traditional widely used methods.
Unfortunately, the traditional widely used methods often do not give unbiased or non-arbitrary predictions and explanations, and this problem will become ever more apparent with today's high-dimension data.
1 A Fundamental Problem with the Widely Used Methods
There is one glaring problem with today's widely used predictive analytic methods that stands in the way of our new data-driven science. This problem is inconsistent with Leibniz's idea of an automated machine that can reproduce the very computations of human cognition, but without the subjective biases of humans. This problem is suggested by the fact that there are probably at least hundreds of predictive analytic methods that are in use today. Each method makes differing assumptions that would not be agreed upon by all, and all have at least one and sometimes many arbitrary parameters. This arbitrary diversity is defended by those who believe a no free lunch
theorem that argues that there is no one best method across all situations.¹⁰,¹¹ Yet, when predictive modelers test various arbitrary algorithms based upon these methods to get a best model for a specific situation, they obviously will only test but a tiny subset of the possibilities. So unless there is an obvious very simple best model, different modelers will almost always produce substantially different arbitrary models with the same data.
As examples of this problem of arbitrary methods, there are different types of decision tree methods like CHAID and CART which have different statistical tests to determine branching. Even with the very same method, different user-provided parameters for splitting the branches of the tree will often give quite different decision trees that will generate very different predictions and explanations. Likewise, there are many widely used regression variable selection methods like stepwise and LASSO logistic regression that are all different in the arbitrary assumptions and parameters employed in how one selects important explanatory
variables. Even with the very same regression method, different user choices in these parameters will almost always generate widely differing explanations and often substantially differing predictions. There are other methods like Principal Component Analysis (PCA), Variable Clustering and Factor Analysis that attempt to avoid the variable selection problem by greatly reducing the dimensionality of the variables. These methods work well when the data match underlying assumptions, but most behavioral data will not be easily modeled with the assumptions in these methods like orthogonal components in the case of PCA or that one knows how to rotate the components to be nonorthogonal using the other methods given that there are an infinite number of possible rotations. Likewise, there are many other methods like Bayesian Networks, Partial Least Squares, and Structural Equation Modeling that modelers often use to make explanatory inferences. These methods each make differing arbitrary assumptions that often generate wide diversity in explanations and predictions. Likewise, there are a large number of fairly black box methods like Support Vector Machines, Artificial Neural Networks, Random Forests, Stochastic Gradient Boosting, and various Genetic Algorithms that are not completely transparent in their explanations of how the predictions are formed, although some measure of variable importance often can be obtained. These methods can generate quite different predictions and important variables simply because of differing assumptions across the methods or differing user-defined modeling parameters within the methods.
Because there are so many methods and because all require unsubstantiated modeling assumptions along with arbitrary user-defined parameters, if you gave exactly the same data to a 100 different predictive modelers, you would likely get a 100 completely different models unless it was a simple solution. These differing models often would make very different predictions and almost always generate different explanations to the extent that the method produces transparent models that could be interpreted. In cases where regression methods are used and raw interaction or nonlinear effects are parsimoniously selected without accompanying main effects, the model's predictions are even likely to depend on how variables are scaled so that currency in Dollars versus Euros would give different predictions.¹² Because of such variability that even can defy basic principles of logic, it is unreasonable to interpret any of these arbitrary models as reflecting a causal and/or most probable explanation or prediction.
Because the widely used methods yield arbitrary and even illogical models in many cases, hardly can we say Let us calculate
to answer important questions such as the most likely contribution of environmental versus genetic versus other biological factors in causing Parkinson's disease, Alzheimer's disease, prostate cancer, breast cancer and so on. Hardly, can we say Let us calculate
, when we wish to provide a most likely explanation for why there is climate change or why certain genetic and environmental markers correlate to diseases or why our business is suddenly losing customers or how we may decrease costs and yet improve quality in health care. Hardly, can we say Let us calculate
, when we wish to know the extent to which sexual orientation and other average gender differences are determined by biological factors or by social factors, when we wish to know whether stricter guns control policies would have a positive or negative impact on crime and murder rates, or when we wish to know whether austerity as an economic intervention tool is helpful or hurtful. Because our widely used predictive analytic methods are so influenced by completely subjective human choices, predictive model explanations and predictions about human diseases, climate change, and business and social outcomes will have substantial variability simply due to our cognitive biases and/or our arbitrary modeling methods. The most important questions of our day relate to various economic, social, medical, and environmental outcomes related to human behavior by cause or effect, but our widely used predictive analytic methods cannot answer these questions reliably.
Even when the very same method is used to select variables, the important variables that the model selects as the basis of explanation are likely to vary across independent observation samples. This sampling variability will be especially prevalent if the observations available to train the model are limited or if there are many possible features that are candidates for explanatory variables, and if there is also more than a modest correlation between at least some of the candidate explanatory variables. This problem of correlation between variables or multicollinearity is ultimately the real culprit. This multicollinearity problem is almost always seen with human behavior outcomes. Unlike many physical phenomena, behavioral outcomes usually cannot be understood in terms of easy to separate uncorrelated causal components. Models based upon randomized controlled experimental selection methods can avoid this multicollinearity problem through designs that yield variables that are orthogonal.¹³ Yet, most of today's predictive analytic applications necessarily must deal with observation data, as randomized experiments are simply not possible usually with human behavior in real-world