Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Statistical
Point Estimation Models Based on Summarization Bayes Theorem and Decision Tree Hypothesis Testing Regression and Correlation
Point Estimation
Point Estimate: estimate a population parameter given by a single number May be made by calculating the parameter for a sample. May be used to predict value for missing data.
R contains 100 employees 99 have salary information Mean salary of these is $50,000 Use $50,000 as value of remaining employees salary.
Estimation Error
Bias: Difference between expected value and actual value. Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value:
Bayes Theorem
Posterior Probability: P(h1|xi) Prior Probability: P(h1) Bayes Theorem:
Decision Tree
It can be defined as a root followed by internal nodes. Each labeled as a question to cover all possible responses Used in classification and clustering methods to breakdown problems down into increasingly discrete subsets by working from generalization to more specific information
Hypothesis Testing
Find model to explain behavior by creating and then testing a hypothesis about the data. Exact opposite of usual DM approach. H0 Null hypothesis; Hypothesis to be tested. H1 Alternative hypothesis
Chi-Square Test
One technique to perform hypothesis testing Used to test the association between two observed variable values and determine if a set of observed values is statistically different. The chi-squared statistic is defines as:
Regression
Predict future values based on past values Fitting a set of points to a curve Linear Regression assumes linear relationship exists. y = c 0 + c 1 x1 + + c n x n
n input variables, (called regressors or predictors) One out put variable, called response n+1 constants, chosen during the modlong process to match the input examples
10
11
Correlation
Examine the degree to which the values for two variables behave similarly. Correlation coefficient r:
1 = perfect correlation -1 = perfect but opposite correlation 0 = no correlation
12
Neural Computing
Neural networks utilize many connected nodes to examine large amount of data to find a pattern so as one can go through large amount of data quickly. They can be used to model complex relationships between inputs and outputs or to find patterns in data. Using neural networks as a tool, data warehousing firms harvest information from datasets Neural networks essentially comprise three pieces: the architecture or model; the learning algorithm; and the activation functions
Intelligent Agents
An Intelligent agent is software that assists people and acts on their behalf. Intelligent agents work by allowing people to delegate work that they could have done to the agent software These are special types of software applications used for data filtering and analysis, information brokering, condition monitoring and alarm generation, workflow management, personal assistance, simulation and gaming etc.
Genetic Algorithms
Genetic Algorithms work on the principle of expansion of possible outcomes. They are used for clustering and association rules. Given a fixed no of possible outcomes, they seek to define new and better solutions