Decision tree methodology
is a usually used data mining method for starting classification systems based
on multiple covariates or for developing forecast algorithms for a target
The basic concept of the
are three types of nodes. (Lu and Song, 2017)
A root hub, additionally called a choice
hub, symbolizes a decision that will bring about the segment of all records
into at least two similarly selective subsets.
Internal hubs, additionally called shot
hubs, symbolize one of the conceivable choices accessible at that reality in
the tree structure, the upper edge of the hub is associated with its parent hub
and the most profound edge is associated with its kid hubs or leaf hubs.
Leaf hubs, likewise called end hubs, speak
to the last impact of a blend of choices or occasions.
Branches. (Lu and Song, 2017)
Branches symbolize chance outcomes or events that originate from root
hubs and inward hubs.
A decision tree demonstrate is composed utilizing a pecking order of
branches. Every way from the root hub over inner hubs to a leaf hub speaks to a
grouping choice run the show.
These decision tree ways can likewise be spoken to as ‘assuming at that
3. Splitting. (Lu and Song, 2017)
Only the input
variables interrelated to the target variable are charity to split parent nodes
into purer child nodes of the target variable.
Both separate input
variables and incessant input variables which are collapsed into two or more
categories can be used.
When building the
model one need first identify the most important input variables, and then
split records at the root node and at succeeding internal nodes into two or
more classes or ‘bins’ based on the status of these variables.
The type of the decision tree
Classification tree analysis is when the forecast
outcome is the class to which the data belongs.
Regression tree analysis is when the
predicted outcome can be considered a real number (e.g. the price of a house,
or a patient’s length of stay in a hospital).
Decision tree can quickly express complex options
plainly. Furthermore, can without much of a spring adjust a decision tree as
new data storms up noticeably available. Set up a decision tree to look at how shifting
information regards influence different choice options. Standard decision tree certification
is anything but difficult to receive. You can think about contending choices
even without finish data as far as threat and likely esteem. (Anon, 2017)
2. Logistic Regression
Logistic regression is used to find the
probability of event=Success and event=Failure. We should use logistic
regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No)
logistic model is charity to estimate the probability of a binary response
based on one or more predictor (or independent) variables (features).
one to say that the presence of a risk factor increases the odds of a given
outcome by a specific factor.
Logistic regression doesn’t require
linear relationship between dependent and independent variables. It can handle various types of relationships
because it applies a non-linear log transformation to the predicted odds ratio.
The type of logistic regression
logistic regression (Wiley,2011)
used when the dependent variable is
dichotomous and the independent variables are either continuous or categorical.
dependent variable is not dichotomous and is comprised of more than two
categories, a multinomial logistic regression.
Logistic Regression (Wiley,2011)
regression analysis to conduct when the dependent variable is nominal with more
than two levels. Thus it is an extension of logistic regression, which
analyses dichotomous (binary) dependents.
regression is used to describe data and to explain the relationship between one
dependent nominal variable and one or more continuous-level (interval or ratio
scale) independent variables.
The logistic regression does not assume a linear relationship between
the independent variable and dependent variable and it may handle nonlinear
effects. The dependent variable need not be normally distributed. It does not
require that the independents be interval and unbounded. Logistic regression
come at a cost, it requires much more data to achieve stable, meaningful
results. logistic regression come at a cost: it requires much more data to
achieve stable, meaningful results. With standard regression, and dependent
variable, typically 20 data points per predictor is considered the lower bound.
For logistic regression, at least 50 data points per predictor is necessary to
achieve stable results (Wiley,2011)
3) Neural Network
Neural network is a method of the computing,
based on the interaction of multiple connected processing elements. Ability to
deal with incomplete information. When an element of the neural network fails,
it can continue without any problem by their parallel nature.
(Liu, Yang and Ramsay, 2011)
Basic concept of the
neural network (Liu, Yang and Ramsay, 2011)
understanding and modelling operations of
single neurons or small neuronal circuits, e.g. minicolumns.
Modelling information processing in actual
brain systems, e.g. auditory tract.
Modelling human perception and cognition.
Artificial Neural Networks
Used in Pattern recognition, adaptive
control, time series prediction and etc.
areas contributing to Artificial neural networks are Statistical Pattern
recognition, Computational Learning Theory, Computational Neuroscience,
Dynamical systems theory and Nonlinear optimisation.
The type of neural
There is the commonest type of neural
network in practical application. The first layer is the input and the last
layer is output.
If the is more than one hidden layer, we
call them ‘deep’ neural networks. They compute a series of transformation that
change the similarities between cases.
These have directed cycles in their
connection graph. That means you can sometimes get back to where you started by
following the arrows.
They can have complicated dynamic and this can
make them very difficult to train.
A neural network can perform tasks that a linear program cannot. A neural
network learns and does not need to be reprogrammed. It can be implemented in
any application. It can be implemented without any problem. Neural networks
requiring less formal statistical training, ability to implicitly detect
complex nonlinear relationships between dependent and independent