root/MGET/Branches/Jason/PythonPackage/dist/TracOnlineDocumentation/Documentation/ArcGISReference/TreeModel.FitToArcGISTable.html @ 842

Revision 842, 59.1 KB (checked in by jjr8, 20 months ago)

Rebuilt installation packages again. This will be merged with the Trunk and released as MGET 0.8a28.

Line 
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml"><head><link rel="stylesheet" type="text/css" href="81help.css?format=raw" /><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Fit Tree Model</title></head><body><table style="margin-top:-1em; margin-bottom:0; padding:0; margin-left:-1em"><tr><td style="background:white"><img width="875" height="70" alt="ArcToolbox banner" src="AHBanner_ArcToolbox.gif?format=raw" /></td></tr></table><h1>Fit Tree Model</h1><p></p><p>Fits a tree model to data in an ArcGIS table.</p><p>Tree models were first introduced by Breiman et al. (1984) in the
4classic Classification and Regression Tree (CART) software and are
5frequently referenced by that name. Since that time, the original
6methods have been reimplemented in R and many other statistical
7programs. This tool fits tree models using the R rpart package by
8Terry M. Therneau and Elizabeth J. Atkinson, and plots them using the
9R rpart.plot package by Stephen Milborrow.</p><p><b>References</b></p><p>Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and regression trees. Chapman &amp; Hall/CRC.</p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p><br /><p><h2><img width="11" height="11" border="0" src="sm_arrow_down.gif?format=raw" /> Command line syntax</h2></p><div Class="expand" id="id103139">TreeModelFitToArcGISTable_GeoEco &lt;inputTable&gt; &lt;outputModelFile&gt; &lt;formula&gt; &lt;ANOVA | Class | Exp | Poisson&gt; {where} {allowMissingCovariates} {minSplit} {minBucket} {cp} {maxCompete} {maxSurrogate} {2 | 0 | 1} {0 | 1} {xval} {maxDepth} {Minimum error | 1-SE rule | Interactive | User specified CP} {pruningCP} {xColumnName} {yColumnName} {zColumnName} {mColumnName} {writeSummaryFile} {writeDiagnosticPlots} {writeTreePlot} {writePrunedTreePlot} {png | emf} {res} {width} {height} {pointSize} {bg} {0 | 1 | 2 | 3 | 4} {1 | 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9} {percentage} {under} {clipRightLabels} {fallenLeaves} {0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9} {branch} {uniform} {digits} {varlen} {faclen} {cex} {tweak} {compress} {ycompress} <br /><br /><b>Parameters</b><br /><table width="100%" border="0" cellpadding="5"><tbody><tr><th width="40%"><b>Expression</b></th><th width="60%"><b>Explanation</b></th></tr><tr><td class="info">&lt;inputTable&gt;</td><td class="info" align="left"><p>ArcGIS table, table view, feature class, or feature layer
10containing the data for which the model should be fitted.</p></td></tr><tr><td class="info">&lt;outputModelFile&gt;</td><td class="info" align="left"><p>Output file to receive the fitted model. The file will not be in a
11user-readable format. After the model is fitted, you can provide the
12file to other tools that perform further analysis or visualization of
13the fitted model.</p><p>It is suggested, but not required, that you give the file an .Rdata
14extension.</p></td></tr><tr><td class="info">&lt;formula&gt;</td><td class="info" align="left"><p>Formula that specifies the table field that is the response
15variable and the table fields that are the terms of the model.</p><p>The formula must be in the format expected by the R glm function:</p><dl><dt></dt><dd><pre>response ~ term1 + term2 + ... + termN</pre></dd></dl><p>response is the table field that will be modeled as the response
16variable and the terms are the table fields that will serve as the
17predictor variables. The field names are case sensitive. If any field
18used in the formula is NULL for a given row, that row will not be used
19in fitting the model.</p><p>For example, if you have a field Presence that indicates the presence
20or absence of a species (1 or 0) and you want to model it in terms of
21sampled environmental covariates stored in the SST, ChlDensity, and
22Depth fields, you would use the formula:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth</pre></dd></dl><p>By default, all terms are treated as continuous variables. To indicate
23that a term should be treated as a categorical variable, use the
24factor function. For example, if SubstrateType is an integer code that
25should be treated as categorical:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth + factor(SubstrateType)</pre></dd></dl><p>The model terms may also use these operators:</p><ul><li><p>The : operator denotes the interaction of variables a and b. For
26example: a:b.</p></li></ul><ul><li><p>The * operator denotes "crossing". For example, a*b is identical to
27a+b+a:b.</p></li></ul><ul><li><p>The ^ operator denotes crossing to the Nth degree. For example,
28(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a
29formula containing the main effects for a, b and c together with
30their second-order interactions.</p></li></ul><ul><li><p>The %in% operator indicates that the terms on its left are nested
31within those on the right. For example a + b %in% a expands to the
32formula a + a:b.</p></li></ul><ul><li><p>The - operator (minus) removes the specified terms, so that
33(a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also
34used to remove the intercept term: y ~ x - 1 is a line through the
35origin. A model with no intercept can be also specified as y ~ x + 0
36or y ~ 0 + x.</p></li></ul><p>While formulae usually involve just variable names, they can also
37involve arithmetic expressions. The formula log(y) ~ a + log(x) is
38quite legal. When such arithmetic expressions involve operators which
39are also used symbolically in model formulae, there can be confusion
40between arithmetic and symbolic operator use.</p><p>To avoid this confusion, the function I() can be used to bracket those
41portions of a model formula where the operators are used in their
42arithmetic sense. For example, in the formula y ~ a + I(b+c), the term
43b+c is to be interpreted as the sum of b and c.</p><p>Please see the topics "glm" and "formula" in the R documentation for
44more information.</p></td></tr><tr><td class="info">&lt;ANOVA | Class | Exp | Poisson&gt;</td><td class="info" align="left"><p>Method to use for splitting the tree, one of:</p><ul><li><p>ANOVA - Use this method to build a regression tree, i.e. when you
45are modeling a continuous response variable, such as the abundance
46of a species. With this method, the splits will be chosen to
47maximize the between-groups sum-of-squares in a simple analysis of
48variance.</p></li></ul><ul><li><p>Class - Use this method to build a classification tree, i.e. when
49you are modeling a categorical response variable, such as the
50presence or absence of a species. When this method is selected, the
51response variable is assumed to be categorical and the R factor
52function is automatically applied to it.</p></li></ul><ul><li><p>Exp - Use this method to build a regression tree using exponential
53scaling. For more information about this method, please see the
54references below.</p></li></ul><ul><li><p>Poisson - Use this method to build a regression tree using Poisson
55regression, which is appropriate for event rate data. For more
56information about this method, please see the references below.</p></li></ul><p><b>References</b></p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p></td></tr><tr><td class="info">{where}</td><td class="info" align="left"><p>SQL WHERE clause expression that specifies the subset of rows to
57process. If this parameter is not provided, all of the rows will be
58processed. If this parameter is provided but the underlying database
59does not support WHERE clauses, an error will be raised.</p><p>The exact syntax of this expression depends on the underlying
60database. ESRI recommends you reference fields using the following
61syntax:</p><ul><li><p>If you're querying ArcInfo coverages, shapefiles, INFO tables or
62dBASE tables (.dbf files), enclose field names in double quotes in
63the SQL expression: "MY_FIELD".</p></li></ul><ul><li><p>If you're querying Microsoft Access tables or personal
64geodatabase tables, enclose field names in square brackets:
65[MY_FIELD].</p></li></ul><ul><li><p>If you're querying ArcSDE geodatabase tables, an ArcIMS feature
66class, or an ArcIMS image service sublayer, don't enclose field
67names: MY_FIELD.</p></li></ul></td></tr><tr><td class="info">{allowMissingCovariates}</td><td class="info" align="left"><p>If this option is enabled (the default), records will be included
68in the model fitting process so long as they have a value for the
69response variable and at least one predictor variable. If this option
70is disabled, records must have values for the response variable and
71all predictor variables in order to be included.</p><p>The R rpart package that is used to fit the model has the novel
72capability of allowing records that are missing some data to still
73participate in the model fitting process. For more information about
74how this works, please see
75<a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.</p></td></tr><tr><td class="info">{minSplit}</td><td class="info" align="left"><p>The minimum number of observations that must exist in a node of
76the tree in order for a split of that node to be attempted. The
77default value, 20, was taken from the R rpart package that is used to
78fit the model.</p></td></tr><tr><td class="info">{minBucket}</td><td class="info" align="left"><p>The minimum number of observations that may be in any leaf node of
79the tree. The default value, 7, was taken from the R rpart package
80that is used to fit the model. By default, rpart recommends that this
81parameter be set to one third of the previous parameter.</p></td></tr><tr><td class="info">{cp}</td><td class="info" align="left"><p>Any split that does not decrease the overall lack of fit by a
82factor of this parameter will not be attempted. For instance, with
83ANOVA splitting, this means that the overall Rsquare must increase by
84this parameter at each step. The main role of this parameter is to
85save computing time by pruning off splits that are obviously not
86worthwhile. Essentially, you inform the tool that any split which does
87not improve the fit by this parameter will likely be pruned off by
88cross-validation, and that hence the tool need not pursue it.</p><p>The default value, 0.01, was taken from the R rpart package that is
89used to fit the model.</p></td></tr><tr><td class="info">{maxCompete}</td><td class="info" align="left"><p>The number of competitor splits to retain in the output. It is
90useful to know not just which split was chosen, but which variable
91came in second, third, etc. The default value, 4, was taken from the R
92rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{maxSurrogate}</td><td class="info" align="left"><p>The number of surrogate splits to retain in the output. If this is
93set to zero the compute time will be shortened, since approximately
94half of the computational time (other than setup) is used in the
95search for surrogate splits. The default value, 5, was taken from the
96R rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{2 | 0 | 1}</td><td class="info" align="left"><p>The method for using surrogates in the splitting process, one of:</p><ul><li><p>0 - display only; an observation with a missing value for the
97primary split rule is not sent further down the tree.</p></li></ul><ul><li><p>1 - use surrogates, in order, to split subjects missing the primary
98variable; if all surrogates are missing the observation is not
99split.</p></li></ul><ul><li><p>2 - if all surrogates are missing, then send the observation in the
100majority direction. This is the recommendations of Breiman, et al.</p></li></ul><p>The default value, 2, was taken from the R rpart package that is used
101to fit the model.</p></td></tr><tr><td class="info">{0 | 1}</td><td class="info" align="left"><p>The method used to select the best surrogate, one of:</p><ul><li><p>0 - the tool uses the total number of correct classifications for a
102potential surrogate variable.</p></li></ul><ul><li><p>1 - the tool uses the percent correct, calculated over the
103non-missing values of the surrogate.</p></li></ul><p>The default value, 0, was taken from the R rpart package that is used
104to fit the model. This value more severely penalizes covariates with a
105large number of missing values.</p></td></tr><tr><td class="info">{xval}</td><td class="info" align="left"><p>The number of cross-validations to perform.</p><p>The R rpart package that is used to fit the model uses a default of
10610, but we have found that so few iterations can cause the calculated
107cross-validation error to differ substantially over several runs of
108the tool using identical input data and parameter values. In one case,
109we observed the cross-validation errors to vary by over 10%. Because
110the cross-validation errors are often used to prune the tree, we
111believe it is important to have accurate estimates of them, so we
112increased the default to 1000. This will cause complicated models to
113run substantially slower. If you find your model is too slow, decrease
114the value.</p></td></tr><tr><td class="info">{maxDepth}</td><td class="info" align="left"><p>The maximum depth of any node of the final tree, with the root
115node counted as depth 0. The default value, 30, was taken from the R
116rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{Minimum error | 1-SE rule | Interactive | User specified CP}</td><td class="info" align="left"><p>Method to use for pruning the tree, one of:</p><ul><li><p>Minimum error - The tool will prune the tree using the complexity
117parameter associated with the minimum cross-validation error.</p></li></ul><ul><li><p>1-SE rule - The tool will prune the tree using the complexity
118parameter obtained using the 1-SE rule described by Therneau and
119Atkinson in chapter 4 of
120<a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.
121A plot of the mean cross-validation errors versus candidate values
122of the complexity parameter often has an initial sharp drop followed
123by a relatively flat plateau and then a slow rise. (This tool
124produces that plot as an optional diagnostic output.) According to
125the 1-SE rule, any cross-validation error within one standard error
126of the minimum cross-validation error is considered equivalent to
127the minimum (i.e. considered to be part of the flat plateau). The
1281-SE rule chooses the largest complexity parameter that yields a
129cross-validation error equivalent ot the minimum. This results in an
130optimal tree, i.e. the tree with the fewest number of splits that
131yields a cross-validation error equivalent to the minimum.</p></li></ul><ul><li><p>Interactive - The tool will display the unpruned tree in a window,
132allowing you to prune it interactively with the mouse. If you click
133on a split it will be marked as deleted. If you click on an
134already-deleted split it will be undeleted (if its parent is not
135deleted). Information about the node is printed as you click. When
136you have finished pruning, click on the QUIT button.</p></li></ul><ul><li><p>User specified CP - The tool will prune the tree using the
137complexity parameter you specify below.</p></li></ul><p>If this parameter is omitted, the tree will not be pruned.</p></td></tr><tr><td class="info">{pruningCP}</td><td class="info" align="left"><p>Complexity parameter for pruning the tree. This parameter is only
138used when the Pruning Method is set to 'User specified'.</p></td></tr><tr><td class="info">{xColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the X coordinates of point
139features. If the input table is a point feature class or layer, the X
140coordinates will be extracted from the points and be accessible in the
141formula using the name provided for this parameter.</p></td></tr><tr><td class="info">{yColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the Y coordinates of point
142features. If the input table is a point feature class or layer, the Y
143coordinates will be extracted from the points and be accessible in the
144formula using the name provided for this parameter.</p></td></tr><tr><td class="info">{zColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the Z coordinates of point
145features. If the input table is a point feature class or layer that
146has Z coordinates, the Z coordinates will be extracted from the points
147and be accessible in the formula using the name provided for this
148parameter.</p></td></tr><tr><td class="info">{mColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the measure values of point
149features. If the input table is a point feature class or layer that
150has measure values, the measure values will be extracted from the
151points and be accessible in the formula using the name provided for this
152parameter.</p></td></tr><tr><td class="info">{writeSummaryFile}</td><td class="info" align="left"><p>If True, this tool will write summary information about the fitted
153model to a text file. (This is the same information that the tool
154outputs as log messages.) The file will have the name X_summary.txt,
155where X is the name of the output model file, minus any
156extension.</p></td></tr><tr><td class="info">{writeDiagnosticPlots}</td><td class="info" align="left"><p>If True, this tool will write diagnostic plots:</p><ul><li><p>X_cp.Y - visual representation of the cross-validation results for
157the unpruned tree, to assist you with choosing a Complexity
158Parameter for pruning the tree. The x-axis represents possible
159choices for the Complexity Parameter and the y-axis represents the
160means and standard deviations of the errors in the cross-validated
161prediction that would result. The dashed horizontal line is drawn 1
162standard error above the minimum of the curve. A good choice of the
163Complexity Parameter for pruning is the leftmost value for which the
164mean error lies below the line. This value will be chosen
165automatically if the Pruning Method parameter is set to '1-SE rule'.</p></li></ul><ul><li><p>X_rsquare.Y - two-panel plot only produced for the ANOVA splitting
166method. The first panel shows the r-square (both apparent and
167apparent from cross-validation) versus the number of splits. The
168second panel shows the mean error in the cross-validated prediction
169versus the number of splits (this is essentially the same plot as
170the X_cp.Y plot described above). Both panels are produced for the
171unpruned tree.</p></li></ul><ul><li><p>X_residuals.Y - plot of the residuals vs. the fitted values for the
172unpruned tree.</p></li></ul><ul><li><p>X_pruned_residuals.Y - plot of the residuals vs. the fitted values
173for the pruned tree. This plot will only be produced if the tree is
174pruned.</p></li></ul><p>In the file names above, X is the name of the output model file, minus
175any extension, and Y is the extension of the selected output plot
176format.</p></td></tr><tr><td class="info">{writeTreePlot}</td><td class="info" align="left"><p>If True, this tool will write a plot of the unpruned tree to a
177file having the name X_unpruned_tree.Y, where X is the name of the
178output model file minus the extension and Y is the extension of the
179selected output plot format.</p></td></tr><tr><td class="info">{writePrunedTreePlot}</td><td class="info" align="left"><p>If True, this tool will write a plot of the pruned tree to a file
180having the name X_pruned_tree.Y, where X is the name of the output
181model file minus the extension and Y is the extension of the selected
182output plot format. This plot will only be produced if the tree is
183pruned.</p></td></tr><tr><td class="info">{png | emf}</td><td class="info" align="left"><p>Plot file format, one of:</p><ul><li><p>emf - Windows enhanced metafile (EMF) format. This is a vector
184format that may be printed and resized without any pixelation and is
185therefore suitable for use in printable documents that recognize
186this format (e.g. Microsoft Word or Microsoft Visio).</p></li></ul><ul><li><p>png - Portable network graphics (PNG) format. This is a compressed,
187lossless, highly portable raster format suitable for use in web
188pages or other locations where a raster format is desired. Most
189scientific journals accept PNG; they typically request that files
190have a resolution of at least 1000 DPI.</p></li></ul></td></tr><tr><td class="info">{res}</td><td class="info" align="left"><p>PNG plot file resolution, in dots per inch (DPI). The default is
191set to a high value (1000) because this is the minimum resolution
192typically required by scientific journals that accept figures in PNG
193format.</p><p>This parameter is ignored for EMF format because it is a vector
194format.</p></td></tr><tr><td class="info">{width}</td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG
195format).</p></td></tr><tr><td class="info">{height}</td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG
196format).</p></td></tr><tr><td class="info">{pointSize}</td><td class="info" align="left"><p>The default pointsize of text in diagnostic plots (the size of the
197text in tree plots is controlled by a different parameter).</p></td></tr><tr><td class="info">{bg}</td><td class="info" align="left"><p>PNG plot file background color. The color must be a valid name in
198R's color palette, or "transparent" if there is no background color.
199This parameter is ignored if the plot format file is EMF.</p></td></tr><tr><td class="info">{0 | 1 | 2 | 3 | 4}</td><td class="info" align="left"><p>Type of tree plots to create, one of:</p><ul><li><p>0 - The default. Draw a split label at each split and a node label
200at each leaf.</p></li></ul><ul><li><p>1 - Label all nodes, not just leaves.</p></li></ul><ul><li><p>2 - Like 1 but draw the split labels below the node labels. Similar
201to the plots in the CART book.</p></li></ul><ul><li><p>3 - Draw separate split labels for the left and right directions.</p></li></ul><ul><li><p>4 - Like 3 but label all nodes, not just leaves.</p></li></ul></td></tr><tr><td class="info">{1 | 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9}</td><td class="info" align="left"><p>Extra information to display at the nodes, one of:</p><ul><li><p>0 - No extra information.</p></li></ul><ul><li><p>1 - The default. Display the number of observations that fall in the
202node (per class for Class models; prefixed by the number of events
203for Poisson and Exp models).</p></li></ul><ul><li><p>2 - Class models: display the classification rate at the node, expressed as the
204number of correct classifications and the number of observations in the node.
205Poisson and Exp models: display the number of events.</p></li></ul><ul><li><p>3 - Class models only: misclassification rate at the node, expressed
206as the number of incorrect classifications and the number of
207observations in the node.</p></li></ul><ul><li><p>4 - Class models only: probability per class of observations in the
208node (conditioned on the node, sum across a node is 1).</p></li></ul><ul><li><p>5 - Class models only: like 4 but do not display the fitted class.</p></li></ul><ul><li><p>6 - Class models only: the probability of the second class only.
209Useful for binary responses.</p></li></ul><ul><li><p>7 - Class models only: like 6 but do not display the fitted class.</p></li></ul><ul><li><p>8 - Class models only: the probability of the fitted class.</p></li></ul><ul><li><p>9 - Class models only: the probabilities times the fraction of
210observations in the node (the probability relative to all
211observations, sum across all leaves is 1).</p></li></ul></td></tr><tr><td class="info">{percentage}</td><td class="info" align="left"><p>If True, the default, nodes will be labeled with the percentage of
212observations in the node. The percentage will be displayed below the
213"extra information" (if any is requested).</p></td></tr><tr><td class="info">{under}</td><td class="info" align="left"><p>If True, the default, extra information and percentage of
214observations will be displayed below the nodes. If False, they will be
215displayed within the nodes' boxes.</p><p>This parameter is ignored if neither extra information nor percentage
216of observations are requested.</p></td></tr><tr><td class="info">{clipRightLabels}</td><td class="info" align="left"><p>If True, the default, the right-hand split labels on plots of type
2173 or 4 will not include "variable=". If False, the right-hand labels
218will include "variable=", just like the left-hand labels.</p><p>This parameter is ignored the plot type is not 3 or 4.</p></td></tr><tr><td class="info">{fallenLeaves}</td><td class="info" align="left"><p>If True, all leaf nodes will be displayed at the bottom. If False,
219the default, leaf nodes will be displayed where they would normally
220appear.</p></td></tr><tr><td class="info">{0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9}</td><td class="info" align="left"><p>Type of branches to draw. If zero, the default, the tool will draw
221conventional branches having a constant narrow width. If nonzero, the tool
222will draw "wide branches", with branch widths proportional to the
223specified parameter, one of:</p><ul><li><p>1 - Deviance</p></li></ul><ul><li><p>2 - Square root of deviance</p></li></ul><ul><li><p>3 - Deviance / number of observations</p></li></ul><ul><li><p>4 - Square root of (deviance / number of observations)</p></li></ul><ul><li><p>5 - Number of observations</p></li></ul><ul><li><p>6 - Complexity parameter</p></li></ul><ul><li><p>7 - Absolute value of the predicted value</p></li></ul><ul><li><p>8 - Predicted value minus the minimum predicted value</p></li></ul><ul><li><p>9 - Constant wide width, for checking the visual distortion that
224results when wide branches are drawn at different angles</p></li></ul></td></tr><tr><td class="info">{branch}</td><td class="info" align="left"><p>Controls the shape of the branches from parent to child nodes. Any
225number from 0 to 1 is allowed. A value of 1 gives square shouldered
226branches, a value of 0 give V shaped branches, with other values being
227intermediate.</p><p>Note that if the Branch Type parameter is nonzero, the Branch Shape
228parameter will be rounded to 1 or 0 (e.g. a Branch Shape of 0.75 will
229be rounded to 1).</p></td></tr><tr><td class="info">{uniform}</td><td class="info" align="left"><p>If True, the default, the vertical spacing of the nodes will be
230uniform. If False, the nodes will be spaced proportionally to the fit
231(more precisely, to the difference between a node's deviance and the
232sum of its children's deviances). Small spaces must be expanded to
233leave room for the labels.</p><p>Note: if this parameter is False and the Text Magnification Factor is
234omitted (the default), very small text can sometimes result.</p></td></tr><tr><td class="info">{digits}</td><td class="info" align="left"><p>Number of significant digits to display in floating-point numbers.</p><p>Probabilities and percentages are treated specially. Probabilities are
235displayed with the specified number of digits after the decimal point
236(by default 2 digits). Percentages are displayed with the specified
237number of digits minus 2 after the decimal point (by default no
238digits).</p></td></tr><tr><td class="info">{varlen}</td><td class="info" align="left"><p>Length of variable names in text at the splits (and, for class
239responses, the class displayed at the node). There are three
240possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>&gt;0 - Use an abbreviation algorithm to shorten the names to at
241least the specified number, such that they remain unique.</p></li></ul><ul><li><p>&lt;0 - Truncate names to the shortest length where they are still
242unique, but never truncate to shorter than the specified number
243(e.g. the value -5 means never truncate to shorter than 5
244characters).</p></li></ul></td></tr><tr><td class="info">{faclen}</td><td class="info" align="left"><p>Length of factor level names (i.e. categorical variable values) in
245splits. There are four possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>1 - Represent factor levels with alphabetic characters (a for the
246first level, b for the second, and so on).</p></li></ul><ul><li><p>&gt;1 - Use an abbreviation algorithm to shorten the names to at
247least the specified number, such that they remain unique.</p></li></ul><ul><li><p>&lt;0 - Truncate names to the shortest length where they are still
248unique, but never truncate to shorter than the specified number
249(e.g. the value -5 means never truncate to shorter than 5
250characters).</p></li></ul></td></tr><tr><td class="info">{cex}</td><td class="info" align="left"><p>A numerical value giving the amount by which text should be
251magnified relative to the default. If omitted, the default, the text
252size will be calculated automatically.</p><p>The default automatic calculation means that this seemingly innocuous
253argument has a far reaching effect. If necessary it will trigger the
254node shifting engine to get a decent type size (see the Compress Tree
255Vertically parameter).</p></td></tr><tr><td class="info">{tweak}</td><td class="info" align="left"><p>Adjust the (possibly automatically calculated) Text Magnification
256Factor. For example, use 1.1 to make the text 10% larger. The default
257is 1, meaning no adjustment.</p><p>Note that font sizes are discrete, so the Text Magnification Factor
258you ask for may not be the one you get. And a small tweak may not
259actually change the type size or change it more than you want.</p></td></tr><tr><td class="info">{compress}</td><td class="info" align="left"><p>If True, the default, the tree will be compressed horizontally by
260shifting nodes horizontally where space is available.</p></td></tr><tr><td class="info">{ycompress}</td><td class="info" align="left"><p>If True, the default, and the initial automatically calculated
261Text Magnification Factor is less than 0.7, crowded labels will be
262shifted vertically where space is available. This often allows
263considerably larger text.</p><p>Set this parameter to False if you fell the resulting plot is too
264messy. The shifting algorithm may work a little better (allowing
265larger text) for plot types 1, 2, and 3.</p></td></tr></tbody></table></div><p><h2><img width="11" height="11" border="0" src="sm_arrow_down.gif?format=raw" /> Scripting syntax</h2></p><div Class="expand" id="TEST">TreeModelFitToArcGISTable_GeoEco (inputTable, outputModelFile, formula, method, where, allowMissingCovariates, minSplit, minBucket, cp, maxCompete, maxSurrogate, useSurrogate, surrogateStyle, xval, maxDepth, pruningMethod, pruningCP, xColumnName, yColumnName, zColumnName, mColumnName, writeSummaryFile, writeDiagnosticPlots, writeTreePlot, writePrunedTreePlot, plotFileFormat, res, width, height, pointSize, bg, treePlotType, extra, percentage, under, clipRightLabels, fallenLeaves, branchType, branch, uniform, digits, varlen, faclen, cex, tweak, compress, ycompress) <br /><br /><b>Parameters</b><br /><table width="100%" border="0" cellpadding="5"><tbody><tr><th width="40%"><b>Expression</b></th><th width="60%"><b>Explanation</b></th></tr><tr><td class="info">Input table (Required) </td><td class="info" align="left"><p>ArcGIS table, table view, feature class, or feature layer
266containing the data for which the model should be fitted.</p></td></tr><tr><td class="info">Output model file (Required) </td><td class="info" align="left"><p>Output file to receive the fitted model. The file will not be in a
267user-readable format. After the model is fitted, you can provide the
268file to other tools that perform further analysis or visualization of
269the fitted model.</p><p>It is suggested, but not required, that you give the file an .Rdata
270extension.</p></td></tr><tr><td class="info">Formula (Required) </td><td class="info" align="left"><p>Formula that specifies the table field that is the response
271variable and the table fields that are the terms of the model.</p><p>The formula must be in the format expected by the R glm function:</p><dl><dt></dt><dd><pre>response ~ term1 + term2 + ... + termN</pre></dd></dl><p>response is the table field that will be modeled as the response
272variable and the terms are the table fields that will serve as the
273predictor variables. The field names are case sensitive. If any field
274used in the formula is NULL for a given row, that row will not be used
275in fitting the model.</p><p>For example, if you have a field Presence that indicates the presence
276or absence of a species (1 or 0) and you want to model it in terms of
277sampled environmental covariates stored in the SST, ChlDensity, and
278Depth fields, you would use the formula:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth</pre></dd></dl><p>By default, all terms are treated as continuous variables. To indicate
279that a term should be treated as a categorical variable, use the
280factor function. For example, if SubstrateType is an integer code that
281should be treated as categorical:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth + factor(SubstrateType)</pre></dd></dl><p>The model terms may also use these operators:</p><ul><li><p>The : operator denotes the interaction of variables a and b. For
282example: a:b.</p></li></ul><ul><li><p>The * operator denotes "crossing". For example, a*b is identical to
283a+b+a:b.</p></li></ul><ul><li><p>The ^ operator denotes crossing to the Nth degree. For example,
284(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a
285formula containing the main effects for a, b and c together with
286their second-order interactions.</p></li></ul><ul><li><p>The %in% operator indicates that the terms on its left are nested
287within those on the right. For example a + b %in% a expands to the
288formula a + a:b.</p></li></ul><ul><li><p>The - operator (minus) removes the specified terms, so that
289(a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also
290used to remove the intercept term: y ~ x - 1 is a line through the
291origin. A model with no intercept can be also specified as y ~ x + 0
292or y ~ 0 + x.</p></li></ul><p>While formulae usually involve just variable names, they can also
293involve arithmetic expressions. The formula log(y) ~ a + log(x) is
294quite legal. When such arithmetic expressions involve operators which
295are also used symbolically in model formulae, there can be confusion
296between arithmetic and symbolic operator use.</p><p>To avoid this confusion, the function I() can be used to bracket those
297portions of a model formula where the operators are used in their
298arithmetic sense. For example, in the formula y ~ a + I(b+c), the term
299b+c is to be interpreted as the sum of b and c.</p><p>Please see the topics "glm" and "formula" in the R documentation for
300more information.</p></td></tr><tr><td class="info">Splitting method (Required) </td><td class="info" align="left"><p>Method to use for splitting the tree, one of:</p><ul><li><p>ANOVA - Use this method to build a regression tree, i.e. when you
301are modeling a continuous response variable, such as the abundance
302of a species. With this method, the splits will be chosen to
303maximize the between-groups sum-of-squares in a simple analysis of
304variance.</p></li></ul><ul><li><p>Class - Use this method to build a classification tree, i.e. when
305you are modeling a categorical response variable, such as the
306presence or absence of a species. When this method is selected, the
307response variable is assumed to be categorical and the R factor
308function is automatically applied to it.</p></li></ul><ul><li><p>Exp - Use this method to build a regression tree using exponential
309scaling. For more information about this method, please see the
310references below.</p></li></ul><ul><li><p>Poisson - Use this method to build a regression tree using Poisson
311regression, which is appropriate for event rate data. For more
312information about this method, please see the references below.</p></li></ul><p><b>References</b></p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p></td></tr><tr><td class="info">Where clause (Optional) </td><td class="info" align="left"><p>SQL WHERE clause expression that specifies the subset of rows to
313process. If this parameter is not provided, all of the rows will be
314processed. If this parameter is provided but the underlying database
315does not support WHERE clauses, an error will be raised.</p><p>The exact syntax of this expression depends on the underlying
316database. ESRI recommends you reference fields using the following
317syntax:</p><ul><li><p>If you're querying ArcInfo coverages, shapefiles, INFO tables or
318dBASE tables (.dbf files), enclose field names in double quotes in
319the SQL expression: "MY_FIELD".</p></li></ul><ul><li><p>If you're querying Microsoft Access tables or personal
320geodatabase tables, enclose field names in square brackets:
321[MY_FIELD].</p></li></ul><ul><li><p>If you're querying ArcSDE geodatabase tables, an ArcIMS feature
322class, or an ArcIMS image service sublayer, don't enclose field
323names: MY_FIELD.</p></li></ul></td></tr><tr><td class="info">Include records that are missing covariates (Optional) </td><td class="info" align="left"><p>If this option is enabled (the default), records will be included
324in the model fitting process so long as they have a value for the
325response variable and at least one predictor variable. If this option
326is disabled, records must have values for the response variable and
327all predictor variables in order to be included.</p><p>The R rpart package that is used to fit the model has the novel
328capability of allowing records that are missing some data to still
329participate in the model fitting process. For more information about
330how this works, please see
331<a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.</p></td></tr><tr><td class="info">Minimum number of observations to attempt a split (Optional) </td><td class="info" align="left"><p>The minimum number of observations that must exist in a node of
332the tree in order for a split of that node to be attempted. The
333default value, 20, was taken from the R rpart package that is used to
334fit the model.</p></td></tr><tr><td class="info">Minimum number of observations in a leaf node (Optional) </td><td class="info" align="left"><p>The minimum number of observations that may be in any leaf node of
335the tree. The default value, 7, was taken from the R rpart package
336that is used to fit the model. By default, rpart recommends that this
337parameter be set to one third of the previous parameter.</p></td></tr><tr><td class="info">Complexity parameter (Optional) </td><td class="info" align="left"><p>Any split that does not decrease the overall lack of fit by a
338factor of this parameter will not be attempted. For instance, with
339ANOVA splitting, this means that the overall Rsquare must increase by
340this parameter at each step. The main role of this parameter is to
341save computing time by pruning off splits that are obviously not
342worthwhile. Essentially, you inform the tool that any split which does
343not improve the fit by this parameter will likely be pruned off by
344cross-validation, and that hence the tool need not pursue it.</p><p>The default value, 0.01, was taken from the R rpart package that is
345used to fit the model.</p></td></tr><tr><td class="info">Number of competitor splits to retain (Optional) </td><td class="info" align="left"><p>The number of competitor splits to retain in the output. It is
346useful to know not just which split was chosen, but which variable
347came in second, third, etc. The default value, 4, was taken from the R
348rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Number of surrogate splits to retain (Optional) </td><td class="info" align="left"><p>The number of surrogate splits to retain in the output. If this is
349set to zero the compute time will be shortened, since approximately
350half of the computational time (other than setup) is used in the
351search for surrogate splits. The default value, 5, was taken from the
352R rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Surrogate usage method (Optional) </td><td class="info" align="left"><p>The method for using surrogates in the splitting process, one of:</p><ul><li><p>0 - display only; an observation with a missing value for the
353primary split rule is not sent further down the tree.</p></li></ul><ul><li><p>1 - use surrogates, in order, to split subjects missing the primary
354variable; if all surrogates are missing the observation is not
355split.</p></li></ul><ul><li><p>2 - if all surrogates are missing, then send the observation in the
356majority direction. This is the recommendations of Breiman, et al.</p></li></ul><p>The default value, 2, was taken from the R rpart package that is used
357to fit the model.</p></td></tr><tr><td class="info">Surrogate selection method (Optional) </td><td class="info" align="left"><p>The method used to select the best surrogate, one of:</p><ul><li><p>0 - the tool uses the total number of correct classifications for a
358potential surrogate variable.</p></li></ul><ul><li><p>1 - the tool uses the percent correct, calculated over the
359non-missing values of the surrogate.</p></li></ul><p>The default value, 0, was taken from the R rpart package that is used
360to fit the model. This value more severely penalizes covariates with a
361large number of missing values.</p></td></tr><tr><td class="info">Number of cross-validations (Optional) </td><td class="info" align="left"><p>The number of cross-validations to perform.</p><p>The R rpart package that is used to fit the model uses a default of
36210, but we have found that so few iterations can cause the calculated
363cross-validation error to differ substantially over several runs of
364the tool using identical input data and parameter values. In one case,
365we observed the cross-validation errors to vary by over 10%. Because
366the cross-validation errors are often used to prune the tree, we
367believe it is important to have accurate estimates of them, so we
368increased the default to 1000. This will cause complicated models to
369run substantially slower. If you find your model is too slow, decrease
370the value.</p></td></tr><tr><td class="info">Maximum tree depth (Optional) </td><td class="info" align="left"><p>The maximum depth of any node of the final tree, with the root
371node counted as depth 0. The default value, 30, was taken from the R
372rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Pruning method (Optional) </td><td class="info" align="left"><p>Method to use for pruning the tree, one of:</p><ul><li><p>Minimum error - The tool will prune the tree using the complexity
373parameter associated with the minimum cross-validation error.</p></li></ul><ul><li><p>1-SE rule - The tool will prune the tree using the complexity
374parameter obtained using the 1-SE rule described by Therneau and
375Atkinson in chapter 4 of
376<a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.
377A plot of the mean cross-validation errors versus candidate values
378of the complexity parameter often has an initial sharp drop followed
379by a relatively flat plateau and then a slow rise. (This tool
380produces that plot as an optional diagnostic output.) According to
381the 1-SE rule, any cross-validation error within one standard error
382of the minimum cross-validation error is considered equivalent to
383the minimum (i.e. considered to be part of the flat plateau). The
3841-SE rule chooses the largest complexity parameter that yields a
385cross-validation error equivalent ot the minimum. This results in an
386optimal tree, i.e. the tree with the fewest number of splits that
387yields a cross-validation error equivalent to the minimum.</p></li></ul><ul><li><p>Interactive - The tool will display the unpruned tree in a window,
388allowing you to prune it interactively with the mouse. If you click
389on a split it will be marked as deleted. If you click on an
390already-deleted split it will be undeleted (if its parent is not
391deleted). Information about the node is printed as you click. When
392you have finished pruning, click on the QUIT button.</p></li></ul><ul><li><p>User specified CP - The tool will prune the tree using the
393complexity parameter you specify below.</p></li></ul><p>If this parameter is omitted, the tree will not be pruned.</p></td></tr><tr><td class="info">Complexity parameter for pruning (Optional) </td><td class="info" align="left"><p>Complexity parameter for pruning the tree. This parameter is only
394used when the Pruning Method is set to 'User specified'.</p></td></tr><tr><td class="info">Name to use for X coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the X coordinates of point
395features. If the input table is a point feature class or layer, the X
396coordinates will be extracted from the points and be accessible in the
397formula using the name provided for this parameter.</p></td></tr><tr><td class="info">Name to use for Y coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the Y coordinates of point
398features. If the input table is a point feature class or layer, the Y
399coordinates will be extracted from the points and be accessible in the
400formula using the name provided for this parameter.</p></td></tr><tr><td class="info">Name to use for Z coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the Z coordinates of point
401features. If the input table is a point feature class or layer that
402has Z coordinates, the Z coordinates will be extracted from the points
403and be accessible in the formula using the name provided for this
404parameter.</p></td></tr><tr><td class="info">Name to use for M values of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the measure values of point
405features. If the input table is a point feature class or layer that
406has measure values, the measure values will be extracted from the
407points and be accessible in the formula using the name provided for this
408parameter.</p></td></tr><tr><td class="info">Write model summary file (Optional) </td><td class="info" align="left"><p>If True, this tool will write summary information about the fitted
409model to a text file. (This is the same information that the tool
410outputs as log messages.) The file will have the name X_summary.txt,
411where X is the name of the output model file, minus any
412extension.</p></td></tr><tr><td class="info">Write diagnostic plots (Optional) </td><td class="info" align="left"><p>If True, this tool will write diagnostic plots:</p><ul><li><p>X_cp.Y - visual representation of the cross-validation results for
413the unpruned tree, to assist you with choosing a Complexity
414Parameter for pruning the tree. The x-axis represents possible
415choices for the Complexity Parameter and the y-axis represents the
416means and standard deviations of the errors in the cross-validated
417prediction that would result. The dashed horizontal line is drawn 1
418standard error above the minimum of the curve. A good choice of the
419Complexity Parameter for pruning is the leftmost value for which the
420mean error lies below the line. This value will be chosen
421automatically if the Pruning Method parameter is set to '1-SE rule'.</p></li></ul><ul><li><p>X_rsquare.Y - two-panel plot only produced for the ANOVA splitting
422method. The first panel shows the r-square (both apparent and
423apparent from cross-validation) versus the number of splits. The
424second panel shows the mean error in the cross-validated prediction
425versus the number of splits (this is essentially the same plot as
426the X_cp.Y plot described above). Both panels are produced for the
427unpruned tree.</p></li></ul><ul><li><p>X_residuals.Y - plot of the residuals vs. the fitted values for the
428unpruned tree.</p></li></ul><ul><li><p>X_pruned_residuals.Y - plot of the residuals vs. the fitted values
429for the pruned tree. This plot will only be produced if the tree is
430pruned.</p></li></ul><p>In the file names above, X is the name of the output model file, minus
431any extension, and Y is the extension of the selected output plot
432format.</p></td></tr><tr><td class="info">Write tree plot (Optional) </td><td class="info" align="left"><p>If True, this tool will write a plot of the unpruned tree to a
433file having the name X_unpruned_tree.Y, where X is the name of the
434output model file minus the extension and Y is the extension of the
435selected output plot format.</p></td></tr><tr><td class="info">Write pruned tree plot (Optional) </td><td class="info" align="left"><p>If True, this tool will write a plot of the pruned tree to a file
436having the name X_pruned_tree.Y, where X is the name of the output
437model file minus the extension and Y is the extension of the selected
438output plot format. This plot will only be produced if the tree is
439pruned.</p></td></tr><tr><td class="info">Plot file format (Optional) </td><td class="info" align="left"><p>Plot file format, one of:</p><ul><li><p>emf - Windows enhanced metafile (EMF) format. This is a vector
440format that may be printed and resized without any pixelation and is
441therefore suitable for use in printable documents that recognize
442this format (e.g. Microsoft Word or Microsoft Visio).</p></li></ul><ul><li><p>png - Portable network graphics (PNG) format. This is a compressed,
443lossless, highly portable raster format suitable for use in web
444pages or other locations where a raster format is desired. Most
445scientific journals accept PNG; they typically request that files
446have a resolution of at least 1000 DPI.</p></li></ul></td></tr><tr><td class="info">Plot resolution, in DPI (Optional) </td><td class="info" align="left"><p>PNG plot file resolution, in dots per inch (DPI). The default is
447set to a high value (1000) because this is the minimum resolution
448typically required by scientific journals that accept figures in PNG
449format.</p><p>This parameter is ignored for EMF format because it is a vector
450format.</p></td></tr><tr><td class="info">Plot width (Optional) </td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG
451format).</p></td></tr><tr><td class="info">Plot height (Optional) </td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG
452format).</p></td></tr><tr><td class="info">Default pointsize of text in diagnostic plots (Optional) </td><td class="info" align="left"><p>The default pointsize of text in diagnostic plots (the size of the
453text in tree plots is controlled by a different parameter).</p></td></tr><tr><td class="info">Plot background color (Optional) </td><td class="info" align="left"><p>PNG plot file background color. The color must be a valid name in
454R's color palette, or "transparent" if there is no background color.
455This parameter is ignored if the plot format file is EMF.</p></td></tr><tr><td class="info">Plot type (Optional) </td><td class="info" align="left"><p>Type of tree plots to create, one of:</p><ul><li><p>0 - The default. Draw a split label at each split and a node label
456at each leaf.</p></li></ul><ul><li><p>1 - Label all nodes, not just leaves.</p></li></ul><ul><li><p>2 - Like 1 but draw the split labels below the node labels. Similar
457to the plots in the CART book.</p></li></ul><ul><li><p>3 - Draw separate split labels for the left and right directions.</p></li></ul><ul><li><p>4 - Like 3 but label all nodes, not just leaves.</p></li></ul></td></tr><tr><td class="info">Extra information (Optional) </td><td class="info" align="left"><p>Extra information to display at the nodes, one of:</p><ul><li><p>0 - No extra information.</p></li></ul><ul><li><p>1 - The default. Display the number of observations that fall in the
458node (per class for Class models; prefixed by the number of events
459for Poisson and Exp models).</p></li></ul><ul><li><p>2 - Class models: display the classification rate at the node, expressed as the
460number of correct classifications and the number of observations in the node.
461Poisson and Exp models: display the number of events.</p></li></ul><ul><li><p>3 - Class models only: misclassification rate at the node, expressed
462as the number of incorrect classifications and the number of
463observations in the node.</p></li></ul><ul><li><p>4 - Class models only: probability per class of observations in the
464node (conditioned on the node, sum across a node is 1).</p></li></ul><ul><li><p>5 - Class models only: like 4 but do not display the fitted class.</p></li></ul><ul><li><p>6 - Class models only: the probability of the second class only.
465Useful for binary responses.</p></li></ul><ul><li><p>7 - Class models only: like 6 but do not display the fitted class.</p></li></ul><ul><li><p>8 - Class models only: the probability of the fitted class.</p></li></ul><ul><li><p>9 - Class models only: the probabilities times the fraction of
466observations in the node (the probability relative to all
467observations, sum across all leaves is 1).</p></li></ul></td></tr><tr><td class="info">Display percentage of observations (Optional) </td><td class="info" align="left"><p>If True, the default, nodes will be labeled with the percentage of
468observations in the node. The percentage will be displayed below the
469"extra information" (if any is requested).</p></td></tr><tr><td class="info">Display extra text under node boxes (Optional) </td><td class="info" align="left"><p>If True, the default, extra information and percentage of
470observations will be displayed below the nodes. If False, they will be
471displayed within the nodes' boxes.</p><p>This parameter is ignored if neither extra information nor percentage
472of observations are requested.</p></td></tr><tr><td class="info">Clip right-hand split labels (Optional) </td><td class="info" align="left"><p>If True, the default, the right-hand split labels on plots of type
4733 or 4 will not include "variable=". If False, the right-hand labels
474will include "variable=", just like the left-hand labels.</p><p>This parameter is ignored the plot type is not 3 or 4.</p></td></tr><tr><td class="info">Display leaves at bottom (Optional) </td><td class="info" align="left"><p>If True, all leaf nodes will be displayed at the bottom. If False,
475the default, leaf nodes will be displayed where they would normally
476appear.</p></td></tr><tr><td class="info">Branch type (Optional) </td><td class="info" align="left"><p>Type of branches to draw. If zero, the default, the tool will draw
477conventional branches having a constant narrow width. If nonzero, the tool
478will draw "wide branches", with branch widths proportional to the
479specified parameter, one of:</p><ul><li><p>1 - Deviance</p></li></ul><ul><li><p>2 - Square root of deviance</p></li></ul><ul><li><p>3 - Deviance / number of observations</p></li></ul><ul><li><p>4 - Square root of (deviance / number of observations)</p></li></ul><ul><li><p>5 - Number of observations</p></li></ul><ul><li><p>6 - Complexity parameter</p></li></ul><ul><li><p>7 - Absolute value of the predicted value</p></li></ul><ul><li><p>8 - Predicted value minus the minimum predicted value</p></li></ul><ul><li><p>9 - Constant wide width, for checking the visual distortion that
480results when wide branches are drawn at different angles</p></li></ul></td></tr><tr><td class="info">Branch shape (Optional) </td><td class="info" align="left"><p>Controls the shape of the branches from parent to child nodes. Any
481number from 0 to 1 is allowed. A value of 1 gives square shouldered
482branches, a value of 0 give V shaped branches, with other values being
483intermediate.</p><p>Note that if the Branch Type parameter is nonzero, the Branch Shape
484parameter will be rounded to 1 or 0 (e.g. a Branch Shape of 0.75 will
485be rounded to 1).</p></td></tr><tr><td class="info">Use uniform vertical spacing (Optional) </td><td class="info" align="left"><p>If True, the default, the vertical spacing of the nodes will be
486uniform. If False, the nodes will be spaced proportionally to the fit
487(more precisely, to the difference between a node's deviance and the
488sum of its children's deviances). Small spaces must be expanded to
489leave room for the labels.</p><p>Note: if this parameter is False and the Text Magnification Factor is
490omitted (the default), very small text can sometimes result.</p></td></tr><tr><td class="info">Significant digits for labels (Optional) </td><td class="info" align="left"><p>Number of significant digits to display in floating-point numbers.</p><p>Probabilities and percentages are treated specially. Probabilities are
491displayed with the specified number of digits after the decimal point
492(by default 2 digits). Percentages are displayed with the specified
493number of digits minus 2 after the decimal point (by default no
494digits).</p></td></tr><tr><td class="info">Length of variable names at splits (Optional) </td><td class="info" align="left"><p>Length of variable names in text at the splits (and, for class
495responses, the class displayed at the node). There are three
496possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>&gt;0 - Use an abbreviation algorithm to shorten the names to at
497least the specified number, such that they remain unique.</p></li></ul><ul><li><p>&lt;0 - Truncate names to the shortest length where they are still
498unique, but never truncate to shorter than the specified number
499(e.g. the value -5 means never truncate to shorter than 5
500characters).</p></li></ul></td></tr><tr><td class="info">Length of factor level names in splits (Optional) </td><td class="info" align="left"><p>Length of factor level names (i.e. categorical variable values) in
501splits. There are four possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>1 - Represent factor levels with alphabetic characters (a for the
502first level, b for the second, and so on).</p></li></ul><ul><li><p>&gt;1 - Use an abbreviation algorithm to shorten the names to at
503least the specified number, such that they remain unique.</p></li></ul><ul><li><p>&lt;0 - Truncate names to the shortest length where they are still
504unique, but never truncate to shorter than the specified number
505(e.g. the value -5 means never truncate to shorter than 5
506characters).</p></li></ul></td></tr><tr><td class="info">Text magnification factor (Optional) </td><td class="info" align="left"><p>A numerical value giving the amount by which text should be
507magnified relative to the default. If omitted, the default, the text
508size will be calculated automatically.</p><p>The default automatic calculation means that this seemingly innocuous
509argument has a far reaching effect. If necessary it will trigger the
510node shifting engine to get a decent type size (see the Compress Tree
511Vertically parameter).</p></td></tr><tr><td class="info">Tweak text magnification (Optional) </td><td class="info" align="left"><p>Adjust the (possibly automatically calculated) Text Magnification
512Factor. For example, use 1.1 to make the text 10% larger. The default
513is 1, meaning no adjustment.</p><p>Note that font sizes are discrete, so the Text Magnification Factor
514you ask for may not be the one you get. And a small tweak may not
515actually change the type size or change it more than you want.</p></td></tr><tr><td class="info">Compress tree horizontally (Optional) </td><td class="info" align="left"><p>If True, the default, the tree will be compressed horizontally by
516shifting nodes horizontally where space is available.</p></td></tr><tr><td class="info">Compress tree vertically (Optional) </td><td class="info" align="left"><p>If True, the default, and the initial automatically calculated
517Text Magnification Factor is less than 0.7, crowded labels will be
518shifted vertically where space is available. This often allows
519considerably larger text.</p><p>Set this parameter to False if you fell the resulting plot is too
520messy. The shifting algorithm may work a little better (allowing
521larger text) for plot types 1, 2, and 3.</p></td></tr></tbody></table></div></body></html>
Note: See TracBrowser for help on using the browser.