| 1 | <?xml version="1.0" encoding="utf-8"?> |
|---|
| 2 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
|---|
| 3 | <html xmlns="http://www.w3.org/1999/xhtml"><head><link rel="stylesheet" type="text/css" href="81help.css?format=raw" /><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Fit Tree Model</title></head><body><table style="margin-top:-1em; margin-bottom:0; padding:0; margin-left:-1em"><tr><td style="background:white"><img width="875" height="70" alt="ArcToolbox banner" src="AHBanner_ArcToolbox.gif?format=raw" /></td></tr></table><h1>Fit Tree Model</h1><p></p><p>Fits a tree model to data in an ArcGIS table.</p><p>Tree models were first introduced by Breiman et al. (1984) in the |
|---|
| 4 | classic Classification and Regression Tree (CART) software and are |
|---|
| 5 | frequently referenced by that name. Since that time, the original |
|---|
| 6 | methods have been reimplemented in R and many other statistical |
|---|
| 7 | programs. This tool fits tree models using the R rpart package by |
|---|
| 8 | Terry M. Therneau and Elizabeth J. Atkinson, and plots them using the |
|---|
| 9 | R rpart.plot package by Stephen Milborrow.</p><p><b>References</b></p><p>Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and regression trees. Chapman & Hall/CRC.</p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p><br /><p><h2><img width="11" height="11" border="0" src="sm_arrow_down.gif?format=raw" /> Command line syntax</h2></p><div Class="expand" id="id103139">TreeModelFitToArcGISTable_GeoEco <inputTable> <outputModelFile> <formula> <ANOVA | Class | Exp | Poisson> {where} {allowMissingCovariates} {minSplit} {minBucket} {cp} {maxCompete} {maxSurrogate} {2 | 0 | 1} {0 | 1} {xval} {maxDepth} {Minimum error | 1-SE rule | Interactive | User specified CP} {pruningCP} {xColumnName} {yColumnName} {zColumnName} {mColumnName} {writeSummaryFile} {writeDiagnosticPlots} {writeTreePlot} {writePrunedTreePlot} {png | emf} {res} {width} {height} {pointSize} {bg} {0 | 1 | 2 | 3 | 4} {1 | 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9} {percentage} {under} {clipRightLabels} {fallenLeaves} {0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9} {branch} {uniform} {digits} {varlen} {faclen} {cex} {tweak} {compress} {ycompress} <br /><br /><b>Parameters</b><br /><table width="100%" border="0" cellpadding="5"><tbody><tr><th width="40%"><b>Expression</b></th><th width="60%"><b>Explanation</b></th></tr><tr><td class="info"><inputTable></td><td class="info" align="left"><p>ArcGIS table, table view, feature class, or feature layer |
|---|
| 10 | containing the data for which the model should be fitted.</p></td></tr><tr><td class="info"><outputModelFile></td><td class="info" align="left"><p>Output file to receive the fitted model. The file will not be in a |
|---|
| 11 | user-readable format. After the model is fitted, you can provide the |
|---|
| 12 | file to other tools that perform further analysis or visualization of |
|---|
| 13 | the fitted model.</p><p>It is suggested, but not required, that you give the file an .Rdata |
|---|
| 14 | extension.</p></td></tr><tr><td class="info"><formula></td><td class="info" align="left"><p>Formula that specifies the table field that is the response |
|---|
| 15 | variable and the table fields that are the terms of the model.</p><p>The formula must be in the format expected by the R glm function:</p><dl><dt></dt><dd><pre>response ~ term1 + term2 + ... + termN</pre></dd></dl><p>response is the table field that will be modeled as the response |
|---|
| 16 | variable and the terms are the table fields that will serve as the |
|---|
| 17 | predictor variables. The field names are case sensitive. If any field |
|---|
| 18 | used in the formula is NULL for a given row, that row will not be used |
|---|
| 19 | in fitting the model.</p><p>For example, if you have a field Presence that indicates the presence |
|---|
| 20 | or absence of a species (1 or 0) and you want to model it in terms of |
|---|
| 21 | sampled environmental covariates stored in the SST, ChlDensity, and |
|---|
| 22 | Depth fields, you would use the formula:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth</pre></dd></dl><p>By default, all terms are treated as continuous variables. To indicate |
|---|
| 23 | that a term should be treated as a categorical variable, use the |
|---|
| 24 | factor function. For example, if SubstrateType is an integer code that |
|---|
| 25 | should be treated as categorical:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth + factor(SubstrateType)</pre></dd></dl><p>The model terms may also use these operators:</p><ul><li><p>The : operator denotes the interaction of variables a and b. For |
|---|
| 26 | example: a:b.</p></li></ul><ul><li><p>The * operator denotes "crossing". For example, a*b is identical to |
|---|
| 27 | a+b+a:b.</p></li></ul><ul><li><p>The ^ operator denotes crossing to the Nth degree. For example, |
|---|
| 28 | (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a |
|---|
| 29 | formula containing the main effects for a, b and c together with |
|---|
| 30 | their second-order interactions.</p></li></ul><ul><li><p>The %in% operator indicates that the terms on its left are nested |
|---|
| 31 | within those on the right. For example a + b %in% a expands to the |
|---|
| 32 | formula a + a:b.</p></li></ul><ul><li><p>The - operator (minus) removes the specified terms, so that |
|---|
| 33 | (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also |
|---|
| 34 | used to remove the intercept term: y ~ x - 1 is a line through the |
|---|
| 35 | origin. A model with no intercept can be also specified as y ~ x + 0 |
|---|
| 36 | or y ~ 0 + x.</p></li></ul><p>While formulae usually involve just variable names, they can also |
|---|
| 37 | involve arithmetic expressions. The formula log(y) ~ a + log(x) is |
|---|
| 38 | quite legal. When such arithmetic expressions involve operators which |
|---|
| 39 | are also used symbolically in model formulae, there can be confusion |
|---|
| 40 | between arithmetic and symbolic operator use.</p><p>To avoid this confusion, the function I() can be used to bracket those |
|---|
| 41 | portions of a model formula where the operators are used in their |
|---|
| 42 | arithmetic sense. For example, in the formula y ~ a + I(b+c), the term |
|---|
| 43 | b+c is to be interpreted as the sum of b and c.</p><p>Please see the topics "glm" and "formula" in the R documentation for |
|---|
| 44 | more information.</p></td></tr><tr><td class="info"><ANOVA | Class | Exp | Poisson></td><td class="info" align="left"><p>Method to use for splitting the tree, one of:</p><ul><li><p>ANOVA - Use this method to build a regression tree, i.e. when you |
|---|
| 45 | are modeling a continuous response variable, such as the abundance |
|---|
| 46 | of a species. With this method, the splits will be chosen to |
|---|
| 47 | maximize the between-groups sum-of-squares in a simple analysis of |
|---|
| 48 | variance.</p></li></ul><ul><li><p>Class - Use this method to build a classification tree, i.e. when |
|---|
| 49 | you are modeling a categorical response variable, such as the |
|---|
| 50 | presence or absence of a species. When this method is selected, the |
|---|
| 51 | response variable is assumed to be categorical and the R factor |
|---|
| 52 | function is automatically applied to it.</p></li></ul><ul><li><p>Exp - Use this method to build a regression tree using exponential |
|---|
| 53 | scaling. For more information about this method, please see the |
|---|
| 54 | references below.</p></li></ul><ul><li><p>Poisson - Use this method to build a regression tree using Poisson |
|---|
| 55 | regression, which is appropriate for event rate data. For more |
|---|
| 56 | information about this method, please see the references below.</p></li></ul><p><b>References</b></p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p></td></tr><tr><td class="info">{where}</td><td class="info" align="left"><p>SQL WHERE clause expression that specifies the subset of rows to |
|---|
| 57 | process. If this parameter is not provided, all of the rows will be |
|---|
| 58 | processed. If this parameter is provided but the underlying database |
|---|
| 59 | does not support WHERE clauses, an error will be raised.</p><p>The exact syntax of this expression depends on the underlying |
|---|
| 60 | database. ESRI recommends you reference fields using the following |
|---|
| 61 | syntax:</p><ul><li><p>If you're querying ArcInfo coverages, shapefiles, INFO tables or |
|---|
| 62 | dBASE tables (.dbf files), enclose field names in double quotes in |
|---|
| 63 | the SQL expression: "MY_FIELD".</p></li></ul><ul><li><p>If you're querying Microsoft Access tables or personal |
|---|
| 64 | geodatabase tables, enclose field names in square brackets: |
|---|
| 65 | [MY_FIELD].</p></li></ul><ul><li><p>If you're querying ArcSDE geodatabase tables, an ArcIMS feature |
|---|
| 66 | class, or an ArcIMS image service sublayer, don't enclose field |
|---|
| 67 | names: MY_FIELD.</p></li></ul></td></tr><tr><td class="info">{allowMissingCovariates}</td><td class="info" align="left"><p>If this option is enabled (the default), records will be included |
|---|
| 68 | in the model fitting process so long as they have a value for the |
|---|
| 69 | response variable and at least one predictor variable. If this option |
|---|
| 70 | is disabled, records must have values for the response variable and |
|---|
| 71 | all predictor variables in order to be included.</p><p>The R rpart package that is used to fit the model has the novel |
|---|
| 72 | capability of allowing records that are missing some data to still |
|---|
| 73 | participate in the model fitting process. For more information about |
|---|
| 74 | how this works, please see |
|---|
| 75 | <a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.</p></td></tr><tr><td class="info">{minSplit}</td><td class="info" align="left"><p>The minimum number of observations that must exist in a node of |
|---|
| 76 | the tree in order for a split of that node to be attempted. The |
|---|
| 77 | default value, 20, was taken from the R rpart package that is used to |
|---|
| 78 | fit the model.</p></td></tr><tr><td class="info">{minBucket}</td><td class="info" align="left"><p>The minimum number of observations that may be in any leaf node of |
|---|
| 79 | the tree. The default value, 7, was taken from the R rpart package |
|---|
| 80 | that is used to fit the model. By default, rpart recommends that this |
|---|
| 81 | parameter be set to one third of the previous parameter.</p></td></tr><tr><td class="info">{cp}</td><td class="info" align="left"><p>Any split that does not decrease the overall lack of fit by a |
|---|
| 82 | factor of this parameter will not be attempted. For instance, with |
|---|
| 83 | ANOVA splitting, this means that the overall Rsquare must increase by |
|---|
| 84 | this parameter at each step. The main role of this parameter is to |
|---|
| 85 | save computing time by pruning off splits that are obviously not |
|---|
| 86 | worthwhile. Essentially, you inform the tool that any split which does |
|---|
| 87 | not improve the fit by this parameter will likely be pruned off by |
|---|
| 88 | cross-validation, and that hence the tool need not pursue it.</p><p>The default value, 0.01, was taken from the R rpart package that is |
|---|
| 89 | used to fit the model.</p></td></tr><tr><td class="info">{maxCompete}</td><td class="info" align="left"><p>The number of competitor splits to retain in the output. It is |
|---|
| 90 | useful to know not just which split was chosen, but which variable |
|---|
| 91 | came in second, third, etc. The default value, 4, was taken from the R |
|---|
| 92 | rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{maxSurrogate}</td><td class="info" align="left"><p>The number of surrogate splits to retain in the output. If this is |
|---|
| 93 | set to zero the compute time will be shortened, since approximately |
|---|
| 94 | half of the computational time (other than setup) is used in the |
|---|
| 95 | search for surrogate splits. The default value, 5, was taken from the |
|---|
| 96 | R rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{2 | 0 | 1}</td><td class="info" align="left"><p>The method for using surrogates in the splitting process, one of:</p><ul><li><p>0 - display only; an observation with a missing value for the |
|---|
| 97 | primary split rule is not sent further down the tree.</p></li></ul><ul><li><p>1 - use surrogates, in order, to split subjects missing the primary |
|---|
| 98 | variable; if all surrogates are missing the observation is not |
|---|
| 99 | split.</p></li></ul><ul><li><p>2 - if all surrogates are missing, then send the observation in the |
|---|
| 100 | majority direction. This is the recommendations of Breiman, et al.</p></li></ul><p>The default value, 2, was taken from the R rpart package that is used |
|---|
| 101 | to fit the model.</p></td></tr><tr><td class="info">{0 | 1}</td><td class="info" align="left"><p>The method used to select the best surrogate, one of:</p><ul><li><p>0 - the tool uses the total number of correct classifications for a |
|---|
| 102 | potential surrogate variable.</p></li></ul><ul><li><p>1 - the tool uses the percent correct, calculated over the |
|---|
| 103 | non-missing values of the surrogate.</p></li></ul><p>The default value, 0, was taken from the R rpart package that is used |
|---|
| 104 | to fit the model. This value more severely penalizes covariates with a |
|---|
| 105 | large number of missing values.</p></td></tr><tr><td class="info">{xval}</td><td class="info" align="left"><p>The number of cross-validations to perform.</p><p>The R rpart package that is used to fit the model uses a default of |
|---|
| 106 | 10, but we have found that so few iterations can cause the calculated |
|---|
| 107 | cross-validation error to differ substantially over several runs of |
|---|
| 108 | the tool using identical input data and parameter values. In one case, |
|---|
| 109 | we observed the cross-validation errors to vary by over 10%. Because |
|---|
| 110 | the cross-validation errors are often used to prune the tree, we |
|---|
| 111 | believe it is important to have accurate estimates of them, so we |
|---|
| 112 | increased the default to 1000. This will cause complicated models to |
|---|
| 113 | run substantially slower. If you find your model is too slow, decrease |
|---|
| 114 | the value.</p></td></tr><tr><td class="info">{maxDepth}</td><td class="info" align="left"><p>The maximum depth of any node of the final tree, with the root |
|---|
| 115 | node counted as depth 0. The default value, 30, was taken from the R |
|---|
| 116 | rpart package that is used to fit the model.</p></td></tr><tr><td class="info">{Minimum error | 1-SE rule | Interactive | User specified CP}</td><td class="info" align="left"><p>Method to use for pruning the tree, one of:</p><ul><li><p>Minimum error - The tool will prune the tree using the complexity |
|---|
| 117 | parameter associated with the minimum cross-validation error.</p></li></ul><ul><li><p>1-SE rule - The tool will prune the tree using the complexity |
|---|
| 118 | parameter obtained using the 1-SE rule described by Therneau and |
|---|
| 119 | Atkinson in chapter 4 of |
|---|
| 120 | <a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>. |
|---|
| 121 | A plot of the mean cross-validation errors versus candidate values |
|---|
| 122 | of the complexity parameter often has an initial sharp drop followed |
|---|
| 123 | by a relatively flat plateau and then a slow rise. (This tool |
|---|
| 124 | produces that plot as an optional diagnostic output.) According to |
|---|
| 125 | the 1-SE rule, any cross-validation error within one standard error |
|---|
| 126 | of the minimum cross-validation error is considered equivalent to |
|---|
| 127 | the minimum (i.e. considered to be part of the flat plateau). The |
|---|
| 128 | 1-SE rule chooses the largest complexity parameter that yields a |
|---|
| 129 | cross-validation error equivalent ot the minimum. This results in an |
|---|
| 130 | optimal tree, i.e. the tree with the fewest number of splits that |
|---|
| 131 | yields a cross-validation error equivalent to the minimum.</p></li></ul><ul><li><p>Interactive - The tool will display the unpruned tree in a window, |
|---|
| 132 | allowing you to prune it interactively with the mouse. If you click |
|---|
| 133 | on a split it will be marked as deleted. If you click on an |
|---|
| 134 | already-deleted split it will be undeleted (if its parent is not |
|---|
| 135 | deleted). Information about the node is printed as you click. When |
|---|
| 136 | you have finished pruning, click on the QUIT button.</p></li></ul><ul><li><p>User specified CP - The tool will prune the tree using the |
|---|
| 137 | complexity parameter you specify below.</p></li></ul><p>If this parameter is omitted, the tree will not be pruned.</p></td></tr><tr><td class="info">{pruningCP}</td><td class="info" align="left"><p>Complexity parameter for pruning the tree. This parameter is only |
|---|
| 138 | used when the Pruning Method is set to 'User specified'.</p></td></tr><tr><td class="info">{xColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the X coordinates of point |
|---|
| 139 | features. If the input table is a point feature class or layer, the X |
|---|
| 140 | coordinates will be extracted from the points and be accessible in the |
|---|
| 141 | formula using the name provided for this parameter.</p></td></tr><tr><td class="info">{yColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the Y coordinates of point |
|---|
| 142 | features. If the input table is a point feature class or layer, the Y |
|---|
| 143 | coordinates will be extracted from the points and be accessible in the |
|---|
| 144 | formula using the name provided for this parameter.</p></td></tr><tr><td class="info">{zColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the Z coordinates of point |
|---|
| 145 | features. If the input table is a point feature class or layer that |
|---|
| 146 | has Z coordinates, the Z coordinates will be extracted from the points |
|---|
| 147 | and be accessible in the formula using the name provided for this |
|---|
| 148 | parameter.</p></td></tr><tr><td class="info">{mColumnName}</td><td class="info" align="left"><p>Name to use in the formula for the measure values of point |
|---|
| 149 | features. If the input table is a point feature class or layer that |
|---|
| 150 | has measure values, the measure values will be extracted from the |
|---|
| 151 | points and be accessible in the formula using the name provided for this |
|---|
| 152 | parameter.</p></td></tr><tr><td class="info">{writeSummaryFile}</td><td class="info" align="left"><p>If True, this tool will write summary information about the fitted |
|---|
| 153 | model to a text file. (This is the same information that the tool |
|---|
| 154 | outputs as log messages.) The file will have the name X_summary.txt, |
|---|
| 155 | where X is the name of the output model file, minus any |
|---|
| 156 | extension.</p></td></tr><tr><td class="info">{writeDiagnosticPlots}</td><td class="info" align="left"><p>If True, this tool will write diagnostic plots:</p><ul><li><p>X_cp.Y - visual representation of the cross-validation results for |
|---|
| 157 | the unpruned tree, to assist you with choosing a Complexity |
|---|
| 158 | Parameter for pruning the tree. The x-axis represents possible |
|---|
| 159 | choices for the Complexity Parameter and the y-axis represents the |
|---|
| 160 | means and standard deviations of the errors in the cross-validated |
|---|
| 161 | prediction that would result. The dashed horizontal line is drawn 1 |
|---|
| 162 | standard error above the minimum of the curve. A good choice of the |
|---|
| 163 | Complexity Parameter for pruning is the leftmost value for which the |
|---|
| 164 | mean error lies below the line. This value will be chosen |
|---|
| 165 | automatically if the Pruning Method parameter is set to '1-SE rule'.</p></li></ul><ul><li><p>X_rsquare.Y - two-panel plot only produced for the ANOVA splitting |
|---|
| 166 | method. The first panel shows the r-square (both apparent and |
|---|
| 167 | apparent from cross-validation) versus the number of splits. The |
|---|
| 168 | second panel shows the mean error in the cross-validated prediction |
|---|
| 169 | versus the number of splits (this is essentially the same plot as |
|---|
| 170 | the X_cp.Y plot described above). Both panels are produced for the |
|---|
| 171 | unpruned tree.</p></li></ul><ul><li><p>X_residuals.Y - plot of the residuals vs. the fitted values for the |
|---|
| 172 | unpruned tree.</p></li></ul><ul><li><p>X_pruned_residuals.Y - plot of the residuals vs. the fitted values |
|---|
| 173 | for the pruned tree. This plot will only be produced if the tree is |
|---|
| 174 | pruned.</p></li></ul><p>In the file names above, X is the name of the output model file, minus |
|---|
| 175 | any extension, and Y is the extension of the selected output plot |
|---|
| 176 | format.</p></td></tr><tr><td class="info">{writeTreePlot}</td><td class="info" align="left"><p>If True, this tool will write a plot of the unpruned tree to a |
|---|
| 177 | file having the name X_unpruned_tree.Y, where X is the name of the |
|---|
| 178 | output model file minus the extension and Y is the extension of the |
|---|
| 179 | selected output plot format.</p></td></tr><tr><td class="info">{writePrunedTreePlot}</td><td class="info" align="left"><p>If True, this tool will write a plot of the pruned tree to a file |
|---|
| 180 | having the name X_pruned_tree.Y, where X is the name of the output |
|---|
| 181 | model file minus the extension and Y is the extension of the selected |
|---|
| 182 | output plot format. This plot will only be produced if the tree is |
|---|
| 183 | pruned.</p></td></tr><tr><td class="info">{png | emf}</td><td class="info" align="left"><p>Plot file format, one of:</p><ul><li><p>emf - Windows enhanced metafile (EMF) format. This is a vector |
|---|
| 184 | format that may be printed and resized without any pixelation and is |
|---|
| 185 | therefore suitable for use in printable documents that recognize |
|---|
| 186 | this format (e.g. Microsoft Word or Microsoft Visio).</p></li></ul><ul><li><p>png - Portable network graphics (PNG) format. This is a compressed, |
|---|
| 187 | lossless, highly portable raster format suitable for use in web |
|---|
| 188 | pages or other locations where a raster format is desired. Most |
|---|
| 189 | scientific journals accept PNG; they typically request that files |
|---|
| 190 | have a resolution of at least 1000 DPI.</p></li></ul></td></tr><tr><td class="info">{res}</td><td class="info" align="left"><p>PNG plot file resolution, in dots per inch (DPI). The default is |
|---|
| 191 | set to a high value (1000) because this is the minimum resolution |
|---|
| 192 | typically required by scientific journals that accept figures in PNG |
|---|
| 193 | format.</p><p>This parameter is ignored for EMF format because it is a vector |
|---|
| 194 | format.</p></td></tr><tr><td class="info">{width}</td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG |
|---|
| 195 | format).</p></td></tr><tr><td class="info">{height}</td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG |
|---|
| 196 | format).</p></td></tr><tr><td class="info">{pointSize}</td><td class="info" align="left"><p>The default pointsize of text in diagnostic plots (the size of the |
|---|
| 197 | text in tree plots is controlled by a different parameter).</p></td></tr><tr><td class="info">{bg}</td><td class="info" align="left"><p>PNG plot file background color. The color must be a valid name in |
|---|
| 198 | R's color palette, or "transparent" if there is no background color. |
|---|
| 199 | This parameter is ignored if the plot format file is EMF.</p></td></tr><tr><td class="info">{0 | 1 | 2 | 3 | 4}</td><td class="info" align="left"><p>Type of tree plots to create, one of:</p><ul><li><p>0 - The default. Draw a split label at each split and a node label |
|---|
| 200 | at each leaf.</p></li></ul><ul><li><p>1 - Label all nodes, not just leaves.</p></li></ul><ul><li><p>2 - Like 1 but draw the split labels below the node labels. Similar |
|---|
| 201 | to the plots in the CART book.</p></li></ul><ul><li><p>3 - Draw separate split labels for the left and right directions.</p></li></ul><ul><li><p>4 - Like 3 but label all nodes, not just leaves.</p></li></ul></td></tr><tr><td class="info">{1 | 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9}</td><td class="info" align="left"><p>Extra information to display at the nodes, one of:</p><ul><li><p>0 - No extra information.</p></li></ul><ul><li><p>1 - The default. Display the number of observations that fall in the |
|---|
| 202 | node (per class for Class models; prefixed by the number of events |
|---|
| 203 | for Poisson and Exp models).</p></li></ul><ul><li><p>2 - Class models: display the classification rate at the node, expressed as the |
|---|
| 204 | number of correct classifications and the number of observations in the node. |
|---|
| 205 | Poisson and Exp models: display the number of events.</p></li></ul><ul><li><p>3 - Class models only: misclassification rate at the node, expressed |
|---|
| 206 | as the number of incorrect classifications and the number of |
|---|
| 207 | observations in the node.</p></li></ul><ul><li><p>4 - Class models only: probability per class of observations in the |
|---|
| 208 | node (conditioned on the node, sum across a node is 1).</p></li></ul><ul><li><p>5 - Class models only: like 4 but do not display the fitted class.</p></li></ul><ul><li><p>6 - Class models only: the probability of the second class only. |
|---|
| 209 | Useful for binary responses.</p></li></ul><ul><li><p>7 - Class models only: like 6 but do not display the fitted class.</p></li></ul><ul><li><p>8 - Class models only: the probability of the fitted class.</p></li></ul><ul><li><p>9 - Class models only: the probabilities times the fraction of |
|---|
| 210 | observations in the node (the probability relative to all |
|---|
| 211 | observations, sum across all leaves is 1).</p></li></ul></td></tr><tr><td class="info">{percentage}</td><td class="info" align="left"><p>If True, the default, nodes will be labeled with the percentage of |
|---|
| 212 | observations in the node. The percentage will be displayed below the |
|---|
| 213 | "extra information" (if any is requested).</p></td></tr><tr><td class="info">{under}</td><td class="info" align="left"><p>If True, the default, extra information and percentage of |
|---|
| 214 | observations will be displayed below the nodes. If False, they will be |
|---|
| 215 | displayed within the nodes' boxes.</p><p>This parameter is ignored if neither extra information nor percentage |
|---|
| 216 | of observations are requested.</p></td></tr><tr><td class="info">{clipRightLabels}</td><td class="info" align="left"><p>If True, the default, the right-hand split labels on plots of type |
|---|
| 217 | 3 or 4 will not include "variable=". If False, the right-hand labels |
|---|
| 218 | will include "variable=", just like the left-hand labels.</p><p>This parameter is ignored the plot type is not 3 or 4.</p></td></tr><tr><td class="info">{fallenLeaves}</td><td class="info" align="left"><p>If True, all leaf nodes will be displayed at the bottom. If False, |
|---|
| 219 | the default, leaf nodes will be displayed where they would normally |
|---|
| 220 | appear.</p></td></tr><tr><td class="info">{0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9}</td><td class="info" align="left"><p>Type of branches to draw. If zero, the default, the tool will draw |
|---|
| 221 | conventional branches having a constant narrow width. If nonzero, the tool |
|---|
| 222 | will draw "wide branches", with branch widths proportional to the |
|---|
| 223 | specified parameter, one of:</p><ul><li><p>1 - Deviance</p></li></ul><ul><li><p>2 - Square root of deviance</p></li></ul><ul><li><p>3 - Deviance / number of observations</p></li></ul><ul><li><p>4 - Square root of (deviance / number of observations)</p></li></ul><ul><li><p>5 - Number of observations</p></li></ul><ul><li><p>6 - Complexity parameter</p></li></ul><ul><li><p>7 - Absolute value of the predicted value</p></li></ul><ul><li><p>8 - Predicted value minus the minimum predicted value</p></li></ul><ul><li><p>9 - Constant wide width, for checking the visual distortion that |
|---|
| 224 | results when wide branches are drawn at different angles</p></li></ul></td></tr><tr><td class="info">{branch}</td><td class="info" align="left"><p>Controls the shape of the branches from parent to child nodes. Any |
|---|
| 225 | number from 0 to 1 is allowed. A value of 1 gives square shouldered |
|---|
| 226 | branches, a value of 0 give V shaped branches, with other values being |
|---|
| 227 | intermediate.</p><p>Note that if the Branch Type parameter is nonzero, the Branch Shape |
|---|
| 228 | parameter will be rounded to 1 or 0 (e.g. a Branch Shape of 0.75 will |
|---|
| 229 | be rounded to 1).</p></td></tr><tr><td class="info">{uniform}</td><td class="info" align="left"><p>If True, the default, the vertical spacing of the nodes will be |
|---|
| 230 | uniform. If False, the nodes will be spaced proportionally to the fit |
|---|
| 231 | (more precisely, to the difference between a node's deviance and the |
|---|
| 232 | sum of its children's deviances). Small spaces must be expanded to |
|---|
| 233 | leave room for the labels.</p><p>Note: if this parameter is False and the Text Magnification Factor is |
|---|
| 234 | omitted (the default), very small text can sometimes result.</p></td></tr><tr><td class="info">{digits}</td><td class="info" align="left"><p>Number of significant digits to display in floating-point numbers.</p><p>Probabilities and percentages are treated specially. Probabilities are |
|---|
| 235 | displayed with the specified number of digits after the decimal point |
|---|
| 236 | (by default 2 digits). Percentages are displayed with the specified |
|---|
| 237 | number of digits minus 2 after the decimal point (by default no |
|---|
| 238 | digits).</p></td></tr><tr><td class="info">{varlen}</td><td class="info" align="left"><p>Length of variable names in text at the splits (and, for class |
|---|
| 239 | responses, the class displayed at the node). There are three |
|---|
| 240 | possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>>0 - Use an abbreviation algorithm to shorten the names to at |
|---|
| 241 | least the specified number, such that they remain unique.</p></li></ul><ul><li><p><0 - Truncate names to the shortest length where they are still |
|---|
| 242 | unique, but never truncate to shorter than the specified number |
|---|
| 243 | (e.g. the value -5 means never truncate to shorter than 5 |
|---|
| 244 | characters).</p></li></ul></td></tr><tr><td class="info">{faclen}</td><td class="info" align="left"><p>Length of factor level names (i.e. categorical variable values) in |
|---|
| 245 | splits. There are four possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>1 - Represent factor levels with alphabetic characters (a for the |
|---|
| 246 | first level, b for the second, and so on).</p></li></ul><ul><li><p>>1 - Use an abbreviation algorithm to shorten the names to at |
|---|
| 247 | least the specified number, such that they remain unique.</p></li></ul><ul><li><p><0 - Truncate names to the shortest length where they are still |
|---|
| 248 | unique, but never truncate to shorter than the specified number |
|---|
| 249 | (e.g. the value -5 means never truncate to shorter than 5 |
|---|
| 250 | characters).</p></li></ul></td></tr><tr><td class="info">{cex}</td><td class="info" align="left"><p>A numerical value giving the amount by which text should be |
|---|
| 251 | magnified relative to the default. If omitted, the default, the text |
|---|
| 252 | size will be calculated automatically.</p><p>The default automatic calculation means that this seemingly innocuous |
|---|
| 253 | argument has a far reaching effect. If necessary it will trigger the |
|---|
| 254 | node shifting engine to get a decent type size (see the Compress Tree |
|---|
| 255 | Vertically parameter).</p></td></tr><tr><td class="info">{tweak}</td><td class="info" align="left"><p>Adjust the (possibly automatically calculated) Text Magnification |
|---|
| 256 | Factor. For example, use 1.1 to make the text 10% larger. The default |
|---|
| 257 | is 1, meaning no adjustment.</p><p>Note that font sizes are discrete, so the Text Magnification Factor |
|---|
| 258 | you ask for may not be the one you get. And a small tweak may not |
|---|
| 259 | actually change the type size or change it more than you want.</p></td></tr><tr><td class="info">{compress}</td><td class="info" align="left"><p>If True, the default, the tree will be compressed horizontally by |
|---|
| 260 | shifting nodes horizontally where space is available.</p></td></tr><tr><td class="info">{ycompress}</td><td class="info" align="left"><p>If True, the default, and the initial automatically calculated |
|---|
| 261 | Text Magnification Factor is less than 0.7, crowded labels will be |
|---|
| 262 | shifted vertically where space is available. This often allows |
|---|
| 263 | considerably larger text.</p><p>Set this parameter to False if you fell the resulting plot is too |
|---|
| 264 | messy. The shifting algorithm may work a little better (allowing |
|---|
| 265 | larger text) for plot types 1, 2, and 3.</p></td></tr></tbody></table></div><p><h2><img width="11" height="11" border="0" src="sm_arrow_down.gif?format=raw" /> Scripting syntax</h2></p><div Class="expand" id="TEST">TreeModelFitToArcGISTable_GeoEco (inputTable, outputModelFile, formula, method, where, allowMissingCovariates, minSplit, minBucket, cp, maxCompete, maxSurrogate, useSurrogate, surrogateStyle, xval, maxDepth, pruningMethod, pruningCP, xColumnName, yColumnName, zColumnName, mColumnName, writeSummaryFile, writeDiagnosticPlots, writeTreePlot, writePrunedTreePlot, plotFileFormat, res, width, height, pointSize, bg, treePlotType, extra, percentage, under, clipRightLabels, fallenLeaves, branchType, branch, uniform, digits, varlen, faclen, cex, tweak, compress, ycompress) <br /><br /><b>Parameters</b><br /><table width="100%" border="0" cellpadding="5"><tbody><tr><th width="40%"><b>Expression</b></th><th width="60%"><b>Explanation</b></th></tr><tr><td class="info">Input table (Required) </td><td class="info" align="left"><p>ArcGIS table, table view, feature class, or feature layer |
|---|
| 266 | containing the data for which the model should be fitted.</p></td></tr><tr><td class="info">Output model file (Required) </td><td class="info" align="left"><p>Output file to receive the fitted model. The file will not be in a |
|---|
| 267 | user-readable format. After the model is fitted, you can provide the |
|---|
| 268 | file to other tools that perform further analysis or visualization of |
|---|
| 269 | the fitted model.</p><p>It is suggested, but not required, that you give the file an .Rdata |
|---|
| 270 | extension.</p></td></tr><tr><td class="info">Formula (Required) </td><td class="info" align="left"><p>Formula that specifies the table field that is the response |
|---|
| 271 | variable and the table fields that are the terms of the model.</p><p>The formula must be in the format expected by the R glm function:</p><dl><dt></dt><dd><pre>response ~ term1 + term2 + ... + termN</pre></dd></dl><p>response is the table field that will be modeled as the response |
|---|
| 272 | variable and the terms are the table fields that will serve as the |
|---|
| 273 | predictor variables. The field names are case sensitive. If any field |
|---|
| 274 | used in the formula is NULL for a given row, that row will not be used |
|---|
| 275 | in fitting the model.</p><p>For example, if you have a field Presence that indicates the presence |
|---|
| 276 | or absence of a species (1 or 0) and you want to model it in terms of |
|---|
| 277 | sampled environmental covariates stored in the SST, ChlDensity, and |
|---|
| 278 | Depth fields, you would use the formula:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth</pre></dd></dl><p>By default, all terms are treated as continuous variables. To indicate |
|---|
| 279 | that a term should be treated as a categorical variable, use the |
|---|
| 280 | factor function. For example, if SubstrateType is an integer code that |
|---|
| 281 | should be treated as categorical:</p><dl><dt></dt><dd><pre>Presence ~ SST + ChlDensity + Depth + factor(SubstrateType)</pre></dd></dl><p>The model terms may also use these operators:</p><ul><li><p>The : operator denotes the interaction of variables a and b. For |
|---|
| 282 | example: a:b.</p></li></ul><ul><li><p>The * operator denotes "crossing". For example, a*b is identical to |
|---|
| 283 | a+b+a:b.</p></li></ul><ul><li><p>The ^ operator denotes crossing to the Nth degree. For example, |
|---|
| 284 | (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a |
|---|
| 285 | formula containing the main effects for a, b and c together with |
|---|
| 286 | their second-order interactions.</p></li></ul><ul><li><p>The %in% operator indicates that the terms on its left are nested |
|---|
| 287 | within those on the right. For example a + b %in% a expands to the |
|---|
| 288 | formula a + a:b.</p></li></ul><ul><li><p>The - operator (minus) removes the specified terms, so that |
|---|
| 289 | (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c. It can also |
|---|
| 290 | used to remove the intercept term: y ~ x - 1 is a line through the |
|---|
| 291 | origin. A model with no intercept can be also specified as y ~ x + 0 |
|---|
| 292 | or y ~ 0 + x.</p></li></ul><p>While formulae usually involve just variable names, they can also |
|---|
| 293 | involve arithmetic expressions. The formula log(y) ~ a + log(x) is |
|---|
| 294 | quite legal. When such arithmetic expressions involve operators which |
|---|
| 295 | are also used symbolically in model formulae, there can be confusion |
|---|
| 296 | between arithmetic and symbolic operator use.</p><p>To avoid this confusion, the function I() can be used to bracket those |
|---|
| 297 | portions of a model formula where the operators are used in their |
|---|
| 298 | arithmetic sense. For example, in the formula y ~ a + I(b+c), the term |
|---|
| 299 | b+c is to be interpreted as the sum of b and c.</p><p>Please see the topics "glm" and "formula" in the R documentation for |
|---|
| 300 | more information.</p></td></tr><tr><td class="info">Splitting method (Required) </td><td class="info" align="left"><p>Method to use for splitting the tree, one of:</p><ul><li><p>ANOVA - Use this method to build a regression tree, i.e. when you |
|---|
| 301 | are modeling a continuous response variable, such as the abundance |
|---|
| 302 | of a species. With this method, the splits will be chosen to |
|---|
| 303 | maximize the between-groups sum-of-squares in a simple analysis of |
|---|
| 304 | variance.</p></li></ul><ul><li><p>Class - Use this method to build a classification tree, i.e. when |
|---|
| 305 | you are modeling a categorical response variable, such as the |
|---|
| 306 | presence or absence of a species. When this method is selected, the |
|---|
| 307 | response variable is assumed to be categorical and the R factor |
|---|
| 308 | function is automatically applied to it.</p></li></ul><ul><li><p>Exp - Use this method to build a regression tree using exponential |
|---|
| 309 | scaling. For more information about this method, please see the |
|---|
| 310 | references below.</p></li></ul><ul><li><p>Poisson - Use this method to build a regression tree using Poisson |
|---|
| 311 | regression, which is appropriate for event rate data. For more |
|---|
| 312 | information about this method, please see the references below.</p></li></ul><p><b>References</b></p><p><a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a></p><p><a href="http://cran.r-project.org/web/packages/rpart/rpart.pdf">R rpart package documentation</a></p></td></tr><tr><td class="info">Where clause (Optional) </td><td class="info" align="left"><p>SQL WHERE clause expression that specifies the subset of rows to |
|---|
| 313 | process. If this parameter is not provided, all of the rows will be |
|---|
| 314 | processed. If this parameter is provided but the underlying database |
|---|
| 315 | does not support WHERE clauses, an error will be raised.</p><p>The exact syntax of this expression depends on the underlying |
|---|
| 316 | database. ESRI recommends you reference fields using the following |
|---|
| 317 | syntax:</p><ul><li><p>If you're querying ArcInfo coverages, shapefiles, INFO tables or |
|---|
| 318 | dBASE tables (.dbf files), enclose field names in double quotes in |
|---|
| 319 | the SQL expression: "MY_FIELD".</p></li></ul><ul><li><p>If you're querying Microsoft Access tables or personal |
|---|
| 320 | geodatabase tables, enclose field names in square brackets: |
|---|
| 321 | [MY_FIELD].</p></li></ul><ul><li><p>If you're querying ArcSDE geodatabase tables, an ArcIMS feature |
|---|
| 322 | class, or an ArcIMS image service sublayer, don't enclose field |
|---|
| 323 | names: MY_FIELD.</p></li></ul></td></tr><tr><td class="info">Include records that are missing covariates (Optional) </td><td class="info" align="left"><p>If this option is enabled (the default), records will be included |
|---|
| 324 | in the model fitting process so long as they have a value for the |
|---|
| 325 | response variable and at least one predictor variable. If this option |
|---|
| 326 | is disabled, records must have values for the response variable and |
|---|
| 327 | all predictor variables in order to be included.</p><p>The R rpart package that is used to fit the model has the novel |
|---|
| 328 | capability of allowing records that are missing some data to still |
|---|
| 329 | participate in the model fitting process. For more information about |
|---|
| 330 | how this works, please see |
|---|
| 331 | <a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>.</p></td></tr><tr><td class="info">Minimum number of observations to attempt a split (Optional) </td><td class="info" align="left"><p>The minimum number of observations that must exist in a node of |
|---|
| 332 | the tree in order for a split of that node to be attempted. The |
|---|
| 333 | default value, 20, was taken from the R rpart package that is used to |
|---|
| 334 | fit the model.</p></td></tr><tr><td class="info">Minimum number of observations in a leaf node (Optional) </td><td class="info" align="left"><p>The minimum number of observations that may be in any leaf node of |
|---|
| 335 | the tree. The default value, 7, was taken from the R rpart package |
|---|
| 336 | that is used to fit the model. By default, rpart recommends that this |
|---|
| 337 | parameter be set to one third of the previous parameter.</p></td></tr><tr><td class="info">Complexity parameter (Optional) </td><td class="info" align="left"><p>Any split that does not decrease the overall lack of fit by a |
|---|
| 338 | factor of this parameter will not be attempted. For instance, with |
|---|
| 339 | ANOVA splitting, this means that the overall Rsquare must increase by |
|---|
| 340 | this parameter at each step. The main role of this parameter is to |
|---|
| 341 | save computing time by pruning off splits that are obviously not |
|---|
| 342 | worthwhile. Essentially, you inform the tool that any split which does |
|---|
| 343 | not improve the fit by this parameter will likely be pruned off by |
|---|
| 344 | cross-validation, and that hence the tool need not pursue it.</p><p>The default value, 0.01, was taken from the R rpart package that is |
|---|
| 345 | used to fit the model.</p></td></tr><tr><td class="info">Number of competitor splits to retain (Optional) </td><td class="info" align="left"><p>The number of competitor splits to retain in the output. It is |
|---|
| 346 | useful to know not just which split was chosen, but which variable |
|---|
| 347 | came in second, third, etc. The default value, 4, was taken from the R |
|---|
| 348 | rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Number of surrogate splits to retain (Optional) </td><td class="info" align="left"><p>The number of surrogate splits to retain in the output. If this is |
|---|
| 349 | set to zero the compute time will be shortened, since approximately |
|---|
| 350 | half of the computational time (other than setup) is used in the |
|---|
| 351 | search for surrogate splits. The default value, 5, was taken from the |
|---|
| 352 | R rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Surrogate usage method (Optional) </td><td class="info" align="left"><p>The method for using surrogates in the splitting process, one of:</p><ul><li><p>0 - display only; an observation with a missing value for the |
|---|
| 353 | primary split rule is not sent further down the tree.</p></li></ul><ul><li><p>1 - use surrogates, in order, to split subjects missing the primary |
|---|
| 354 | variable; if all surrogates are missing the observation is not |
|---|
| 355 | split.</p></li></ul><ul><li><p>2 - if all surrogates are missing, then send the observation in the |
|---|
| 356 | majority direction. This is the recommendations of Breiman, et al.</p></li></ul><p>The default value, 2, was taken from the R rpart package that is used |
|---|
| 357 | to fit the model.</p></td></tr><tr><td class="info">Surrogate selection method (Optional) </td><td class="info" align="left"><p>The method used to select the best surrogate, one of:</p><ul><li><p>0 - the tool uses the total number of correct classifications for a |
|---|
| 358 | potential surrogate variable.</p></li></ul><ul><li><p>1 - the tool uses the percent correct, calculated over the |
|---|
| 359 | non-missing values of the surrogate.</p></li></ul><p>The default value, 0, was taken from the R rpart package that is used |
|---|
| 360 | to fit the model. This value more severely penalizes covariates with a |
|---|
| 361 | large number of missing values.</p></td></tr><tr><td class="info">Number of cross-validations (Optional) </td><td class="info" align="left"><p>The number of cross-validations to perform.</p><p>The R rpart package that is used to fit the model uses a default of |
|---|
| 362 | 10, but we have found that so few iterations can cause the calculated |
|---|
| 363 | cross-validation error to differ substantially over several runs of |
|---|
| 364 | the tool using identical input data and parameter values. In one case, |
|---|
| 365 | we observed the cross-validation errors to vary by over 10%. Because |
|---|
| 366 | the cross-validation errors are often used to prune the tree, we |
|---|
| 367 | believe it is important to have accurate estimates of them, so we |
|---|
| 368 | increased the default to 1000. This will cause complicated models to |
|---|
| 369 | run substantially slower. If you find your model is too slow, decrease |
|---|
| 370 | the value.</p></td></tr><tr><td class="info">Maximum tree depth (Optional) </td><td class="info" align="left"><p>The maximum depth of any node of the final tree, with the root |
|---|
| 371 | node counted as depth 0. The default value, 30, was taken from the R |
|---|
| 372 | rpart package that is used to fit the model.</p></td></tr><tr><td class="info">Pruning method (Optional) </td><td class="info" align="left"><p>Method to use for pruning the tree, one of:</p><ul><li><p>Minimum error - The tool will prune the tree using the complexity |
|---|
| 373 | parameter associated with the minimum cross-validation error.</p></li></ul><ul><li><p>1-SE rule - The tool will prune the tree using the complexity |
|---|
| 374 | parameter obtained using the 1-SE rule described by Therneau and |
|---|
| 375 | Atkinson in chapter 4 of |
|---|
| 376 | <a href="http://www.mayo.edu/hsr/techrpt/61.pdf">An Introduction to Recursive Partitioning Using the RPART Routines</a>. |
|---|
| 377 | A plot of the mean cross-validation errors versus candidate values |
|---|
| 378 | of the complexity parameter often has an initial sharp drop followed |
|---|
| 379 | by a relatively flat plateau and then a slow rise. (This tool |
|---|
| 380 | produces that plot as an optional diagnostic output.) According to |
|---|
| 381 | the 1-SE rule, any cross-validation error within one standard error |
|---|
| 382 | of the minimum cross-validation error is considered equivalent to |
|---|
| 383 | the minimum (i.e. considered to be part of the flat plateau). The |
|---|
| 384 | 1-SE rule chooses the largest complexity parameter that yields a |
|---|
| 385 | cross-validation error equivalent ot the minimum. This results in an |
|---|
| 386 | optimal tree, i.e. the tree with the fewest number of splits that |
|---|
| 387 | yields a cross-validation error equivalent to the minimum.</p></li></ul><ul><li><p>Interactive - The tool will display the unpruned tree in a window, |
|---|
| 388 | allowing you to prune it interactively with the mouse. If you click |
|---|
| 389 | on a split it will be marked as deleted. If you click on an |
|---|
| 390 | already-deleted split it will be undeleted (if its parent is not |
|---|
| 391 | deleted). Information about the node is printed as you click. When |
|---|
| 392 | you have finished pruning, click on the QUIT button.</p></li></ul><ul><li><p>User specified CP - The tool will prune the tree using the |
|---|
| 393 | complexity parameter you specify below.</p></li></ul><p>If this parameter is omitted, the tree will not be pruned.</p></td></tr><tr><td class="info">Complexity parameter for pruning (Optional) </td><td class="info" align="left"><p>Complexity parameter for pruning the tree. This parameter is only |
|---|
| 394 | used when the Pruning Method is set to 'User specified'.</p></td></tr><tr><td class="info">Name to use for X coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the X coordinates of point |
|---|
| 395 | features. If the input table is a point feature class or layer, the X |
|---|
| 396 | coordinates will be extracted from the points and be accessible in the |
|---|
| 397 | formula using the name provided for this parameter.</p></td></tr><tr><td class="info">Name to use for Y coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the Y coordinates of point |
|---|
| 398 | features. If the input table is a point feature class or layer, the Y |
|---|
| 399 | coordinates will be extracted from the points and be accessible in the |
|---|
| 400 | formula using the name provided for this parameter.</p></td></tr><tr><td class="info">Name to use for Z coordinates of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the Z coordinates of point |
|---|
| 401 | features. If the input table is a point feature class or layer that |
|---|
| 402 | has Z coordinates, the Z coordinates will be extracted from the points |
|---|
| 403 | and be accessible in the formula using the name provided for this |
|---|
| 404 | parameter.</p></td></tr><tr><td class="info">Name to use for M values of points (Optional) </td><td class="info" align="left"><p>Name to use in the formula for the measure values of point |
|---|
| 405 | features. If the input table is a point feature class or layer that |
|---|
| 406 | has measure values, the measure values will be extracted from the |
|---|
| 407 | points and be accessible in the formula using the name provided for this |
|---|
| 408 | parameter.</p></td></tr><tr><td class="info">Write model summary file (Optional) </td><td class="info" align="left"><p>If True, this tool will write summary information about the fitted |
|---|
| 409 | model to a text file. (This is the same information that the tool |
|---|
| 410 | outputs as log messages.) The file will have the name X_summary.txt, |
|---|
| 411 | where X is the name of the output model file, minus any |
|---|
| 412 | extension.</p></td></tr><tr><td class="info">Write diagnostic plots (Optional) </td><td class="info" align="left"><p>If True, this tool will write diagnostic plots:</p><ul><li><p>X_cp.Y - visual representation of the cross-validation results for |
|---|
| 413 | the unpruned tree, to assist you with choosing a Complexity |
|---|
| 414 | Parameter for pruning the tree. The x-axis represents possible |
|---|
| 415 | choices for the Complexity Parameter and the y-axis represents the |
|---|
| 416 | means and standard deviations of the errors in the cross-validated |
|---|
| 417 | prediction that would result. The dashed horizontal line is drawn 1 |
|---|
| 418 | standard error above the minimum of the curve. A good choice of the |
|---|
| 419 | Complexity Parameter for pruning is the leftmost value for which the |
|---|
| 420 | mean error lies below the line. This value will be chosen |
|---|
| 421 | automatically if the Pruning Method parameter is set to '1-SE rule'.</p></li></ul><ul><li><p>X_rsquare.Y - two-panel plot only produced for the ANOVA splitting |
|---|
| 422 | method. The first panel shows the r-square (both apparent and |
|---|
| 423 | apparent from cross-validation) versus the number of splits. The |
|---|
| 424 | second panel shows the mean error in the cross-validated prediction |
|---|
| 425 | versus the number of splits (this is essentially the same plot as |
|---|
| 426 | the X_cp.Y plot described above). Both panels are produced for the |
|---|
| 427 | unpruned tree.</p></li></ul><ul><li><p>X_residuals.Y - plot of the residuals vs. the fitted values for the |
|---|
| 428 | unpruned tree.</p></li></ul><ul><li><p>X_pruned_residuals.Y - plot of the residuals vs. the fitted values |
|---|
| 429 | for the pruned tree. This plot will only be produced if the tree is |
|---|
| 430 | pruned.</p></li></ul><p>In the file names above, X is the name of the output model file, minus |
|---|
| 431 | any extension, and Y is the extension of the selected output plot |
|---|
| 432 | format.</p></td></tr><tr><td class="info">Write tree plot (Optional) </td><td class="info" align="left"><p>If True, this tool will write a plot of the unpruned tree to a |
|---|
| 433 | file having the name X_unpruned_tree.Y, where X is the name of the |
|---|
| 434 | output model file minus the extension and Y is the extension of the |
|---|
| 435 | selected output plot format.</p></td></tr><tr><td class="info">Write pruned tree plot (Optional) </td><td class="info" align="left"><p>If True, this tool will write a plot of the pruned tree to a file |
|---|
| 436 | having the name X_pruned_tree.Y, where X is the name of the output |
|---|
| 437 | model file minus the extension and Y is the extension of the selected |
|---|
| 438 | output plot format. This plot will only be produced if the tree is |
|---|
| 439 | pruned.</p></td></tr><tr><td class="info">Plot file format (Optional) </td><td class="info" align="left"><p>Plot file format, one of:</p><ul><li><p>emf - Windows enhanced metafile (EMF) format. This is a vector |
|---|
| 440 | format that may be printed and resized without any pixelation and is |
|---|
| 441 | therefore suitable for use in printable documents that recognize |
|---|
| 442 | this format (e.g. Microsoft Word or Microsoft Visio).</p></li></ul><ul><li><p>png - Portable network graphics (PNG) format. This is a compressed, |
|---|
| 443 | lossless, highly portable raster format suitable for use in web |
|---|
| 444 | pages or other locations where a raster format is desired. Most |
|---|
| 445 | scientific journals accept PNG; they typically request that files |
|---|
| 446 | have a resolution of at least 1000 DPI.</p></li></ul></td></tr><tr><td class="info">Plot resolution, in DPI (Optional) </td><td class="info" align="left"><p>PNG plot file resolution, in dots per inch (DPI). The default is |
|---|
| 447 | set to a high value (1000) because this is the minimum resolution |
|---|
| 448 | typically required by scientific journals that accept figures in PNG |
|---|
| 449 | format.</p><p>This parameter is ignored for EMF format because it is a vector |
|---|
| 450 | format.</p></td></tr><tr><td class="info">Plot width (Optional) </td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG |
|---|
| 451 | format).</p></td></tr><tr><td class="info">Plot height (Optional) </td><td class="info" align="left"><p>Plot file width in inches (for EMF format) or pixels (for PNG |
|---|
| 452 | format).</p></td></tr><tr><td class="info">Default pointsize of text in diagnostic plots (Optional) </td><td class="info" align="left"><p>The default pointsize of text in diagnostic plots (the size of the |
|---|
| 453 | text in tree plots is controlled by a different parameter).</p></td></tr><tr><td class="info">Plot background color (Optional) </td><td class="info" align="left"><p>PNG plot file background color. The color must be a valid name in |
|---|
| 454 | R's color palette, or "transparent" if there is no background color. |
|---|
| 455 | This parameter is ignored if the plot format file is EMF.</p></td></tr><tr><td class="info">Plot type (Optional) </td><td class="info" align="left"><p>Type of tree plots to create, one of:</p><ul><li><p>0 - The default. Draw a split label at each split and a node label |
|---|
| 456 | at each leaf.</p></li></ul><ul><li><p>1 - Label all nodes, not just leaves.</p></li></ul><ul><li><p>2 - Like 1 but draw the split labels below the node labels. Similar |
|---|
| 457 | to the plots in the CART book.</p></li></ul><ul><li><p>3 - Draw separate split labels for the left and right directions.</p></li></ul><ul><li><p>4 - Like 3 but label all nodes, not just leaves.</p></li></ul></td></tr><tr><td class="info">Extra information (Optional) </td><td class="info" align="left"><p>Extra information to display at the nodes, one of:</p><ul><li><p>0 - No extra information.</p></li></ul><ul><li><p>1 - The default. Display the number of observations that fall in the |
|---|
| 458 | node (per class for Class models; prefixed by the number of events |
|---|
| 459 | for Poisson and Exp models).</p></li></ul><ul><li><p>2 - Class models: display the classification rate at the node, expressed as the |
|---|
| 460 | number of correct classifications and the number of observations in the node. |
|---|
| 461 | Poisson and Exp models: display the number of events.</p></li></ul><ul><li><p>3 - Class models only: misclassification rate at the node, expressed |
|---|
| 462 | as the number of incorrect classifications and the number of |
|---|
| 463 | observations in the node.</p></li></ul><ul><li><p>4 - Class models only: probability per class of observations in the |
|---|
| 464 | node (conditioned on the node, sum across a node is 1).</p></li></ul><ul><li><p>5 - Class models only: like 4 but do not display the fitted class.</p></li></ul><ul><li><p>6 - Class models only: the probability of the second class only. |
|---|
| 465 | Useful for binary responses.</p></li></ul><ul><li><p>7 - Class models only: like 6 but do not display the fitted class.</p></li></ul><ul><li><p>8 - Class models only: the probability of the fitted class.</p></li></ul><ul><li><p>9 - Class models only: the probabilities times the fraction of |
|---|
| 466 | observations in the node (the probability relative to all |
|---|
| 467 | observations, sum across all leaves is 1).</p></li></ul></td></tr><tr><td class="info">Display percentage of observations (Optional) </td><td class="info" align="left"><p>If True, the default, nodes will be labeled with the percentage of |
|---|
| 468 | observations in the node. The percentage will be displayed below the |
|---|
| 469 | "extra information" (if any is requested).</p></td></tr><tr><td class="info">Display extra text under node boxes (Optional) </td><td class="info" align="left"><p>If True, the default, extra information and percentage of |
|---|
| 470 | observations will be displayed below the nodes. If False, they will be |
|---|
| 471 | displayed within the nodes' boxes.</p><p>This parameter is ignored if neither extra information nor percentage |
|---|
| 472 | of observations are requested.</p></td></tr><tr><td class="info">Clip right-hand split labels (Optional) </td><td class="info" align="left"><p>If True, the default, the right-hand split labels on plots of type |
|---|
| 473 | 3 or 4 will not include "variable=". If False, the right-hand labels |
|---|
| 474 | will include "variable=", just like the left-hand labels.</p><p>This parameter is ignored the plot type is not 3 or 4.</p></td></tr><tr><td class="info">Display leaves at bottom (Optional) </td><td class="info" align="left"><p>If True, all leaf nodes will be displayed at the bottom. If False, |
|---|
| 475 | the default, leaf nodes will be displayed where they would normally |
|---|
| 476 | appear.</p></td></tr><tr><td class="info">Branch type (Optional) </td><td class="info" align="left"><p>Type of branches to draw. If zero, the default, the tool will draw |
|---|
| 477 | conventional branches having a constant narrow width. If nonzero, the tool |
|---|
| 478 | will draw "wide branches", with branch widths proportional to the |
|---|
| 479 | specified parameter, one of:</p><ul><li><p>1 - Deviance</p></li></ul><ul><li><p>2 - Square root of deviance</p></li></ul><ul><li><p>3 - Deviance / number of observations</p></li></ul><ul><li><p>4 - Square root of (deviance / number of observations)</p></li></ul><ul><li><p>5 - Number of observations</p></li></ul><ul><li><p>6 - Complexity parameter</p></li></ul><ul><li><p>7 - Absolute value of the predicted value</p></li></ul><ul><li><p>8 - Predicted value minus the minimum predicted value</p></li></ul><ul><li><p>9 - Constant wide width, for checking the visual distortion that |
|---|
| 480 | results when wide branches are drawn at different angles</p></li></ul></td></tr><tr><td class="info">Branch shape (Optional) </td><td class="info" align="left"><p>Controls the shape of the branches from parent to child nodes. Any |
|---|
| 481 | number from 0 to 1 is allowed. A value of 1 gives square shouldered |
|---|
| 482 | branches, a value of 0 give V shaped branches, with other values being |
|---|
| 483 | intermediate.</p><p>Note that if the Branch Type parameter is nonzero, the Branch Shape |
|---|
| 484 | parameter will be rounded to 1 or 0 (e.g. a Branch Shape of 0.75 will |
|---|
| 485 | be rounded to 1).</p></td></tr><tr><td class="info">Use uniform vertical spacing (Optional) </td><td class="info" align="left"><p>If True, the default, the vertical spacing of the nodes will be |
|---|
| 486 | uniform. If False, the nodes will be spaced proportionally to the fit |
|---|
| 487 | (more precisely, to the difference between a node's deviance and the |
|---|
| 488 | sum of its children's deviances). Small spaces must be expanded to |
|---|
| 489 | leave room for the labels.</p><p>Note: if this parameter is False and the Text Magnification Factor is |
|---|
| 490 | omitted (the default), very small text can sometimes result.</p></td></tr><tr><td class="info">Significant digits for labels (Optional) </td><td class="info" align="left"><p>Number of significant digits to display in floating-point numbers.</p><p>Probabilities and percentages are treated specially. Probabilities are |
|---|
| 491 | displayed with the specified number of digits after the decimal point |
|---|
| 492 | (by default 2 digits). Percentages are displayed with the specified |
|---|
| 493 | number of digits minus 2 after the decimal point (by default no |
|---|
| 494 | digits).</p></td></tr><tr><td class="info">Length of variable names at splits (Optional) </td><td class="info" align="left"><p>Length of variable names in text at the splits (and, for class |
|---|
| 495 | responses, the class displayed at the node). There are three |
|---|
| 496 | possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>>0 - Use an abbreviation algorithm to shorten the names to at |
|---|
| 497 | least the specified number, such that they remain unique.</p></li></ul><ul><li><p><0 - Truncate names to the shortest length where they are still |
|---|
| 498 | unique, but never truncate to shorter than the specified number |
|---|
| 499 | (e.g. the value -5 means never truncate to shorter than 5 |
|---|
| 500 | characters).</p></li></ul></td></tr><tr><td class="info">Length of factor level names in splits (Optional) </td><td class="info" align="left"><p>Length of factor level names (i.e. categorical variable values) in |
|---|
| 501 | splits. There are four possibilities:</p><ul><li><p>0 - The default. Use full names.</p></li></ul><ul><li><p>1 - Represent factor levels with alphabetic characters (a for the |
|---|
| 502 | first level, b for the second, and so on).</p></li></ul><ul><li><p>>1 - Use an abbreviation algorithm to shorten the names to at |
|---|
| 503 | least the specified number, such that they remain unique.</p></li></ul><ul><li><p><0 - Truncate names to the shortest length where they are still |
|---|
| 504 | unique, but never truncate to shorter than the specified number |
|---|
| 505 | (e.g. the value -5 means never truncate to shorter than 5 |
|---|
| 506 | characters).</p></li></ul></td></tr><tr><td class="info">Text magnification factor (Optional) </td><td class="info" align="left"><p>A numerical value giving the amount by which text should be |
|---|
| 507 | magnified relative to the default. If omitted, the default, the text |
|---|
| 508 | size will be calculated automatically.</p><p>The default automatic calculation means that this seemingly innocuous |
|---|
| 509 | argument has a far reaching effect. If necessary it will trigger the |
|---|
| 510 | node shifting engine to get a decent type size (see the Compress Tree |
|---|
| 511 | Vertically parameter).</p></td></tr><tr><td class="info">Tweak text magnification (Optional) </td><td class="info" align="left"><p>Adjust the (possibly automatically calculated) Text Magnification |
|---|
| 512 | Factor. For example, use 1.1 to make the text 10% larger. The default |
|---|
| 513 | is 1, meaning no adjustment.</p><p>Note that font sizes are discrete, so the Text Magnification Factor |
|---|
| 514 | you ask for may not be the one you get. And a small tweak may not |
|---|
| 515 | actually change the type size or change it more than you want.</p></td></tr><tr><td class="info">Compress tree horizontally (Optional) </td><td class="info" align="left"><p>If True, the default, the tree will be compressed horizontally by |
|---|
| 516 | shifting nodes horizontally where space is available.</p></td></tr><tr><td class="info">Compress tree vertically (Optional) </td><td class="info" align="left"><p>If True, the default, and the initial automatically calculated |
|---|
| 517 | Text Magnification Factor is less than 0.7, crowded labels will be |
|---|
| 518 | shifted vertically where space is available. This often allows |
|---|
| 519 | considerably larger text.</p><p>Set this parameter to False if you fell the resulting plot is too |
|---|
| 520 | messy. The shifting algorithm may work a little better (allowing |
|---|
| 521 | larger text) for plot types 1, 2, and 3.</p></td></tr></tbody></table></div></body></html> |
|---|