# Each command in the session below is preceded by a breif hashed comment. # For more detailed descriptions of individual commands, see "*readme" files. # The following is a complete list of the OC1-related files. $ ls README gendata.readme oc1.h aaai93-paper.ps impurity_measures.c oc1.tar announce linear.data perturb.c classify.c linear.test prune.c classify_util.c linear.train sample.dt compute_impurity.c load_data.c sample_session display.c makefile source_code.readme display.readme mktree.c train_util.c gendata.c mktree.readme util.c # Run "make mktree", "make gendata" and "make display" commands. to # compile everything. # the simplest use of mktree. # Induces a decision tree for the dataset linear.train, and stores it. # The decision trees induced by mktree are always written to a file, # so as not to waste effort. When cross validation is used, only the first # decision tree is written to a file. The default file name is # .dt. # In the following command, most of the parameters to mktree have default # values. The parameters number of dimensions, number of categories and # number of examples are computed from the datafile. $ mktree -tlinear.train Pruned decision tree written to linear.train.dt. # The following command chooses 48 points randomly out of the file # linear.train, builds a decision tree for them, and estimates the # classification accuracy on the rest of the points (32 of them) in the file # using this tree. As the pruning portion is specified as 0, no pruning is # done. $mktree -tlinear.train -n48 -p0 Unpruned decision tree written to linear.train.dt. Leaf Count = 2, Tree Depth = 1 Classification accuracy = 87.5000 Leaf count = 2.0 Tree Depth = 1.0 (Without pruning) Class 1 : Accuracy = 88.889 (16/18) Class 2 : Accuracy = 85.714 (12/14) # "Induce a decision tree for the dataset linear.train, and classify the # data points in linear.test, using the above decision tree" . Both linear.train # and linear.test consist of points that are labelled with categories, so it # is possible to compute the accuracy of classification. $ mktree -tlinear.train -Tlinear.test Pruned decision tree written to linear.train.dt. Leaf Count = 2, Tree Depth = 1 Classification accuracy = 100.0000 Leaf count = 2.0 Tree Depth = 1.0 (With pruning) Class 1 : Accuracy = 100.000 (12/12) Class 2 : Accuracy = 100.000 (8/8) # In stead of rebuilding the decision tree for each experiment, we can # read it off a file, as shown below. $ mktree -Dlinear.train.dt -Tlinear.test Classification accuracy = 100.0000 Leaf count = 2.0 Tree Depth = 1.0 (With pruning) Class 1 : Accuracy = 100.000 (12/12) Class 2 : Accuracy = 100.000 (8/8) # A more complicated use of mktree. # Do a 5-fold cross validation on a linear.data, # restarting from at most 30 different random hyperplanes # at each node, trying at most 25 random perturbations at # each local minimum. # Abbreviations in the output : TD=Tree Depth, LC=Leaf Count, acc=accuracy, # "1:13/13(100.000)"= 13 out of the 13 points in class 1 in the test set, # are correctly classified, giving 100% accuracy). $ mktree -tlinear.data -V5 -i30 -m25 Pruned decision tree 1 written to linear.data.dt. Fold 1: LC = 2 TD = 1, Acc = 80.00 Fold 2: LC = 2 TD = 1, Acc = 100.00 Fold 3: LC = 2 TD = 1, Acc = 100.00 Fold 4: LC = 2 TD = 1, Acc = 100.00 Fold 5: LC = 2 TD = 1, Acc = 100.00 Using 5-fold cross validation: Classification accuracy = 96.0000 Leaf count = 2.0 Tree Depth = 1.0 (With pruning) Class 1 : Accuracy = 96.000 (48/50) Class 2 : Accuracy = 96.000 (48/50) Standard deviation of leaf counts = 0.000 Standard deviation of accuracy = 17.889 # Generate 200 random data points that will be perfectly classified by the # decision tree in linear.data.dt, and write the points to linear.test2, in # verbose mode. $ gendata -v -Dlinear.data.dt -n200 -olinear.test2 Decision tree read from linear.data.dt. Number of dimensions = 2, Number of classes = 2 200 random data points generated. Category 1 : 105 points Category 2 : 95 points Output written to linear.test2. # Build an axis parallel decision tree on the set linear.test2. # Verbose mode. Output the decision tree to linear.ap.dt $ mktree -tlinear.test2 -a -Dlinear.ap.dt -v Decision tree read from linear.data.dt. Number of dimensions = 2, Number of classes = 2 200 random data points generated. Category 1 : 105 points Category 2 : 95 points Output written to linear.test2. ESCH:/exports/users/escher/murthy/classify(80) mktree -tlinear.test2 -a -Dlinear.ap.dt -v 200 training examples loaded from linear.test2. 180 used for training, 20 for pruning. Dimensions = 2, Categories = 2 Root hyperplane found. "l" hyperplane found. "ll" hyperplane found. "lll" hyperplane found. "r" hyperplane found. "rr" hyperplane found. "rrr" hyperplane found. "rrrr" hyperplane found. "rrrrl" hyperplane found. "rrrrll" hyperplane found. Error Complexity Pruning: Tree 1: TD=7.0 LC=11.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 2: TD=5.0 LC=9.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 3: TD=5.0 LC=8.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 4: TD=3.0 LC=6.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 5: TD=3.0 LC=4.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 6: TD=2.0 LC=3.0 acc=95.000 1:13/13(100.000) 2:6/7(85.714) Tree 7: TD=1.0 LC=2.0 acc=90.000 1:13/13(100.000) 2:5/7(71.429) Tree 6 Selected. Pruning Done. Pruned decision tree written to linear.ap.dt. Leaf Count = 3, Tree Depth = 2 # display the oblique tree that generated linear.test2, along with the data # points, in the bottom half of the page (uppermost y coordinate = 300), # as a PostScript(R) file linear.o.ps. $ display -tlinear.test2 -Dlinear.data.dt -Y300 > linear.o.ps # display the axis parallel tree generated using linear.test2, along with the # data, in the top half of the page (lowermost y coordinate = 320), # as a PostScript(R) file linear.ap.ps. $ display -tlinear.test2 -Dlinear.ap.dt -y320 > linear.ap.ps # Now, the files linear.*.ps can be viewed, using any standard PostScript(R) # previewer, to contrast the axis parallel tree construction with the # oblique one.