"gendata", as the name suggests, is a program to generate data. It can do the following types of things : 1. Generate random examples, with d real-valued attributes. ----------- 2. Label the examples with random category numbers, where the user specifies the # of categories. 3. Read in a decision tree and classify the examples with it. The main use of gendata is to experiment with artificial datasets. We can, for instance, start with a decision tree named DT1; generate random data points (examples) and label them according to DT1; use "mktree" to induce a decision tree DT2 on the data points, and use "display" to compare DT2 with DT1. This comparison may give the user some insight about parameter settings for OC1. For example, if the root hyperplane in DT2 is consistently getting stuck in a local minimum, resulting in a bad overall tree, you can try increasing the number_of_iterations or the max_no_of_random_perturbations (the i and m options of mktree, respectively). The following is a list of "gendata" options. The file "sample_session" in this directory lists some typical calls to "gendata". -D : File containing the decision tree For sample of the decision tree format, see sample.dt. If no file name is specified, or if there is some problem with reading the tree from the specified file, "gendata" labels the generated data points randomly, or leaves them unlabeled if -u option is specified. This option overrides the -d and -c options, as the decision tree also lists the number of dimensions and the number of categories. gendata does not even allow you to specify this option along with -d and -c options on the command line. -s : integer seed for the random number generator Default : system default for srand48() system call. -o : file to write the generated data. Default=stdout. You can re-direct this to a file, of course. -n : number of examples (data points) to be generated There is no default for this; the user must specify this number. -d : number of dimensions If a decision tree is specified with the -D option, number of dimensions are read from there in. Otherwise, default=2. -c : number of categories If a decision tree is specified with the -D option, number of categories are read from there in. If not, and if -u option is not chosen, default=2. -u : generate unlabeled points (default= Off). Overrides the -D option. -v : verbose output. (default=FALSE) If true, gendata will print extra information about what it's doing. -a : all generated attribute values will greater than or equal to this number. Default = 0. -b : all generated attribute values will be less than or equal to this number. Default = 1.