| .names file created by George John, October 1994
| This data is ALMOST the same as the original crx dataset Quinlan
| used in C4.5, but 
| * missing values have been replaced with the medians,
|   which is unfair to the algorithms that can deal
|   with missing data well.  Replacing an attribute by its mean/median
|   value is known to be one of the poorest methods of handling missing values.
| * attribute 4 is removed (I checked -- in the entire dataset atts 4 and 5
|   were completely correlated)
| * categorical attribute values are numbered in increasing likelihood
|   of being class + and treated  as numeric in the statlog tests.
|   Strange.
|
|1. TITLE: 
|	Australian Credit Approval
|
|2. USE IN STATLOG
|
|	2.1- Testing Mode		
|		10-Fold Cross Validation
|
|	2.2- Special Preprocessing	
|		Yes (See REMARKS)
|
|	2.3- Test Results
|
|		Algorithm	Success Rate
|		---------	------------
|		Cal5		86.900
|		Itrule		86.300
|		LogDisc		85.900
|		Discrim		85.900
|		Dipol92		85.900
|		Radial		85.500
|		Cart		85.500
|		Castle		85.200
|		Bayes		84.900
|		IndCart		84.800
|		BackProp	84.600
|		C4.5		84.500
|		Smart		84.200
|		BayTree		82.900
|		KNN		81.900
|		Ac2		81.900
|		NewId		81.900
|		LVQ		80.300
|		Alloc80		79.900
|		Cn2		79.600
|		QuaDisc		79.300
|		Default		56.000
|		Cascade		0.000
|		Kohonen		0.000
|
|3. SOURCES and PAST USAGE
|  
|   	3.1 ORIGINAL SOURCE
|    		(confidential)
|    		Submitted by quinlan@cs.su.oz.au
|
|	3.2 PAST USAGE
|	   See Quinlan,
|    	     * "Simplifying decision trees", Int J Man-Machine Studies 27,
|      	   Dec 1987, pp. 221-234.
|   	     * "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992
|  
|	3.2.  RELEVANT INFORMATION
|
|    	This file concerns credit card applications.  All attribute names
|    	and values have been changed to meaningless symbols to protect
|    	confidentiality of the data.
|  
|    	This dataset is interesting because there is a good mix of
|    	attributes -- continuous, nominal with small numbers of
|    	values, and nominal with larger numbers of values.  There
|    	were originally a few missing values, but these have all
|    	been replaced by the overall median.
|
|4. DATASET DESCRIPTION 
|   
|   	NUMBER OF EXAMPLES
|		Total no. =  690
|   	NUMBER OF CLASSES: 2
|	    	0,1 (-,+)
|
|  		Class Distribution: 
|    		+: 307 (44.5%)    CLASS 1
|    		-: 383 (55.5%)    CLASS 0
|
|   	NUMBER OF ATTRIBUTES
|	 	14  (6 Continuous 8 Categorical)
|
|    	A1:	0,1    CATEGORICAL
|        	a,b
|    	A2:	continuous.
|    	A3:	continuous.
|    	A4:	1,2,3         CATEGORICAL
|        	p,g,gg
|    	A5:	1, 2,3,4,5, 6,7,8,9,10,11,12,13,14    CATEGORICAL
|           	ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x 
|         
|    	A6:	 1, 2,3, 4,5,6,7,8,9    CATEGORICAL
|        	ff,dd,j,bb,v,n,o,h,z 
|
|    	A7:	continuous.
|    	A8:	1, 0       CATEGORICAL
|        	t, f.
|    	A9: 	1, 0	    CATEGORICAL
|        	t, f.
|    	A10:	continuous.
|    	A11:  	1, 0	    CATEGORICAL
|          	t, f.
|    	A12:    1, 2, 3    CATEGORICAL
|            	s, g, p 
|    	A13:	continuous.
|    	A14:	continuous.
|	
|5-REMARKS:
|
|	Missing Attribute Values:
|    		37 cases (5%) HAD one or more missing values.  The missing
|    		values from particular attributes WERE:
|
|    		A1:  12
|    		A2:  12
|    		A4:   6
|    		A5:   6
|    		A6:   9
|    		A7:   9
|    		A14: 13
|    
|    		THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE (CATEGORICAL)
|                               MEAN OF THE ATTRIBUTE (CONTINUOUS)
|                           
|   	There is no cost matrix.
|
|
|_____________________________________________________________________________
|
|Three remarks relating to the StatLog version:
|       
|        THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE OF THE STATISTICAL 
|	ALGORITHMS.   FOR EXAMPLE, ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS p,g,gg 
|	AND THESE HAVE BEEN CHANGED TO LABELS 1,2,3.
|
|1.  Attributes 4 and 5 of the original WERE APPARENTLY IDENTICAL,  
|    so ATTRIBUTE 4 OF THE ORIGINAL WAS REMOVED 
|    (for the convenience of the statistical algorithms).
|
|2.  Where attributes were categorical, the categories were given numerical
|    labels in the order of the relative risk of being class "+".  Treat 
|    as categorical if thought desirable.   All StatLog trials treated
|    these variables as numerical.
|    
|3.  A stepwise regression procedure strongly suggests that only
|    attributes A5, A8, A9, A13 and A14 are relevant.   Improved results
|    are often obtained if only these five attributes are used.
|     
|CONTACTS
|	statlog-adm@ncc.up.pt
|	bob@stams.strathclyde.ac.uk
|	
|	See README file for general information 
|                        
|================================================================================
|

0,1.
A1:	0,1.
A2:	continuous.
A3:	continuous.
A4:	1,2,3.
A5:	1, 2,3,4,5, 6,7,8,9,10,11,12,13,14.
A6:	 1, 2,3, 4,5,6,7,8,9.
A7:	continuous.
A8:	1, 0.
A9: 	1, 0.
A10:	continuous.
A11:  	1, 0.
A12:    1, 2, 3.
A13:	continuous.
A14:	continuous.