edu.cmu.minorthird.classify
Class SampleDatasets

java.lang.Object
  extended by edu.cmu.minorthird.classify.SampleDatasets

public class SampleDatasets
extends java.lang.Object

Some sample inputs for learners.

Author:
William Cohen

Field Summary
static java.lang.String[] negTest
           
static java.lang.String[] negTrain
           
static java.lang.String[] posTest
           
static java.lang.String[] posTrain
           
 
Constructor Summary
SampleDatasets()
           
 
Method Summary
static void main(java.lang.String[] args)
           
static Dataset makeLogisticRegressionData(java.util.Random rand, int m, double a, double b)
          Data useful for testing univariate logistic regression.
static Dataset makeNumericData(java.util.Random r, int dim, int m)
          Random data, defined by a simple boolean combination of thresholds over two dimensions, with up to 5 irrelevant dimensions, and m examples.
static Dataset makeSparseNumericData(java.util.Random r, int m)
          Sparse numeric data - some values are 1.0, and some are zero.
static Dataset makeToy3ClassData(java.util.Random random, int numInstances)
          Makes a sample 3 class dataset
static SequenceDataset makeToySequenceData()
           
static SequenceDataset makeToySequenceData(java.lang.String[] lines)
           
static SequenceDataset makeToySequenceTestData()
           
static Dataset sampleData(java.lang.String name, boolean isTest)
           
static Dataset toyBayesExtremeTest()
           
static Dataset toyBayesExtremeTrain()
           
static Dataset toyBayesExtremeUnlabeledTrain()
           
static Dataset toyBayesTest()
          Test data for a trivial classification problem.
static Dataset toyBayesTrain()
          Training data for a trivial classification problem.
static Dataset toyTest()
          Test data for a trivial classification problem.
static Dataset toyTrain()
          Training data for a trivial classification problem.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

posTrain

public static final java.lang.String[] posTrain

negTrain

public static final java.lang.String[] negTrain

posTest

public static final java.lang.String[] posTest

negTest

public static final java.lang.String[] negTest
Constructor Detail

SampleDatasets

public SampleDatasets()
Method Detail

toyTrain

public static Dataset toyTrain()
Training data for a trivial classification problem.


toyTest

public static Dataset toyTest()
Test data for a trivial classification problem.


toyBayesExtremeTrain

public static Dataset toyBayesExtremeTrain()

toyBayesExtremeTest

public static Dataset toyBayesExtremeTest()

toyBayesExtremeUnlabeledTrain

public static Dataset toyBayesExtremeUnlabeledTrain()

toyBayesTrain

public static Dataset toyBayesTrain()
Training data for a trivial classification problem.


toyBayesTest

public static Dataset toyBayesTest()
Test data for a trivial classification problem.


makeSparseNumericData

public static Dataset makeSparseNumericData(java.util.Random r,
                                            int m)
Sparse numeric data - some values are 1.0, and some are zero.


makeNumericData

public static Dataset makeNumericData(java.util.Random r,
                                      int dim,
                                      int m)
Random data, defined by a simple boolean combination of thresholds over two dimensions, with up to 5 irrelevant dimensions, and m examples.


makeLogisticRegressionData

public static Dataset makeLogisticRegressionData(java.util.Random rand,
                                                 int m,
                                                 double a,
                                                 double b)
Data useful for testing univariate logistic regression. The dataset will contain m examples, each with a single uniformly-distributed numeric feature x. The probability of the positive class will be chosen according to logistic(a*x + b).


makeToySequenceData

public static SequenceDataset makeToySequenceData()

makeToySequenceTestData

public static SequenceDataset makeToySequenceTestData()

makeToySequenceData

public static SequenceDataset makeToySequenceData(java.lang.String[] lines)

makeToy3ClassData

public static Dataset makeToy3ClassData(java.util.Random random,
                                        int numInstances)
Makes a sample 3 class dataset

Parameters:
random - A random number generator for building the dataset.
numInstances - The number of instances to be created.

sampleData

public static Dataset sampleData(java.lang.String name,
                                 boolean isTest)

main

public static void main(java.lang.String[] args)