edu.cmu.minorthird.classify.experiments
Class WebmasterSplitter<T>

java.lang.Object
  extended by edu.cmu.minorthird.classify.experiments.WebmasterSplitter<T>
All Implemented Interfaces:
Splitter<T>

public class WebmasterSplitter<T>
extends java.lang.Object
implements Splitter<T>

A complicated splitter that stratifies samples according to an arbitrary "profile" property, and restricts train/test splits to not cross boundaries defined by "user" and "request" properties. This will do a random split according to users, then a stratified split according to requests (with the stratification done according to profiles).

Constraints on splitting are defined by a file with multiple lines of the form

 msgId userId requestId profileId
 
. where each Id is a String.

The main purpose is of this is to split webmaster data, hence the name.

Author:
William Cohen

Constructor Summary
WebmasterSplitter(java.lang.String constraintFileName, double fraction, int folds)
           
 
Method Summary
 int getNumPartitions()
          Return the number of partitions produced by the last call to split()
 java.util.Iterator<T> getTest(int k)
          Return an iterator over the test cases in the k-th split.
 java.util.Iterator<T> getTrain(int k)
          Return an iterator over the training cases in the k-th split.
static void main(java.lang.String[] args)
           
 void split(java.util.Iterator<T> it)
          Split the iterator into a number of train/test partitions.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

WebmasterSplitter

public WebmasterSplitter(java.lang.String constraintFileName,
                         double fraction,
                         int folds)
Method Detail

split

public void split(java.util.Iterator<T> it)
Description copied from interface: Splitter
Split the iterator into a number of train/test partitions.

Specified by:
split in interface Splitter<T>

getNumPartitions

public int getNumPartitions()
Description copied from interface: Splitter
Return the number of partitions produced by the last call to split()

Specified by:
getNumPartitions in interface Splitter<T>

getTrain

public java.util.Iterator<T> getTrain(int k)
Description copied from interface: Splitter
Return an iterator over the training cases in the k-th split.

Specified by:
getTrain in interface Splitter<T>

getTest

public java.util.Iterator<T> getTest(int k)
Description copied from interface: Splitter
Return an iterator over the test cases in the k-th split.

Specified by:
getTest in interface Splitter<T>

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

main

public static void main(java.lang.String[] args)