[Reel Two]

Home News Products Services Demos About Us

The building blocks for Reel Two solutions, the Classifier, Entity Extractor and Web & Search products can also be used individually to solve critical information problems.

Classifier Entity Extractor SureGene SureChem

Data Sets

Reel Two is constantly engaged in developing new datasets that test the performance and robustness of the Classification System technology. The sample datasets included here are from a variety of applications in different industries, and demonstrate the Classification System's ability to work with both text and nominal data formats.

The table lists statistics for each dataset, including "Build Time" and "F-Measure". Build Time is the time to load, model and evaluate (using Leave-One-Out evaluation) a dataset on a WinXP/1GHz Celeron/256MB computer. F-Measure is the micro-averaged F-Measure across all categories in the dataset.

Categories Instances Build Time F Measure
Reuters-21578 (Top 10)

The Reuters News research dataset is a compilation of new stories from Reuters News organized into a number of topics. Identifying the documents from the largest 10 categories is one of the most popular text categorization tests.

Download: Reuters.ratz

License: Restricted

Original Dataset: Maintained here by David Lewis of AT&T.

10 2,535 15 seconds 0.9121
Gene Ontology (GO) MEDLINE Abstracts

The GO dataset is an association of MEDLINE research abstracts that have been classified according to the Gene Ontology, a structure encoding information about gene products and functions.

Download: Gene Ontology.ratz

License: Restricted

Original Dataset: Maintained here by the United States National Library of Medicine.

72 2,721 45 seconds 0.7242
Jaguar: Car or Cat

Reel Two created this dataset as a basic demonstration of the categorization task. The dataset consists of documents containing the word "Jaguar", but are they about the car or the cat?

Download: Jaguar.ratz

License: Public Domain

2 200 n/a n/a
Language Recognition

The Reel Two Classification System supports 25 languages via the built-in facilities of the Java programming language. This dataset was created by Reel Two to demonstrate that capability. News was sampled from a variety of news sources around the world.

Download: Multilingual.ratz

License:

25 626 10 seconds 0.9774
Steel Annealing

Download: Anneal.ratz

License: Public Domain

Original Dataset: Maintained here by the University of California, Irvine (UCI).

6 798 5 seconds 0.9211
Diabetes Detection

Download: Diabetes.ratz

License:

Original Dataset: Maintained here by the University of California, Irvine (UCI).

2 421 2 seconds 0.7981
Gene Splicing

Download: Splice.ratz

License:

Original Dataset: Maintained by the University of California, Irvine (UCI).

3 3,190 15 seconds 0.9128