Name		Name	Last commit message	Last commit date
parent directory ..
Genre classification		Genre classification
MRPC		MRPC
Poems classifier		Poems classifier
Proza classifier		Proza classifier
Readability classifier		Readability classifier
SICK		SICK
SST		SST
STS		STS
TREC		TREC
Tags classifier		Tags classifier
README.md		README.md

README.md

Datasets information

SentEval_Ru allows you to evaluate your sentence embeddings as features for the following tasks:

Task	Type
MRPC	paraphrase detection
SST/dialog-2016	third-labeled sentiment analysis
SST/binary	binary sentiment analysis
Tags classifier	tags classifier
Readability classifier	readability classifier
Poems classifier	tag classifier
Proza classifier	tag classifier
Genre classification	tag classifier
TREC (translated to Russian)	question-type classification
SICK-E (translated to Russian)	natural language inference
STS (translated to Russian)	semantic textual similarity

In the folder with each task there are datasets presented in .csv format. Test datasets contain the following:

MRPC

Tab separated input files with s1 | s2 | label structure. (s1, s2 – sentences)

The system participating in this task should compute semantic similarity s1 and s2 are, returning a similarity score — 0 or 1.

SST/dialog-2016

Tab separated input files with id | sentence | label structure.

The system participating in this task should classify the polarity of a given sentence at the document — is it positive (1), negative (-1) or neutral (0).

SST/binary

Tab separated input files with sentence | label structure.

The system participating in this task should classify the polarity of a given sentence at the document — is it positive (1) or negative (-1).

Tags classifier

Tab separated input files with sentences | label structure.

See possible variations of labels at labels.csv.

Readability classifier

Tab separated input files with sentences | label structure.

The system participating in this task should compute text reading difficulty in range [1..10].

Proza classifier

Tab separated input files with sentences | label structure.

The system participating in this task should classify proza's genre. See possible variations of labels at labels.csv.

Poems classifier

Tab separated input files with sentences | label structure.

The system participating in this task should classify poem's genre. See possible variations of labels at labels.csv.

Genre classification

Tab separated input files with sentences | label structure.

The system participating in this task should classify movie's genre. See possible variations of labels at genre_numeration.csv.

TREC

Tab separated input files with sentence | label structure.

The system participating in this task should classify what answer type have a question sentence.

See possible variations of labels at paper

Read more about Learning Question Classifiers

Example: Q: What Canadian city has the largest population?, the hope is to classify this question as having answer type city.

SICK-E

Tab separated input files with s1 | s2 | label structure. (s1, s2 – sentences)

The system participating in this task should compute how similar semantically s1 and s2 are, returning a similarity score in range [1..5].

STS

Tab separated input files with from | label | s1 | s2 structure. (s1, s2 – sentences)

The system participating in this task examine the degree of semantic equivalence between two sentences s1 and s2.

from field contains source of data. See STS2012, STS 2013, STS 2014