Treebanks

A Treebank is a corpus of dependency-annotated sentences in one or more files.

DependencyTrees.TreebankType
Treebank

A lazy reader for a file containing annotated dependency parse trees.

Iterating (i.e., using a for-loop) produces one tree at a time. The treebank's read_sentence field is called on the IO stream to read until a sentence boundary (by default two blank lines), and the parse field is called on the resulting string to read a DependencyTree.

source
DependencyTrees.TreebankMethod
Treebank(file)

Read file as a treebank.

The file extension should be one of the following supported formats:

  • ".conllu"
  • ".conllx"
source

Iterating over a treebank reads sentences one at a time:

treebank = Treebank("data/news.conll", conllu)

for tree in treebank
    # ...
end

tree = first(treebank)

# output
┌────────────── 0 ROOT
│           ┌─► 1 Economic
│        ┌─►└── 2 news
└─►┌──┌──└───── 3 had
   │  │     ┌─► 4 little
   │  └─►┌──└── 5 effect
   │  ┌──└────► 6 on
   │  │     ┌─► 7 financial
   │  └────►└── 8 markets
   └──────────► 9 .

CoNLL-U

CoNLL-X

DependencyTrees.conllxFunction
conllx(text)

Read a dependency tree from text (in CoNLL-X format).

For more details on the format, see the CoNLL-X paper [4].

source
conllx(tree::DependencyTree)

Serialize a dependency tree to CoNLL-X format.

source