Treebanks
A Treebank is a corpus of dependency-annotated sentences in one or more files.
DependencyTrees.Treebank — Type
TreebankA lazy reader for a file containing annotated dependency parse trees.
Iterating (i.e., using a for-loop) produces one tree at a time. The treebank's read_sentence field is called on the IO stream to read until a sentence boundary (by default two blank lines), and the parse field is called on the resulting string to read a DependencyTree.
DependencyTrees.Treebank — Method
Treebank(file)Read file as a treebank.
The file extension should be one of the following supported formats:
- ".conllu"
- ".conllx"
Iterating over a treebank reads sentences one at a time:
treebank = Treebank("data/news.conll", conllu)
for tree in treebank
# ...
end
tree = first(treebank)
# output
┌────────────── 0 ROOT
│ ┌─► 1 Economic
│ ┌─►└── 2 news
└─►┌──┌──└───── 3 had
│ │ ┌─► 4 little
│ └─►┌──└── 5 effect
│ ┌──└────► 6 on
│ │ ┌─► 7 financial
│ └────►└── 8 markets
└──────────► 9 .CoNLL-U
DependencyTrees.conllu — Function
conllu(text)Read an annotated sentence (DependencyTree) from CoNLL-U format.
Further Reading
conllu(tree::DependencyTree)Serialize a dependency tree to CoNLL-U format.
CoNLL-X
DependencyTrees.conllx — Function
conllx(text)Read a dependency tree from text (in CoNLL-X format).
For more details on the format, see the CoNLL-X paper [4].
conllx(tree::DependencyTree)Serialize a dependency tree to CoNLL-X format.