ConstituencyTrees.jl
ConstituencyTrees.jl is a Julia package for working with constituency trees of natural language sentences (also called parse trees, syntax trees).
Trees
ConstituencyTree{T}
Constituency parse tree of a natural language sentence.
ConstituencyTrees.@tree_str
— Macro.@tree_str(str)
String macro for reading a constituency parse tree from bracketed format.
tree"(S (NP (DT the) (N cat)) (VP (V ate)))"
ConstituencyTrees.Brackets.read_tree
— Function.read_tree(str)
read_tree(reader, string)
Read a constituency parse tree from bracketed format.
ConstituencyTrees.pprint
— Function.pprint([io::IO,] tree; indent=2, multiline=true)
Print a constituency parse tree in bracketed format.
Arguments
io
:IO
stream to write the tree totree
: the tree to search
Keywords
multiline=true
: whether to include newlines in string representationindent=2
: how many characters of whitespace to use in indentation (ignored ifmultiline
isfalse
)
Returns
- bracketed
String
representation of the parse tree
ConstituencyTrees.POS
— Function.POS(tree)
Iterator for part-of-speech tagged words.
julia> POS(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
("DT", "the")
("N", "cat")
("V", "ate")
Arguments
tree
: the tree to search
Returns
POSIterator
- a lazy iterator over (POS, token) pairs
ConstituencyTrees.Words
— Function.Words(tree)
Iterator for words in a sentence.
julia> Words(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
"the"
"cat"
"ate"
Arguments
tree
: the tree to search
Returns
WordsIterator
: an iterator over tokens
ConstituencyTrees.productions
— Function.productions(tree; search=PreOrderDFS, nonterminal=identity, terminal=identity)
Return a vector of (lhs, rhs) productions from a constituency parse tree.
Arguments
tree
: the tree to search
Keywords
nonterminal
: a function to call on nonterminal symbols.terminal
: a function to call on terminal symbols.
Returns
- a
Vector
of (lhs, rhs) tuples
Treebanks
ConstituencyTrees.Treebank
— Type.Treebank(corpus)
A Treebank is an iterator over a corpus of trees in bracketed format.
Transformations
ConstituencyTrees.chomsky_normal_form
— Function.chomsky_normal_form(tree, factor=RightFactored(), labelf)
Convert a tree into chomsky normal form.
Arguments
tree
: the constituency tree to convertfactor=RightFactored()
: can beLeftFactored()
orRightFactored()
labelf
: function (called likelabelf(tree, children)
)
Returns
- a tree of the same type as its argument
ConstituencyTrees.collapse_unary
— Function.collapse_unary(tree, labelf=unary_label; collapse_pos=false, collapse_root=false)
Transform a parse tree by collapsing all single-child nodes.
Arguments
tree
: parse tree to transformlabelf
: function called to create new label representing the collapsed nodes.
Keywords
collapse_pos=false
: whether to collapse(POS word)
nodescollapse_root=false
: whether to collapse the top-level root node
Returns
- a transformed tree of the same type as its argument