ConstituencyTrees.jl
ConstituencyTrees.jl is a Julia package for working with constituency trees of natural language sentences (also called parse trees, syntax trees).
Trees
ConstituencyTree{T}Constituency parse tree of a natural language sentence.
ConstituencyTrees.@tree_str — Macro.@tree_str(str)String macro for reading a constituency parse tree from bracketed format.
tree"(S (NP (DT the) (N cat)) (VP (V ate)))"ConstituencyTrees.Brackets.read_tree — Function.read_tree(str)
read_tree(reader, string)Read a constituency parse tree from bracketed format.
ConstituencyTrees.pprint — Function.pprint([io::IO,] tree; indent=2, multiline=true)Print a constituency parse tree in bracketed format.
Arguments
io:IOstream to write the tree totree: the tree to search
Keywords
multiline=true: whether to include newlines in string representationindent=2: how many characters of whitespace to use in indentation (ignored ifmultilineisfalse)
Returns
- bracketed
Stringrepresentation of the parse tree
ConstituencyTrees.POS — Function.POS(tree)Iterator for part-of-speech tagged words.
julia> POS(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
("DT", "the")
("N", "cat")
("V", "ate")Arguments
tree: the tree to search
Returns
POSIterator- a lazy iterator over (POS, token) pairs
ConstituencyTrees.Words — Function.Words(tree)Iterator for words in a sentence.
julia> Words(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
"the"
"cat"
"ate"Arguments
tree: the tree to search
Returns
WordsIterator: an iterator over tokens
ConstituencyTrees.productions — Function.productions(tree; search=PreOrderDFS, nonterminal=identity, terminal=identity)Return a vector of (lhs, rhs) productions from a constituency parse tree.
Arguments
tree: the tree to search
Keywords
nonterminal: a function to call on nonterminal symbols.terminal: a function to call on terminal symbols.
Returns
- a
Vectorof (lhs, rhs) tuples
Treebanks
ConstituencyTrees.Treebank — Type.Treebank(corpus)A Treebank is an iterator over a corpus of trees in bracketed format.
Transformations
ConstituencyTrees.chomsky_normal_form — Function.chomsky_normal_form(tree, factor=RightFactored(), labelf)Convert a tree into chomsky normal form.
Arguments
tree: the constituency tree to convertfactor=RightFactored(): can beLeftFactored()orRightFactored()labelf: function (called likelabelf(tree, children))
Returns
- a tree of the same type as its argument
ConstituencyTrees.collapse_unary — Function.collapse_unary(tree, labelf=unary_label; collapse_pos=false, collapse_root=false)Transform a parse tree by collapsing all single-child nodes.
Arguments
tree: parse tree to transformlabelf: function called to create new label representing the collapsed nodes.
Keywords
collapse_pos=false: whether to collapse(POS word)nodescollapse_root=false: whether to collapse the top-level root node
Returns
- a transformed tree of the same type as its argument