Home

ConstituencyTrees.jl

ConstituencyTrees.jl is a Julia package for working with constituency trees of natural language sentences (also called parse trees, syntax trees).

Trees

ConstituencyTree{T}

Constituency parse tree of a natural language sentence.

source
@tree_str(str)

String macro for reading a constituency parse tree from bracketed format.

tree"(S (NP (DT the) (N cat)) (VP (V ate)))"
source
read_tree(str)
read_tree(reader, string)

Read a constituency parse tree from bracketed format.

source
pprint([io::IO,] tree; indent=2, multiline=true)

Print a constituency parse tree in bracketed format.

Arguments

  • io: IO stream to write the tree to
  • tree: the tree to search

Keywords

  • multiline=true: whether to include newlines in string representation
  • indent=2: how many characters of whitespace to use in indentation (ignored if multiline is false)

Returns

  • bracketed String representation of the parse tree
source
ConstituencyTrees.POSFunction.
POS(tree)

Iterator for part-of-speech tagged words.

julia> POS(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
 ("DT", "the")
 ("N", "cat")
 ("V", "ate")

Arguments

  • tree: the tree to search

Returns

  • POSIterator- a lazy iterator over (POS, token) pairs
source
Words(tree)

Iterator for words in a sentence.

julia> Words(tree"(S (NP (DT the) (N cat)) (VP (V ate)))") |> collect
3-element Array{Any,1}:
 "the"
 "cat"
 "ate"

Arguments

  • tree: the tree to search

Returns

  • WordsIterator: an iterator over tokens
source
productions(tree; search=PreOrderDFS, nonterminal=identity, terminal=identity)

Return a vector of (lhs, rhs) productions from a constituency parse tree.

Arguments

  • tree: the tree to search

Keywords

  • nonterminal: a function to call on nonterminal symbols.
  • terminal: a function to call on terminal symbols.

Returns

  • a Vector of (lhs, rhs) tuples
source

Treebanks

Treebank(corpus)

A Treebank is an iterator over a corpus of trees in bracketed format.

source

Transformations

chomsky_normal_form(tree, factor=RightFactored(), labelf)

Convert a tree into chomsky normal form.

Arguments

  • tree: the constituency tree to convert
  • factor=RightFactored(): can be LeftFactored() or RightFactored()
  • labelf: function (called like labelf(tree, children))

Returns

  • a tree of the same type as its argument
source
collapse_unary(tree, labelf=unary_label; collapse_pos=false, collapse_root=false)

Transform a parse tree by collapsing all single-child nodes.

Arguments

  • tree: parse tree to transform
  • labelf: function called to create new label representing the collapsed nodes.

Keywords

  • collapse_pos=false: whether to collapse (POS word) nodes
  • collapse_root=false: whether to collapse the top-level root node

Returns

  • a transformed tree of the same type as its argument
source