7

I am working on a problem to represent knowledge extracted from a paragraph and rank it to produce abstractive summaries. I have implemented dependency parsing using Stanford NLP, which gives dot format graph as an output.

The dependency-parsed output of the following two sentences are as follows.

Sentence 1 - John is a computer scientist

Dot format -

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 5 [label="root"]
1 [label="1 (John)"]
2 [label="2 (is)"]
3 [label="3 (a)"]
4 [label="4 (computer)"]
5 [label="5 (scientist)"]
5 -> 2 [label="cop"]
5 -> 4 [label="compound"]
5 -> 3 [label="det"]
5 -> 1 [label="nsubj"]
}

Graph - enter image description here

Sentence 2 - John has an elder sister named Mary.

Dot Format -

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 2 [label="root"]
1 [label="1 (John)"]
2 [label="2 (has)"]
2 -> 5 [label="dobj"]
2 -> 1 [label="nsubj"]
3 [label="3 (an)"]
4 [label="4 (elder)"]
5 [label="5 (sister)"]
5 -> 6 [label="acl"]
5 -> 3 [label="det"]
5 -> 4 [label="amod"]
6 [label="6 (named)"]
6 -> 7 [label="dobj"]
7 [label="7 (Mary)"]
}

Graph - enter image description here

Now I want to merge this graph at a common node, John. I am currently using graphviz to import dot graph like this,

from graphviz import Source
s = Source(dotGraph, filename=filepath, format="png")

But there seems to be no functionality to merge graphs in Graphviz, or Networkx. So how can this be done?

Adam Bittlingmayer
  • 7,664
  • 25
  • 40
Riken Shah
  • 247
  • 1
  • 6
  • 1
    Commissioner Gordon, turn on the Merge Signal. This is a case for the Biosyntax Squad. – jlawler Feb 04 '17 at 15:51
  • What would your expected output look like? How exactly does that help you achieve your stated purpose? – Lefty G Balogh Feb 05 '17 at 13:38
  • The goal is to have all the information related to a particular entity, in the same graph. Further, it can be ranked and used to generate summaries. – Riken Shah Feb 06 '17 at 06:21
  • I am a theoretical DG guy. I cannot comment on the computational side of what you are doing. I would, though, like to point out that the Stanford annotation scheme is controversial. For instance, your first dependency tree shows scientist as the root of the sentence. From a linguistic point of view, a stronger case can be made for viewing the finite verb is as the root. – Tim Osborne Apr 12 '20 at 15:14

1 Answers1

1

Since you are using CoreNLP to generate dependency trees, a very nice way to tackle your problem would be to use the Tsurgeon library used to manipulate parse trees.

Tsurgeon is a (parse) tree transformation language. (Also check Tregex and SemGrex on the same link.)

Caxton
  • 111
  • 1