Tutorial: Using GraphTransliterator

Note

Python code on this page: tutorial.py Jupyter Notebook: tutorial.ipynb

Graph Transliterator is designed to allow you to quickly develop rules for transliterating between languages and scripts. In this tutorial you will use a portion of Graph Transliterators features, including its token matching, class-based matching, and on match rules, using the GraphTransliterator class.

Tutorial Overview

The task for this tutorial will be to design a transliterator between the ITRANS (Indian languages TRANSliteration) encoding for Devanagari (Hindi) and standard Unicode. ITRANS developed as a means to transliterate Indic-language using the latin alphabet and punctuation marks before there were Unicode fonts.

The Devanagari alphabet is an abugida (alphasyllabary), where each “syllable” is a separate symbol. Vowels, except for the default अ (“a”) have a unique symbol that connects to a consonant. At the start of the words, they have a unique shape. Consonants in sequence, without intermediary vowels, change their shape and are joined together. In Unicode, that is accomplished by using the Virama character.

Graph Transliterator works by first converting the input text into a series of tokens. In this tutorial you will define the tokens of ITRANS and necessary token classes that will allow us to generate rules for conversion.

Graph Transliterator allows rule matching by preceding tokens, tokens, and following tokens. It allows token classes to precede or follow any specific tokens. For this task, you will use a preceding token class to identify when to write vowel signs as opposed to full vowel characters.

Graph Transliterator also allows the insertion of strings between matches involving particular token classes. This transliterator will need to insert the virama character between transliteration rules ending with consonants in order to create consonant clusters.

Configuring

Here you will parameterize the Graph Transliterator using its “easy reading” format, which uses YAML. It maps to a dictionary containing up to five keys: tokens, rules, onmatch_rules (optional), whitespace, and metadata (optional).

Token Definitions

Graph Transliterator tokenizes its input before transliterating. The tokens section will map the input tokens to their token classes. The main class you will need is one for consonants, so you can use consonant as the class. Graph Transliterator also requires a dedicated whitespace class, so you can use whitespace.

Graph Transliterator allows the use of Unicode character names in files using \N{UNICODE CHARACTER NAME HERE}} notation. You can enter the Unicode characters using that notation or directly. YAML will also unescape \u####, where #### is the hexadecimal notation for a character.

Here is a subsection of that definition:

tokens:
  k: [consonant]
  kh: [consonant]
  "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": [consonant]
  a: [vowel]
  aa: [vowel]
  A: [vowel]
  ' ': [wb,whitespace]
  "\t": [wb,whitespace]
  .N: [vowel_sign]

Transliteration Rule Definitions

The rule definitions in Graph Transliterator in “easy reading” format are also a dictionary where the rules are the key and the production—what should be outputted by the rule—is the value. For this task, you just need to match individual tokens and also any preceding token classes:

rules:
  b: \N{DEVANAGARI LETTER B}
  <consonant> A: \N{DEVANAGARI LETTER AA}
  A: \N{DEVANAGARI LETTER AA}

These rules will replace “b” with the devanagari equivalent (ब), and “A” with with a full letter अा if it is at a start of a word (following a token of class “wb”, for wordbreak) or otherwise with a vowel sign ा if it is not, presumably following a consonant. Graph Transliterator automatically sorts rules by how many tokens are required for them to be matched, and it picks the one with that requires the most tokens. So the “A” following a consonant would be matched before an “A” after any other character. Graph Transliterator will also check for ambiguity in these rules, unless check_ambiguity is set to False.

While not necessary for this tutorial, Graph Transliterator can also require matching of specific previous or following tokens and also classes preceding and following those tokens, e.g.

k a r (U M g A <wb>): k,a,r_followed_by_U,M,g,A_and_a_wordbreak
s o (n a): s,o_followed_by_n,a
(<wb> p y) aa r: aa,r_preceded_by_a_wordbreak,p,and_y

Here is a subsection of the rules:

rules:
  "\t": "\t"
  ' ': ' '
  ',': ','
  .D: "\N{DEVANAGARI LETTER DDDHA}"
  <consonant> A: "\N{DEVANAGARI VOWEL SIGN AA}"
  "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": "\N{DEVANAGARI LETTER NGA}"

On Match Rule Definitions

You will want to insert the Virama character between consonants so that they will join together in Unicode output. To do so, add an “onmatch_rules” section:

onmatch_rules:
  - <consonant> + <consonant>: "\N{DEVANAGARI SIGN VIRAMA}"

Unlike the tokens and rules, the onmatch rules are ordered. The first rule matched is applied. In YAML, they consist of a list of dictionaries each with a single key and value. The value is the production string to be inserted between matches. The ` + ` represents that space. So in the input string kyA, which would tokenize as [' ','k','y','A',' '], a virama character would be inserted when y is matched, as it is of class “consonant” and the previously matched transliteration rule for “k” ends with a “consonant”.

Whitespace Definitions

The final required setup parameter is for whitespace. These include the default whitespace token, which is temporarily added before and after the input tokens; the consolidate option to replace sequential whitespace characters with a single default whitespace character; and the token_class of whitespace tokens:

whitespace:
  consolidate: false
  default: ' '
  token_class: whitespace

Metadata Definitions

Graph Transliterator also allows metadata to be added to its settings:

metadata:
  title: "ITRANS Devanagari to Unicode"
  version: "0.1.0"

Creating a Transliterator

Now that the settings are ready, you can create a Graph Transliterator. Since you have been using the “easy reading” format, you can use GraphTransliterator.from_yaml_file() to read from a specific file or the GraphTransliterator.from_yaml() to read from a YAML string. You read from the loaded contents of an “easy reading” YAML file using GraphTransliterator.from_dict(). Graph Transliterator will convert those settings into basic Python types and then return a GraphTransliterator:

  1from graphtransliterator import GraphTransliterator
  2easyreading_yaml = """
  3tokens:
  4  k: [consonant]
  5  kh: [consonant]
  6  g: [consonant]
  7  gh: [consonant]
  8  ~N: [consonant]
  9  "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": [consonant]
 10  ch: [consonant]
 11  chh: [consonant]
 12  Ch: [consonant]
 13  j: [consonant]
 14  jh: [consonant]
 15  ~n: [consonant]
 16  T: [consonant]
 17  Th: [consonant]
 18  D: [consonant]
 19  Dh: [consonant]
 20  N: [consonant]
 21  t: [consonant]
 22  th: [consonant]
 23  d: [consonant]
 24  dh: [consonant]
 25  n: [consonant]
 26  ^n: [consonant]
 27  p: [consonant]
 28  ph: [consonant]
 29  b: [consonant]
 30  bh: [consonant]
 31  m: [consonant]
 32  y: [consonant]
 33  r: [consonant]
 34  R: [consonant]
 35  l: [consonant]
 36  ld: [consonant]
 37  L: [consonant]
 38  zh: [consonant]
 39  v: [consonant]
 40  sh: [consonant]
 41  Sh: [consonant]
 42  s: [consonant]
 43  h: [consonant]
 44  x: [consonant]
 45  kSh: [consonant]
 46  GY: [consonant]
 47  j~n: [consonant]
 48  dny: [consonant]
 49  q: [consonant]
 50  K: [consonant]
 51  G: [consonant]
 52  J: [consonant]
 53  z: [consonant]
 54  .D: [consonant]
 55  .Dh: [consonant]
 56  f: [consonant]
 57  Y: [consonant]
 58  a: [vowel]
 59  aa: [vowel]
 60  A: [vowel]
 61  i: [vowel]
 62  ii: [vowel]
 63  I: [vowel]
 64  ee: [vowel]
 65  u: [vowel]
 66  uu: [vowel]
 67  U: [vowel]
 68  RRi: [vowel]
 69  R^i: [vowel]
 70  LLi: [vowel]
 71  L^i: [vowel]
 72  RRI: [vowel]
 73  LLI: [vowel]
 74  a.c: [vowel]
 75  ^e: [vowel]
 76  e: [vowel]
 77  ai: [vowel]
 78  A.c: [vowel]
 79  ^o: [vowel]
 80  o: [vowel]
 81  au: [vowel]
 82  ' ': [wb,whitespace]
 83  "\t": [wb,whitespace]
 84  ',': [wb]
 85  .h: [wb]
 86  H: [wb]
 87  OM: [wb]
 88  AUM: [wb]
 89  '|': [wb]
 90  '||': [wb]
 91  '0': [wb]
 92  '1': [wb]
 93  '2': [wb]
 94  '3': [wb]
 95  '4': [wb]
 96  '5': [wb]
 97  '6': [wb]
 98  '7': [wb]
 99  '8': [wb]
100  '9': [wb]
101  Rs.: [wb]
102  ~Rs.: [wb]
103  .a: [wb]
104  a.e: [vowel_sign]
105  .N: [vowel_sign]
106  .n: [vowel_sign]
107  M: [vowel_sign]
108  .m: [vowel_sign]
109rules:
110  "\t": "\t"
111  ' ': ' '
112  ',': ','
113  .D: "\N{DEVANAGARI LETTER DDDHA}"
114  .Dh: "\N{DEVANAGARI LETTER RHA}"
115  .N: "\N{DEVANAGARI SIGN CANDRABINDU}"
116  .a: "\N{DEVANAGARI SIGN AVAGRAHA}"
117  .h: "\N{DEVANAGARI SIGN VIRAMA}\N{ZERO WIDTH NON-JOINER}"
118  .m: "\N{DEVANAGARI SIGN ANUSVARA}"
119  .n: "\N{DEVANAGARI SIGN ANUSVARA}"
120  '0': "\N{DEVANAGARI DIGIT ZERO}"
121  '1': "\N{DEVANAGARI DIGIT ONE}"
122  '2': "\N{DEVANAGARI DIGIT TWO}"
123  '3': "\N{DEVANAGARI DIGIT THREE}"
124  '4': "\N{DEVANAGARI DIGIT FOUR}"
125  '5': "\N{DEVANAGARI DIGIT FIVE}"
126  '6': "\N{DEVANAGARI DIGIT SIX}"
127  '7': "\N{DEVANAGARI DIGIT SEVEN}"
128  '8': "\N{DEVANAGARI DIGIT EIGHT}"
129  '9': "\N{DEVANAGARI DIGIT NINE}"
130  <consonant> A: "\N{DEVANAGARI VOWEL SIGN AA}"
131  <consonant> A.c: "\N{DEVANAGARI VOWEL SIGN CANDRA O}"
132  <consonant> I: "\N{DEVANAGARI VOWEL SIGN II}"
133  <consonant> LLI: "\N{DEVANAGARI VOWEL SIGN VOCALIC LL}"
134  <consonant> LLi: "\N{DEVANAGARI VOWEL SIGN VOCALIC L}"
135  <consonant> L^i: "\N{DEVANAGARI VOWEL SIGN VOCALIC L}"
136  <consonant> RRI: "\N{DEVANAGARI VOWEL SIGN VOCALIC RR}"
137  <consonant> RRi: "\N{DEVANAGARI VOWEL SIGN VOCALIC R}"
138  <consonant> R^i: "\N{DEVANAGARI VOWEL SIGN VOCALIC R}"
139  <consonant> U: "\N{DEVANAGARI VOWEL SIGN UU}"
140  <consonant> ^e: "\N{DEVANAGARI VOWEL SIGN SHORT E}"
141  <consonant> ^o: "\N{DEVANAGARI VOWEL SIGN SHORT O}"
142  <consonant> a: ''
143  <consonant> a.c: "\N{DEVANAGARI VOWEL SIGN CANDRA E}"
144  <consonant> aa: "\N{DEVANAGARI VOWEL SIGN AA}"
145  <consonant> ai: "\N{DEVANAGARI VOWEL SIGN AI}"
146  <consonant> au: "\N{DEVANAGARI VOWEL SIGN AU}"
147  <consonant> e: "\N{DEVANAGARI VOWEL SIGN E}"
148  <consonant> ee: "\N{DEVANAGARI VOWEL SIGN II}"
149  <consonant> i: "\N{DEVANAGARI VOWEL SIGN I}"
150  <consonant> ii: "\N{DEVANAGARI VOWEL SIGN II}"
151  <consonant> o: "\N{DEVANAGARI VOWEL SIGN O}"
152  <consonant> u: "\N{DEVANAGARI VOWEL SIGN U}"
153  <consonant> uu: "\N{DEVANAGARI VOWEL SIGN UU}"
154  A: "\N{DEVANAGARI LETTER AA}"
155  A.c: "\N{DEVANAGARI LETTER CANDRA O}"
156  AUM: "\N{DEVANAGARI OM}"
157  Ch: "\N{DEVANAGARI LETTER CHA}"
158  D: "\N{DEVANAGARI LETTER DDA}"
159  Dh: "\N{DEVANAGARI LETTER DDHA}"
160  G: "\N{DEVANAGARI LETTER GHHA}"
161  GY: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
162  H: "\N{DEVANAGARI SIGN VISARGA}"
163  I: "\N{DEVANAGARI LETTER II}"
164  J: "\N{DEVANAGARI LETTER ZA}"
165  K: "\N{DEVANAGARI LETTER KHHA}"
166  L: "\N{DEVANAGARI LETTER LLA}"
167  LLI: "\N{DEVANAGARI LETTER VOCALIC LL}"
168  LLi: "\N{DEVANAGARI LETTER VOCALIC L}"
169  L^i: "\N{DEVANAGARI LETTER VOCALIC L}"
170  M: "\N{DEVANAGARI SIGN ANUSVARA}"
171  N: "\N{DEVANAGARI LETTER NNA}"
172  OM: "\N{DEVANAGARI OM}"
173  R: "\N{DEVANAGARI LETTER RRA}"
174  RRI: "\N{DEVANAGARI LETTER VOCALIC RR}"
175  RRi: "\N{DEVANAGARI LETTER VOCALIC R}"
176  R^i: "\N{DEVANAGARI LETTER VOCALIC R}"
177  Rs.: "\N{INDIAN RUPEE SIGN}"
178  Sh: "\N{DEVANAGARI LETTER SSA}"
179  T: "\N{DEVANAGARI LETTER TTA}"
180  Th: "\N{DEVANAGARI LETTER TTHA}"
181  U: "\N{DEVANAGARI LETTER UU}"
182  Y: "\N{DEVANAGARI LETTER YYA}"
183  ^e: "\N{DEVANAGARI LETTER SHORT E}"
184  ^n: "\N{DEVANAGARI LETTER NNNA}"
185  ^o: "\N{DEVANAGARI LETTER SHORT O}"
186  a: "\N{DEVANAGARI LETTER A}"
187  a.c: "\N{DEVANAGARI LETTER CANDRA E}"
188  a.e: "\N{DEVANAGARI LETTER CANDRA A}"
189  aa: "\N{DEVANAGARI LETTER AA}"
190  ai: "\N{DEVANAGARI LETTER AI}"
191  au: "\N{DEVANAGARI LETTER AU}"
192  b: "\N{DEVANAGARI LETTER BA}"
193  bh: "\N{DEVANAGARI LETTER BHA}"
194  ch: "\N{DEVANAGARI LETTER CA}"
195  chh: "\N{DEVANAGARI LETTER CHA}"
196  d: "\N{DEVANAGARI LETTER DA}"
197  dh: "\N{DEVANAGARI LETTER DHA}"
198  dny: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
199  e: "\N{DEVANAGARI LETTER E}"
200  ee: "\N{DEVANAGARI LETTER II}"
201  f: "\N{DEVANAGARI LETTER FA}"
202  g: "\N{DEVANAGARI LETTER GA}"
203  gh: "\N{DEVANAGARI LETTER GHA}"
204  h: "\N{DEVANAGARI LETTER HA}"
205  i: "\N{DEVANAGARI LETTER I}"
206  ii: "\N{DEVANAGARI LETTER II}"
207  j: "\N{DEVANAGARI LETTER JA}"
208  jh: "\N{DEVANAGARI LETTER JHA}"
209  j~n: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
210  k: "\N{DEVANAGARI LETTER KA}"
211  kSh: "\N{DEVANAGARI LETTER KA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER SSA}"
212  kh: "\N{DEVANAGARI LETTER KHA}"
213  l: "\N{DEVANAGARI LETTER LA}"
214  ld: "\N{DEVANAGARI LETTER LLA}"
215  m: "\N{DEVANAGARI LETTER MA}"
216  n: "\N{DEVANAGARI LETTER NA}"
217  o: "\N{DEVANAGARI LETTER O}"
218  p: "\N{DEVANAGARI LETTER PA}"
219  ph: "\N{DEVANAGARI LETTER PHA}"
220  q: "\N{DEVANAGARI LETTER QA}"
221  r: "\N{DEVANAGARI LETTER RA}"
222  s: "\N{DEVANAGARI LETTER SA}"
223  sh: "\N{DEVANAGARI LETTER SHA}"
224  t: "\N{DEVANAGARI LETTER TA}"
225  th: "\N{DEVANAGARI LETTER THA}"
226  u: "\N{DEVANAGARI LETTER U}"
227  uu: "\N{DEVANAGARI LETTER UU}"
228  v: "\N{DEVANAGARI LETTER VA}"
229  x: "\N{DEVANAGARI LETTER KA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER SSA}"
230  y: "\N{DEVANAGARI LETTER YA}"
231  z: "\N{DEVANAGARI LETTER ZA}"
232  zh: "\N{DEVANAGARI LETTER LLLA}"
233  '|': "\N{DEVANAGARI DANDA}"
234  '||': "\N{DEVANAGARI DOUBLE DANDA}"
235  ~N: "\N{DEVANAGARI LETTER NGA}"
236  ~Rs.: "\N{INDIAN RUPEE SIGN}"
237  ~n: "\N{DEVANAGARI LETTER NYA}"
238  "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": "\N{DEVANAGARI LETTER NGA}"
239onmatch_rules:
240- <consonant> + <consonant>: "\N{DEVANAGARI SIGN VIRAMA}"
241whitespace:
242  consolidate: false
243  default: ' '
244  token_class: whitespace
245metadata:
246  title: ITRANS to Unicode
247  version: 0.1.0
248"""
249gt = GraphTransliterator.from_yaml(easyreading_yaml)

Transliterating

With the transliterator created, you can now transliterate using GraphTransliterator.transliterate():

250gt.transliterate("aaj mausam ba.Daa beiimaan hai, aaj mausam")
'आज मौसम बड़ा बेईमान है, आज मौसम'

Other Information

Graph Transliterator has a few other tools built in that are for more specialized applications.

If you want to receive the details of the most recent transliteration, access GraphTransliterator.last_matched_rules to get this list of rules matched:

251gt.last_matched_rules
[TransliterationRule(production='आ', prev_classes=None, prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ज', prev_classes=None, prev_tokens=None, tokens=['j'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ौ', prev_classes=['consonant'], prev_tokens=None, tokens=['au'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='स', prev_classes=None, prev_tokens=None, tokens=['s'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ब', prev_classes=None, prev_tokens=None, tokens=['b'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='ड़', prev_classes=None, prev_tokens=None, tokens=['.D'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ा', prev_classes=['consonant'], prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ब', prev_classes=None, prev_tokens=None, tokens=['b'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='े', prev_classes=['consonant'], prev_tokens=None, tokens=['e'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='ई', prev_classes=None, prev_tokens=None, tokens=['ii'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ा', prev_classes=['consonant'], prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='न', prev_classes=None, prev_tokens=None, tokens=['n'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ह', prev_classes=None, prev_tokens=None, tokens=['h'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ै', prev_classes=['consonant'], prev_tokens=None, tokens=['ai'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production=',', prev_classes=None, prev_tokens=None, tokens=[','], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='आ', prev_classes=None, prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ज', prev_classes=None, prev_tokens=None, tokens=['j'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='ौ', prev_classes=['consonant'], prev_tokens=None, tokens=['au'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='स', prev_classes=None, prev_tokens=None, tokens=['s'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
 TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]

Or if you just want to know the tokens matched by each rule, check GraphTransliterator.last_matched_rule_tokens:

252gt.last_matched_rule_tokens
[['aa'],
 ['j'],
 [' '],
 ['m'],
 ['au'],
 ['s'],
 ['a'],
 ['m'],
 [' '],
 ['b'],
 ['a'],
 ['.D'],
 ['aa'],
 [' '],
 ['b'],
 ['e'],
 ['ii'],
 ['m'],
 ['aa'],
 ['n'],
 [' '],
 ['h'],
 ['ai'],
 [','],
 [' '],
 ['aa'],
 ['j'],
 [' '],
 ['m'],
 ['au'],
 ['s'],
 ['a'],
 ['m']]

You can access the directed tree used by GraphTransliterator using GraphTransliterator.graph:

253gt.graph
<graphtransliterator.graphs.DirectedGraph at 0x7f30141f5340>