Tutorial: Using GraphTransliterator
Note
Python code on this page: tutorial.py
Jupyter Notebook: tutorial.ipynb
Graph Transliterator is designed to allow you to quickly develop rules for
transliterating between languages and scripts. In this tutorial you will use a
portion of Graph Transliterators features, including its token matching,
class-based matching, and on match rules, using the GraphTransliterator
class.
Tutorial Overview
The task for this tutorial will be to design a transliterator between the ITRANS (Indian languages TRANSliteration) encoding for Devanagari (Hindi) and standard Unicode. ITRANS developed as a means to transliterate Indic-language using the latin alphabet and punctuation marks before there were Unicode fonts.
The Devanagari alphabet is an abugida (alphasyllabary), where each “syllable” is a separate symbol. Vowels, except for the default अ (“a”) have a unique symbol that connects to a consonant. At the start of the words, they have a unique shape. Consonants in sequence, without intermediary vowels, change their shape and are joined together. In Unicode, that is accomplished by using the Virama character.
Graph Transliterator works by first converting the input text into a series of tokens. In this tutorial you will define the tokens of ITRANS and necessary token classes that will allow us to generate rules for conversion.
Graph Transliterator allows rule matching by preceding tokens, tokens, and following tokens. It allows token classes to precede or follow any specific tokens. For this task, you will use a preceding token class to identify when to write vowel signs as opposed to full vowel characters.
Graph Transliterator also allows the insertion of strings between matches involving particular token classes. This transliterator will need to insert the virama character between transliteration rules ending with consonants in order to create consonant clusters.
Configuring
Here you will parameterize the Graph Transliterator using its “easy reading”
format, which uses YAML. It maps to a dictionary
containing up to five keys: tokens
, rules
, onmatch_rules
(optional), whitespace
, and metadata
(optional).
Token Definitions
Graph Transliterator tokenizes its input before transliterating. The tokens
section will map the input tokens to their token classes. The main class you
will need is one for consonants, so you can use consonant
as the class.
Graph Transliterator also requires a dedicated whitespace class, so you can use
whitespace
.
Graph Transliterator allows the use of Unicode character names in files using \N{UNICODE CHARACTER NAME HERE}} notation. You can enter the Unicode characters using that notation or directly. YAML will also unescape \u####, where #### is the hexadecimal notation for a character.
Here is a subsection of that definition:
tokens:
k: [consonant]
kh: [consonant]
"\N{LATIN SMALL LETTER N WITH DOT ABOVE}": [consonant]
a: [vowel]
aa: [vowel]
A: [vowel]
' ': [wb,whitespace]
"\t": [wb,whitespace]
.N: [vowel_sign]
Transliteration Rule Definitions
The rule definitions in Graph Transliterator in “easy reading” format are also a dictionary where the rules are the key and the production—what should be outputted by the rule—is the value. For this task, you just need to match individual tokens and also any preceding token classes:
rules:
b: \N{DEVANAGARI LETTER B}
<consonant> A: \N{DEVANAGARI LETTER AA}
A: \N{DEVANAGARI LETTER AA}
These rules will replace “b” with the devanagari equivalent (ब), and “A” with
with a full letter अा if it is at a start of a word (following a token of class
“wb”, for wordbreak) or otherwise with a vowel sign ा if it is not, presumably
following a consonant. Graph Transliterator automatically sorts rules by how
many tokens are required for them to be matched, and it picks the one with
that requires the most tokens. So the “A” following a consonant would be
matched before an “A” after any other character. Graph Transliterator will also
check for ambiguity in these rules, unless check_ambiguity
is set to False.
While not necessary for this tutorial, Graph Transliterator can also require matching of specific previous or following tokens and also classes preceding and following those tokens, e.g.
k a r (U M g A <wb>): k,a,r_followed_by_U,M,g,A_and_a_wordbreak
s o (n a): s,o_followed_by_n,a
(<wb> p y) aa r: aa,r_preceded_by_a_wordbreak,p,and_y
Here is a subsection of the rules:
rules:
"\t": "\t"
' ': ' '
',': ','
.D: "\N{DEVANAGARI LETTER DDDHA}"
<consonant> A: "\N{DEVANAGARI VOWEL SIGN AA}"
"\N{LATIN SMALL LETTER N WITH DOT ABOVE}": "\N{DEVANAGARI LETTER NGA}"
On Match Rule Definitions
You will want to insert the Virama character between consonants so that they will join together in Unicode output. To do so, add an “onmatch_rules” section:
onmatch_rules:
- <consonant> + <consonant>: "\N{DEVANAGARI SIGN VIRAMA}"
Unlike the tokens and rules, the onmatch rules are ordered. The first rule
matched is applied. In YAML, they consist of a list of dictionaries each with a
single key and value. The value is the production string to be inserted between
matches. The ` + ` represents that space. So in the input string kyA, which
would tokenize as [' ','k','y','A',' ']
, a virama character would be
inserted when y is matched, as it is of class “consonant” and the previously
matched transliteration rule for “k” ends with a “consonant”.
Whitespace Definitions
The final required setup parameter is for whitespace. These include the
default
whitespace token, which is temporarily added before and after the
input tokens; the consolidate
option to replace sequential whitespace
characters with a single default whitespace character; and the token_class
of whitespace tokens:
whitespace:
consolidate: false
default: ' '
token_class: whitespace
Metadata Definitions
Graph Transliterator also allows metadata to be added to its settings:
metadata:
title: "ITRANS Devanagari to Unicode"
version: "0.1.0"
Creating a Transliterator
Now that the settings are ready, you can create a Graph Transliterator.
Since you have been using the “easy reading” format, you
can use GraphTransliterator.from_yaml_file()
to read from a
specific file or the GraphTransliterator.from_yaml()
to read from a
YAML string. You read from the loaded contents of an “easy reading”
YAML file using GraphTransliterator.from_dict()
. Graph Transliterator
will convert those settings into basic Python types and then return a
GraphTransliterator
:
1from graphtransliterator import GraphTransliterator
2easyreading_yaml = """
3tokens:
4 k: [consonant]
5 kh: [consonant]
6 g: [consonant]
7 gh: [consonant]
8 ~N: [consonant]
9 "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": [consonant]
10 ch: [consonant]
11 chh: [consonant]
12 Ch: [consonant]
13 j: [consonant]
14 jh: [consonant]
15 ~n: [consonant]
16 T: [consonant]
17 Th: [consonant]
18 D: [consonant]
19 Dh: [consonant]
20 N: [consonant]
21 t: [consonant]
22 th: [consonant]
23 d: [consonant]
24 dh: [consonant]
25 n: [consonant]
26 ^n: [consonant]
27 p: [consonant]
28 ph: [consonant]
29 b: [consonant]
30 bh: [consonant]
31 m: [consonant]
32 y: [consonant]
33 r: [consonant]
34 R: [consonant]
35 l: [consonant]
36 ld: [consonant]
37 L: [consonant]
38 zh: [consonant]
39 v: [consonant]
40 sh: [consonant]
41 Sh: [consonant]
42 s: [consonant]
43 h: [consonant]
44 x: [consonant]
45 kSh: [consonant]
46 GY: [consonant]
47 j~n: [consonant]
48 dny: [consonant]
49 q: [consonant]
50 K: [consonant]
51 G: [consonant]
52 J: [consonant]
53 z: [consonant]
54 .D: [consonant]
55 .Dh: [consonant]
56 f: [consonant]
57 Y: [consonant]
58 a: [vowel]
59 aa: [vowel]
60 A: [vowel]
61 i: [vowel]
62 ii: [vowel]
63 I: [vowel]
64 ee: [vowel]
65 u: [vowel]
66 uu: [vowel]
67 U: [vowel]
68 RRi: [vowel]
69 R^i: [vowel]
70 LLi: [vowel]
71 L^i: [vowel]
72 RRI: [vowel]
73 LLI: [vowel]
74 a.c: [vowel]
75 ^e: [vowel]
76 e: [vowel]
77 ai: [vowel]
78 A.c: [vowel]
79 ^o: [vowel]
80 o: [vowel]
81 au: [vowel]
82 ' ': [wb,whitespace]
83 "\t": [wb,whitespace]
84 ',': [wb]
85 .h: [wb]
86 H: [wb]
87 OM: [wb]
88 AUM: [wb]
89 '|': [wb]
90 '||': [wb]
91 '0': [wb]
92 '1': [wb]
93 '2': [wb]
94 '3': [wb]
95 '4': [wb]
96 '5': [wb]
97 '6': [wb]
98 '7': [wb]
99 '8': [wb]
100 '9': [wb]
101 Rs.: [wb]
102 ~Rs.: [wb]
103 .a: [wb]
104 a.e: [vowel_sign]
105 .N: [vowel_sign]
106 .n: [vowel_sign]
107 M: [vowel_sign]
108 .m: [vowel_sign]
109rules:
110 "\t": "\t"
111 ' ': ' '
112 ',': ','
113 .D: "\N{DEVANAGARI LETTER DDDHA}"
114 .Dh: "\N{DEVANAGARI LETTER RHA}"
115 .N: "\N{DEVANAGARI SIGN CANDRABINDU}"
116 .a: "\N{DEVANAGARI SIGN AVAGRAHA}"
117 .h: "\N{DEVANAGARI SIGN VIRAMA}\N{ZERO WIDTH NON-JOINER}"
118 .m: "\N{DEVANAGARI SIGN ANUSVARA}"
119 .n: "\N{DEVANAGARI SIGN ANUSVARA}"
120 '0': "\N{DEVANAGARI DIGIT ZERO}"
121 '1': "\N{DEVANAGARI DIGIT ONE}"
122 '2': "\N{DEVANAGARI DIGIT TWO}"
123 '3': "\N{DEVANAGARI DIGIT THREE}"
124 '4': "\N{DEVANAGARI DIGIT FOUR}"
125 '5': "\N{DEVANAGARI DIGIT FIVE}"
126 '6': "\N{DEVANAGARI DIGIT SIX}"
127 '7': "\N{DEVANAGARI DIGIT SEVEN}"
128 '8': "\N{DEVANAGARI DIGIT EIGHT}"
129 '9': "\N{DEVANAGARI DIGIT NINE}"
130 <consonant> A: "\N{DEVANAGARI VOWEL SIGN AA}"
131 <consonant> A.c: "\N{DEVANAGARI VOWEL SIGN CANDRA O}"
132 <consonant> I: "\N{DEVANAGARI VOWEL SIGN II}"
133 <consonant> LLI: "\N{DEVANAGARI VOWEL SIGN VOCALIC LL}"
134 <consonant> LLi: "\N{DEVANAGARI VOWEL SIGN VOCALIC L}"
135 <consonant> L^i: "\N{DEVANAGARI VOWEL SIGN VOCALIC L}"
136 <consonant> RRI: "\N{DEVANAGARI VOWEL SIGN VOCALIC RR}"
137 <consonant> RRi: "\N{DEVANAGARI VOWEL SIGN VOCALIC R}"
138 <consonant> R^i: "\N{DEVANAGARI VOWEL SIGN VOCALIC R}"
139 <consonant> U: "\N{DEVANAGARI VOWEL SIGN UU}"
140 <consonant> ^e: "\N{DEVANAGARI VOWEL SIGN SHORT E}"
141 <consonant> ^o: "\N{DEVANAGARI VOWEL SIGN SHORT O}"
142 <consonant> a: ''
143 <consonant> a.c: "\N{DEVANAGARI VOWEL SIGN CANDRA E}"
144 <consonant> aa: "\N{DEVANAGARI VOWEL SIGN AA}"
145 <consonant> ai: "\N{DEVANAGARI VOWEL SIGN AI}"
146 <consonant> au: "\N{DEVANAGARI VOWEL SIGN AU}"
147 <consonant> e: "\N{DEVANAGARI VOWEL SIGN E}"
148 <consonant> ee: "\N{DEVANAGARI VOWEL SIGN II}"
149 <consonant> i: "\N{DEVANAGARI VOWEL SIGN I}"
150 <consonant> ii: "\N{DEVANAGARI VOWEL SIGN II}"
151 <consonant> o: "\N{DEVANAGARI VOWEL SIGN O}"
152 <consonant> u: "\N{DEVANAGARI VOWEL SIGN U}"
153 <consonant> uu: "\N{DEVANAGARI VOWEL SIGN UU}"
154 A: "\N{DEVANAGARI LETTER AA}"
155 A.c: "\N{DEVANAGARI LETTER CANDRA O}"
156 AUM: "\N{DEVANAGARI OM}"
157 Ch: "\N{DEVANAGARI LETTER CHA}"
158 D: "\N{DEVANAGARI LETTER DDA}"
159 Dh: "\N{DEVANAGARI LETTER DDHA}"
160 G: "\N{DEVANAGARI LETTER GHHA}"
161 GY: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
162 H: "\N{DEVANAGARI SIGN VISARGA}"
163 I: "\N{DEVANAGARI LETTER II}"
164 J: "\N{DEVANAGARI LETTER ZA}"
165 K: "\N{DEVANAGARI LETTER KHHA}"
166 L: "\N{DEVANAGARI LETTER LLA}"
167 LLI: "\N{DEVANAGARI LETTER VOCALIC LL}"
168 LLi: "\N{DEVANAGARI LETTER VOCALIC L}"
169 L^i: "\N{DEVANAGARI LETTER VOCALIC L}"
170 M: "\N{DEVANAGARI SIGN ANUSVARA}"
171 N: "\N{DEVANAGARI LETTER NNA}"
172 OM: "\N{DEVANAGARI OM}"
173 R: "\N{DEVANAGARI LETTER RRA}"
174 RRI: "\N{DEVANAGARI LETTER VOCALIC RR}"
175 RRi: "\N{DEVANAGARI LETTER VOCALIC R}"
176 R^i: "\N{DEVANAGARI LETTER VOCALIC R}"
177 Rs.: "\N{INDIAN RUPEE SIGN}"
178 Sh: "\N{DEVANAGARI LETTER SSA}"
179 T: "\N{DEVANAGARI LETTER TTA}"
180 Th: "\N{DEVANAGARI LETTER TTHA}"
181 U: "\N{DEVANAGARI LETTER UU}"
182 Y: "\N{DEVANAGARI LETTER YYA}"
183 ^e: "\N{DEVANAGARI LETTER SHORT E}"
184 ^n: "\N{DEVANAGARI LETTER NNNA}"
185 ^o: "\N{DEVANAGARI LETTER SHORT O}"
186 a: "\N{DEVANAGARI LETTER A}"
187 a.c: "\N{DEVANAGARI LETTER CANDRA E}"
188 a.e: "\N{DEVANAGARI LETTER CANDRA A}"
189 aa: "\N{DEVANAGARI LETTER AA}"
190 ai: "\N{DEVANAGARI LETTER AI}"
191 au: "\N{DEVANAGARI LETTER AU}"
192 b: "\N{DEVANAGARI LETTER BA}"
193 bh: "\N{DEVANAGARI LETTER BHA}"
194 ch: "\N{DEVANAGARI LETTER CA}"
195 chh: "\N{DEVANAGARI LETTER CHA}"
196 d: "\N{DEVANAGARI LETTER DA}"
197 dh: "\N{DEVANAGARI LETTER DHA}"
198 dny: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
199 e: "\N{DEVANAGARI LETTER E}"
200 ee: "\N{DEVANAGARI LETTER II}"
201 f: "\N{DEVANAGARI LETTER FA}"
202 g: "\N{DEVANAGARI LETTER GA}"
203 gh: "\N{DEVANAGARI LETTER GHA}"
204 h: "\N{DEVANAGARI LETTER HA}"
205 i: "\N{DEVANAGARI LETTER I}"
206 ii: "\N{DEVANAGARI LETTER II}"
207 j: "\N{DEVANAGARI LETTER JA}"
208 jh: "\N{DEVANAGARI LETTER JHA}"
209 j~n: "\N{DEVANAGARI LETTER JA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER NYA}"
210 k: "\N{DEVANAGARI LETTER KA}"
211 kSh: "\N{DEVANAGARI LETTER KA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER SSA}"
212 kh: "\N{DEVANAGARI LETTER KHA}"
213 l: "\N{DEVANAGARI LETTER LA}"
214 ld: "\N{DEVANAGARI LETTER LLA}"
215 m: "\N{DEVANAGARI LETTER MA}"
216 n: "\N{DEVANAGARI LETTER NA}"
217 o: "\N{DEVANAGARI LETTER O}"
218 p: "\N{DEVANAGARI LETTER PA}"
219 ph: "\N{DEVANAGARI LETTER PHA}"
220 q: "\N{DEVANAGARI LETTER QA}"
221 r: "\N{DEVANAGARI LETTER RA}"
222 s: "\N{DEVANAGARI LETTER SA}"
223 sh: "\N{DEVANAGARI LETTER SHA}"
224 t: "\N{DEVANAGARI LETTER TA}"
225 th: "\N{DEVANAGARI LETTER THA}"
226 u: "\N{DEVANAGARI LETTER U}"
227 uu: "\N{DEVANAGARI LETTER UU}"
228 v: "\N{DEVANAGARI LETTER VA}"
229 x: "\N{DEVANAGARI LETTER KA}\N{DEVANAGARI SIGN VIRAMA}\N{DEVANAGARI LETTER SSA}"
230 y: "\N{DEVANAGARI LETTER YA}"
231 z: "\N{DEVANAGARI LETTER ZA}"
232 zh: "\N{DEVANAGARI LETTER LLLA}"
233 '|': "\N{DEVANAGARI DANDA}"
234 '||': "\N{DEVANAGARI DOUBLE DANDA}"
235 ~N: "\N{DEVANAGARI LETTER NGA}"
236 ~Rs.: "\N{INDIAN RUPEE SIGN}"
237 ~n: "\N{DEVANAGARI LETTER NYA}"
238 "\N{LATIN SMALL LETTER N WITH DOT ABOVE}": "\N{DEVANAGARI LETTER NGA}"
239onmatch_rules:
240- <consonant> + <consonant>: "\N{DEVANAGARI SIGN VIRAMA}"
241whitespace:
242 consolidate: false
243 default: ' '
244 token_class: whitespace
245metadata:
246 title: ITRANS to Unicode
247 version: 0.1.0
248"""
249gt = GraphTransliterator.from_yaml(easyreading_yaml)
Transliterating
With the transliterator created, you can now transliterate using
GraphTransliterator.transliterate()
:
250gt.transliterate("aaj mausam ba.Daa beiimaan hai, aaj mausam")
'आज मौसम बड़ा बेईमान है, आज मौसम'
Other Information
Graph Transliterator has a few other tools built in that are for more specialized applications.
If you want to receive the details of the most recent transliteration, access
GraphTransliterator.last_matched_rules
to get this list of rules
matched:
251gt.last_matched_rules
[TransliterationRule(production='आ', prev_classes=None, prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ज', prev_classes=None, prev_tokens=None, tokens=['j'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ौ', prev_classes=['consonant'], prev_tokens=None, tokens=['au'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='स', prev_classes=None, prev_tokens=None, tokens=['s'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ब', prev_classes=None, prev_tokens=None, tokens=['b'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='ड़', prev_classes=None, prev_tokens=None, tokens=['.D'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ा', prev_classes=['consonant'], prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ब', prev_classes=None, prev_tokens=None, tokens=['b'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='े', prev_classes=['consonant'], prev_tokens=None, tokens=['e'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='ई', prev_classes=None, prev_tokens=None, tokens=['ii'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ा', prev_classes=['consonant'], prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='न', prev_classes=None, prev_tokens=None, tokens=['n'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ह', prev_classes=None, prev_tokens=None, tokens=['h'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ै', prev_classes=['consonant'], prev_tokens=None, tokens=['ai'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production=',', prev_classes=None, prev_tokens=None, tokens=[','], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='आ', prev_classes=None, prev_tokens=None, tokens=['aa'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ज', prev_classes=None, prev_tokens=None, tokens=['j'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='ौ', prev_classes=['consonant'], prev_tokens=None, tokens=['au'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='स', prev_classes=None, prev_tokens=None, tokens=['s'], next_tokens=None, next_classes=None, cost=0.5849625007211562),
TransliterationRule(production='', prev_classes=['consonant'], prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
TransliterationRule(production='म', prev_classes=None, prev_tokens=None, tokens=['m'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
Or if you just want to know the tokens matched by each rule, check
GraphTransliterator.last_matched_rule_tokens
:
252gt.last_matched_rule_tokens
[['aa'],
['j'],
[' '],
['m'],
['au'],
['s'],
['a'],
['m'],
[' '],
['b'],
['a'],
['.D'],
['aa'],
[' '],
['b'],
['e'],
['ii'],
['m'],
['aa'],
['n'],
[' '],
['h'],
['ai'],
[','],
[' '],
['aa'],
['j'],
[' '],
['m'],
['au'],
['s'],
['a'],
['m']]
You can access the directed tree used by GraphTransliterator using
GraphTransliterator.graph
:
253gt.graph
<graphtransliterator.graphs.DirectedGraph at 0x7f30141f5340>