API Reference

A list of the full API reference of all public classes and functions is below.

Public members can (and should) be imported from graphtransliterator:

from graphtransliterator import GraphTransliterator

Bundled transliterators require that graphtransliterator.transliterators: be imported:

import graphtransliterator.transliterators
transliterators.iter_names()

Core Classes

class graphtransliterator.GraphTransliterator(tokens, rules, whitespace, onmatch_rules=None, metadata=None, ignore_errors=False, check_ambiguity=True, onmatch_rules_lookup=None, tokens_by_class=None, graph=None, tokenizer_pattern=None, graphtransliterator_version=None, **kwargs)[source]

A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.

Transliteration of tokens of an input string to an output string is configured by: a set of input token types with classes, pattern-matching rules involving sequences of tokens as well as preceding or following tokens and token classes, insertion rules between matches, and optional consolidation of whitespace. Rules are ordered by specificity.

Note

This constructor does not validate settings and should typically not be called directly. Use from_dict() instead. For “easy reading” support, use from_easyreading_dict(), from_yaml(), or from_yaml_file(). Keyword parameters used here (ignore_errors, check_ambiguity) can be passed from those other constructors.

Parameters:
  • tokens (dict of {str: set of str}) – Mapping of input token types to token classes

  • rules (list of TransliterationRule) – list of transliteration rules ordered by cost

  • onmatch_rules (list of OnMatchRule, or None) – Rules for output to be inserted between tokens of certain classes when a transliteration rule has been matched but before its production string has been added to the output

  • whitespace (WhitespaceRules) – Rules for handling whitespace

  • metadata (dict or None) – Metadata settings

  • ignore_errors (bool, optional) – If true, transliteration errors are ignored and do not raise an exception. The default is false.

  • check_ambiguity (bool, optional) – If true (default), transliteration rules are checked for ambiguity. load() and loads() do not check ambiguity by default.

  • onmatch_rules_lookup (dict of {str: dict of {str: list of int}}, optional`) – OnMatchRules lookup, used internally, will be generated if not present.

  • tokens_by_class (dict of {str: set of str}, optional) – Tokens by class, used internally, will be generated if not present.

  • graph (DirectedGraph, optional) – Directed graph used by Graph Transliterator, will be generated if not present.

  • tokenizer_pattern (str, optional) – Regular expression pattern for input string tokenization, will be generated if not present.

  • graphtransliterator_version (str, optional) – Version of graphtransliterator, added by dump() and dumps().

Example

1from graphtransliterator import GraphTransliterator, OnMatchRule, TransliterationRule, WhitespaceRules
2settings = {'tokens': {'a': {'vowel'}, ' ': {'wb'}}, 'onmatch_rules': [OnMatchRule(prev_classes=['vowel'], next_classes=['vowel'], production=',')], 'rules': [TransliterationRule(production='A', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562), TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562)], 'metadata': {'author': 'Author McAuthorson'}, 'whitespace': WhitespaceRules(default=' ', token_class='wb', consolidate=False)}
3gt = GraphTransliterator(**settings)
4gt.transliterate('a')
'A'

See also

from_dict

Constructor from dictionary of settings

from_easyreading_dict

Constructor from dictionary in “easy reading” format

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

dump(compression_level=0)[source]

Dump configuration of Graph Transliterator to Python data types.

Compression is turned off by default.

Parameters:

compression_level (int) – A value in 0 (default, no compression), 1 (compression including graph), and 2 (compressiong without graph)

Returns:

GraphTransliterator configuration as a dictionary with keys:

"tokens"

Mappings of tokens to their classes (OrderedDict of {str: list of str})

"rules"

Transliteration rules in direct format (list of dict of {str: str})

"whitespace"

Whitespace settings (dict of {str: str})

"onmatch_rules"

On match rules (list of OrderedDict)

"metadata"

Dictionary of metadata (dict)

"ignore_errors"

Ignore errors in transliteration (bool)

"onmatch_rules_lookup"

Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}})

"tokens_by_class"

Tokens keyed by token class, used internally (dict of {str: list of str})

"graph"

Serialization of DirectedGraph (dict)

"tokenizer_pattern"

Regular expression for tokenizing (str)

"graphtransliterator_version"

Module version of graphtransliterator (str)

Return type:

OrderedDict

Example

 5yaml_ = '''
 6tokens:
 7  a: [vowel]
 8  ' ': [wb]
 9rules:
10  a: A
11  ' ': ' '
12whitespace:
13  default: " "
14  consolidate: false
15  token_class: wb
16onmatch_rules:
17  - <vowel> + <vowel>: ','  # add a comma between vowels
18metadata:
19  author: "Author McAuthorson"
20'''
21gt = GraphTransliterator.from_yaml(yaml_)
22gt.dump()
OrderedDict([('tokens', {'a': ['vowel'], ' ': ['wb']}),
             ('rules',
              [OrderedDict([('production', 'A'),
                            ('tokens', ['a']),
                            ('cost', 0.5849625007211562)]),
               OrderedDict([('production', ' '),
                            ('tokens', [' ']),
                            ('cost', 0.5849625007211562)])]),
             ('whitespace',
              {'default': ' ', 'token_class': 'wb', 'consolidate': False}),
             ('onmatch_rules',
              [OrderedDict([('prev_classes', ['vowel']),
                            ('next_classes', ['vowel']),
                            ('production', ',')])]),
             ('metadata', {'author': 'Author McAuthorson'}),
             ('ignore_errors', False),
             ('onmatch_rules_lookup', {'a': {'a': [0]}}),
             ('tokens_by_class', {'vowel': ['a'], 'wb': [' ']}),
             ('graph',
              {'node': [{'type': 'Start',
                 'ordered_children': {'a': [1], ' ': [3]}},
                {'token': 'a',
                 'type': 'token',
                 'ordered_children': {'__rules__': [2]}},
                {'type': 'rule', 'accepting': True, 'rule_key': 0},
                {'token': ' ',
                 'type': 'token',
                 'ordered_children': {'__rules__': [4]}},
                {'type': 'rule', 'accepting': True, 'rule_key': 1}],
               'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
                 3: {'token': ' ', 'cost': 0.5849625007211562}},
                1: {2: {'cost': 0.5849625007211562}},
                3: {4: {'cost': 0.5849625007211562}}},
               'edge_list': [(0, 1), (0, 3), (1, 2), (3, 4)]}),
             ('tokenizer_pattern', '(a|\\ )'),
             ('graphtransliterator_version', '1.2.4')])

See also

dumps

Dump Graph Transliterator configuration to JSON string

load

Load Graph Transliteration from configuration in Python data types

loads

Load Graph Transliteration from configuration as a JSON string

dumps(compression_level=2)[source]
Parameters:
  • compression_level (int) – A value in 0 (no compression), 1 (compression including graph), and 2 (default, compression without graph)

  • separators (tuple of str) – Separators used by json.dumps(), default is compact

  • (JSON). (Dump settings of Graph Transliterator to Javascript Object Notation) –

  • default. (Compression is turned on by) –

Returns:

JSON string

Return type:

str

Examples

23yaml_ = '''
24  tokens:
25    a: [vowel]
26    ' ': [wb]
27  rules:
28    a: A
29    ' ': ' '
30  whitespace:
31    default: " "
32    consolidate: false
33    token_class: wb
34  onmatch_rules:
35    - <vowel> + <vowel>: ','  # add a comma between vowels
36  metadata:
37    author: "Author McAuthorson"
38'''
39gt = GraphTransliterator.from_yaml(yaml_)
40gt.dumps()
'{"graphtransliterator_version":"1.2.4","compressed_settings":[["vowel","wb"],[" ","a"],[[1],[0]],[["A",0,0,[1],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","wb",0],[[[0],[0],","]],{"author":"Author McAuthorson"},null]}'

See also

dump

Dump Graph Transliterator configuration to Python data types

load

Load Graph Transliteration from configuration in Python data types

loads

Load Graph Transliteration from configuration as a JSON string

static from_dict(dict_settings, **kwargs)[source]

Generate GraphTransliterator from dict settings.

Parameters:

dict_settings (dict) – Dictionary of settings

Returns:

Graph transliterator

Return type:

GraphTransliterator

static from_easyreading_dict(easyreading_settings, **kwargs)[source]

Constructs GraphTransliterator from a dictionary of settings in “easy reading” format, i.e. the loaded contents of a YAML string.

Parameters:

easyreading_settings (dict) –

Settings dictionary in easy reading format with keys:

"tokens"

Mappings of tokens to their classes (dict of {str: list of str})

"rules"

Transliteration rules in “easy reading” format (list of dict of {str: str})

"onmatch_rules"

On match rules in “easy reading” format (dict of {str: str}, optional)

"whitespace"

Whitespace definitions, including default whitespace token, class of whitespace tokens, and whether or not to consolidate (dict of {‘default’: str, ‘token_class’: str, consolidate: bool}, optional)

"metadata"

Dictionary of metadata (dict, optional)

Returns:

Graph Transliterator

Return type:

GraphTransliterator

Note

Called by from_yaml().

Example

41tokens = {
42    'ab': ['class_ab'],
43    ' ': ['wb']
44}
45whitespace = {
46    'default': ' ',
47    'token_class': 'wb',
48    'consolidate': True
49}
50onmatch_rules = [
51    {'<class_ab> + <class_ab>': ','}
52]
53rules = {'ab': 'AB',
54         ' ': '_'}
55settings = {'tokens': tokens,
56            'rules': rules,
57            'whitespace': whitespace,
58            'onmatch_rules': onmatch_rules}
59gt = GraphTransliterator.from_easyreading_dict(settings)
60gt.transliterate("ab abab")
'AB_AB,AB'

See also

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

static from_yaml(yaml_str, charnames_escaped=True, **kwargs)[source]

Construct GraphTransliterator from a YAML str.

Parameters:
  • yaml_str (str) – YAML mappings of tokens, rules, and (optionally) onmatch_rules

  • charnames_escaped (boolean) – Unescape Unicode during YAML read (default True)

Note

Called by from_yaml_file() and calls from_easyreading_dict().

Example

61yaml_ = '''
62tokens:
63  a: [class1]
64  ' ': [wb]
65rules:
66  a: A
67  ' ': ' '
68whitespace:
69  default: ' '
70  consolidate: True
71  token_class: wb
72onmatch_rules:
73  - <class1> + <class1>: "+"
74'''
75gt = GraphTransliterator.from_yaml(yaml_)
76gt.transliterate("a aa")
'A A+A'

See also

from_easyreading_dict

Constructor from dictionary in “easy reading” format

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

static from_yaml_file(yaml_filename, **kwargs)[source]

Construct GraphTransliterator from YAML file.

Parameters:

yaml_filename (str) – Name of YAML file, containing tokens, rules, and (optionally) onmatch_rules

Note

Calls from_yaml().

See also

from_yaml

Constructor from YAML string in “easy reading” format

from_easyreading_dict

Constructor from dictionary in “easy reading” format

property graph

Graph used in transliteration.

Type:

DirectedGraph

property graphtransliterator_version

Graph Transliterator version.

Type:

str

property ignore_errors

Ignore transliteration errors setting.

Type:

bool

property last_input_tokens

Last tokenization of the input string, with whitespace at start and end.

Type:

list of str

property last_matched_rule_tokens

Last matched tokens for each rule.

Type:

list of list of str

property last_matched_rules

Last transliteration rules matched.

Type:

list of TransliterationRule

static load(settings, **kwargs)[source]

Create GraphTransliterator from settings as Python data types.

Parameters:

settings

GraphTransliterator configuration as a dictionary with keys:

"tokens"

Mappings of tokens to their classes (dict of {str: list of str})

"rules"

Transliteration rules in direct format (list of OrderedDict of {str: str})

"whitespace"

Whitespace settings (dict of {str: str})

"onmatch_rules"

On match rules (list of OrderedDict, optional)

"metadata"

Dictionary of metadata (dict, optional)

"ignore_errors"

Ignore errors. (bool, optional)

"onmatch_rules_lookup"

Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}}, optional)

tokens_by_class

Tokens keyed by token class, used internally (dict of {str: list of str}, optional)

graph

Serialization of DirectedGraph (dict, optional)

"tokenizer_pattern"

Regular expression for tokenizing (str, optional)

"graphtransliterator_version"

Module version of graphtransliterator (str, optional)

Returns:

Graph Transliterator

Return type:

GraphTransliterator

Example

 77from collections import OrderedDict
 78settings =           {'tokens': {'a': ['vowel'], ' ': ['wb']},
 79 'rules': [OrderedDict([('production', 'A'),
 80               # Can be compacted, removing None values
 81               # ('prev_tokens', None),
 82               ('tokens', ['a']),
 83               ('next_classes', None),
 84               ('next_tokens', None),
 85               ('cost', 0.5849625007211562)]),
 86  OrderedDict([('production', ' '),
 87               ('prev_classes', None),
 88               ('prev_tokens', None),
 89               ('tokens', [' ']),
 90               ('next_classes', None),
 91               ('next_tokens', None),
 92               ('cost', 0.5849625007211562)])],
 93 'whitespace': {'default': ' ', 'token_class': 'wb', 'consolidate': False},
 94 'onmatch_rules': [OrderedDict([('prev_classes', ['vowel']),
 95               ('next_classes', ['vowel']),
 96               ('production', ',')])],
 97 'metadata': {'author': 'Author McAuthorson'},
 98 'onmatch_rules_lookup': {'a': {'a': [0]}},
 99 'tokens_by_class': {'vowel': ['a'], 'wb': [' ']},
100 'graph': {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
101    3: {'token': ' ', 'cost': 0.5849625007211562}},
102   1: {2: {'cost': 0.5849625007211562}},
103   3: {4: {'cost': 0.5849625007211562}}},
104  'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}},
105   {'type': 'token', 'token': 'a', 'ordered_children': {'__rules__': [2]}},
106   {'type': 'rule',
107    'rule_key': 0,
108    'accepting': True,
109    'ordered_children': {}},
110   {'type': 'token', 'token': ' ', 'ordered_children': {'__rules__': [4]}},
111   {'type': 'rule',
112    'rule_key': 1,
113    'accepting': True,
114    'ordered_children': {}}],
115  'edge_list': [(0, 1), (1, 2), (0, 3), (3, 4)]},
116 'tokenizer_pattern': '(a|\ )',
117 'graphtransliterator_version': '0.3.3'}
118gt = GraphTransliterator.load(settings)
119gt.transliterate('aa')
'A,A'
120# can be compacted
121settings.pop('onmatch_rules_lookup')
122GraphTransliterator.load(settings).transliterate('aa')
'A,A'

See also

dump

Dump Graph Transliterator configuration to Python data types

dumps

Dump Graph Transliterator configuration to JSON string

loads

Load Graph Transliteration from configuration as a JSON string

static loads(settings, **kwargs)[source]

Create GraphTransliterator from JavaScript Object Notation (JSON) string.

Parameters:

settings – JSON settings for GraphTransliterator

Returns:

Graph Transliterator

Return type:

GraphTransliterator

Example

123JSON_settings = '''{"tokens": {"a": ["vowel"], " ": ["wb"]}, "rules": [{"production": "A", "prev_classes": null, "prev_tokens": null, "tokens": ["a"], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}, {"production": " ", "prev_classes": null, "prev_tokens": null, "tokens": [" "], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}], "whitespace": {"default": " ", "token_class": "wb", "consolidate": false}, "onmatch_rules": [{"prev_classes": ["vowel"], "next_classes": ["vowel"], "production": ","}], "metadata": {"author": "Author McAuthorson"}, "ignore_errors": false, "onmatch_rules_lookup": {"a": {"a": [0]}}, "tokens_by_class": {"vowel": ["a"], "wb": [" "]}, "graph": {"node": [{"type": "Start", "ordered_children": {"a": [1], " ": [3]}}, {"type": "token", "token": "a", "ordered_children": {"__rules__": [2]}}, {"type": "rule", "rule_key": 0, "accepting": true, "ordered_children": {}}, {"type": "token", "token": " ", "ordered_children": {"__rules__": [4]}}, {"type": "rule", "rule_key": 1, "accepting": true, "ordered_children": {}}], "edge": {"0": {"1": {"token": "a", "cost": 0.5849625007211562}, "3": {"token": " ", "cost": 0.5849625007211562}}, "1": {"2": {"cost": 0.5849625007211562}}, "3": {"4": {"cost": 0.5849625007211562}}}, "edge_list": [[0, 1], [1, 2], [0, 3], [3, 4]]}, "tokenizer_pattern": "(a| )", "graphtransliterator_version": "1.2.2"}'''
124
125gt = GraphTransliterator.loads(JSON_settings)
126gt.transliterate('a')
'A'

See also

dump

Dump Graph Transliterator configuration to Python data types

dumps

Dump Graph Transliterator configuration to JSON string

load

Load Graph Transliteration from configuration in Python data types

match_at(token_i, tokens, match_all=False)[source]

Match best (least costly) transliteration rule at a given index in the input tokens and return the index to that rule. Optionally, return all rules that match.

Parameters:
  • token_i (int) – Location in tokens at which to begin

  • tokens (list of str) – List of tokens

  • match_all (bool, optional) – If true, return the index of all rules matching at the given index. The default is false.

Returns:

Index of matching transliteration rule in GraphTransliterator.rules or None. Returns a list of int or an empty list if match_all is true.

Return type:

int, None, or list of int

Note

Expects whitespaces token at beginning and end of tokens.

Examples

127gt = GraphTransliterator.from_yaml('''
128        tokens:
129            a: []
130            a a: []
131            ' ': [wb]
132        rules:
133            a: <A>
134            a a: <AA>
135        whitespace:
136            default: ' '
137            consolidate: True
138            token_class: wb
139''')
140tokens = gt.tokenize("aa")
141tokens # whitespace added to ends
[' ', 'a', 'a', ' ']
142gt.match_at(1, tokens) # returns index to rule
0
143gt.rules[gt.match_at(1, tokens)] # actual rule
TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376)
144gt.match_at(1, tokens, match_all=True) # index to rules, with match_all
[0, 1]
145[gt.rules[_] for _ in gt.match_at(1, tokens, match_all=True)]
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
property metadata

Metadata of transliterator

Type:

dict

property onmatch_rules

Rules for productions between matches.

Type:

list of OnMatchRules

property onmatch_rules_lookup

On Match Rules lookup

Type:

dict

property productions

List of productions of each transliteration rule.

Type:

list of str

pruned_of(productions)[source]

Remove transliteration rules with specific output productions.

Parameters:

productions (str, or list of str) – list of productions to remove

Returns:

Graph transliterator pruned of certain productions.

Return type:

graphtransliterator.GraphTransliterator

Note

Uses original initialization parameters to construct a new GraphTransliterator.

Examples

146gt = GraphTransliterator.from_yaml('''
147        tokens:
148            a: []
149            a a: []
150            ' ': [wb]
151        rules:
152            a: <A>
153            a a: <AA>
154        whitespace:
155            default: ' '
156            consolidate: True
157            token_class: wb
158''')
159gt.rules
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
160gt.pruned_of('<AA>').rules
[TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
161gt.pruned_of(['<A>', '<AA>']).rules
[]
property rules

Transliteration rules sorted by cost.

Type:

list of TransliterationRule

tokenize(input)[source]

Tokenizes an input string.

Adds initial and trailing whitespace, which can be consolidated.

Parameters:

input (str) – String to tokenize

Returns:

List of tokens, with default whitespace token at beginning and end.

Return type:

list of str

Raises:

ValueError – Unrecognizable input, such as a character that is not in a token

Examples

162tokens = {'ab': ['class_ab'], ' ': ['wb']}
163whitespace = {'default': ' ', 'token_class': 'wb', 'consolidate': True}
164rules = {'ab': 'AB', ' ': '_'}
165settings = {'tokens': tokens, 'rules': rules, 'whitespace': whitespace}
166gt = GraphTransliterator.from_easyreading_dict(settings)
167gt.tokenize('ab ')
[' ', 'ab', ' ']
property tokenizer_pattern

Tokenizer pattern from transliterator

Type:

str

property tokens

Mappings of tokens to their classes.

Type:

dict of {str

Type:

set of str}

property tokens_by_class

Tokenizer pattern from transliterator

Type:

dict of {str

Type:

list of str}

transliterate(input)[source]

Transliterate an input string into an output string.

Parameters:

input (str) – Input string to transliterate

Returns:

Transliteration output string

Return type:

str

Raises:

ValueError – Cannot parse input

Note

Whitespace will be temporarily appended to start and end of input string.

Example

168GraphTransliterator.from_yaml(
169'''
170tokens:
171  a: []
172  ' ': [wb]
173rules:
174  a: A
175  ' ': '_'
176whitespace:
177  default: ' '
178  consolidate: True
179  token_class: wb
180''').transliterate("a a")
'A_A'
property whitespace

Whitespace rules.

Type:

WhiteSpaceRules

class graphtransliterator.CoverageTransliterator(*args, **kwargs)[source]

Subclass of GraphTransliterator that logs visits to graph and on_match rules.

Used to confirm that tests cover the entire graph and onmatch_rules.

check_coverage(raise_exception=True)[source]

Check coverage of graph and onmatch rules.

First checks graph coverage, then checks onmatch rules.

check_onmatchrules_coverage(raise_exception=True)[source]

Check coverage of onmatch rules.

clear_visited()[source]

Clear visited flags from graph and onmatch_rules.

Bundled Transliterators

graphtransliterator.transliterators

Bundled transliterators are loaded by explicitly importing graphtransliterator.transliterators. Each is an instance of graphtransliterator.bundled.Bundled.

class graphtransliterator.transliterators.Bundled(*args, **kwargs)[source]

Subclass of GraphTransliterator used for bundled Graph Transliterator.

property directory

Directory of bundled transliterator, used to load settings.

from_JSON(check_ambiguity=False, coverage=False, **kwargs)[source]

Initialize from bundled JSON file (best for speed).

Parameters:
  • check_ambiguity (bool,) – Should ambiguity be checked. Default is False.

  • coverage (bool) – Should test coverage be checked. Default is False.

from_YAML(check_ambiguity=True, coverage=True, **kwargs)[source]

Initialize from bundled YAML file (best for development).

Parameters:
  • check_ambiguity (bool,) – Should ambiguity be checked. Default is True.

  • coverage (bool) – Should test coverage be checked. Default is True.

generate_yaml_tests(file=None)[source]

Generates YAML tests with complete coverage.

Uses the first token in a class as a sample. Assumes for onmatch rules that the first sample token in a class has a unique production, which may not be the case. These should be checked and edited.

load_yaml_tests()[source]

Iterator for YAML tests.

Assumes tests are found in subdirectory tests of module with name NAME_tests.yaml, e.g. `source_to_target/tests/source_to_target_tests.yaml.

property name

Name of bundled transliterator, e.g. ‘Example’

classmethod new(method='json', **kwargs)[source]

Return a new class instance from method (json/yaml).

Parameters:

method (str (json or yaml)) – How to load bundled transliterator, JSON or YAML.

run_tests(transliteration_tests)[source]

Run transliteration tests.

Parameters:

transliteration_tests (dict of {str:str}) – Dictionary of test from source -> correct target.

run_yaml_tests()[source]

Run YAML tests in MODULE/tests/MODULE_tests.yaml

property yaml_tests_filen

Metadata of transliterator

Type:

dict

class graphtransliterator.transliterators.Example(**kwargs)[source]

Example Bundled Graph Transliterator.

class graphtransliterator.transliterators.ITRANSDevanagariToUnicode(**kwargs)[source]

ITRANS Devanagari to Unicode Transliterator.

class graphtransliterator.transliterators.MetadataSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for Bundled metadata.

graphtransliterator.transliterators.iter_names()[source]

Iterate through bundled transliterator names.

graphtransliterator.transliterators.iter_transliterators(**kwds)[source]

Iterate through instances of bundled transliterators.

Graph Classes

class graphtransliterator.DirectedGraph(node=None, edge=None, edge_list=None)[source]

A very basic dictionary- and list-based directed graph. Nodes are a list of dictionaries of node data. Edges are nested dictionaries keyed from the head -> tail -> edge properties. An edge list is maintained. Can be exported as a dictionary.

node

List of node data

Type:

list of dict

edge

Mapping from head to tail of edge, holding edge data

Type:

dict of {int: dict of {int: dict}}

edge_list

List of head and tail of each edge

Type:

list of tuple of (int, int)

Examples

181from graphtransliterator import DirectedGraph
182DirectedGraph()
<graphtransliterator.graphs.DirectedGraph at 0x7f4e404ee840>
add_edge(head, tail, edge_data=None)[source]

Add an edge to a graph and return its attributes as dict.

Parameters:
  • head (int) – Index of head of edge

  • tail (int) – Index of tail of edge

  • edge_data (dict, default {}) – Edge data

Returns:

Data of created edge

Return type:

dict

Raises:

ValueError – Invalid head or tail, or edge_data is not a dict.

Examples

183g = DirectedGraph()
184g.add_node()
(0, {})
185g.add_node()
(1, {})
186g.add_edge(0,1, {'data_key_1': 'some edge data here'})
{'data_key_1': 'some edge data here'}
187g.edge
{0: {1: {'data_key_1': 'some edge data here'}}}
add_node(node_data=None)[source]

Create node and return (int, dict) of node key and object.

Parameters:

node_data (dict, default {}) – Data to be stored in created node

Returns:

Index of created node and its data

Return type:

tuple of (int, dict)

Raises:

ValueErrornode_data is not a dict

Examples

188g = DirectedGraph()
189g.add_node()
(0, {})
190g.add_node({'datakey1': 'data value'})
(1, {'datakey1': 'data value'})
191g.node
[{}, {'datakey1': 'data value'}]
class graphtransliterator.VisitLoggingDirectedGraph(graph)[source]

A DirectedGraph that logs visits to all nodes and edges.

Used to measure the coverage of tests for bundled transliterators.

check_coverage(raise_exception=True)[source]

Checks that all nodes and edges are visited.

Parameters:

raise_exception (bool, default) – Raise IncompleteGraphCoverageException (default, True)

Raises:

IncompleteGraphCoverageException – Not all nodes/edges of a graph have been visited.

clear_visited()[source]

Clear all visited attributes on nodes and edges.

Rule Classes

class graphtransliterator.TransliterationRule(production, prev_classes, prev_tokens, tokens, next_tokens, next_classes, cost)[source]

A transliteration rule containing the specific match conditions and string output to be produced, as well as the rule’s cost.

production

Output produced on match of rule

Type:

str

prev_classes

List of previous token classes to be matched before tokens or, if they exist, prev_tokens

Type:

list of str, or None

prev_tokens

List of tokens to be matched before tokens

Type:

list of str, or None

tokens

List of tokens to match

Type:

list of str

next_tokens

List of tokens to match after tokens

Type:

list of str, or None

next_classes

List of tokens to match after tokens or, if they exist, next_tokens

Type:

list of str, or None

cost

Cost of the rule, where less specific rules are more costly

Type:

float

class graphtransliterator.OnMatchRule(prev_classes, next_classes, production)[source]

Rules about adding text between certain combinations of matched rules.

When a translation rule has been found and before its production is added to the output, the productions string of an OnMatch rule is added if previously matched tokens and current tokens are of the specified classes.

prev_classes

List of previously matched token classes required

Type:

list of str

next_classes

List of current and following token classes required

Type:

list of str

production

String to added before current rule

Type:

str

class graphtransliterator.WhitespaceRules(default, token_class, consolidate)[source]

Whitespace rules of GraphTransliterator.

default

Default whitespace token

Type:

str

token_class

Whitespace token class

Type:

str

consolidate

Consolidate consecutive whitespace tokens and render as a single instance of the specified default whitespace token.

Type:

bool

Exceptions

exception graphtransliterator.GraphTransliteratorException[source]

Base exception class. All Graph Transliterator-specific exceptions should subclass this class.

exception graphtransliterator.AmbiguousTransliterationRulesException[source]

Raised when multiple transliteration rules can match the same pattern. Details of ambiguities are given in a logging.warning().

exception graphtransliterator.NoMatchingTransliterationRuleException[source]

Raised when no transliteration rule can be matched at a particular location in the input string’s tokens. Details of the location are given in a logging.warning().

exception graphtransliterator.UnrecognizableInputTokenException[source]

Raised when a character in the input string does not correspond to any tokens in the GraphTransliterator’s token settings. Details of the location are given in a logging.warning().

Schemas

class graphtransliterator.DirectedGraphSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for DirectedGraph.

Validates graph somewhat rigorously.

class graphtransliterator.EasyReadingSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for easy reading settings.

Provides initial validation based on easy reading format.

class graphtransliterator.GraphTransliteratorSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for Graph Transliterator.

class graphtransliterator.OnMatchRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for OnMatchRule.

class graphtransliterator.SettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for settings in dictionary format.

Performs validation.

class graphtransliterator.TransliterationRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for TransliterationRule.

class graphtransliterator.WhitespaceDictSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for Whitespace definition as a dict.

class graphtransliterator.WhitespaceSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]

Schema for Whitespace definition that loads as WhitespaceRules.