API Reference

A list of the full API reference of all public classes and functions is below.

Public members can (and should) be imported from graphtransliterator:

from graphtransliterator import GraphTransliterator

Bundled transliterators require that graphtransliterator.transliterators: be imported:

import graphtransliterator.transliterators
transliterators.iter_names()

Core Classes

class graphtransliterator.GraphTransliterator(tokens, rules, whitespace, onmatch_rules=None, metadata=None, ignore_errors=False, check_ambiguity=True, onmatch_rules_lookup=None, tokens_by_class=None, graph=None, tokenizer_pattern=None, graphtransliterator_version=None, **kwargs)[source]

A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.

Transliteration of tokens of an input string to an output string is configured by: a set of input token types with classes, pattern-matching rules involving sequences of tokens as well as preceding or following tokens and token classes, insertion rules between matches, and optional consolidation of whitespace. Rules are ordered by specificity.

Note

This constructor does not validate settings and should typically not be called directly. Use from_dict() instead. For “easy reading” support, use from_easyreading_dict(), from_yaml(), or from_yaml_file(). Keyword parameters used here (ignore_errors, check_ambiguity) can be passed from those other constructors.

Parameters
  • tokens (dict of {str: set of str}) – Mapping of input token types to token classes

  • rules (list of TransliterationRule) – list of transliteration rules ordered by cost

  • onmatch_rules (list of OnMatchRule, or None) – Rules for output to be inserted between tokens of certain classes when a transliteration rule has been matched but before its production string has been added to the output

  • whitespace (WhitespaceRules) – Rules for handling whitespace

  • metadata (dict or None) – Metadata settings

  • ignore_errors (bool, optional) – If true, transliteration errors are ignored and do not raise an exception. The default is false.

  • check_ambiguity (bool, optional) – If true (default), transliteration rules are checked for ambiguity. load() and loads() do not check ambiguity by default.

  • onmatch_rules_lookup (dict of {str: dict of {str: list of int}}, optional`) – OnMatchRules lookup, used internally, will be generated if not present.

  • tokens_by_class (dict of {str: set of str}, optional) – Tokens by class, used internally, will be generated if not present.

  • graph (DirectedGraph, optional) – Directed graph used by Graph Transliterator, will be generated if not present.

  • tokenizer_pattern (str, optional) – Regular expression pattern for input string tokenization, will be generated if not present.

  • graphtransliterator_version (str, optional) – Version of graphtransliterator, added by dump() and dumps().

Example

1from graphtransliterator import GraphTransliterator, OnMatchRule, TransliterationRule, WhitespaceRules
2settings = {'tokens': {'a': {'vowel'}, ' ': {'wb'}}, 'onmatch_rules': [OnMatchRule(prev_classes=['vowel'], next_classes=['vowel'], production=',')], 'rules': [TransliterationRule(production='A', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562), TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562)], 'metadata': {'author': 'Author McAuthorson'}, 'whitespace': WhitespaceRules(default=' ', token_class='wb', consolidate=False)}
3gt = GraphTransliterator(**settings)
4gt.transliterate('a')
'A'

See also

from_dict

Constructor from dictionary of settings

from_easyreading_dict

Constructor from dictionary in “easy reading” format

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

dump(compression_level=0)[source]

Dump configuration of Graph Transliterator to Python data types.

Compression is turned off by default.

Parameters

compression_level (int) – A value in 0 (default, no compression), 1 (compression including graph), and 2 (compressiong without graph)

Returns

GraphTransliterator configuration as a dictionary with keys:

"tokens"

Mappings of tokens to their classes (OrderedDict of {str: list of str})

"rules"

Transliteration rules in direct format (list of dict of {str: str})

"whitespace"

Whitespace settings (dict of {str: str})

"onmatch_rules"

On match rules (list of OrderedDict)

"metadata"

Dictionary of metadata (dict)

"ignore_errors"

Ignore errors in transliteration (bool)

"onmatch_rules_lookup"

Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}})

"tokens_by_class"

Tokens keyed by token class, used internally (dict of {str: list of str})

"graph"

Serialization of DirectedGraph (dict)

"tokenizer_pattern"

Regular expression for tokenizing (str)

"graphtransliterator_version"

Module version of graphtransliterator (str)

Return type

OrderedDict

Example

 5yaml_ = '''
 6tokens:
 7  a: [vowel]
 8  ' ': [wb]
 9rules:
10  a: A
11  ' ': ' '
12whitespace:
13  default: " "
14  consolidate: false
15  token_class: wb
16onmatch_rules:
17  - <vowel> + <vowel>: ','  # add a comma between vowels
18metadata:
19  author: "Author McAuthorson"
20'''
21gt = GraphTransliterator.from_yaml(yaml_)
22gt.dump()
OrderedDict([('tokens', {'a': ['vowel'], ' ': ['wb']}),
             ('rules',
              [OrderedDict([('production', 'A'),
                            ('tokens', ['a']),
                            ('cost', 0.5849625007211562)]),
               OrderedDict([('production', ' '),
                            ('tokens', [' ']),
                            ('cost', 0.5849625007211562)])]),
             ('whitespace',
              {'token_class': 'wb', 'default': ' ', 'consolidate': False}),
             ('onmatch_rules',
              [OrderedDict([('prev_classes', ['vowel']),
                            ('next_classes', ['vowel']),
                            ('production', ',')])]),
             ('metadata', {'author': 'Author McAuthorson'}),
             ('ignore_errors', False),
             ('onmatch_rules_lookup', {'a': {'a': [0]}}),
             ('tokens_by_class', {'vowel': ['a'], 'wb': [' ']}),
             ('graph',
              {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
                 3: {'token': ' ', 'cost': 0.5849625007211562}},
                1: {2: {'cost': 0.5849625007211562}},
                3: {4: {'cost': 0.5849625007211562}}},
               'edge_list': [(0, 1), (0, 3), (1, 2), (3, 4)],
               'node': [{'ordered_children': {'a': [1], ' ': [3]},
                 'type': 'Start'},
                {'token': 'a',
                 'ordered_children': {'__rules__': [2]},
                 'type': 'token'},
                {'accepting': True, 'type': 'rule', 'rule_key': 0},
                {'token': ' ',
                 'ordered_children': {'__rules__': [4]},
                 'type': 'token'},
                {'accepting': True, 'type': 'rule', 'rule_key': 1}]}),
             ('tokenizer_pattern', '(a|\\ )'),
             ('graphtransliterator_version', '1.2.2')])

See also

dumps

Dump Graph Transliterator configuration to JSON string

load

Load Graph Transliteration from configuration in Python data types

loads

Load Graph Transliteration from configuration as a JSON string

dumps(compression_level=2)[source]
Parameters
  • compression_level (int) – A value in 0 (no compression), 1 (compression including graph), and 2 (default, compression without graph)

  • separators (tuple of str) – Separators used by json.dumps(), default is compact

  • (JSON) (Dump settings of Graph Transliterator to Javascript Object Notation) –

  • default. (Compression is turned on by) –

Returns

JSON string

Return type

str

Examples

23yaml_ = '''
24  tokens:
25    a: [vowel]
26    ' ': [wb]
27  rules:
28    a: A
29    ' ': ' '
30  whitespace:
31    default: " "
32    consolidate: false
33    token_class: wb
34  onmatch_rules:
35    - <vowel> + <vowel>: ','  # add a comma between vowels
36  metadata:
37    author: "Author McAuthorson"
38'''
39gt = GraphTransliterator.from_yaml(yaml_)
40gt.dumps()
'{"graphtransliterator_version":"1.2.2","compressed_settings":[["vowel","wb"],[" ","a"],[[1],[0]],[["A",0,0,[1],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","wb",0],[[[0],[0],","]],{"author":"Author McAuthorson"},null]}'

See also

dump

Dump Graph Transliterator configuration to Python data types

load

Load Graph Transliteration from configuration in Python data types

loads

Load Graph Transliteration from configuration as a JSON string

static from_dict(dict_settings, **kwargs)[source]

Generate GraphTransliterator from dict settings.

Parameters

dict_settings (dict) – Dictionary of settings

Returns

Graph transliterator

Return type

GraphTransliterator

static from_easyreading_dict(easyreading_settings, **kwargs)[source]

Constructs GraphTransliterator from a dictionary of settings in “easy reading” format, i.e. the loaded contents of a YAML string.

Parameters

easyreading_settings (dict) –

Settings dictionary in easy reading format with keys:

"tokens"

Mappings of tokens to their classes (dict of {str: list of str})

"rules"

Transliteration rules in “easy reading” format (list of dict of {str: str})

"onmatch_rules"

On match rules in “easy reading” format (dict of {str: str}, optional)

"whitespace"

Whitespace definitions, including default whitespace token, class of whitespace tokens, and whether or not to consolidate (dict of {‘default’: str, ‘token_class’: str, consolidate: bool}, optional)

"metadata"

Dictionary of metadata (dict, optional)

Returns

Graph Transliterator

Return type

GraphTransliterator

Note

Called by from_yaml().

Example

41tokens = {
42    'ab': ['class_ab'],
43    ' ': ['wb']
44}
45whitespace = {
46    'default': ' ',
47    'token_class': 'wb',
48    'consolidate': True
49}
50onmatch_rules = [
51    {'<class_ab> + <class_ab>': ','}
52]
53rules = {'ab': 'AB',
54         ' ': '_'}
55settings = {'tokens': tokens,
56            'rules': rules,
57            'whitespace': whitespace,
58            'onmatch_rules': onmatch_rules}
59gt = GraphTransliterator.from_easyreading_dict(settings)
60gt.transliterate("ab abab")
'AB_AB,AB'

See also

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

static from_yaml(yaml_str, charnames_escaped=True, **kwargs)[source]

Construct GraphTransliterator from a YAML str.

Parameters
  • yaml_str (str) – YAML mappings of tokens, rules, and (optionally) onmatch_rules

  • charnames_escaped (boolean) – Unescape Unicode during YAML read (default True)

Note

Called by from_yaml_file() and calls from_easyreading_dict().

Example

61yaml_ = '''
62tokens:
63  a: [class1]
64  ' ': [wb]
65rules:
66  a: A
67  ' ': ' '
68whitespace:
69  default: ' '
70  consolidate: True
71  token_class: wb
72onmatch_rules:
73  - <class1> + <class1>: "+"
74'''
75gt = GraphTransliterator.from_yaml(yaml_)
76gt.transliterate("a aa")
'A A+A'

See also

from_easyreading_dict

Constructor from dictionary in “easy reading” format

from_yaml

Constructor from YAML string in “easy reading” format

from_yaml_file

Constructor from YAML file in “easy reading” format

static from_yaml_file(yaml_filename, **kwargs)[source]

Construct GraphTransliterator from YAML file.

Parameters

yaml_filename (str) – Name of YAML file, containing tokens, rules, and (optionally) onmatch_rules

Note

Calls from_yaml().

See also

from_yaml

Constructor from YAML string in “easy reading” format

from_easyreading_dict

Constructor from dictionary in “easy reading” format

property graph

Graph used in transliteration.

Type

DirectedGraph

property graphtransliterator_version

Graph Transliterator version.

Type

str

property ignore_errors

Ignore transliteration errors setting.

Type

bool

property last_input_tokens

Last tokenization of the input string, with whitespace at start and end.

Type

list of str

property last_matched_rule_tokens

Last matched tokens for each rule.

Type

list of list of str

property last_matched_rules

Last transliteration rules matched.

Type

list of TransliterationRule

static load(settings, **kwargs)[source]

Create GraphTransliterator from settings as Python data types.

Parameters

settings

GraphTransliterator configuration as a dictionary with keys:

"tokens"

Mappings of tokens to their classes (dict of {str: list of str})

"rules"

Transliteration rules in direct format (list of OrderedDict of {str: str})

"whitespace"

Whitespace settings (dict of {str: str})

"onmatch_rules"

On match rules (list of OrderedDict, optional)

"metadata"

Dictionary of metadata (dict, optional)

"ignore_errors"

Ignore errors. (bool, optional)

"onmatch_rules_lookup"

Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}}, optional)

tokens_by_class

Tokens keyed by token class, used internally (dict of {str: list of str}, optional)

graph

Serialization of DirectedGraph (dict, optional)

"tokenizer_pattern"

Regular expression for tokenizing (str, optional)

"graphtransliterator_version"

Module version of graphtransliterator (str, optional)

Returns

Graph Transliterator

Return type

GraphTransliterator

Example

 77from collections import OrderedDict
 78settings =           {'tokens': {'a': ['vowel'], ' ': ['wb']},
 79 'rules': [OrderedDict([('production', 'A'),
 80               # Can be compacted, removing None values
 81               # ('prev_tokens', None),
 82               ('tokens', ['a']),
 83               ('next_classes', None),
 84               ('next_tokens', None),
 85               ('cost', 0.5849625007211562)]),
 86  OrderedDict([('production', ' '),
 87               ('prev_classes', None),
 88               ('prev_tokens', None),
 89               ('tokens', [' ']),
 90               ('next_classes', None),
 91               ('next_tokens', None),
 92               ('cost', 0.5849625007211562)])],
 93 'whitespace': {'default': ' ', 'token_class': 'wb', 'consolidate': False},
 94 'onmatch_rules': [OrderedDict([('prev_classes', ['vowel']),
 95               ('next_classes', ['vowel']),
 96               ('production', ',')])],
 97 'metadata': {'author': 'Author McAuthorson'},
 98 'onmatch_rules_lookup': {'a': {'a': [0]}},
 99 'tokens_by_class': {'vowel': ['a'], 'wb': [' ']},
100 'graph': {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
101    3: {'token': ' ', 'cost': 0.5849625007211562}},
102   1: {2: {'cost': 0.5849625007211562}},
103   3: {4: {'cost': 0.5849625007211562}}},
104  'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}},
105   {'type': 'token', 'token': 'a', 'ordered_children': {'__rules__': [2]}},
106   {'type': 'rule',
107    'rule_key': 0,
108    'accepting': True,
109    'ordered_children': {}},
110   {'type': 'token', 'token': ' ', 'ordered_children': {'__rules__': [4]}},
111   {'type': 'rule',
112    'rule_key': 1,
113    'accepting': True,
114    'ordered_children': {}}],
115  'edge_list': [(0, 1), (1, 2), (0, 3), (3, 4)]},
116 'tokenizer_pattern': '(a|\ )',
117 'graphtransliterator_version': '0.3.3'}
118gt = GraphTransliterator.load(settings)
119gt.transliterate('aa')
'A,A'
120# can be compacted
121settings.pop('onmatch_rules_lookup')
122GraphTransliterator.load(settings).transliterate('aa')
'A,A'

See also

dump

Dump Graph Transliterator configuration to Python data types

dumps

Dump Graph Transliterator configuration to JSON string

loads

Load Graph Transliteration from configuration as a JSON string

static loads(settings, **kwargs)[source]

Create GraphTransliterator from JavaScript Object Notation (JSON) string.

Parameters

settings – JSON settings for GraphTransliterator

Returns

Graph Transliterator

Return type

GraphTransliterator

Example

123JSON_settings = '''{"tokens": {"a": ["vowel"], " ": ["wb"]}, "rules": [{"production": "A", "prev_classes": null, "prev_tokens": null, "tokens": ["a"], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}, {"production": " ", "prev_classes": null, "prev_tokens": null, "tokens": [" "], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}], "whitespace": {"default": " ", "token_class": "wb", "consolidate": false}, "onmatch_rules": [{"prev_classes": ["vowel"], "next_classes": ["vowel"], "production": ","}], "metadata": {"author": "Author McAuthorson"}, "ignore_errors": false, "onmatch_rules_lookup": {"a": {"a": [0]}}, "tokens_by_class": {"vowel": ["a"], "wb": [" "]}, "graph": {"node": [{"type": "Start", "ordered_children": {"a": [1], " ": [3]}}, {"type": "token", "token": "a", "ordered_children": {"__rules__": [2]}}, {"type": "rule", "rule_key": 0, "accepting": true, "ordered_children": {}}, {"type": "token", "token": " ", "ordered_children": {"__rules__": [4]}}, {"type": "rule", "rule_key": 1, "accepting": true, "ordered_children": {}}], "edge": {"0": {"1": {"token": "a", "cost": 0.5849625007211562}, "3": {"token": " ", "cost": 0.5849625007211562}}, "1": {"2": {"cost": 0.5849625007211562}}, "3": {"4": {"cost": 0.5849625007211562}}}, "edge_list": [[0, 1], [1, 2], [0, 3], [3, 4]]}, "tokenizer_pattern": "(a| )", "graphtransliterator_version": "1.2.2"}'''
124
125gt = GraphTransliterator.loads(JSON_settings)
126gt.transliterate('a')
'A'

See also

dump

Dump Graph Transliterator configuration to Python data types

dumps

Dump Graph Transliterator configuration to JSON string

load

Load Graph Transliteration from configuration in Python data types

match_at(token_i, tokens, match_all=False)[source]

Match best (least costly) transliteration rule at a given index in the input tokens and return the index to that rule. Optionally, return all rules that match.

Parameters
  • token_i (int) – Location in tokens at which to begin

  • tokens (list of str) – List of tokens

  • match_all (bool, optional) – If true, return the index of all rules matching at the given index. The default is false.

Returns

Index of matching transliteration rule in GraphTransliterator.rules or None. Returns a list of int or an empty list if match_all is true.

Return type

int, None, or list of int

Note

Expects whitespaces token at beginning and end of tokens.

Examples

127gt = GraphTransliterator.from_yaml('''
128        tokens:
129            a: []
130            a a: []
131            ' ': [wb]
132        rules:
133            a: <A>
134            a a: <AA>
135        whitespace:
136            default: ' '
137            consolidate: True
138            token_class: wb
139''')
140tokens = gt.tokenize("aa")
141tokens # whitespace added to ends
[' ', 'a', 'a', ' ']
142gt.match_at(1, tokens) # returns index to rule
0
143gt.rules[gt.match_at(1, tokens)] # actual rule
TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437)
144gt.match_at(1, tokens, match_all=True) # index to rules, with match_all
[0, 1]
145[gt.rules[_] for _ in gt.match_at(1, tokens, match_all=True)]
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
property metadata

Metadata of transliterator

Type

dict

property onmatch_rules

Rules for productions between matches.

Type

list of OnMatchRules

property onmatch_rules_lookup

On Match Rules lookup

Type

dict

property productions

List of productions of each transliteration rule.

Type

list of str

pruned_of(productions)[source]

Remove transliteration rules with specific output productions.

Parameters

productions (str, or list of str) – list of productions to remove

Returns

Graph transliterator pruned of certain productions.

Return type

graphtransliterator.GraphTransliterator

Note

Uses original initialization parameters to construct a new GraphTransliterator.

Examples

146gt = GraphTransliterator.from_yaml('''
147        tokens:
148            a: []
149            a a: []
150            ' ': [wb]
151        rules:
152            a: <A>
153            a a: <AA>
154        whitespace:
155            default: ' '
156            consolidate: True
157            token_class: wb
158''')
159gt.rules
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
160gt.pruned_of('<AA>').rules
[TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
161gt.pruned_of(['<A>', '<AA>']).rules
[]
property rules

Transliteration rules sorted by cost.

Type

list of TransliterationRule

tokenize(input)[source]

Tokenizes an input string.

Adds initial and trailing whitespace, which can be consolidated.

Parameters

input (str) – String to tokenize

Returns

List of tokens, with default whitespace token at beginning and end.

Return type

list of str

Raises

ValueError – Unrecognizable input, such as a character that is not in a token

Examples

162tokens = {'ab': ['class_ab'], ' ': ['wb']}
163whitespace = {'default': ' ', 'token_class': 'wb', 'consolidate': True}
164rules = {'ab': 'AB', ' ': '_'}
165settings = {'tokens': tokens, 'rules': rules, 'whitespace': whitespace}
166gt = GraphTransliterator.from_easyreading_dict(settings)
167gt.tokenize('ab ')
[' ', 'ab', ' ']
property tokenizer_pattern

Tokenizer pattern from transliterator

Type

str

property tokens

Mappings of tokens to their classes.

Type

dict of {str

Type

set of str}

property tokens_by_class

Tokenizer pattern from transliterator

Type

dict of {str

Type

list of str}

transliterate(input)[source]

Transliterate an input string into an output string.

Parameters

input (str) – Input string to transliterate

Returns

Transliteration output string

Return type

str

Raises

ValueError – Cannot parse input

Note

Whitespace will be temporarily appended to start and end of input string.

Example

168GraphTransliterator.from_yaml(
169'''
170tokens:
171  a: []
172  ' ': [wb]
173rules:
174  a: A
175  ' ': '_'
176whitespace:
177  default: ' '
178  consolidate: True
179  token_class: wb
180''').transliterate("a a")
'A_A'
property whitespace

Whitespace rules.

Type

WhiteSpaceRules

class graphtransliterator.CoverageTransliterator(*args, **kwargs)[source]

Subclass of GraphTransliterator that logs visits to graph and on_match rules.

Used to confirm that tests cover the entire graph and onmatch_rules.

check_coverage(raise_exception=True)[source]

Check coverage of graph and onmatch rules.

First checks graph coverage, then checks onmatch rules.

check_onmatchrules_coverage(raise_exception=True)[source]

Check coverage of onmatch rules.

clear_visited()[source]

Clear visited flags from graph and onmatch_rules.

Bundled Transliterators

graphtransliterator.transliterators

Bundled transliterators are loaded by explicitly importing graphtransliterator.transliterators. Each is an instance of graphtransliterator.bundled.Bundled.

class graphtransliterator.transliterators.Bundled(*args, **kwargs)[source]

Subclass of GraphTransliterator used for bundled Graph Transliterator.

property directory

Directory of bundled transliterator, used to load settings.

from_JSON(check_ambiguity=False, coverage=False, **kwargs)[source]

Initialize from bundled JSON file (best for speed).

Parameters
  • check_ambiguity (bool,) – Should ambiguity be checked. Default is False.

  • coverage (bool) – Should test coverage be checked. Default is False.

from_YAML(check_ambiguity=True, coverage=True, **kwargs)[source]

Initialize from bundled YAML file (best for development).

Parameters
  • check_ambiguity (bool,) – Should ambiguity be checked. Default is True.

  • coverage (bool) – Should test coverage be checked. Default is True.

generate_yaml_tests(file=None)[source]

Generates YAML tests with complete coverage.

Uses the first token in a class as a sample. Assumes for onmatch rules that the first sample token in a class has a unique production, which may not be the case. These should be checked and edited.

load_yaml_tests()[source]

Iterator for YAML tests.

Assumes tests are found in subdirectory tests of module with name NAME_tests.yaml, e.g. `source_to_target/tests/source_to_target_tests.yaml.

property name

Name of bundled transliterator, e.g. ‘Example’

classmethod new(method='json', **kwargs)[source]

Return a new class instance from method (json/yaml).

Parameters

method (str (json or yaml)) – How to load bundled transliterator, JSON or YAML.

run_tests(transliteration_tests)[source]

Run transliteration tests.

Parameters

transliteration_tests (dict of {str:str}) – Dictionary of test from source -> correct target.

run_yaml_tests()[source]

Run YAML tests in MODULE/tests/MODULE_tests.yaml

property yaml_tests_filen

Metadata of transliterator

Type

dict

class graphtransliterator.transliterators.Example(**kwargs)[source]

Example Bundled Graph Transliterator.

class graphtransliterator.transliterators.ITRANSDevanagariToUnicode(**kwargs)[source]

ITRANS Devanagari to Unicode Transliterator.

class graphtransliterator.transliterators.MetadataSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for Bundled metadata.

graphtransliterator.transliterators.iter_names()[source]

Iterate through bundled transliterator names.

graphtransliterator.transliterators.iter_transliterators(**kwds)[source]

Iterate through instances of bundled transliterators.

Graph Classes

class graphtransliterator.DirectedGraph(node=None, edge=None, edge_list=None)[source]

A very basic dictionary- and list-based directed graph. Nodes are a list of dictionaries of node data. Edges are nested dictionaries keyed from the head -> tail -> edge properties. An edge list is maintained. Can be exported as a dictionary.

node

List of node data

Type

list of dict

edge

Mapping from head to tail of edge, holding edge data

Type

dict of {int: dict of {int: dict}}

edge_list

List of head and tail of each edge

Type

list of tuple of (int, int)

Examples

181from graphtransliterator import DirectedGraph
182DirectedGraph()
<graphtransliterator.graphs.DirectedGraph at 0x7ff8d83354b0>
add_edge(head, tail, edge_data=None)[source]

Add an edge to a graph and return its attributes as dict.

Parameters
  • head (int) – Index of head of edge

  • tail (int) – Index of tail of edge

  • edge_data (dict, default {}) – Edge data

Returns

Data of created edge

Return type

dict

Raises

ValueError – Invalid head or tail, or edge_data is not a dict.

Examples

183g = DirectedGraph()
184g.add_node()
(0, {})
185g.add_node()
(1, {})
186g.add_edge(0,1, {'data_key_1': 'some edge data here'})
{'data_key_1': 'some edge data here'}
187g.edge
{0: {1: {'data_key_1': 'some edge data here'}}}
add_node(node_data=None)[source]

Create node and return (int, dict) of node key and object.

Parameters

node_data (dict, default {}) – Data to be stored in created node

Returns

Index of created node and its data

Return type

tuple of (int, dict)

Raises

ValueErrornode_data is not a dict

Examples

188g = DirectedGraph()
189g.add_node()
(0, {})
190g.add_node({'datakey1': 'data value'})
(1, {'datakey1': 'data value'})
191g.node
[{}, {'datakey1': 'data value'}]
class graphtransliterator.VisitLoggingDirectedGraph(graph)[source]

A DirectedGraph that logs visits to all nodes and edges.

Used to measure the coverage of tests for bundled transliterators.

check_coverage(raise_exception=True)[source]

Checks that all nodes and edges are visited.

Parameters

raise_exception (bool, default) – Raise IncompleteGraphCoverageException (default, True)

Raises

IncompleteGraphCoverageException – Not all nodes/edges of a graph have been visited.

clear_visited()[source]

Clear all visited attributes on nodes and edges.

Rule Classes

class graphtransliterator.TransliterationRule(production, prev_classes, prev_tokens, tokens, next_tokens, next_classes, cost)[source]

A transliteration rule containing the specific match conditions and string output to be produced, as well as the rule’s cost.

production

Output produced on match of rule

Type

str

prev_classes

List of previous token classes to be matched before tokens or, if they exist, prev_tokens

Type

list of str, or None

prev_tokens

List of tokens to be matched before tokens

Type

list of str, or None

tokens

List of tokens to match

Type

list of str

next_tokens

List of tokens to match after tokens

Type

list of str, or None

next_classes

List of tokens to match after tokens or, if they exist, next_tokens

Type

list of str, or None

cost

Cost of the rule, where less specific rules are more costly

Type

float

class graphtransliterator.OnMatchRule(prev_classes, next_classes, production)[source]

Rules about adding text between certain combinations of matched rules.

When a translation rule has been found and before its production is added to the output, the productions string of an OnMatch rule is added if previously matched tokens and current tokens are of the specified classes.

prev_classes

List of previously matched token classes required

Type

list of str

next_classes

List of current and following token classes required

Type

list of str

production

String to added before current rule

Type

str

class graphtransliterator.WhitespaceRules(default, token_class, consolidate)[source]

Whitespace rules of GraphTransliterator.

default

Default whitespace token

Type

str

token_class

Whitespace token class

Type

str

consolidate

Consolidate consecutive whitespace tokens and render as a single instance of the specified default whitespace token.

Type

bool

Exceptions

exception graphtransliterator.GraphTransliteratorException[source]

Base exception class. All Graph Transliterator-specific exceptions should subclass this class.

exception graphtransliterator.AmbiguousTransliterationRulesException[source]

Raised when multiple transliteration rules can match the same pattern. Details of ambiguities are given in a logging.warning().

exception graphtransliterator.NoMatchingTransliterationRuleException[source]

Raised when no transliteration rule can be matched at a particular location in the input string’s tokens. Details of the location are given in a logging.warning().

exception graphtransliterator.UnrecognizableInputTokenException[source]

Raised when a character in the input string does not correspond to any tokens in the GraphTransliterator’s token settings. Details of the location are given in a logging.warning().

Schemas

class graphtransliterator.DirectedGraphSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for DirectedGraph.

Validates graph somewhat rigorously.

class graphtransliterator.EasyReadingSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for easy reading settings.

Provides initial validation based on easy reading format.

class graphtransliterator.GraphTransliteratorSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for Graph Transliterator.

class graphtransliterator.OnMatchRuleSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for OnMatchRule.

class graphtransliterator.SettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for settings in dictionary format.

Performs validation.

class graphtransliterator.TransliterationRuleSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for TransliterationRule.

class graphtransliterator.WhitespaceDictSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for Whitespace definition as a dict.

class graphtransliterator.WhitespaceSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]

Schema for Whitespace definition that loads as WhitespaceRules.