API Reference

A list of the full API reference of all public classes and functions is below.

Public members can (and should) be imported from graphtransliterator:

from graphtransliterator import GraphTransliterator

Bundled transliterators require that graphtransliterator.transliterators: be imported:

import graphtransliterator.transliterators
transliterators.iter_names()

Core Classes

class graphtransliterator.GraphTransliterator(tokens, rules, whitespace, onmatch_rules=None, metadata=None, ignore_errors=False, check_ambiguity=True, onmatch_rules_lookup=None, tokens_by_class=None, graph=None, tokenizer_pattern=None, graphtransliterator_version=None, **kwargs)[source]

A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.

Transliteration of tokens of an input string to an output string is configured by: a set of input token types with classes, pattern-matching rules involving sequences of tokens as well as preceding or following tokens and token classes, insertion rules between matches, and optional consolidation of whitespace. Rules are ordered by specificity.

Note

This constructor does not validate settings and should typically not be called directly. Use from_dict() instead. For “easy reading” support, use from_easyreading_dict(), from_yaml(), or from_yaml_file(). Keyword parameters used here (ignore_errors, check_ambiguity) can be passed from those other constructors.

Parameters:

tokens (dict of {str: set of str}) – Mapping of input token types to token classes
rules (list of TransliterationRule) – list of transliteration rules ordered by cost
onmatch_rules (list of OnMatchRule, or None) – Rules for output to be inserted between tokens of certain classes when a transliteration rule has been matched but before its production string has been added to the output
whitespace (WhitespaceRules) – Rules for handling whitespace
metadata (dict or None) – Metadata settings
ignore_errors (bool, optional) – If true, transliteration errors are ignored and do not raise an exception. The default is false.
check_ambiguity (bool, optional) – If true (default), transliteration rules are checked for ambiguity. load() and loads() do not check ambiguity by default.
onmatch_rules_lookup (dict of {str: dict of {str: list of int}}, optional`) – OnMatchRules lookup, used internally, will be generated if not present.
tokens_by_class (dict of {str: set of str}, optional) – Tokens by class, used internally, will be generated if not present.
graph (DirectedGraph, optional) – Directed graph used by Graph Transliterator, will be generated if not present.
tokenizer_pattern (str, optional) – Regular expression pattern for input string tokenization, will be generated if not present.
graphtransliterator_version (str, optional) – Version of graphtransliterator, added by dump() and dumps().

Example

from graphtransliterator import GraphTransliterator, OnMatchRule, TransliterationRule, WhitespaceRules
settings = {'tokens': {'a': {'vowel'}, ' ': {'wb'}}, 'onmatch_rules': [OnMatchRule(prev_classes=['vowel'], next_classes=['vowel'], production=',')], 'rules': [TransliterationRule(production='A', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562), TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562)], 'metadata': {'author': 'Author McAuthorson'}, 'whitespace': WhitespaceRules(default=' ', token_class='wb', consolidate=False)}
gt = GraphTransliterator(**settings)
gt.transliterate('a')

'A'

See also

from_dict: Constructor from dictionary of settings
from_easyreading_dict: Constructor from dictionary in “easy reading” format
from_yaml: Constructor from YAML string in “easy reading” format
from_yaml_file: Constructor from YAML file in “easy reading” format

dump(compression_level=0)[source]

Dump configuration of Graph Transliterator to Python data types.

Compression is turned off by default.

Parameters:

compression_level (int) – A value in 0 (default, no compression), 1 (compression including graph), and 2 (compressiong without graph)

Returns:

GraphTransliterator configuration as a dictionary with keys:

"tokens"
Mappings of tokens to their classes (OrderedDict of {str: list of str})

"rules"
Transliteration rules in direct format (list of dict of {str: str})

"whitespace"
Whitespace settings (dict of {str: str})

"onmatch_rules"
On match rules (list of OrderedDict)

"metadata"
Dictionary of metadata (dict)

"ignore_errors"
Ignore errors in transliteration (bool)

"onmatch_rules_lookup"
Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}})

"tokens_by_class"
Tokens keyed by token class, used internally (dict of {str: list of str})

"graph"
Serialization of DirectedGraph (dict)

"tokenizer_pattern"
Regular expression for tokenizing (str)

"graphtransliterator_version"
Module version of graphtransliterator (str)

Return type:

OrderedDict

Example

yaml_ = '''
tokens:
  a: [vowel]
  ' ': [wb]
rules:
  a: A
  ' ': ' '
whitespace:
  default: " "
  consolidate: false
  token_class: wb
onmatch_rules:
  - <vowel> + <vowel>: ','  # add a comma between vowels
metadata:
  author: "Author McAuthorson"
'''
gt = GraphTransliterator.from_yaml(yaml_)
gt.dump()

OrderedDict([('tokens', {'a': ['vowel'], ' ': ['wb']}),
             ('rules',
              [OrderedDict([('production', 'A'),
                            ('tokens', ['a']),
                            ('cost', 0.5849625007211562)]),
               OrderedDict([('production', ' '),
                            ('tokens', [' ']),
                            ('cost', 0.5849625007211562)])]),
             ('whitespace',
              {'default': ' ', 'token_class': 'wb', 'consolidate': False}),
             ('onmatch_rules',
              [OrderedDict([('prev_classes', ['vowel']),
                            ('next_classes', ['vowel']),
                            ('production', ',')])]),
             ('metadata', {'author': 'Author McAuthorson'}),
             ('ignore_errors', False),
             ('onmatch_rules_lookup', {'a': {'a': [0]}}),
             ('tokens_by_class', {'vowel': ['a'], 'wb': [' ']}),
             ('graph',
              {'node': [{'type': 'Start',
                 'ordered_children': {'a': [1], ' ': [3]}},
                {'token': 'a',
                 'type': 'token',
                 'ordered_children': {'__rules__': [2]}},
                {'type': 'rule', 'accepting': True, 'rule_key': 0},
                {'token': ' ',
                 'type': 'token',
                 'ordered_children': {'__rules__': [4]}},
                {'type': 'rule', 'accepting': True, 'rule_key': 1}],
               'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
                 3: {'token': ' ', 'cost': 0.5849625007211562}},
                1: {2: {'cost': 0.5849625007211562}},
                3: {4: {'cost': 0.5849625007211562}}},
               'edge_list': [(0, 1), (0, 3), (1, 2), (3, 4)]}),
             ('tokenizer_pattern', '(a|\\ )'),
             ('graphtransliterator_version', '1.2.4')])

See also

dumps: Dump Graph Transliterator configuration to JSON string
load: Load Graph Transliteration from configuration in Python data types
loads: Load Graph Transliteration from configuration as a JSON string

dumps(compression_level=2)[source]

Parameters:

compression_level (int) – A value in 0 (no compression), 1 (compression including graph), and 2 (default, compression without graph)
separators (tuple of str) – Separators used by json.dumps(), default is compact
(JSON). (Dump settings of Graph Transliterator to Javascript Object Notation) –
default. (Compression is turned on by) –

Returns:

JSON string

Return type:

str

Examples

yaml_ = '''
  tokens:
    a: [vowel]
    ' ': [wb]
  rules:
    a: A
    ' ': ' '
  whitespace:
    default: " "
    consolidate: false
    token_class: wb
  onmatch_rules:
    - <vowel> + <vowel>: ','  # add a comma between vowels
  metadata:
    author: "Author McAuthorson"
'''
gt = GraphTransliterator.from_yaml(yaml_)
gt.dumps()

'{"graphtransliterator_version":"1.2.4","compressed_settings":[["vowel","wb"],[" ","a"],[[1],[0]],[["A",0,0,[1],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","wb",0],[[[0],[0],","]],{"author":"Author McAuthorson"},null]}'

See also

dump: Dump Graph Transliterator configuration to Python data types
load: Load Graph Transliteration from configuration in Python data types
loads: Load Graph Transliteration from configuration as a JSON string

static from_dict(dict_settings, **kwargs)[source]

Generate GraphTransliterator from dict settings.

Parameters:: dict_settings (dict) – Dictionary of settings
Returns:: Graph transliterator
Return type:: GraphTransliterator

static from_easyreading_dict(easyreading_settings, **kwargs)[source]

Constructs GraphTransliterator from a dictionary of settings in “easy reading” format, i.e. the loaded contents of a YAML string.

Parameters:

easyreading_settings (dict) –

Settings dictionary in easy reading format with keys:

"tokens"
Mappings of tokens to their classes (dict of {str: list of str})

"rules"
Transliteration rules in “easy reading” format (list of dict of {str: str})

"onmatch_rules"
On match rules in “easy reading” format (dict of {str: str}, optional)

"whitespace"
Whitespace definitions, including default whitespace token, class of whitespace tokens, and whether or not to consolidate (dict of {‘default’: str, ‘token_class’: str, consolidate: bool}, optional)

"metadata"
Dictionary of metadata (dict, optional)

Returns:

Graph Transliterator

Return type:

GraphTransliterator

Note

Called by from_yaml().

Example

tokens = {
    'ab': ['class_ab'],
    ' ': ['wb']
}
whitespace = {
    'default': ' ',
    'token_class': 'wb',
    'consolidate': True
}
onmatch_rules = [
    {'<class_ab> + <class_ab>': ','}
]
rules = {'ab': 'AB',
         ' ': '_'}
settings = {'tokens': tokens,
            'rules': rules,
            'whitespace': whitespace,
            'onmatch_rules': onmatch_rules}
gt = GraphTransliterator.from_easyreading_dict(settings)
gt.transliterate("ab abab")

'AB_AB,AB'

See also

from_yaml: Constructor from YAML string in “easy reading” format
from_yaml_file: Constructor from YAML file in “easy reading” format

static from_yaml(yaml_str, charnames_escaped=True, **kwargs)[source]

Construct GraphTransliterator from a YAML str.

Parameters:

yaml_str (str) – YAML mappings of tokens, rules, and (optionally) onmatch_rules
charnames_escaped (boolean) – Unescape Unicode during YAML read (default True)

Note

Called by from_yaml_file() and calls from_easyreading_dict().

Example

yaml_ = '''
tokens:
  a: [class1]
  ' ': [wb]
rules:
  a: A
  ' ': ' '
whitespace:
  default: ' '
  consolidate: True
  token_class: wb
onmatch_rules:
  - <class1> + <class1>: "+"
'''
gt = GraphTransliterator.from_yaml(yaml_)
gt.transliterate("a aa")

'A A+A'

See also

from_easyreading_dict: Constructor from dictionary in “easy reading” format
from_yaml: Constructor from YAML string in “easy reading” format
from_yaml_file: Constructor from YAML file in “easy reading” format

static from_yaml_file(yaml_filename, **kwargs)[source]

Construct GraphTransliterator from YAML file.

Parameters:: yaml_filename (str) – Name of YAML file, containing tokens, rules, and (optionally) onmatch_rules

Note

Calls from_yaml().

See also

from_yaml: Constructor from YAML string in “easy reading” format
from_easyreading_dict: Constructor from dictionary in “easy reading” format

property graph

Graph used in transliteration.

Type:: DirectedGraph

property graphtransliterator_version

Graph Transliterator version.

Type:: str

property ignore_errors

Ignore transliteration errors setting.

Type:: bool

property last_input_tokens

Last tokenization of the input string, with whitespace at start and end.

Type:: list of str

property last_matched_rule_tokens

Last matched tokens for each rule.

Type:: list of list of str

property last_matched_rules

Last transliteration rules matched.

Type:: list of TransliterationRule

static load(settings, **kwargs)[source]

Create GraphTransliterator from settings as Python data types.

Parameters:

settings –

GraphTransliterator configuration as a dictionary with keys:

"tokens"
Mappings of tokens to their classes (dict of {str: list of str})

"rules"
Transliteration rules in direct format (list of OrderedDict of {str: str})

"whitespace"
Whitespace settings (dict of {str: str})

"onmatch_rules"
On match rules (list of OrderedDict, optional)

"metadata"
Dictionary of metadata (dict, optional)

"ignore_errors"
Ignore errors. (bool, optional)

"onmatch_rules_lookup"
Dictionary keyed by current token to previous token containing a list of indexes of applicable OnmatchRule to try (dict of {str: dict of {str: list of int}}, optional)

tokens_by_class
Tokens keyed by token class, used internally (dict of {str: list of str}, optional)

graph
Serialization of DirectedGraph (dict, optional)

"tokenizer_pattern"
Regular expression for tokenizing (str, optional)

"graphtransliterator_version"
Module version of graphtransliterator (str, optional)

Returns:

Graph Transliterator

Return type:

GraphTransliterator

Example

from collections import OrderedDict
settings =           {'tokens': {'a': ['vowel'], ' ': ['wb']},
 'rules': [OrderedDict([('production', 'A'),
               # Can be compacted, removing None values
               # ('prev_tokens', None),
               ('tokens', ['a']),
               ('next_classes', None),
               ('next_tokens', None),
               ('cost', 0.5849625007211562)]),
  OrderedDict([('production', ' '),
               ('prev_classes', None),
               ('prev_tokens', None),
               ('tokens', [' ']),
               ('next_classes', None),
               ('next_tokens', None),
               ('cost', 0.5849625007211562)])],
 'whitespace': {'default': ' ', 'token_class': 'wb', 'consolidate': False},
 'onmatch_rules': [OrderedDict([('prev_classes', ['vowel']),
               ('next_classes', ['vowel']),
               ('production', ',')])],
 'metadata': {'author': 'Author McAuthorson'},
 'onmatch_rules_lookup': {'a': {'a': [0]}},
 'tokens_by_class': {'vowel': ['a'], 'wb': [' ']},
 'graph': {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562},
    3: {'token': ' ', 'cost': 0.5849625007211562}},
   1: {2: {'cost': 0.5849625007211562}},
   3: {4: {'cost': 0.5849625007211562}}},
  'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}},
   {'type': 'token', 'token': 'a', 'ordered_children': {'__rules__': [2]}},
   {'type': 'rule',
    'rule_key': 0,
    'accepting': True,
    'ordered_children': {}},
   {'type': 'token', 'token': ' ', 'ordered_children': {'__rules__': [4]}},
   {'type': 'rule',
    'rule_key': 1,
    'accepting': True,
    'ordered_children': {}}],
  'edge_list': [(0, 1), (1, 2), (0, 3), (3, 4)]},
 'tokenizer_pattern': '(a|\ )',
 'graphtransliterator_version': '0.3.3'}
gt = GraphTransliterator.load(settings)
gt.transliterate('aa')

'A,A'

# can be compacted
settings.pop('onmatch_rules_lookup')
GraphTransliterator.load(settings).transliterate('aa')

'A,A'

See also

dump: Dump Graph Transliterator configuration to Python data types
dumps: Dump Graph Transliterator configuration to JSON string
loads: Load Graph Transliteration from configuration as a JSON string

static loads(settings, **kwargs)[source]

Create GraphTransliterator from JavaScript Object Notation (JSON) string.

Parameters:: settings – JSON settings for GraphTransliterator
Returns:: Graph Transliterator
Return type:: GraphTransliterator

Example

JSON_settings = '''{"tokens": {"a": ["vowel"], " ": ["wb"]}, "rules": [{"production": "A", "prev_classes": null, "prev_tokens": null, "tokens": ["a"], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}, {"production": " ", "prev_classes": null, "prev_tokens": null, "tokens": [" "], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}], "whitespace": {"default": " ", "token_class": "wb", "consolidate": false}, "onmatch_rules": [{"prev_classes": ["vowel"], "next_classes": ["vowel"], "production": ","}], "metadata": {"author": "Author McAuthorson"}, "ignore_errors": false, "onmatch_rules_lookup": {"a": {"a": [0]}}, "tokens_by_class": {"vowel": ["a"], "wb": [" "]}, "graph": {"node": [{"type": "Start", "ordered_children": {"a": [1], " ": [3]}}, {"type": "token", "token": "a", "ordered_children": {"__rules__": [2]}}, {"type": "rule", "rule_key": 0, "accepting": true, "ordered_children": {}}, {"type": "token", "token": " ", "ordered_children": {"__rules__": [4]}}, {"type": "rule", "rule_key": 1, "accepting": true, "ordered_children": {}}], "edge": {"0": {"1": {"token": "a", "cost": 0.5849625007211562}, "3": {"token": " ", "cost": 0.5849625007211562}}, "1": {"2": {"cost": 0.5849625007211562}}, "3": {"4": {"cost": 0.5849625007211562}}}, "edge_list": [[0, 1], [1, 2], [0, 3], [3, 4]]}, "tokenizer_pattern": "(a| )", "graphtransliterator_version": "1.2.2"}'''

gt = GraphTransliterator.loads(JSON_settings)
gt.transliterate('a')

'A'

See also

dump: Dump Graph Transliterator configuration to Python data types
dumps: Dump Graph Transliterator configuration to JSON string
load: Load Graph Transliteration from configuration in Python data types

match_at(token_i, tokens, match_all=False)[source]

Match best (least costly) transliteration rule at a given index in the input tokens and return the index to that rule. Optionally, return all rules that match.

Parameters:

token_i (int) – Location in tokens at which to begin
tokens (list of str) – List of tokens
match_all (bool, optional) – If true, return the index of all rules matching at the given index. The default is false.

Returns:

Index of matching transliteration rule in GraphTransliterator.rules or None. Returns a list of int or an empty list if match_all is true.

Return type:

int, None, or list of int

Note

Expects whitespaces token at beginning and end of tokens.

Examples

gt = GraphTransliterator.from_yaml('''
        tokens:
            a: []
            a a: []
            ' ': [wb]
        rules:
            a: <A>
            a a: <AA>
        whitespace:
            default: ' '
            consolidate: True
            token_class: wb
''')
tokens = gt.tokenize("aa")
tokens # whitespace added to ends

[' ', 'a', 'a', ' ']

gt.match_at(1, tokens) # returns index to rule

gt.rules[gt.match_at(1, tokens)] # actual rule

TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376)

gt.match_at(1, tokens, match_all=True) # index to rules, with match_all

[0, 1]

[gt.rules[_] for _ in gt.match_at(1, tokens, match_all=True)]

[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]

property metadata

Metadata of transliterator

Type:: dict

property onmatch_rules

Rules for productions between matches.

Type:: list of OnMatchRules

property onmatch_rules_lookup

On Match Rules lookup

Type:: dict

property productions

List of productions of each transliteration rule.

Type:: list of str

pruned_of(productions)[source]

Remove transliteration rules with specific output productions.

Parameters:: productions (str, or list of str) – list of productions to remove
Returns:: Graph transliterator pruned of certain productions.
Return type:: graphtransliterator.GraphTransliterator

Note

Uses original initialization parameters to construct a new GraphTransliterator.

Examples

gt = GraphTransliterator.from_yaml('''
        tokens:
            a: []
            a a: []
            ' ': [wb]
        rules:
            a: <A>
            a a: <AA>
        whitespace:
            default: ' '
            consolidate: True
            token_class: wb
''')
gt.rules

[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376),
 TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]

gt.pruned_of('<AA>').rules

[TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]

gt.pruned_of(['<A>', '<AA>']).rules

[]

property rules

Transliteration rules sorted by cost.

Type:: list of TransliterationRule

tokenize(input)[source]

Tokenizes an input string.

Adds initial and trailing whitespace, which can be consolidated.

Parameters:: input (str) – String to tokenize
Returns:: List of tokens, with default whitespace token at beginning and end.
Return type:: list of str
Raises:: ValueError – Unrecognizable input, such as a character that is not in a token

Examples

tokens = {'ab': ['class_ab'], ' ': ['wb']}
whitespace = {'default': ' ', 'token_class': 'wb', 'consolidate': True}
rules = {'ab': 'AB', ' ': '_'}
settings = {'tokens': tokens, 'rules': rules, 'whitespace': whitespace}
gt = GraphTransliterator.from_easyreading_dict(settings)
gt.tokenize('ab ')

[' ', 'ab', ' ']

property tokenizer_pattern

Tokenizer pattern from transliterator

Type:: str

property tokens

Mappings of tokens to their classes.

Type:: dict of {str
Type:: set of str}

property tokens_by_class

Tokenizer pattern from transliterator

Type:: dict of {str
Type:: list of str}

transliterate(input)[source]

Transliterate an input string into an output string.

Parameters:: input (str) – Input string to transliterate
Returns:: Transliteration output string
Return type:: str
Raises:: ValueError – Cannot parse input

Note

Whitespace will be temporarily appended to start and end of input string.

Example

GraphTransliterator.from_yaml(
'''
tokens:
  a: []
  ' ': [wb]
rules:
  a: A
  ' ': '_'
whitespace:
  default: ' '
  consolidate: True
  token_class: wb
''').transliterate("a a")

'A_A'

property whitespace

Whitespace rules.

Type:: WhiteSpaceRules

class graphtransliterator.CoverageTransliterator(*args, **kwargs)[source]

Subclass of GraphTransliterator that logs visits to graph and on_match rules.

Used to confirm that tests cover the entire graph and onmatch_rules.

check_coverage(raise_exception=True)[source]

Check coverage of graph and onmatch rules.

First checks graph coverage, then checks onmatch rules.

check_onmatchrules_coverage(raise_exception=True)[source]: Check coverage of onmatch rules.

clear_visited()[source]: Clear visited flags from graph and onmatch_rules.

Bundled Transliterators

graphtransliterator.transliterators

Bundled transliterators are loaded by explicitly importing graphtransliterator.transliterators. Each is an instance of graphtransliterator.bundled.Bundled.

class graphtransliterator.transliterators.Bundled(*args, **kwargs)[source]

Subclass of GraphTransliterator used for bundled Graph Transliterator.

property directory: Directory of bundled transliterator, used to load settings.

from_JSON(check_ambiguity=False, coverage=False, **kwargs)[source]

Initialize from bundled JSON file (best for speed).

Parameters:

check_ambiguity (bool,) – Should ambiguity be checked. Default is False.
coverage (bool) – Should test coverage be checked. Default is False.

from_YAML(check_ambiguity=True, coverage=True, **kwargs)[source]

Initialize from bundled YAML file (best for development).

Parameters:

check_ambiguity (bool,) – Should ambiguity be checked. Default is True.
coverage (bool) – Should test coverage be checked. Default is True.

generate_yaml_tests(file=None)[source]

Generates YAML tests with complete coverage.

Uses the first token in a class as a sample. Assumes for onmatch rules that the first sample token in a class has a unique production, which may not be the case. These should be checked and edited.

load_yaml_tests()[source]

Iterator for YAML tests.

Assumes tests are found in subdirectory tests of module with name NAME_tests.yaml, e.g. `source_to_target/tests/source_to_target_tests.yaml.

property name: Name of bundled transliterator, e.g. ‘Example’

classmethod new(method='json', **kwargs)[source]

Return a new class instance from method (json/yaml).

Parameters:: method (str (json or yaml)) – How to load bundled transliterator, JSON or YAML.

run_tests(transliteration_tests)[source]

Run transliteration tests.

Parameters:: transliteration_tests (dict of {str:str}) – Dictionary of test from source -> correct target.

run_yaml_tests()[source]: Run YAML tests in MODULE/tests/MODULE_tests.yaml

property yaml_tests_filen

Metadata of transliterator

Type:: dict

class graphtransliterator.transliterators.Example(**kwargs)[source]: Example Bundled Graph Transliterator.

class graphtransliterator.transliterators.ITRANSDevanagariToUnicode(**kwargs)[source]: ITRANS Devanagari to Unicode Transliterator.

class graphtransliterator.transliterators.MetadataSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for Bundled metadata.

graphtransliterator.transliterators.iter_names()[source]: Iterate through bundled transliterator names.

graphtransliterator.transliterators.iter_transliterators(**kwds)[source]: Iterate through instances of bundled transliterators.

Graph Classes

class graphtransliterator.DirectedGraph(node=None, edge=None, edge_list=None)[source]

A very basic dictionary- and list-based directed graph. Nodes are a list of dictionaries of node data. Edges are nested dictionaries keyed from the head -> tail -> edge properties. An edge list is maintained. Can be exported as a dictionary.

node

List of node data

Type:: list of dict

edge

Mapping from head to tail of edge, holding edge data

Type:: dict of {int: dict of {int: dict}}

edge_list

List of head and tail of each edge

Type:: list of tuple of (int, int)

Examples

from graphtransliterator import DirectedGraph
DirectedGraph()

<graphtransliterator.graphs.DirectedGraph at 0x7f4e404ee840>

add_edge(head, tail, edge_data=None)[source]

Add an edge to a graph and return its attributes as dict.

Parameters:

head (int) – Index of head of edge
tail (int) – Index of tail of edge
edge_data (dict, default {}) – Edge data

Returns:

Data of created edge

Return type:

dict

Raises:

ValueError – Invalid head or tail, or edge_data is not a dict.

Examples

g = DirectedGraph()
g.add_node()

(0, {})

g.add_node()

(1, {})

g.add_edge(0,1, {'data_key_1': 'some edge data here'})

{'data_key_1': 'some edge data here'}

g.edge

{0: {1: {'data_key_1': 'some edge data here'}}}

add_node(node_data=None)[source]

Create node and return (int, dict) of node key and object.

Parameters:: node_data (dict, default {}) – Data to be stored in created node
Returns:: Index of created node and its data
Return type:: tuple of (int, dict)
Raises:: ValueError – node_data is not a dict

Examples

g = DirectedGraph()
g.add_node()

(0, {})

g.add_node({'datakey1': 'data value'})

(1, {'datakey1': 'data value'})

g.node

[{}, {'datakey1': 'data value'}]

class graphtransliterator.VisitLoggingDirectedGraph(graph)[source]

A DirectedGraph that logs visits to all nodes and edges.

Used to measure the coverage of tests for bundled transliterators.

check_coverage(raise_exception=True)[source]

Checks that all nodes and edges are visited.

Parameters:: raise_exception (bool, default) – Raise IncompleteGraphCoverageException (default, True)
Raises:: IncompleteGraphCoverageException – Not all nodes/edges of a graph have been visited.

clear_visited()[source]: Clear all visited attributes on nodes and edges.

Rule Classes

class graphtransliterator.TransliterationRule(production, prev_classes, prev_tokens, tokens, next_tokens, next_classes, cost)[source]

A transliteration rule containing the specific match conditions and string output to be produced, as well as the rule’s cost.

production

Output produced on match of rule

Type:: str

prev_classes

List of previous token classes to be matched before tokens or, if they exist, prev_tokens

Type:: list of str, or None

prev_tokens

List of tokens to be matched before tokens

Type:: list of str, or None

tokens

List of tokens to match

Type:: list of str

next_tokens

List of tokens to match after tokens

Type:: list of str, or None

next_classes

List of tokens to match after tokens or, if they exist, next_tokens

Type:: list of str, or None

cost

Cost of the rule, where less specific rules are more costly

Type:: float

class graphtransliterator.OnMatchRule(prev_classes, next_classes, production)[source]

Rules about adding text between certain combinations of matched rules.

When a translation rule has been found and before its production is added to the output, the productions string of an OnMatch rule is added if previously matched tokens and current tokens are of the specified classes.

prev_classes

List of previously matched token classes required

Type:: list of str

next_classes

List of current and following token classes required

Type:: list of str

production

String to added before current rule

Type:: str

class graphtransliterator.WhitespaceRules(default, token_class, consolidate)[source]

Whitespace rules of GraphTransliterator.

default

Default whitespace token

Type:: str

token_class

Whitespace token class

Type:: str

consolidate

Consolidate consecutive whitespace tokens and render as a single instance of the specified default whitespace token.

Type:: bool

Exceptions

exception graphtransliterator.GraphTransliteratorException[source]: Base exception class. All Graph Transliterator-specific exceptions should subclass this class.

exception graphtransliterator.AmbiguousTransliterationRulesException[source]: Raised when multiple transliteration rules can match the same pattern. Details of ambiguities are given in a logging.warning().

exception graphtransliterator.NoMatchingTransliterationRuleException[source]: Raised when no transliteration rule can be matched at a particular location in the input string’s tokens. Details of the location are given in a logging.warning().

exception graphtransliterator.UnrecognizableInputTokenException[source]: Raised when a character in the input string does not correspond to any tokens in the GraphTransliterator’s token settings. Details of the location are given in a logging.warning().

Schemas

Schema for DirectedGraph.

Validates graph somewhat rigorously.

Schema for easy reading settings.

Provides initial validation based on easy reading format.

class graphtransliterator.GraphTransliteratorSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for Graph Transliterator.

class graphtransliterator.OnMatchRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for OnMatchRule.

Schema for settings in dictionary format.

Performs validation.

class graphtransliterator.TransliterationRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for TransliterationRule.

class graphtransliterator.WhitespaceDictSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for Whitespace definition as a dict.

class graphtransliterator.WhitespaceSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]: Schema for Whitespace definition that loads as WhitespaceRules.