API Reference
A list of the full API reference of all public classes and functions is below.
Public members can (and should) be imported from graphtransliterator:
from graphtransliterator import GraphTransliterator
Bundled transliterators require that graphtransliterator.transliterators:
be imported:
import graphtransliterator.transliterators
transliterators.iter_names()
Core Classes
- class graphtransliterator.GraphTransliterator(tokens, rules, whitespace, onmatch_rules=None, metadata=None, ignore_errors=False, check_ambiguity=True, onmatch_rules_lookup=None, tokens_by_class=None, graph=None, tokenizer_pattern=None, graphtransliterator_version=None, **kwargs)[source]
A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.
Transliteration of tokens of an input string to an output string is configured by: a set of input token types with classes, pattern-matching rules involving sequences of tokens as well as preceding or following tokens and token classes, insertion rules between matches, and optional consolidation of whitespace. Rules are ordered by specificity.
Note
This constructor does not validate settings and should typically not be called directly. Use
from_dict()instead. For “easy reading” support, usefrom_easyreading_dict(),from_yaml(), orfrom_yaml_file(). Keyword parameters used here (ignore_errors,check_ambiguity) can be passed from those other constructors.- Parameters:
tokens (dict of {str: set of str}) – Mapping of input token types to token classes
rules (list of TransliterationRule) – list of transliteration rules ordered by cost
onmatch_rules (list of
OnMatchRule, or None) – Rules for output to be inserted between tokens of certain classes when a transliteration rule has been matched but before its production string has been added to the outputwhitespace (WhitespaceRules) – Rules for handling whitespace
metadata (dict or None) – Metadata settings
ignore_errors (bool, optional) – If true, transliteration errors are ignored and do not raise an exception. The default is false.
check_ambiguity (bool, optional) – If true (default), transliteration rules are checked for ambiguity.
load()andloads()do not check ambiguity by default.onmatch_rules_lookup (dict of {str: dict of {str: list of int}}, optional`) – OnMatchRules lookup, used internally, will be generated if not present.
tokens_by_class (dict of {str: set of str}, optional) – Tokens by class, used internally, will be generated if not present.
graph (DirectedGraph, optional) – Directed graph used by Graph Transliterator, will be generated if not present.
tokenizer_pattern (str, optional) – Regular expression pattern for input string tokenization, will be generated if not present.
graphtransliterator_version (str, optional) – Version of graphtransliterator, added by dump() and dumps().
Example
1from graphtransliterator import GraphTransliterator, OnMatchRule, TransliterationRule, WhitespaceRules 2settings = {'tokens': {'a': {'vowel'}, ' ': {'wb'}}, 'onmatch_rules': [OnMatchRule(prev_classes=['vowel'], next_classes=['vowel'], production=',')], 'rules': [TransliterationRule(production='A', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562), TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562)], 'metadata': {'author': 'Author McAuthorson'}, 'whitespace': WhitespaceRules(default=' ', token_class='wb', consolidate=False)} 3gt = GraphTransliterator(**settings) 4gt.transliterate('a')
'A'
See also
from_dictConstructor from dictionary of settings
from_easyreading_dictConstructor from dictionary in “easy reading” format
from_yamlConstructor from YAML string in “easy reading” format
from_yaml_fileConstructor from YAML file in “easy reading” format
- dump(compression_level=0)[source]
Dump configuration of Graph Transliterator to Python data types.
Compression is turned off by default.
- Parameters:
compression_level (int) – A value in 0 (default, no compression), 1 (compression including graph), and 2 (compressiong without graph)
- Returns:
GraphTransliterator configuration as a dictionary with keys:
"tokens"Mappings of tokens to their classes (OrderedDict of {str: list of str})
"rules"Transliteration rules in direct format (list of dict of {str: str})
"whitespace"Whitespace settings (dict of {str: str})
"onmatch_rules"On match rules (list of OrderedDict)
"metadata"Dictionary of metadata (dict)
"ignore_errors"Ignore errors in transliteration (bool)
"onmatch_rules_lookup"Dictionary keyed by current token to previous token containing a list of indexes of applicable
OnmatchRuleto try (dict of {str: dict of {str: list of int}})"tokens_by_class"Tokens keyed by token class, used internally (dict of {str: list of str})
"graph"Serialization of DirectedGraph (dict)
"tokenizer_pattern"Regular expression for tokenizing (str)
"graphtransliterator_version"Module version of graphtransliterator (str)
- Return type:
OrderedDict
Example
5yaml_ = ''' 6tokens: 7 a: [vowel] 8 ' ': [wb] 9rules: 10 a: A 11 ' ': ' ' 12whitespace: 13 default: " " 14 consolidate: false 15 token_class: wb 16onmatch_rules: 17 - <vowel> + <vowel>: ',' # add a comma between vowels 18metadata: 19 author: "Author McAuthorson" 20''' 21gt = GraphTransliterator.from_yaml(yaml_) 22gt.dump()
OrderedDict([('tokens', {'a': ['vowel'], ' ': ['wb']}), ('rules', [OrderedDict([('production', 'A'), ('tokens', ['a']), ('cost', 0.5849625007211562)]), OrderedDict([('production', ' '), ('tokens', [' ']), ('cost', 0.5849625007211562)])]), ('whitespace', {'default': ' ', 'token_class': 'wb', 'consolidate': False}), ('onmatch_rules', [OrderedDict([('prev_classes', ['vowel']), ('next_classes', ['vowel']), ('production', ',')])]), ('metadata', {'author': 'Author McAuthorson'}), ('ignore_errors', False), ('onmatch_rules_lookup', {'a': {'a': [0]}}), ('tokens_by_class', {'vowel': ['a'], 'wb': [' ']}), ('graph', {'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}}, {'token': 'a', 'type': 'token', 'ordered_children': {'__rules__': [2]}}, {'type': 'rule', 'accepting': True, 'rule_key': 0}, {'token': ' ', 'type': 'token', 'ordered_children': {'__rules__': [4]}}, {'type': 'rule', 'accepting': True, 'rule_key': 1}], 'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562}, 3: {'token': ' ', 'cost': 0.5849625007211562}}, 1: {2: {'cost': 0.5849625007211562}}, 3: {4: {'cost': 0.5849625007211562}}}, 'edge_list': [(0, 1), (0, 3), (1, 2), (3, 4)]}), ('tokenizer_pattern', '(a|\\ )'), ('graphtransliterator_version', '1.2.4')])
- dumps(compression_level=2)[source]
- Parameters:
compression_level (int) – A value in 0 (no compression), 1 (compression including graph), and 2 (default, compression without graph)
separators (tuple of str) – Separators used by json.dumps(), default is compact
(JSON). (Dump settings of Graph Transliterator to Javascript Object Notation) –
default. (Compression is turned on by) –
- Returns:
JSON string
- Return type:
str
Examples
23yaml_ = ''' 24 tokens: 25 a: [vowel] 26 ' ': [wb] 27 rules: 28 a: A 29 ' ': ' ' 30 whitespace: 31 default: " " 32 consolidate: false 33 token_class: wb 34 onmatch_rules: 35 - <vowel> + <vowel>: ',' # add a comma between vowels 36 metadata: 37 author: "Author McAuthorson" 38''' 39gt = GraphTransliterator.from_yaml(yaml_) 40gt.dumps()
'{"graphtransliterator_version":"1.2.4","compressed_settings":[["vowel","wb"],[" ","a"],[[1],[0]],[["A",0,0,[1],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","wb",0],[[[0],[0],","]],{"author":"Author McAuthorson"},null]}'
- static from_dict(dict_settings, **kwargs)[source]
Generate GraphTransliterator from dict settings.
- Parameters:
dict_settings (dict) – Dictionary of settings
- Returns:
Graph transliterator
- Return type:
- static from_easyreading_dict(easyreading_settings, **kwargs)[source]
Constructs GraphTransliterator from a dictionary of settings in “easy reading” format, i.e. the loaded contents of a YAML string.
- Parameters:
easyreading_settings (dict) –
Settings dictionary in easy reading format with keys:
"tokens"Mappings of tokens to their classes (dict of {str: list of str})
"rules"Transliteration rules in “easy reading” format (list of dict of {str: str})
"onmatch_rules"On match rules in “easy reading” format (dict of {str: str}, optional)
"whitespace"Whitespace definitions, including default whitespace token, class of whitespace tokens, and whether or not to consolidate (dict of {‘default’: str, ‘token_class’: str, consolidate: bool}, optional)
"metadata"Dictionary of metadata (dict, optional)
- Returns:
Graph Transliterator
- Return type:
Note
Called by
from_yaml().Example
41tokens = { 42 'ab': ['class_ab'], 43 ' ': ['wb'] 44} 45whitespace = { 46 'default': ' ', 47 'token_class': 'wb', 48 'consolidate': True 49} 50onmatch_rules = [ 51 {'<class_ab> + <class_ab>': ','} 52] 53rules = {'ab': 'AB', 54 ' ': '_'} 55settings = {'tokens': tokens, 56 'rules': rules, 57 'whitespace': whitespace, 58 'onmatch_rules': onmatch_rules} 59gt = GraphTransliterator.from_easyreading_dict(settings) 60gt.transliterate("ab abab")
'AB_AB,AB'
See also
from_yamlConstructor from YAML string in “easy reading” format
from_yaml_fileConstructor from YAML file in “easy reading” format
- static from_yaml(yaml_str, charnames_escaped=True, **kwargs)[source]
Construct GraphTransliterator from a YAML str.
- Parameters:
yaml_str (str) – YAML mappings of tokens, rules, and (optionally) onmatch_rules
charnames_escaped (boolean) – Unescape Unicode during YAML read (default True)
Note
Called by
from_yaml_file()and callsfrom_easyreading_dict().Example
61yaml_ = ''' 62tokens: 63 a: [class1] 64 ' ': [wb] 65rules: 66 a: A 67 ' ': ' ' 68whitespace: 69 default: ' ' 70 consolidate: True 71 token_class: wb 72onmatch_rules: 73 - <class1> + <class1>: "+" 74''' 75gt = GraphTransliterator.from_yaml(yaml_) 76gt.transliterate("a aa")
'A A+A'
See also
from_easyreading_dictConstructor from dictionary in “easy reading” format
from_yamlConstructor from YAML string in “easy reading” format
from_yaml_fileConstructor from YAML file in “easy reading” format
- static from_yaml_file(yaml_filename, **kwargs)[source]
Construct GraphTransliterator from YAML file.
- Parameters:
yaml_filename (str) – Name of YAML file, containing tokens, rules, and (optionally) onmatch_rules
Note
Calls
from_yaml().See also
from_yamlConstructor from YAML string in “easy reading” format
from_easyreading_dictConstructor from dictionary in “easy reading” format
- property graph
Graph used in transliteration.
- Type:
DirectedGraph
- property graphtransliterator_version
Graph Transliterator version.
- Type:
str
- property ignore_errors
Ignore transliteration errors setting.
- Type:
bool
- property last_input_tokens
Last tokenization of the input string, with whitespace at start and end.
- Type:
list of str
- property last_matched_rule_tokens
Last matched tokens for each rule.
- Type:
list of list of str
- property last_matched_rules
Last transliteration rules matched.
- Type:
list of TransliterationRule
- static load(settings, **kwargs)[source]
Create GraphTransliterator from settings as Python data types.
- Parameters:
settings –
GraphTransliterator configuration as a dictionary with keys:
"tokens"Mappings of tokens to their classes (dict of {str: list of str})
"rules"Transliteration rules in direct format (list of OrderedDict of {str: str})
"whitespace"Whitespace settings (dict of {str: str})
"onmatch_rules"On match rules (list of OrderedDict, optional)
"metadata"Dictionary of metadata (dict, optional)
"ignore_errors"Ignore errors. (bool, optional)
"onmatch_rules_lookup"Dictionary keyed by current token to previous token containing a list of indexes of applicable
OnmatchRuleto try (dict of {str: dict of {str: list of int}}, optional)tokens_by_classTokens keyed by token class, used internally (dict of {str: list of str}, optional)
graphSerialization of DirectedGraph (dict, optional)
"tokenizer_pattern"Regular expression for tokenizing (str, optional)
"graphtransliterator_version"Module version of graphtransliterator (str, optional)
- Returns:
Graph Transliterator
- Return type:
Example
77from collections import OrderedDict 78settings = {'tokens': {'a': ['vowel'], ' ': ['wb']}, 79 'rules': [OrderedDict([('production', 'A'), 80 # Can be compacted, removing None values 81 # ('prev_tokens', None), 82 ('tokens', ['a']), 83 ('next_classes', None), 84 ('next_tokens', None), 85 ('cost', 0.5849625007211562)]), 86 OrderedDict([('production', ' '), 87 ('prev_classes', None), 88 ('prev_tokens', None), 89 ('tokens', [' ']), 90 ('next_classes', None), 91 ('next_tokens', None), 92 ('cost', 0.5849625007211562)])], 93 'whitespace': {'default': ' ', 'token_class': 'wb', 'consolidate': False}, 94 'onmatch_rules': [OrderedDict([('prev_classes', ['vowel']), 95 ('next_classes', ['vowel']), 96 ('production', ',')])], 97 'metadata': {'author': 'Author McAuthorson'}, 98 'onmatch_rules_lookup': {'a': {'a': [0]}}, 99 'tokens_by_class': {'vowel': ['a'], 'wb': [' ']}, 100 'graph': {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562}, 101 3: {'token': ' ', 'cost': 0.5849625007211562}}, 102 1: {2: {'cost': 0.5849625007211562}}, 103 3: {4: {'cost': 0.5849625007211562}}}, 104 'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}}, 105 {'type': 'token', 'token': 'a', 'ordered_children': {'__rules__': [2]}}, 106 {'type': 'rule', 107 'rule_key': 0, 108 'accepting': True, 109 'ordered_children': {}}, 110 {'type': 'token', 'token': ' ', 'ordered_children': {'__rules__': [4]}}, 111 {'type': 'rule', 112 'rule_key': 1, 113 'accepting': True, 114 'ordered_children': {}}], 115 'edge_list': [(0, 1), (1, 2), (0, 3), (3, 4)]}, 116 'tokenizer_pattern': '(a|\ )', 117 'graphtransliterator_version': '0.3.3'} 118gt = GraphTransliterator.load(settings) 119gt.transliterate('aa')
'A,A'
120# can be compacted 121settings.pop('onmatch_rules_lookup') 122GraphTransliterator.load(settings).transliterate('aa')
'A,A'
- static loads(settings, **kwargs)[source]
Create GraphTransliterator from JavaScript Object Notation (JSON) string.
- Parameters:
settings – JSON settings for GraphTransliterator
- Returns:
Graph Transliterator
- Return type:
Example
123JSON_settings = '''{"tokens": {"a": ["vowel"], " ": ["wb"]}, "rules": [{"production": "A", "prev_classes": null, "prev_tokens": null, "tokens": ["a"], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}, {"production": " ", "prev_classes": null, "prev_tokens": null, "tokens": [" "], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}], "whitespace": {"default": " ", "token_class": "wb", "consolidate": false}, "onmatch_rules": [{"prev_classes": ["vowel"], "next_classes": ["vowel"], "production": ","}], "metadata": {"author": "Author McAuthorson"}, "ignore_errors": false, "onmatch_rules_lookup": {"a": {"a": [0]}}, "tokens_by_class": {"vowel": ["a"], "wb": [" "]}, "graph": {"node": [{"type": "Start", "ordered_children": {"a": [1], " ": [3]}}, {"type": "token", "token": "a", "ordered_children": {"__rules__": [2]}}, {"type": "rule", "rule_key": 0, "accepting": true, "ordered_children": {}}, {"type": "token", "token": " ", "ordered_children": {"__rules__": [4]}}, {"type": "rule", "rule_key": 1, "accepting": true, "ordered_children": {}}], "edge": {"0": {"1": {"token": "a", "cost": 0.5849625007211562}, "3": {"token": " ", "cost": 0.5849625007211562}}, "1": {"2": {"cost": 0.5849625007211562}}, "3": {"4": {"cost": 0.5849625007211562}}}, "edge_list": [[0, 1], [1, 2], [0, 3], [3, 4]]}, "tokenizer_pattern": "(a| )", "graphtransliterator_version": "1.2.2"}''' 124 125gt = GraphTransliterator.loads(JSON_settings) 126gt.transliterate('a')
'A'
- match_at(token_i, tokens, match_all=False)[source]
Match best (least costly) transliteration rule at a given index in the input tokens and return the index to that rule. Optionally, return all rules that match.
- Parameters:
token_i (int) – Location in tokens at which to begin
tokens (list of str) – List of tokens
match_all (bool, optional) – If true, return the index of all rules matching at the given index. The default is false.
- Returns:
Index of matching transliteration rule in
GraphTransliterator.rulesor None. Returns a list of int or an empty list ifmatch_allis true.- Return type:
int, None, or list of int
Note
Expects whitespaces token at beginning and end of tokens.
Examples
127gt = GraphTransliterator.from_yaml(''' 128 tokens: 129 a: [] 130 a a: [] 131 ' ': [wb] 132 rules: 133 a: <A> 134 a a: <AA> 135 whitespace: 136 default: ' ' 137 consolidate: True 138 token_class: wb 139''') 140tokens = gt.tokenize("aa") 141tokens # whitespace added to ends
[' ', 'a', 'a', ' ']
142gt.match_at(1, tokens) # returns index to rule
0
143gt.rules[gt.match_at(1, tokens)] # actual rule
TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376)
144gt.match_at(1, tokens, match_all=True) # index to rules, with match_all
[0, 1]
145[gt.rules[_] for _ in gt.match_at(1, tokens, match_all=True)]
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376), TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
- property metadata
Metadata of transliterator
- Type:
dict
- property onmatch_rules
Rules for productions between matches.
- Type:
list of
OnMatchRules
- property onmatch_rules_lookup
On Match Rules lookup
- Type:
dict
- property productions
List of productions of each transliteration rule.
- Type:
list of str
- pruned_of(productions)[source]
Remove transliteration rules with specific output productions.
- Parameters:
productions (str, or list of str) – list of productions to remove
- Returns:
Graph transliterator pruned of certain productions.
- Return type:
Note
Uses original initialization parameters to construct a new
GraphTransliterator.Examples
146gt = GraphTransliterator.from_yaml(''' 147 tokens: 148 a: [] 149 a a: [] 150 ' ': [wb] 151 rules: 152 a: <A> 153 a a: <AA> 154 whitespace: 155 default: ' ' 156 consolidate: True 157 token_class: wb 158''') 159gt.rules
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.41503749927884376), TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
160gt.pruned_of('<AA>').rules
[TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
161gt.pruned_of(['<A>', '<AA>']).rules
[]
- property rules
Transliteration rules sorted by cost.
- Type:
list of TransliterationRule
- tokenize(input)[source]
Tokenizes an input string.
Adds initial and trailing whitespace, which can be consolidated.
- Parameters:
input (str) – String to tokenize
- Returns:
List of tokens, with default whitespace token at beginning and end.
- Return type:
list of str
- Raises:
ValueError – Unrecognizable input, such as a character that is not in a token
Examples
162tokens = {'ab': ['class_ab'], ' ': ['wb']} 163whitespace = {'default': ' ', 'token_class': 'wb', 'consolidate': True} 164rules = {'ab': 'AB', ' ': '_'} 165settings = {'tokens': tokens, 'rules': rules, 'whitespace': whitespace} 166gt = GraphTransliterator.from_easyreading_dict(settings) 167gt.tokenize('ab ')
[' ', 'ab', ' ']
- property tokenizer_pattern
Tokenizer pattern from transliterator
- Type:
str
- property tokens
Mappings of tokens to their classes.
- Type:
dict of {str
- Type:
set of str}
- property tokens_by_class
Tokenizer pattern from transliterator
- Type:
dict of {str
- Type:
list of str}
- transliterate(input)[source]
Transliterate an input string into an output string.
- Parameters:
input (str) – Input string to transliterate
- Returns:
Transliteration output string
- Return type:
str
- Raises:
ValueError – Cannot parse input
Note
Whitespace will be temporarily appended to start and end of input string.
Example
168GraphTransliterator.from_yaml( 169''' 170tokens: 171 a: [] 172 ' ': [wb] 173rules: 174 a: A 175 ' ': '_' 176whitespace: 177 default: ' ' 178 consolidate: True 179 token_class: wb 180''').transliterate("a a")
'A_A'
- property whitespace
Whitespace rules.
- Type:
WhiteSpaceRules
- class graphtransliterator.CoverageTransliterator(*args, **kwargs)[source]
Subclass of GraphTransliterator that logs visits to graph and on_match rules.
Used to confirm that tests cover the entire graph and onmatch_rules.
Bundled Transliterators
graphtransliterator.transliterators
Bundled transliterators are loaded by explicitly importing
graphtransliterator.transliterators. Each is an instance of
graphtransliterator.bundled.Bundled.
- class graphtransliterator.transliterators.Bundled(*args, **kwargs)[source]
Subclass of GraphTransliterator used for bundled Graph Transliterator.
- property directory
Directory of bundled transliterator, used to load settings.
- from_JSON(check_ambiguity=False, coverage=False, **kwargs)[source]
Initialize from bundled JSON file (best for speed).
- Parameters:
check_ambiguity (bool,) – Should ambiguity be checked. Default is False.
coverage (bool) – Should test coverage be checked. Default is False.
- from_YAML(check_ambiguity=True, coverage=True, **kwargs)[source]
Initialize from bundled YAML file (best for development).
- Parameters:
check_ambiguity (bool,) – Should ambiguity be checked. Default is True.
coverage (bool) – Should test coverage be checked. Default is True.
- generate_yaml_tests(file=None)[source]
Generates YAML tests with complete coverage.
Uses the first token in a class as a sample. Assumes for onmatch rules that the first sample token in a class has a unique production, which may not be the case. These should be checked and edited.
- load_yaml_tests()[source]
Iterator for YAML tests.
Assumes tests are found in subdirectory tests of module with name NAME_tests.yaml, e.g. `source_to_target/tests/source_to_target_tests.yaml.
- property name
Name of bundled transliterator, e.g. ‘Example’
- classmethod new(method='json', **kwargs)[source]
Return a new class instance from method (json/yaml).
- Parameters:
method (str (json or yaml)) – How to load bundled transliterator, JSON or YAML.
- run_tests(transliteration_tests)[source]
Run transliteration tests.
- Parameters:
transliteration_tests (dict of {str:str}) – Dictionary of test from source -> correct target.
- property yaml_tests_filen
Metadata of transliterator
- Type:
dict
- class graphtransliterator.transliterators.Example(**kwargs)[source]
Example Bundled Graph Transliterator.
- class graphtransliterator.transliterators.ITRANSDevanagariToUnicode(**kwargs)[source]
ITRANS Devanagari to Unicode Transliterator.
- class graphtransliterator.transliterators.MetadataSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for Bundled metadata.
Graph Classes
- class graphtransliterator.DirectedGraph(node=None, edge=None, edge_list=None)[source]
A very basic dictionary- and list-based directed graph. Nodes are a list of dictionaries of node data. Edges are nested dictionaries keyed from the head -> tail -> edge properties. An edge list is maintained. Can be exported as a dictionary.
- node
List of node data
- Type:
list of dict
- edge
Mapping from head to tail of edge, holding edge data
- Type:
dict of {int: dict of {int: dict}}
- edge_list
List of head and tail of each edge
- Type:
list of tuple of (int, int)
Examples
181from graphtransliterator import DirectedGraph 182DirectedGraph()
<graphtransliterator.graphs.DirectedGraph at 0x7f4e404ee840>
- add_edge(head, tail, edge_data=None)[source]
Add an edge to a graph and return its attributes as dict.
- Parameters:
head (int) – Index of head of edge
tail (int) – Index of tail of edge
edge_data (dict, default {}) – Edge data
- Returns:
Data of created edge
- Return type:
- Raises:
ValueError – Invalid
headortail, oredge_datais not a dict.
Examples
183g = DirectedGraph() 184g.add_node()
(0, {})185g.add_node()
(1, {})186g.add_edge(0,1, {'data_key_1': 'some edge data here'})
{'data_key_1': 'some edge data here'}187g.edge
{0: {1: {'data_key_1': 'some edge data here'}}}
- add_node(node_data=None)[source]
Create node and return (int, dict) of node key and object.
- Parameters:
node_data (dict, default {}) – Data to be stored in created node
- Returns:
Index of created node and its data
- Return type:
tuple of (int, dict)
- Raises:
ValueError –
node_datais not adict
Examples
188g = DirectedGraph() 189g.add_node()
(0, {})190g.add_node({'datakey1': 'data value'})
(1, {'datakey1': 'data value'})191g.node
[{}, {'datakey1': 'data value'}]
- class graphtransliterator.VisitLoggingDirectedGraph(graph)[source]
A DirectedGraph that logs visits to all nodes and edges.
Used to measure the coverage of tests for bundled transliterators.
Rule Classes
- class graphtransliterator.TransliterationRule(production, prev_classes, prev_tokens, tokens, next_tokens, next_classes, cost)[source]
A transliteration rule containing the specific match conditions and string output to be produced, as well as the rule’s cost.
- production
Output produced on match of rule
- Type:
str
- prev_classes
List of previous token classes to be matched before tokens or, if they exist, prev_tokens
- Type:
list of str, or None
- prev_tokens
List of tokens to be matched before tokens
- Type:
list of str, or None
- tokens
List of tokens to match
- Type:
list of str
- next_tokens
List of tokens to match after tokens
- Type:
list of str, or None
- next_classes
List of tokens to match after tokens or, if they exist, next_tokens
- Type:
list of str, or None
- cost
Cost of the rule, where less specific rules are more costly
- Type:
float
- class graphtransliterator.OnMatchRule(prev_classes, next_classes, production)[source]
Rules about adding text between certain combinations of matched rules.
When a translation rule has been found and before its production is added to the output, the productions string of an OnMatch rule is added if previously matched tokens and current tokens are of the specified classes.
- prev_classes
List of previously matched token classes required
- Type:
list of str
- next_classes
List of current and following token classes required
- Type:
list of str
- production
String to added before current rule
- Type:
str
- class graphtransliterator.WhitespaceRules(default, token_class, consolidate)[source]
Whitespace rules of GraphTransliterator.
- default
Default whitespace token
- Type:
str
- token_class
Whitespace token class
- Type:
str
- consolidate
Consolidate consecutive whitespace tokens and render as a single instance of the specified default whitespace token.
- Type:
bool
Exceptions
- exception graphtransliterator.GraphTransliteratorException[source]
Base exception class. All Graph Transliterator-specific exceptions should subclass this class.
- exception graphtransliterator.AmbiguousTransliterationRulesException[source]
Raised when multiple transliteration rules can match the same pattern. Details of ambiguities are given in a
logging.warning().
- exception graphtransliterator.NoMatchingTransliterationRuleException[source]
Raised when no transliteration rule can be matched at a particular location in the input string’s tokens. Details of the location are given in a
logging.warning().
- exception graphtransliterator.UnrecognizableInputTokenException[source]
Raised when a character in the input string does not correspond to any tokens in the GraphTransliterator’s token settings. Details of the location are given in a
logging.warning().
Schemas
- class graphtransliterator.DirectedGraphSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for
DirectedGraph.Validates graph somewhat rigorously.
- class graphtransliterator.EasyReadingSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for easy reading settings.
Provides initial validation based on easy reading format.
- class graphtransliterator.GraphTransliteratorSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for Graph Transliterator.
- class graphtransliterator.OnMatchRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for
OnMatchRule.
- class graphtransliterator.SettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for settings in dictionary format.
Performs validation.
- class graphtransliterator.TransliterationRuleSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for
TransliterationRule.
- class graphtransliterator.WhitespaceDictSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for Whitespace definition as a dict.
- class graphtransliterator.WhitespaceSettingsSchema(*, only: Sequence[str] | AbstractSet[str] | None = None, exclude: Sequence[str] | AbstractSet[str] = (), many: bool = False, context: dict | None = None, load_only: Sequence[str] | AbstractSet[str] = (), dump_only: Sequence[str] | AbstractSet[str] = (), partial: bool | Sequence[str] | AbstractSet[str] | None = None, unknown: str | None = None)[source]
Schema for Whitespace definition that loads as
WhitespaceRules.