API Reference¶
A list of the full API reference of all public classes and functions is below.
Public members can (and should) be imported from graphtransliterator
:
from graphtransliterator import GraphTransliterator
Bundled transliterators require that graphtransliterator.transliterators
:
be imported:
import graphtransliterator.transliterators
transliterators.iter_names()
Core Classes¶
- class graphtransliterator.GraphTransliterator(tokens, rules, whitespace, onmatch_rules=None, metadata=None, ignore_errors=False, check_ambiguity=True, onmatch_rules_lookup=None, tokens_by_class=None, graph=None, tokenizer_pattern=None, graphtransliterator_version=None, **kwargs)[source]¶
A graph-based transliteration tool that lets you convert the symbols of one language or script to those of another using rules that you define.
Transliteration of tokens of an input string to an output string is configured by: a set of input token types with classes, pattern-matching rules involving sequences of tokens as well as preceding or following tokens and token classes, insertion rules between matches, and optional consolidation of whitespace. Rules are ordered by specificity.
Note
This constructor does not validate settings and should typically not be called directly. Use
from_dict()
instead. For “easy reading” support, usefrom_easyreading_dict()
,from_yaml()
, orfrom_yaml_file()
. Keyword parameters used here (ignore_errors
,check_ambiguity
) can be passed from those other constructors.- Parameters
tokens (dict of {str: set of str}) – Mapping of input token types to token classes
rules (list of TransliterationRule) – list of transliteration rules ordered by cost
onmatch_rules (list of
OnMatchRule
, or None) – Rules for output to be inserted between tokens of certain classes when a transliteration rule has been matched but before its production string has been added to the outputwhitespace (WhitespaceRules) – Rules for handling whitespace
metadata (dict or None) – Metadata settings
ignore_errors (bool, optional) – If true, transliteration errors are ignored and do not raise an exception. The default is false.
check_ambiguity (bool, optional) – If true (default), transliteration rules are checked for ambiguity.
load()
andloads()
do not check ambiguity by default.onmatch_rules_lookup (dict of {str: dict of {str: list of int}}, optional`) – OnMatchRules lookup, used internally, will be generated if not present.
tokens_by_class (dict of {str: set of str}, optional) – Tokens by class, used internally, will be generated if not present.
graph (DirectedGraph, optional) – Directed graph used by Graph Transliterator, will be generated if not present.
tokenizer_pattern (str, optional) – Regular expression pattern for input string tokenization, will be generated if not present.
graphtransliterator_version (str, optional) – Version of graphtransliterator, added by dump() and dumps().
Example
1from graphtransliterator import GraphTransliterator, OnMatchRule, TransliterationRule, WhitespaceRules 2settings = {'tokens': {'a': {'vowel'}, ' ': {'wb'}}, 'onmatch_rules': [OnMatchRule(prev_classes=['vowel'], next_classes=['vowel'], production=',')], 'rules': [TransliterationRule(production='A', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562), TransliterationRule(production=' ', prev_classes=None, prev_tokens=None, tokens=[' '], next_tokens=None, next_classes=None, cost=0.5849625007211562)], 'metadata': {'author': 'Author McAuthorson'}, 'whitespace': WhitespaceRules(default=' ', token_class='wb', consolidate=False)} 3gt = GraphTransliterator(**settings) 4gt.transliterate('a')
'A'
See also
from_dict
Constructor from dictionary of settings
from_easyreading_dict
Constructor from dictionary in “easy reading” format
from_yaml
Constructor from YAML string in “easy reading” format
from_yaml_file
Constructor from YAML file in “easy reading” format
- dump(compression_level=0)[source]¶
Dump configuration of Graph Transliterator to Python data types.
Compression is turned off by default.
- Parameters
compression_level (int) – A value in 0 (default, no compression), 1 (compression including graph), and 2 (compressiong without graph)
- Returns
GraphTransliterator configuration as a dictionary with keys:
"tokens"
Mappings of tokens to their classes (OrderedDict of {str: list of str})
"rules"
Transliteration rules in direct format (list of dict of {str: str})
"whitespace"
Whitespace settings (dict of {str: str})
"onmatch_rules"
On match rules (list of OrderedDict)
"metadata"
Dictionary of metadata (dict)
"ignore_errors"
Ignore errors in transliteration (bool)
"onmatch_rules_lookup"
Dictionary keyed by current token to previous token containing a list of indexes of applicable
OnmatchRule
to try (dict of {str: dict of {str: list of int}})"tokens_by_class"
Tokens keyed by token class, used internally (dict of {str: list of str})
"graph"
Serialization of DirectedGraph (dict)
"tokenizer_pattern"
Regular expression for tokenizing (str)
"graphtransliterator_version"
Module version of graphtransliterator (str)
- Return type
OrderedDict
Example
5yaml_ = ''' 6tokens: 7 a: [vowel] 8 ' ': [wb] 9rules: 10 a: A 11 ' ': ' ' 12whitespace: 13 default: " " 14 consolidate: false 15 token_class: wb 16onmatch_rules: 17 - <vowel> + <vowel>: ',' # add a comma between vowels 18metadata: 19 author: "Author McAuthorson" 20''' 21gt = GraphTransliterator.from_yaml(yaml_) 22gt.dump()
OrderedDict([('tokens', {'a': ['vowel'], ' ': ['wb']}), ('rules', [OrderedDict([('production', 'A'), ('tokens', ['a']), ('cost', 0.5849625007211562)]), OrderedDict([('production', ' '), ('tokens', [' ']), ('cost', 0.5849625007211562)])]), ('whitespace', {'token_class': 'wb', 'default': ' ', 'consolidate': False}), ('onmatch_rules', [OrderedDict([('prev_classes', ['vowel']), ('next_classes', ['vowel']), ('production', ',')])]), ('metadata', {'author': 'Author McAuthorson'}), ('ignore_errors', False), ('onmatch_rules_lookup', {'a': {'a': [0]}}), ('tokens_by_class', {'vowel': ['a'], 'wb': [' ']}), ('graph', {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562}, 3: {'token': ' ', 'cost': 0.5849625007211562}}, 1: {2: {'cost': 0.5849625007211562}}, 3: {4: {'cost': 0.5849625007211562}}}, 'edge_list': [(0, 1), (0, 3), (1, 2), (3, 4)], 'node': [{'ordered_children': {'a': [1], ' ': [3]}, 'type': 'Start'}, {'token': 'a', 'ordered_children': {'__rules__': [2]}, 'type': 'token'}, {'accepting': True, 'type': 'rule', 'rule_key': 0}, {'token': ' ', 'ordered_children': {'__rules__': [4]}, 'type': 'token'}, {'accepting': True, 'type': 'rule', 'rule_key': 1}]}), ('tokenizer_pattern', '(a|\\ )'), ('graphtransliterator_version', '1.2.2')])
- dumps(compression_level=2)[source]¶
- Parameters
compression_level (int) – A value in 0 (no compression), 1 (compression including graph), and 2 (default, compression without graph)
separators (tuple of str) – Separators used by json.dumps(), default is compact
(JSON) (Dump settings of Graph Transliterator to Javascript Object Notation) –
default. (Compression is turned on by) –
- Returns
JSON string
- Return type
str
Examples
23yaml_ = ''' 24 tokens: 25 a: [vowel] 26 ' ': [wb] 27 rules: 28 a: A 29 ' ': ' ' 30 whitespace: 31 default: " " 32 consolidate: false 33 token_class: wb 34 onmatch_rules: 35 - <vowel> + <vowel>: ',' # add a comma between vowels 36 metadata: 37 author: "Author McAuthorson" 38''' 39gt = GraphTransliterator.from_yaml(yaml_) 40gt.dumps()
'{"graphtransliterator_version":"1.2.2","compressed_settings":[["vowel","wb"],[" ","a"],[[1],[0]],[["A",0,0,[1],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","wb",0],[[[0],[0],","]],{"author":"Author McAuthorson"},null]}'
- static from_dict(dict_settings, **kwargs)[source]¶
Generate GraphTransliterator from dict settings.
- Parameters
dict_settings (dict) – Dictionary of settings
- Returns
Graph transliterator
- Return type
- static from_easyreading_dict(easyreading_settings, **kwargs)[source]¶
Constructs GraphTransliterator from a dictionary of settings in “easy reading” format, i.e. the loaded contents of a YAML string.
- Parameters
easyreading_settings (dict) –
Settings dictionary in easy reading format with keys:
"tokens"
Mappings of tokens to their classes (dict of {str: list of str})
"rules"
Transliteration rules in “easy reading” format (list of dict of {str: str})
"onmatch_rules"
On match rules in “easy reading” format (dict of {str: str}, optional)
"whitespace"
Whitespace definitions, including default whitespace token, class of whitespace tokens, and whether or not to consolidate (dict of {‘default’: str, ‘token_class’: str, consolidate: bool}, optional)
"metadata"
Dictionary of metadata (dict, optional)
- Returns
Graph Transliterator
- Return type
Note
Called by
from_yaml()
.Example
41tokens = { 42 'ab': ['class_ab'], 43 ' ': ['wb'] 44} 45whitespace = { 46 'default': ' ', 47 'token_class': 'wb', 48 'consolidate': True 49} 50onmatch_rules = [ 51 {'<class_ab> + <class_ab>': ','} 52] 53rules = {'ab': 'AB', 54 ' ': '_'} 55settings = {'tokens': tokens, 56 'rules': rules, 57 'whitespace': whitespace, 58 'onmatch_rules': onmatch_rules} 59gt = GraphTransliterator.from_easyreading_dict(settings) 60gt.transliterate("ab abab")
'AB_AB,AB'
See also
from_yaml
Constructor from YAML string in “easy reading” format
from_yaml_file
Constructor from YAML file in “easy reading” format
- static from_yaml(yaml_str, charnames_escaped=True, **kwargs)[source]¶
Construct GraphTransliterator from a YAML str.
- Parameters
yaml_str (str) – YAML mappings of tokens, rules, and (optionally) onmatch_rules
charnames_escaped (boolean) – Unescape Unicode during YAML read (default True)
Note
Called by
from_yaml_file()
and callsfrom_easyreading_dict()
.Example
61yaml_ = ''' 62tokens: 63 a: [class1] 64 ' ': [wb] 65rules: 66 a: A 67 ' ': ' ' 68whitespace: 69 default: ' ' 70 consolidate: True 71 token_class: wb 72onmatch_rules: 73 - <class1> + <class1>: "+" 74''' 75gt = GraphTransliterator.from_yaml(yaml_) 76gt.transliterate("a aa")
'A A+A'
See also
from_easyreading_dict
Constructor from dictionary in “easy reading” format
from_yaml
Constructor from YAML string in “easy reading” format
from_yaml_file
Constructor from YAML file in “easy reading” format
- static from_yaml_file(yaml_filename, **kwargs)[source]¶
Construct GraphTransliterator from YAML file.
- Parameters
yaml_filename (str) – Name of YAML file, containing tokens, rules, and (optionally) onmatch_rules
Note
Calls
from_yaml()
.See also
from_yaml
Constructor from YAML string in “easy reading” format
from_easyreading_dict
Constructor from dictionary in “easy reading” format
- property graph¶
Graph used in transliteration.
- Type
DirectedGraph
- property graphtransliterator_version¶
Graph Transliterator version.
- Type
str
- property ignore_errors¶
Ignore transliteration errors setting.
- Type
bool
- property last_input_tokens¶
Last tokenization of the input string, with whitespace at start and end.
- Type
list of str
- property last_matched_rule_tokens¶
Last matched tokens for each rule.
- Type
list of list of str
- property last_matched_rules¶
Last transliteration rules matched.
- Type
list of TransliterationRule
- static load(settings, **kwargs)[source]¶
Create GraphTransliterator from settings as Python data types.
- Parameters
settings –
GraphTransliterator configuration as a dictionary with keys:
"tokens"
Mappings of tokens to their classes (dict of {str: list of str})
"rules"
Transliteration rules in direct format (list of OrderedDict of {str: str})
"whitespace"
Whitespace settings (dict of {str: str})
"onmatch_rules"
On match rules (list of OrderedDict, optional)
"metadata"
Dictionary of metadata (dict, optional)
"ignore_errors"
Ignore errors. (bool, optional)
"onmatch_rules_lookup"
Dictionary keyed by current token to previous token containing a list of indexes of applicable
OnmatchRule
to try (dict of {str: dict of {str: list of int}}, optional)tokens_by_class
Tokens keyed by token class, used internally (dict of {str: list of str}, optional)
graph
Serialization of DirectedGraph (dict, optional)
"tokenizer_pattern"
Regular expression for tokenizing (str, optional)
"graphtransliterator_version"
Module version of graphtransliterator (str, optional)
- Returns
Graph Transliterator
- Return type
Example
77from collections import OrderedDict 78settings = {'tokens': {'a': ['vowel'], ' ': ['wb']}, 79 'rules': [OrderedDict([('production', 'A'), 80 # Can be compacted, removing None values 81 # ('prev_tokens', None), 82 ('tokens', ['a']), 83 ('next_classes', None), 84 ('next_tokens', None), 85 ('cost', 0.5849625007211562)]), 86 OrderedDict([('production', ' '), 87 ('prev_classes', None), 88 ('prev_tokens', None), 89 ('tokens', [' ']), 90 ('next_classes', None), 91 ('next_tokens', None), 92 ('cost', 0.5849625007211562)])], 93 'whitespace': {'default': ' ', 'token_class': 'wb', 'consolidate': False}, 94 'onmatch_rules': [OrderedDict([('prev_classes', ['vowel']), 95 ('next_classes', ['vowel']), 96 ('production', ',')])], 97 'metadata': {'author': 'Author McAuthorson'}, 98 'onmatch_rules_lookup': {'a': {'a': [0]}}, 99 'tokens_by_class': {'vowel': ['a'], 'wb': [' ']}, 100 'graph': {'edge': {0: {1: {'token': 'a', 'cost': 0.5849625007211562}, 101 3: {'token': ' ', 'cost': 0.5849625007211562}}, 102 1: {2: {'cost': 0.5849625007211562}}, 103 3: {4: {'cost': 0.5849625007211562}}}, 104 'node': [{'type': 'Start', 'ordered_children': {'a': [1], ' ': [3]}}, 105 {'type': 'token', 'token': 'a', 'ordered_children': {'__rules__': [2]}}, 106 {'type': 'rule', 107 'rule_key': 0, 108 'accepting': True, 109 'ordered_children': {}}, 110 {'type': 'token', 'token': ' ', 'ordered_children': {'__rules__': [4]}}, 111 {'type': 'rule', 112 'rule_key': 1, 113 'accepting': True, 114 'ordered_children': {}}], 115 'edge_list': [(0, 1), (1, 2), (0, 3), (3, 4)]}, 116 'tokenizer_pattern': '(a|\ )', 117 'graphtransliterator_version': '0.3.3'} 118gt = GraphTransliterator.load(settings) 119gt.transliterate('aa')
'A,A'
120# can be compacted 121settings.pop('onmatch_rules_lookup') 122GraphTransliterator.load(settings).transliterate('aa')
'A,A'
- static loads(settings, **kwargs)[source]¶
Create GraphTransliterator from JavaScript Object Notation (JSON) string.
- Parameters
settings – JSON settings for GraphTransliterator
- Returns
Graph Transliterator
- Return type
Example
123JSON_settings = '''{"tokens": {"a": ["vowel"], " ": ["wb"]}, "rules": [{"production": "A", "prev_classes": null, "prev_tokens": null, "tokens": ["a"], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}, {"production": " ", "prev_classes": null, "prev_tokens": null, "tokens": [" "], "next_classes": null, "next_tokens": null, "cost": 0.5849625007211562}], "whitespace": {"default": " ", "token_class": "wb", "consolidate": false}, "onmatch_rules": [{"prev_classes": ["vowel"], "next_classes": ["vowel"], "production": ","}], "metadata": {"author": "Author McAuthorson"}, "ignore_errors": false, "onmatch_rules_lookup": {"a": {"a": [0]}}, "tokens_by_class": {"vowel": ["a"], "wb": [" "]}, "graph": {"node": [{"type": "Start", "ordered_children": {"a": [1], " ": [3]}}, {"type": "token", "token": "a", "ordered_children": {"__rules__": [2]}}, {"type": "rule", "rule_key": 0, "accepting": true, "ordered_children": {}}, {"type": "token", "token": " ", "ordered_children": {"__rules__": [4]}}, {"type": "rule", "rule_key": 1, "accepting": true, "ordered_children": {}}], "edge": {"0": {"1": {"token": "a", "cost": 0.5849625007211562}, "3": {"token": " ", "cost": 0.5849625007211562}}, "1": {"2": {"cost": 0.5849625007211562}}, "3": {"4": {"cost": 0.5849625007211562}}}, "edge_list": [[0, 1], [1, 2], [0, 3], [3, 4]]}, "tokenizer_pattern": "(a| )", "graphtransliterator_version": "1.2.2"}''' 124 125gt = GraphTransliterator.loads(JSON_settings) 126gt.transliterate('a')
'A'
- match_at(token_i, tokens, match_all=False)[source]¶
Match best (least costly) transliteration rule at a given index in the input tokens and return the index to that rule. Optionally, return all rules that match.
- Parameters
token_i (int) – Location in tokens at which to begin
tokens (list of str) – List of tokens
match_all (bool, optional) – If true, return the index of all rules matching at the given index. The default is false.
- Returns
Index of matching transliteration rule in
GraphTransliterator.rules
or None. Returns a list of int or an empty list ifmatch_all
is true.- Return type
int, None, or list of int
Note
Expects whitespaces token at beginning and end of tokens.
Examples
127gt = GraphTransliterator.from_yaml(''' 128 tokens: 129 a: [] 130 a a: [] 131 ' ': [wb] 132 rules: 133 a: <A> 134 a a: <AA> 135 whitespace: 136 default: ' ' 137 consolidate: True 138 token_class: wb 139''') 140tokens = gt.tokenize("aa") 141tokens # whitespace added to ends
[' ', 'a', 'a', ' ']
142gt.match_at(1, tokens) # returns index to rule
0
143gt.rules[gt.match_at(1, tokens)] # actual rule
TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437)
144gt.match_at(1, tokens, match_all=True) # index to rules, with match_all
[0, 1]
145[gt.rules[_] for _ in gt.match_at(1, tokens, match_all=True)]
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437), TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
- property metadata¶
Metadata of transliterator
- Type
dict
- property onmatch_rules¶
Rules for productions between matches.
- Type
list of
OnMatchRules
- property onmatch_rules_lookup¶
On Match Rules lookup
- Type
dict
- property productions¶
List of productions of each transliteration rule.
- Type
list of str
- pruned_of(productions)[source]¶
Remove transliteration rules with specific output productions.
- Parameters
productions (str, or list of str) – list of productions to remove
- Returns
Graph transliterator pruned of certain productions.
- Return type
Note
Uses original initialization parameters to construct a new
GraphTransliterator
.Examples
146gt = GraphTransliterator.from_yaml(''' 147 tokens: 148 a: [] 149 a a: [] 150 ' ': [wb] 151 rules: 152 a: <A> 153 a a: <AA> 154 whitespace: 155 default: ' ' 156 consolidate: True 157 token_class: wb 158''') 159gt.rules
[TransliterationRule(production='<AA>', prev_classes=None, prev_tokens=None, tokens=['a', 'a'], next_tokens=None, next_classes=None, cost=0.4150374992788437), TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
160gt.pruned_of('<AA>').rules
[TransliterationRule(production='<A>', prev_classes=None, prev_tokens=None, tokens=['a'], next_tokens=None, next_classes=None, cost=0.5849625007211562)]
161gt.pruned_of(['<A>', '<AA>']).rules
[]
- property rules¶
Transliteration rules sorted by cost.
- Type
list of TransliterationRule
- tokenize(input)[source]¶
Tokenizes an input string.
Adds initial and trailing whitespace, which can be consolidated.
- Parameters
input (str) – String to tokenize
- Returns
List of tokens, with default whitespace token at beginning and end.
- Return type
list of str
- Raises
ValueError – Unrecognizable input, such as a character that is not in a token
Examples
162tokens = {'ab': ['class_ab'], ' ': ['wb']} 163whitespace = {'default': ' ', 'token_class': 'wb', 'consolidate': True} 164rules = {'ab': 'AB', ' ': '_'} 165settings = {'tokens': tokens, 'rules': rules, 'whitespace': whitespace} 166gt = GraphTransliterator.from_easyreading_dict(settings) 167gt.tokenize('ab ')
[' ', 'ab', ' ']
- property tokenizer_pattern¶
Tokenizer pattern from transliterator
- Type
str
- property tokens¶
Mappings of tokens to their classes.
- Type
dict of {str
- Type
set of str}
- property tokens_by_class¶
Tokenizer pattern from transliterator
- Type
dict of {str
- Type
list of str}
- transliterate(input)[source]¶
Transliterate an input string into an output string.
- Parameters
input (str) – Input string to transliterate
- Returns
Transliteration output string
- Return type
str
- Raises
ValueError – Cannot parse input
Note
Whitespace will be temporarily appended to start and end of input string.
Example
168GraphTransliterator.from_yaml( 169''' 170tokens: 171 a: [] 172 ' ': [wb] 173rules: 174 a: A 175 ' ': '_' 176whitespace: 177 default: ' ' 178 consolidate: True 179 token_class: wb 180''').transliterate("a a")
'A_A'
- property whitespace¶
Whitespace rules.
- Type
WhiteSpaceRules
- class graphtransliterator.CoverageTransliterator(*args, **kwargs)[source]¶
Subclass of GraphTransliterator that logs visits to graph and on_match rules.
Used to confirm that tests cover the entire graph and onmatch_rules.
Bundled Transliterators¶
graphtransliterator.transliterators¶
Bundled transliterators are loaded by explicitly importing
graphtransliterator.transliterators
. Each is an instance of
graphtransliterator.bundled.Bundled
.
- class graphtransliterator.transliterators.Bundled(*args, **kwargs)[source]¶
Subclass of GraphTransliterator used for bundled Graph Transliterator.
- property directory¶
Directory of bundled transliterator, used to load settings.
- from_JSON(check_ambiguity=False, coverage=False, **kwargs)[source]¶
Initialize from bundled JSON file (best for speed).
- Parameters
check_ambiguity (bool,) – Should ambiguity be checked. Default is False.
coverage (bool) – Should test coverage be checked. Default is False.
- from_YAML(check_ambiguity=True, coverage=True, **kwargs)[source]¶
Initialize from bundled YAML file (best for development).
- Parameters
check_ambiguity (bool,) – Should ambiguity be checked. Default is True.
coverage (bool) – Should test coverage be checked. Default is True.
- generate_yaml_tests(file=None)[source]¶
Generates YAML tests with complete coverage.
Uses the first token in a class as a sample. Assumes for onmatch rules that the first sample token in a class has a unique production, which may not be the case. These should be checked and edited.
- load_yaml_tests()[source]¶
Iterator for YAML tests.
Assumes tests are found in subdirectory tests of module with name NAME_tests.yaml, e.g. `source_to_target/tests/source_to_target_tests.yaml.
- property name¶
Name of bundled transliterator, e.g. ‘Example’
- classmethod new(method='json', **kwargs)[source]¶
Return a new class instance from method (json/yaml).
- Parameters
method (str (json or yaml)) – How to load bundled transliterator, JSON or YAML.
- run_tests(transliteration_tests)[source]¶
Run transliteration tests.
- Parameters
transliteration_tests (dict of {str:str}) – Dictionary of test from source -> correct target.
- property yaml_tests_filen¶
Metadata of transliterator
- Type
dict
- class graphtransliterator.transliterators.Example(**kwargs)[source]¶
Example Bundled Graph Transliterator.
- class graphtransliterator.transliterators.ITRANSDevanagariToUnicode(**kwargs)[source]¶
ITRANS Devanagari to Unicode Transliterator.
- class graphtransliterator.transliterators.MetadataSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for Bundled metadata.
Graph Classes¶
- class graphtransliterator.DirectedGraph(node=None, edge=None, edge_list=None)[source]¶
A very basic dictionary- and list-based directed graph. Nodes are a list of dictionaries of node data. Edges are nested dictionaries keyed from the head -> tail -> edge properties. An edge list is maintained. Can be exported as a dictionary.
- node¶
List of node data
- Type
list of dict
- edge¶
Mapping from head to tail of edge, holding edge data
- Type
dict of {int: dict of {int: dict}}
- edge_list¶
List of head and tail of each edge
- Type
list of tuple of (int, int)
Examples
181from graphtransliterator import DirectedGraph 182DirectedGraph()
<graphtransliterator.graphs.DirectedGraph at 0x7ff8d83354b0>
- add_edge(head, tail, edge_data=None)[source]¶
Add an edge to a graph and return its attributes as dict.
- Parameters
head (int) – Index of head of edge
tail (int) – Index of tail of edge
edge_data (dict, default {}) – Edge data
- Returns
Data of created edge
- Return type
- Raises
ValueError – Invalid
head
ortail
, oredge_data
is not a dict.
Examples
183g = DirectedGraph() 184g.add_node()
(0, {})
185g.add_node()
(1, {})
186g.add_edge(0,1, {'data_key_1': 'some edge data here'})
{'data_key_1': 'some edge data here'}
187g.edge
{0: {1: {'data_key_1': 'some edge data here'}}}
- add_node(node_data=None)[source]¶
Create node and return (int, dict) of node key and object.
- Parameters
node_data (dict, default {}) – Data to be stored in created node
- Returns
Index of created node and its data
- Return type
tuple of (int, dict)
- Raises
ValueError –
node_data
is not adict
Examples
188g = DirectedGraph() 189g.add_node()
(0, {})
190g.add_node({'datakey1': 'data value'})
(1, {'datakey1': 'data value'})
191g.node
[{}, {'datakey1': 'data value'}]
- class graphtransliterator.VisitLoggingDirectedGraph(graph)[source]¶
A DirectedGraph that logs visits to all nodes and edges.
Used to measure the coverage of tests for bundled transliterators.
Rule Classes¶
- class graphtransliterator.TransliterationRule(production, prev_classes, prev_tokens, tokens, next_tokens, next_classes, cost)[source]¶
A transliteration rule containing the specific match conditions and string output to be produced, as well as the rule’s cost.
- production¶
Output produced on match of rule
- Type
str
- prev_classes¶
List of previous token classes to be matched before tokens or, if they exist, prev_tokens
- Type
list of str, or None
- prev_tokens¶
List of tokens to be matched before tokens
- Type
list of str, or None
- tokens¶
List of tokens to match
- Type
list of str
- next_tokens¶
List of tokens to match after tokens
- Type
list of str, or None
- next_classes¶
List of tokens to match after tokens or, if they exist, next_tokens
- Type
list of str, or None
- cost¶
Cost of the rule, where less specific rules are more costly
- Type
float
- class graphtransliterator.OnMatchRule(prev_classes, next_classes, production)[source]¶
Rules about adding text between certain combinations of matched rules.
When a translation rule has been found and before its production is added to the output, the productions string of an OnMatch rule is added if previously matched tokens and current tokens are of the specified classes.
- prev_classes¶
List of previously matched token classes required
- Type
list of str
- next_classes¶
List of current and following token classes required
- Type
list of str
- production¶
String to added before current rule
- Type
str
- class graphtransliterator.WhitespaceRules(default, token_class, consolidate)[source]¶
Whitespace rules of GraphTransliterator.
- default¶
Default whitespace token
- Type
str
- token_class¶
Whitespace token class
- Type
str
- consolidate¶
Consolidate consecutive whitespace tokens and render as a single instance of the specified default whitespace token.
- Type
bool
Exceptions¶
- exception graphtransliterator.GraphTransliteratorException[source]¶
Base exception class. All Graph Transliterator-specific exceptions should subclass this class.
- exception graphtransliterator.AmbiguousTransliterationRulesException[source]¶
Raised when multiple transliteration rules can match the same pattern. Details of ambiguities are given in a
logging.warning()
.
- exception graphtransliterator.NoMatchingTransliterationRuleException[source]¶
Raised when no transliteration rule can be matched at a particular location in the input string’s tokens. Details of the location are given in a
logging.warning()
.
- exception graphtransliterator.UnrecognizableInputTokenException[source]¶
Raised when a character in the input string does not correspond to any tokens in the GraphTransliterator’s token settings. Details of the location are given in a
logging.warning()
.
Schemas¶
- class graphtransliterator.DirectedGraphSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for
DirectedGraph
.Validates graph somewhat rigorously.
- class graphtransliterator.EasyReadingSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for easy reading settings.
Provides initial validation based on easy reading format.
- class graphtransliterator.GraphTransliteratorSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for Graph Transliterator.
- class graphtransliterator.OnMatchRuleSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for
OnMatchRule
.
- class graphtransliterator.SettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for settings in dictionary format.
Performs validation.
- class graphtransliterator.TransliterationRuleSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for
TransliterationRule
.
- class graphtransliterator.WhitespaceDictSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for Whitespace definition as a dict.
- class graphtransliterator.WhitespaceSettingsSchema(*, only: Optional[Union[Sequence[str], Set[str]]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Optional[Dict] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: Optional[str] = None)[source]¶
Schema for Whitespace definition that loads as
WhitespaceRules.