Bundled Transliterators
Note
Python code on this page: bundled.py
Jupyter Notebook: bundled.ipynb
Graph Transliterator includes bundled transliterators in a Bundled
subclass of
GraphTransliterator
that can be used as follows:
1import graphtransliterator.transliterators as transliterators
2example_transliterator = transliterators.Example()
3example_transliterator.transliterate('a')
'A'
To access transliterator classes, use the iterator
transliterators.iter_transliterators()
:
4bundled_iterator = transliterators.iter_transliterators()
5next(bundled_iterator)
<example.Example at 0x7fc8a86ca0d0>
To access the names of transliterator classes, use the iterator
transliterators.iter_names()
:
6bundled_names_iterator = transliterators.iter_names()
7next(bundled_names_iterator)
'Example'
The actual bundled transliterators are submodules of
graphtransliterator.transliterators
, but they are loaded into the namespace
of transliterators
:
8from graphtransliterator.transliterators import Example
Each instance of Bundled
contains a directory
attribute:
9transliterator = Example()
10transliterator.directory
'/home/docs/checkouts/readthedocs.org/user_builds/graphtransliterator/checkouts/latest/graphtransliterator/transliterators/example'
Each will contain an easy-reading YAML file that you can view:
tokens:
a: [vowel]
' ': [whitespace]
b: [consonant]
rules:
a: A
b: B
' ': ' '
(<consonant> a) b (a <consonant>): "!B!"
onmatch_rules:
- <vowel> + <vowel>: ","
whitespace:
consolidate: False
default: " "
token_class: whitespace
metadata:
name: example
version: 1.0.0
description: "An Example Bundled Transliterator"
url: https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample
author: Author McAuthorson
author_email: author_mcauthorson@msu.edu
license: MIT License
keywords:
- example
project_urls:
Documentation: https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example
Source: https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example
Tracker: https://github.com/seanpue/graphtransliterator/issues
There is also a JSON dump of the transliterator for quick loading:
{"graphtransliterator_version":"1.2.0","compressed_settings":[["consonant","vowel","whitespace"],[" ","a","b"],[[2],[1],[0]],[["!B!",[0],[1],[2],[1],[0],-5],["A",0,0,[1],0,0,-1],["B",0,0,[2],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","whitespace",0],[[[1],[1],","]],{"name":"example","version":"1.0.0","description":"An Example Bundled Transliterator","url":"https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample","author":"Author McAuthorson","author_email":"author_mcauthorson@msu.edu","license":"MIT License","keywords":["example"],"project_urls":{"Documentation":"https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example","Source":"https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example","Tracker":"https://github.com/seanpue/graphtransliterator/issues"}},null]}
Test Coverage of Bundled Transliterators
Each bundled transliterators requires rigorous testing: every node and edge, as
well as any onmatch rules, if applicable, must be visited. A separate subclass
CoverageTransliterator
of GraphTransliterator
is used
during testing.
It logs visits to nodes, edges, and onmatch rules. The tests are found in a subdirectory of the transliterator named “tests”. They are in a YAML file consisting of a dictionary keyed from transliteration input to correct output, e.g.:
# YAML declaration of tests for bundled Graph Transliterator
# These are in the form of a dictionary.
# The key is the source text, and the value is the correct transliteration.
' ': ' '
a: A
aa: A,A
babab: BA!B!AB
b: B
Once the tests are completed, Graph Transliterator checks that all components of the graph and all of the onmatch rules have been visited.
Class Structure and Naming Conventions
Each transliterator must include a class definition in a submodule of
transliterators
.
The class name of each transliterator must be unique and follow camel-case conventions, e.g. SourceToTarget. File and directory names should, if applicable, be lowercased as source_to_target.
The bundled files should follow this directory structure, where {{source_to_target}} is the name of the transliterator:
transliterators
├── {{source_to_target}}
| ├── __init__.py
| ├── {{source_to_target}}.json
| ├── {{source_to_target}}.yaml
└── tests
├── test_{{source_to_target}}.py
└── {{source_to_target}}_tests.yaml
The bundled transliterator will:
include both an easy-reading YAML file
{{source_to_target}}.yaml
and a JSON file{{source_to_target}}.json
.have tests in a YAML format consisting of a dictionary keyed from transliteration to correct output in
{{source_to_target}}_tests.yaml
. It must include complete test coverage of its graph. Every node and edge of the graph must be visited during the course of the tests, as well as every on-match rule. Each on-match rule must be utilized during the course of the tests.include metadata about the transliterator in its easy-reading YAML file.
have an optional custom test file
test_{{source_to_target.py}}
. This is useful during development.
Metadata Requirements
Each Bundled
transliterator can include the following metadata fields. These
fields are a subset of the metadata of setuptools
.
- name (str)
Name of the transliterator, e.g. “source_to_target”.
- version (str, optional)
Version of the transliterator. Semantic versioning (https://semver.org) is recommended.
- url (str, optional)
URL for the transliterator, e.g. github repository.
- author (str, optional)
Author of the transliterator
- author_email (str, optional)
E-mail address of the author.
- maintainer (str, optional)
Name of the maintainer.
- maintainer_email (str, optional)
E-mail address of the maintainer.
- license (str, optional)
License of the transliterator. An open-source license is required for inclusion in this project.
- keywords (list of str, optional)
List of keywords.
- project_urls (dict of {str: str}, optional)
Dictionary of project URLS, e.g. Documentation, Source, etc.
Metadata is validated using a BundledMetadataSchema
found in
transliterators.schemas
.
To browse metadata, you can use iter_transliterators()
:
11import pprint
12transliterator = next(transliterators.iter_transliterators())
13pprint.pprint(transliterator.metadata)
{'author': 'Author McAuthorson',
'author_email': 'author_mcauthorson@msu.edu',
'description': 'An Example Bundled Transliterator',
'keywords': ['example'],
'license': 'MIT License',
'name': 'example',
'project_urls': {'Documentation': 'https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example',
'Source': 'https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example',
'Tracker': 'https://github.com/seanpue/graphtransliterator/issues'},
'url': 'https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample',
'version': '1.0.0'}