Bundled Transliterators

Note

Python code on this page: bundled.py Jupyter Notebook: bundled.ipynb

Graph Transliterator includes bundled transliterators in a Bundled subclass of GraphTransliterator that can be used as follows:

1import graphtransliterator.transliterators as transliterators
2example_transliterator = transliterators.Example()
3example_transliterator.transliterate('a')
'A'

To access transliterator classes, use the iterator transliterators.iter_transliterators():

4bundled_iterator = transliterators.iter_transliterators()
5next(bundled_iterator)
<example.Example at 0x7fc8a86ca0d0>

To access the names of transliterator classes, use the iterator transliterators.iter_names():

6bundled_names_iterator = transliterators.iter_names()
7next(bundled_names_iterator)
'Example'

The actual bundled transliterators are submodules of graphtransliterator.transliterators, but they are loaded into the namespace of transliterators:

8from graphtransliterator.transliterators import Example

Each instance of Bundled contains a directory attribute:

 9transliterator = Example()
10transliterator.directory
'/home/docs/checkouts/readthedocs.org/user_builds/graphtransliterator/checkouts/latest/graphtransliterator/transliterators/example'

Each will contain an easy-reading YAML file that you can view:

tokens:
  a: [vowel]
  ' ': [whitespace]
  b: [consonant]
rules:
  a: A
  b: B
  ' ': ' '
  (<consonant> a) b (a <consonant>):  "!B!"
onmatch_rules:
  - <vowel> + <vowel>: ","
whitespace:
  consolidate: False
  default: " "
  token_class: whitespace
metadata:
  name: example
  version: 1.0.0
  description: "An Example Bundled Transliterator"
  url: https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample
  author: Author McAuthorson
  author_email: author_mcauthorson@msu.edu
  license: MIT License
  keywords:
    - example
  project_urls:
    Documentation: https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example
    Source: https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example
    Tracker: https://github.com/seanpue/graphtransliterator/issues

There is also a JSON dump of the transliterator for quick loading:

{"graphtransliterator_version":"1.2.0","compressed_settings":[["consonant","vowel","whitespace"],[" ","a","b"],[[2],[1],[0]],[["!B!",[0],[1],[2],[1],[0],-5],["A",0,0,[1],0,0,-1],["B",0,0,[2],0,0,-1],[" ",0,0,[0],0,0,-1]],[" ","whitespace",0],[[[1],[1],","]],{"name":"example","version":"1.0.0","description":"An Example Bundled Transliterator","url":"https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample","author":"Author McAuthorson","author_email":"author_mcauthorson@msu.edu","license":"MIT License","keywords":["example"],"project_urls":{"Documentation":"https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example","Source":"https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example","Tracker":"https://github.com/seanpue/graphtransliterator/issues"}},null]}

Test Coverage of Bundled Transliterators

Each bundled transliterators requires rigorous testing: every node and edge, as well as any onmatch rules, if applicable, must be visited. A separate subclass CoverageTransliterator of GraphTransliterator is used during testing.

It logs visits to nodes, edges, and onmatch rules. The tests are found in a subdirectory of the transliterator named “tests”. They are in a YAML file consisting of a dictionary keyed from transliteration input to correct output, e.g.:

# YAML declaration of tests for bundled Graph Transliterator
# These are in the form of a dictionary.
# The key is the source text, and the value is the correct transliteration.
' ': ' '
a: A
aa: A,A
babab: BA!B!AB
b: B

Once the tests are completed, Graph Transliterator checks that all components of the graph and all of the onmatch rules have been visited.

Class Structure and Naming Conventions

Each transliterator must include a class definition in a submodule of transliterators.

The class name of each transliterator must be unique and follow camel-case conventions, e.g. SourceToTarget. File and directory names should, if applicable, be lowercased as source_to_target.

The bundled files should follow this directory structure, where {{source_to_target}} is the name of the transliterator:

transliterators
├── {{source_to_target}}
|   ├── __init__.py
|   ├── {{source_to_target}}.json
|   ├── {{source_to_target}}.yaml
└── tests
    ├── test_{{source_to_target}}.py
    └── {{source_to_target}}_tests.yaml

The bundled transliterator will:

  • include both an easy-reading YAML file {{source_to_target}}.yaml and a JSON file {{source_to_target}}.json.

  • have tests in a YAML format consisting of a dictionary keyed from transliteration to correct output in {{source_to_target}}_tests.yaml. It must include complete test coverage of its graph. Every node and edge of the graph must be visited during the course of the tests, as well as every on-match rule. Each on-match rule must be utilized during the course of the tests.

  • include metadata about the transliterator in its easy-reading YAML file.

  • have an optional custom test file test_{{source_to_target.py}}. This is useful during development.

Metadata Requirements

Each Bundled transliterator can include the following metadata fields. These fields are a subset of the metadata of setuptools.

name (str)

Name of the transliterator, e.g. “source_to_target”.

version (str, optional)

Version of the transliterator. Semantic versioning (https://semver.org) is recommended.

url (str, optional)

URL for the transliterator, e.g. github repository.

author (str, optional)

Author of the transliterator

author_email (str, optional)

E-mail address of the author.

maintainer (str, optional)

Name of the maintainer.

maintainer_email (str, optional)

E-mail address of the maintainer.

license (str, optional)

License of the transliterator. An open-source license is required for inclusion in this project.

keywords (list of str, optional)

List of keywords.

project_urls (dict of {str: str}, optional)

Dictionary of project URLS, e.g. Documentation, Source, etc.

Metadata is validated using a BundledMetadataSchema found in transliterators.schemas.

To browse metadata, you can use iter_transliterators():

11import pprint
12transliterator = next(transliterators.iter_transliterators())
13pprint.pprint(transliterator.metadata)
{'author': 'Author McAuthorson',
 'author_email': 'author_mcauthorson@msu.edu',
 'description': 'An Example Bundled Transliterator',
 'keywords': ['example'],
 'license': 'MIT License',
 'name': 'example',
 'project_urls': {'Documentation': 'https://github.com/seanpue/graphtransliterator/tree/master/graphtransliterator/transliterators/example',
                  'Source': 'https://github.com/seanpue/graphtransliterator/tree/graphtransliterator/transliterators/example',
                  'Tracker': 'https://github.com/seanpue/graphtransliterator/issues'},
 'url': 'https://github.com/seanpue/graphtransliterator/tree/master/transliterator/sample',
 'version': '1.0.0'}