Advanced Tutorial: Bundling a Transliterator

This advanced tutorial builds upon the original tutorial to show you how to bundle a transliterator for inclusion in Graph Transliterator.

Contributions to Graph Transliterator are strongly encouraged!

You will make a very simple transliterator while going through the steps of bundling it into Graph Transliterator.

Git Basics: Fork, Branch, Sync, Commit

Fork

The first thing to do, if you have not already, is to create a fork of Graph Transliterator. See https://help.github.com/en/articles/fork-a-repo

(From here on out, we will be using the command line.)

After creating a fork, clone your forked repo:

git clone https://github.com/YOUR-USERNAME/graphtransliterator

Branch

Once you have done that, go into that directory and create a new branch:

cd graphtransliterator
git checkout -b [name_of_your_transliterator_branch]

For this example, you can use the branch a_to_b:

cd graphtransliterator
git checkout -b a_to_b

Then, push that branch to the origin (your personal github fork):

git push origin [name_of_your_transliterator_branch]

Here that would be:

.. code-block:: bash

  git push origin a_to_b

Next, add a remote upstream for Graph Transliterator (the official Graph Transliterator repo):

git remote add upstream https://github.com/seanpue/graphtransliterator.git

Sync

To update your local copy of the the remote (official Graph Transliterator repo), run:

git fetch upstream

To sync your personal fork with the remote, run:

git merge upstream/master

See https://help.github.com/en/articles/syncing-a-fork for more info. You can run the previous two commands at any time.

Commit

You can commit your changes by running:

git commit -m 'comment here about the commit'

Adding A Transliterator

To add a transliterator, the next step is to create a subdirectory in transliterators. For this tutorial, you can make a branch named a_to_b.

Note that this will be under graphtransliterator/transliterators, so from the root directory enter:

cd graphtransliterator/transliterators
mkdir [name_of_your_transliterator]
cd [name_of_your_transliterator]

For this example, you would enter:

cd graphtransliterator/transliterators
mkdir a_to_b
cd a_to_b

In the graphtransliterator/transliterators/[name_of_your_transliterator] directory, you will add:

  • an __init__.py

  • a YAML file in the “easy reading format”

  • a JSON file that is a serialization of the transliterator (optional)

  • a tests directory including a file named [name_of_your_transliterator]_tests.yaml

  • a Python test named test_[name_of_your_transliterator].py (optional)

Here is a tree showing the file organization:

transliterators
├── {{source_to_target}}
|   ├── __init__.py
|   ├── {{source_to_target}}.json
|   ├── {{source_to_target}}.yaml
└── tests
    ├── test_{{source_to_target}}.py
    └── {{source_to_target}}_tests.yaml

YAML File

The YAML file should contain the “easy reading” version of your transliterator. For this example, create a file called a_to_b.yaml. Add a metadata field to the YAML file, as well, following the guidelines.

tokens:
  a: [a_class]
  ' ': [whitespace]
rules:
  a: A
onmatch_rules:
  - <a_class> + <a_class>: ","
whitespace:
  default: ' '
  token_class: whitespace
  consolidate: false
metadata:
  name: A to B
  version: 0.0.1
  url: http://website_of_project.com
  author: Your Name is Optional
  author_email: your_email@is_option.al
  maintainer: Maintainer's Name is Optional
  maintainer_email: maintainers_email@is_option.al
  license: MIT or Other Open Source License
  keywords: [add, keywords, here, as, a, list]
  project_urls:
     Documentation: https://link_to_documentation.html
     Source: https://link_to_sourcecode.html
     Tracker: https://link_to_issue_tracker.html

For most use cases, the project_urls can link to the Graph Transliterator Github page.

JSON File

To create a JSON file, you can use the command line interface:

$ graphtransliterator dump –from yaml_file a_to_b.yaml > a_to_b.json

Alternatively, you can use the make-json command:

$ graphtransliterator make-json AToB

The JSON file loads more quickly than the YAML one, but it is not necessary during development.

__init__.py

The __init__.py will create the bundled transliterator, which is a subclass of GraphTransliterator named Bundled.

Following convention, uou need to name your transliterator’s class is CamelCase. For this example, it would be AToB:

from graphtransliterator.transliterators import Bundled

class AToB(Bundled):
    """
    A to B Bundled Graph Transliterator
    """

    def __init__(self, **kwargs):
        """Initialize transliterator from YAML."""
        self.from_YAML(
            **kwargs
        )  # defaults to check_ambiguity=True, check_coverage=True
        # When ready, remove the previous lines and initialize more quickly from JSON:
        # self.init_from_JSON(**kwargs) # check_ambiguity=False, check_coverage=False

When you load the bundled transliterator from YAML using from_YAML it will check for ambiguity as well as check the coverage of the tests. You can turn those features off temporarily here.

When a transliterator is added into Graph Transliterator, it will likely be set to load from JSON by default. Tests will check for ambiguity and coverage.

Tests

Graph Transliterator requires that all bundled transliterators have tests that visit every edge and node of the internal graph and that use all on-match rules. The test file should be a YAML file defining a dictionary keyed from input to correct output.

You can test the transliterator as you are developing it by adding YAML tests and running the command:

graphtransliterator test [name_of_your_transliterator]

Tests can be generated using the command line interface:

mkdir tests
graphtransliterator generate-tests --from bundled [name_of_your_transliterator] > tests/[name_of_your_transliterator]

Testing the Transliterator

You should test the transliterator to make sure everything is correct, including its metadata. To do that, navigate back to the root directory of graphtransliterator and execute the command:

py.test tests/test_transliterators.py

You can also run the complete suite of tests by running:

tox

Pushing Your Transliterator

When you are finished with a version of your transliterator, you should once again commit it to your github branch after syncing your branch with the remote. Then you can make a pull request to include the transliterator in Graph Transliterator. You can do that from the Graph Transliterator Github page. See https://help.github.com/en/articles/creating-a-pull-request-from-a-fork.