indicate package

Submodules

indicate.base module

indicate.decoder module

class indicate.decoder.Decoder(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') None[source]

Bases: Model

__init__(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') None[source]
build_attention_mechanism(memory: Tensor | None) Layer[source]
Return type:

Layer

call(inputs: Tensor, initial_state: list[Tensor], encoder_outputs: Tensor) tuple[Tensor, tuple[Tensor, Tensor], Tensor][source]
Return type:

tuple[Tensor, tuple[Tensor, Tensor], Tensor]

build_initial_state(batch_sz: int, encoder_state: list[Tensor]) list[Tensor][source]
Return type:

list[Tensor]

indicate.encoder module

class indicate.encoder.Encoder(vocab_size: int, embedding_dim: int, enc_units: int, batch_sz: int) None[source]

Bases: Model

__init__(vocab_size: int, embedding_dim: int, enc_units: int, batch_sz: int) None[source]
call(x: Tensor, hidden: list[Tensor], training: bool | None = None) tuple[Tensor, Tensor, Tensor][source]
Return type:

tuple[Tensor, Tensor, Tensor]

initialize_hidden_state() list[Tensor][source]
Return type:

list[Tensor]

indicate.hindi2english module

class indicate.hindi2english.HindiToEnglish[source]

Bases: object

MODELFN: str = 'data/hindi_to_english/saved_weights/'
INPUT_VOCAB: str = 'data/hindi_to_english/hindi_tokens.json'
TARGET_VOCAB: str = 'data/hindi_to_english/english_tokens.json'
embedding_dim: int = 256
units: int = 1024
BATCH_SIZE: int = 64
BUFFER_SIZE: int = 120000
max_length_input: int = 47
max_length_output: int = 173
START_TOKEN: str = '^'
END_TOKEN: str = '$'
input_lang_tokenizer: Any | None = None
target_lang_tokenizer: Any | None = None
encoder: Encoder | None = None
decoder: Decoder | None = None
classmethod get_model_path() str[source]
Return type:

str

classmethod get_input_vocab() str[source]
Return type:

str

classmethod get_target_vocab() str[source]
Return type:

str

classmethod transliterate(input: str) str[source]

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:
  • TypeError – If input is None

  • ValueError – If input is not a string

  • RuntimeError – If model loading fails

indicate.logging module

indicate.logging.get_logger() Logger[source]
Return type:

Logger

indicate.transliterate module

Transliteration module with backward compatibility.

This module maintains the original API while delegating to the new Click-based CLI.

indicate.transliterate.hindi2english(input: str) str

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:
  • TypeError – If input is None

  • ValueError – If input is not a string

  • RuntimeError – If model loading fails

indicate.transliterate.main(argv: list[str] | None = None) int[source]

Legacy entry point for backward compatibility.

Return type:

int

indicate.utils module

indicate.utils.sequence_to_chars(tokenizer: Any, sequence: Tensor) str[source]

Convert a sequence of indices back to characters.

Return type:

str

indicate.utils.evaluate_sentence(sentence: str, units: int, input_lang_tokenizer: Any, target_lang_tokenizer: Any, encoder: Any, decoder: Any, max_length_input: int) Tensor[source]

Evaluate/translate a single sentence.

Return type:

Tensor

indicate.utils.translate(sentence: str, units: int, input_lang_tokenizer: Any, target_lang_tokenizer: Any, encoder: Any, decoder: Any, max_length_input: int) str[source]

Translate a sentence from source to target language.

Return type:

str

Module contents

indicate.hindi2english(input: str) str

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:
  • TypeError – If input is None

  • ValueError – If input is not a string

  • RuntimeError – If model loading fails

class indicate.IndicLLMTransliterator(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]

Bases: object

LLM-based transliterator for Indic languages.

DEFAULT_MODELS = {'anthropic': 'claude-3-opus-20240229', 'cohere': 'command-r-plus', 'google': 'gemini-pro', 'openai': 'gpt-4-turbo-preview'}
INDIC_LANGUAGES = {'bengali': {'iso': 'bn', 'native': 'বাংলা', 'script': 'bengali'}, 'english': {'iso': 'en', 'native': 'English', 'script': 'latin'}, 'gujarati': {'iso': 'gu', 'native': 'ગુજરાતી', 'script': 'gujarati'}, 'hindi': {'iso': 'hi', 'native': 'हिन्दी', 'script': 'devanagari'}, 'kannada': {'iso': 'kn', 'native': 'ಕನ್ನಡ', 'script': 'kannada'}, 'malayalam': {'iso': 'ml', 'native': 'മലയാളം', 'script': 'malayalam'}, 'marathi': {'iso': 'mr', 'native': 'मराठी', 'script': 'devanagari'}, 'odia': {'iso': 'or', 'native': 'ଓଡ଼ିଆ', 'script': 'odia'}, 'punjabi': {'iso': 'pa', 'native': 'ਪੰਜਾਬੀ', 'script': 'gurmukhi'}, 'sanskrit': {'iso': 'sa', 'native': 'संस्कृतम्', 'script': 'devanagari'}, 'tamil': {'iso': 'ta', 'native': 'தமிழ்', 'script': 'tamil'}, 'telugu': {'iso': 'te', 'native': 'తెలుగు', 'script': 'telugu'}, 'urdu': {'iso': 'ur', 'native': 'اردو', 'script': 'arabic'}}
__init__(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]

Initialize the Indic LLM transliterator.

Parameters:
  • source_lang – Source language (e.g., ‘hindi’, ‘tamil’)

  • target_lang – Target language (e.g., ‘english’)

  • provider – LLM provider (openai, anthropic, etc.). Auto-detected if not provided.

  • model – Specific model to use. Uses provider defaults if not provided.

  • api_key – API key. Uses environment variables if not provided.

  • temperature – LLM temperature for consistency (lower = more consistent).

  • cache_examples – Whether to cache generated few-shot examples.

generate_few_shot_examples(num_examples: int = 5) list[dict[str, str]][source]

Generate few-shot transliteration examples for the language pair.

Return type:

list[dict[str, str]]

Parameters:

num_examples – Number of examples to generate.

Returns:

List of dictionaries with ‘source’ and ‘target’ keys.

transliterate(text: str, use_few_shot: bool = True, num_examples: int = 5) str[source]

Transliterate text from source language to target language.

Return type:

str

Parameters:
  • text – Text to transliterate.

  • use_few_shot – Whether to use few-shot examples.

  • num_examples – Number of few-shot examples to use.

Returns:

Transliterated text.

transliterate_batch(texts: list[str], batch_size: int = 10, use_few_shot: bool = True) list[str][source]

Transliterate multiple texts efficiently.

Return type:

list[str]

Parameters:
  • texts – List of texts to transliterate.

  • batch_size – Number of texts to process in one API call.

  • use_few_shot – Whether to use few-shot examples.

Returns:

List of transliterated texts.

indicate.detect_indic_script(text: str) str | None[source]

Auto-detect Indic script from Unicode ranges.

Return type:

str | None

Parameters:

text – Text to analyze.

Returns:

Detected script name or None if not Indic.

indicate.detect_language_from_script(text: str) str | None[source]

Detect the most likely language based on script and context.

Return type:

str | None

Parameters:

text – Text to analyze.

Returns:

Detected language name or None.