indicate package¶

Submodules¶

indicate.base module¶

indicate.decoder module¶

class indicate.decoder.Decoder(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') → None[source]¶

Bases: Model

__init__(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') → None[source]¶

build_attention_mechanism(memory: Tensor | None) → Layer[source]¶

Return type:: Layer

call(inputs: Tensor, initial_state: list[Tensor], encoder_outputs: Tensor) → tuple[Tensor, tuple[Tensor, Tensor], Tensor][source]¶

Return type:: tuple[Tensor, tuple[Tensor, Tensor], Tensor]

build_initial_state(batch_sz: int, encoder_state: list[Tensor]) → list[Tensor][source]¶

Return type:: list[Tensor]

indicate.encoder module¶

class indicate.encoder.Encoder(vocab_size: int, embedding_dim: int, enc_units: int, batch_sz: int) → None[source]¶

Bases: Model

__init__(vocab_size: int, embedding_dim: int, enc_units: int, batch_sz: int) → None[source]¶

call(x: Tensor, hidden: list[Tensor], training: bool | None = None) → tuple[Tensor, Tensor, Tensor][source]¶

Return type:: tuple[Tensor, Tensor, Tensor]

initialize_hidden_state() → list[Tensor][source]¶

Return type:: list[Tensor]

indicate.hindi2english module¶

class indicate.hindi2english.HindiToEnglish[source]¶

Bases: object

MODELFN: str = 'data/hindi_to_english/saved_weights/'¶

INPUT_VOCAB: str = 'data/hindi_to_english/hindi_tokens.json'¶

TARGET_VOCAB: str = 'data/hindi_to_english/english_tokens.json'¶

embedding_dim: int = 256¶

units: int = 1024¶

BATCH_SIZE: int = 64¶

BUFFER_SIZE: int = 120000¶

max_length_input: int = 47¶

max_length_output: int = 173¶

START_TOKEN: str = '^'¶

END_TOKEN: str = '$'¶

input_lang_tokenizer: Any | None = None¶

target_lang_tokenizer: Any | None = None¶

encoder: Encoder | None = None¶

decoder: Decoder | None = None¶

classmethod get_model_path() → str[source]¶

Return type:: str

classmethod get_input_vocab() → str[source]¶

Return type:: str

classmethod get_target_vocab() → str[source]¶

Return type:: str

classmethod transliterate(input: str) → str[source]¶

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:

TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails

indicate.logging module¶

indicate.logging.get_logger() → Logger[source]¶

Return type:: Logger

indicate.transliterate module¶

Transliteration module with backward compatibility.

This module maintains the original API while delegating to the new Click-based CLI.

indicate.transliterate.hindi2english(input: str) → str¶

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:

TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails

indicate.transliterate.main(argv: list[str] | None = None) → int[source]¶

Legacy entry point for backward compatibility.

Return type:: int

indicate.utils module¶

indicate.utils.sequence_to_chars(tokenizer: Any, sequence: Tensor) → str[source]¶

Convert a sequence of indices back to characters.

Return type:: str

indicate.utils.evaluate_sentence(sentence: str, units: int, input_lang_tokenizer: Any, target_lang_tokenizer: Any, encoder: Any, decoder: Any, max_length_input: int) → Tensor[source]¶

Evaluate/translate a single sentence.

Return type:: Tensor

indicate.utils.translate(sentence: str, units: int, input_lang_tokenizer: Any, target_lang_tokenizer: Any, encoder: Any, decoder: Any, max_length_input: int) → str[source]¶

Translate a sentence from source to target language.

Return type:: str

Module contents¶

indicate.hindi2english(input: str) → str¶

Transliterate from Hindi to English.

Return type:

str

Parameters:

input (str) – Hindi text

Returns:

English text

Return type:

output (str)

Raises:

TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails

class indicate.IndicLLMTransliterator(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]¶

Bases: object

LLM-based transliterator for Indic languages.

DEFAULT_MODELS = {'anthropic': 'claude-3-opus-20240229', 'cohere': 'command-r-plus', 'google': 'gemini-pro', 'openai': 'gpt-4-turbo-preview'}¶

INDIC_LANGUAGES = {'bengali': {'iso': 'bn', 'native': 'বাংলা', 'script': 'bengali'}, 'english': {'iso': 'en', 'native': 'English', 'script': 'latin'}, 'gujarati': {'iso': 'gu', 'native': 'ગુજરાતી', 'script': 'gujarati'}, 'hindi': {'iso': 'hi', 'native': 'हिन्दी', 'script': 'devanagari'}, 'kannada': {'iso': 'kn', 'native': 'ಕನ್ನಡ', 'script': 'kannada'}, 'malayalam': {'iso': 'ml', 'native': 'മലയാളം', 'script': 'malayalam'}, 'marathi': {'iso': 'mr', 'native': 'मराठी', 'script': 'devanagari'}, 'odia': {'iso': 'or', 'native': 'ଓଡ଼ିଆ', 'script': 'odia'}, 'punjabi': {'iso': 'pa', 'native': 'ਪੰਜਾਬੀ', 'script': 'gurmukhi'}, 'sanskrit': {'iso': 'sa', 'native': 'संस्कृतम्', 'script': 'devanagari'}, 'tamil': {'iso': 'ta', 'native': 'தமிழ்', 'script': 'tamil'}, 'telugu': {'iso': 'te', 'native': 'తెలుగు', 'script': 'telugu'}, 'urdu': {'iso': 'ur', 'native': 'اردو', 'script': 'arabic'}}¶

__init__(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]¶

Initialize the Indic LLM transliterator.

Parameters:

source_lang – Source language (e.g., ‘hindi’, ‘tamil’)
target_lang – Target language (e.g., ‘english’)
provider – LLM provider (openai, anthropic, etc.). Auto-detected if not provided.
model – Specific model to use. Uses provider defaults if not provided.
api_key – API key. Uses environment variables if not provided.
temperature – LLM temperature for consistency (lower = more consistent).
cache_examples – Whether to cache generated few-shot examples.

generate_few_shot_examples(num_examples: int = 5) → list[dict[str, str]][source]¶

Generate few-shot transliteration examples for the language pair.

Return type:: list[dict[str, str]]
Parameters:: num_examples – Number of examples to generate.
Returns:: List of dictionaries with ‘source’ and ‘target’ keys.

transliterate(text: str, use_few_shot: bool = True, num_examples: int = 5) → str[source]¶

Transliterate text from source language to target language.

Return type:

str

Parameters:

text – Text to transliterate.
use_few_shot – Whether to use few-shot examples.
num_examples – Number of few-shot examples to use.

Returns:

Transliterated text.

transliterate_batch(texts: list[str], batch_size: int = 10, use_few_shot: bool = True) → list[str][source]¶

Transliterate multiple texts efficiently.

Return type:

list[str]

Parameters:

texts – List of texts to transliterate.
batch_size – Number of texts to process in one API call.
use_few_shot – Whether to use few-shot examples.

Returns:

List of transliterated texts.

indicate.detect_indic_script(text: str) → str | None[source]¶

Auto-detect Indic script from Unicode ranges.

Return type:: str | None
Parameters:: text – Text to analyze.
Returns:: Detected script name or None if not Indic.

indicate.detect_language_from_script(text: str) → str | None[source]¶

Detect the most likely language based on script and context.

Return type:: str | None
Parameters:: text – Text to analyze.
Returns:: Detected language name or None.