indicate package¶
Submodules¶
indicate.base module¶
indicate.decoder module¶
- class indicate.decoder.Decoder(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') None[source]¶
Bases:
Model- __init__(vocab_size: int, embedding_dim: int, dec_units: int, batch_sz: int, max_length_input: int, max_length_output: int, attention_type: str = 'luong') None[source]¶
indicate.encoder module¶
- class indicate.encoder.Encoder(vocab_size: int, embedding_dim: int, enc_units: int, batch_sz: int) None[source]¶
Bases:
Model- call(x: Tensor, hidden: list[Tensor], training: bool | None = None) tuple[Tensor, Tensor, Tensor][source]¶
- Return type:
tuple[Tensor,Tensor,Tensor]
- Return type:
list[Tensor]
indicate.hindi2english module¶
- class indicate.hindi2english.HindiToEnglish[source]¶
Bases:
object-
MODELFN:
str= 'data/hindi_to_english/saved_weights/'¶
-
INPUT_VOCAB:
str= 'data/hindi_to_english/hindi_tokens.json'¶
-
TARGET_VOCAB:
str= 'data/hindi_to_english/english_tokens.json'¶
-
embedding_dim:
int= 256¶
-
units:
int= 1024¶
-
BATCH_SIZE:
int= 64¶
-
BUFFER_SIZE:
int= 120000¶
-
max_length_input:
int= 47¶
-
max_length_output:
int= 173¶
-
START_TOKEN:
str= '^'¶
-
END_TOKEN:
str= '$'¶
-
input_lang_tokenizer:
Any|None= None¶
-
target_lang_tokenizer:
Any|None= None¶
- classmethod transliterate(input: str) str[source]¶
Transliterate from Hindi to English.
- Return type:
str- Parameters:
input (str) – Hindi text
- Returns:
English text
- Return type:
output (str)
- Raises:
TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails
-
MODELFN:
indicate.logging module¶
indicate.transliterate module¶
Transliteration module with backward compatibility.
This module maintains the original API while delegating to the new Click-based CLI.
- indicate.transliterate.hindi2english(input: str) str¶
Transliterate from Hindi to English.
- Return type:
str- Parameters:
input (str) – Hindi text
- Returns:
English text
- Return type:
output (str)
- Raises:
TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails
indicate.utils module¶
- indicate.utils.sequence_to_chars(tokenizer: Any, sequence: Tensor) str[source]¶
Convert a sequence of indices back to characters.
- Return type:
str
Module contents¶
- indicate.hindi2english(input: str) str¶
Transliterate from Hindi to English.
- Return type:
str- Parameters:
input (str) – Hindi text
- Returns:
English text
- Return type:
output (str)
- Raises:
TypeError – If input is None
ValueError – If input is not a string
RuntimeError – If model loading fails
- class indicate.IndicLLMTransliterator(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]¶
Bases:
objectLLM-based transliterator for Indic languages.
- DEFAULT_MODELS = {'anthropic': 'claude-3-opus-20240229', 'cohere': 'command-r-plus', 'google': 'gemini-pro', 'openai': 'gpt-4-turbo-preview'}¶
- INDIC_LANGUAGES = {'bengali': {'iso': 'bn', 'native': 'বাংলা', 'script': 'bengali'}, 'english': {'iso': 'en', 'native': 'English', 'script': 'latin'}, 'gujarati': {'iso': 'gu', 'native': 'ગુજરાતી', 'script': 'gujarati'}, 'hindi': {'iso': 'hi', 'native': 'हिन्दी', 'script': 'devanagari'}, 'kannada': {'iso': 'kn', 'native': 'ಕನ್ನಡ', 'script': 'kannada'}, 'malayalam': {'iso': 'ml', 'native': 'മലയാളം', 'script': 'malayalam'}, 'marathi': {'iso': 'mr', 'native': 'मराठी', 'script': 'devanagari'}, 'odia': {'iso': 'or', 'native': 'ଓଡ଼ିଆ', 'script': 'odia'}, 'punjabi': {'iso': 'pa', 'native': 'ਪੰਜਾਬੀ', 'script': 'gurmukhi'}, 'sanskrit': {'iso': 'sa', 'native': 'संस्कृतम्', 'script': 'devanagari'}, 'tamil': {'iso': 'ta', 'native': 'தமிழ்', 'script': 'tamil'}, 'telugu': {'iso': 'te', 'native': 'తెలుగు', 'script': 'telugu'}, 'urdu': {'iso': 'ur', 'native': 'اردو', 'script': 'arabic'}}¶
- __init__(source_lang: str, target_lang: str, provider: str | None = None, model: str | None = None, api_key: str | None = None, temperature: float = 0.3, cache_examples: bool = True)[source]¶
Initialize the Indic LLM transliterator.
- Parameters:
source_lang – Source language (e.g., ‘hindi’, ‘tamil’)
target_lang – Target language (e.g., ‘english’)
provider – LLM provider (openai, anthropic, etc.). Auto-detected if not provided.
model – Specific model to use. Uses provider defaults if not provided.
api_key – API key. Uses environment variables if not provided.
temperature – LLM temperature for consistency (lower = more consistent).
cache_examples – Whether to cache generated few-shot examples.
- generate_few_shot_examples(num_examples: int = 5) list[dict[str, str]][source]¶
Generate few-shot transliteration examples for the language pair.
- Return type:
list[dict[str,str]]- Parameters:
num_examples – Number of examples to generate.
- Returns:
List of dictionaries with ‘source’ and ‘target’ keys.
- transliterate(text: str, use_few_shot: bool = True, num_examples: int = 5) str[source]¶
Transliterate text from source language to target language.
- Return type:
str- Parameters:
text – Text to transliterate.
use_few_shot – Whether to use few-shot examples.
num_examples – Number of few-shot examples to use.
- Returns:
Transliterated text.
- transliterate_batch(texts: list[str], batch_size: int = 10, use_few_shot: bool = True) list[str][source]¶
Transliterate multiple texts efficiently.
- Return type:
list[str]- Parameters:
texts – List of texts to transliterate.
batch_size – Number of texts to process in one API call.
use_few_shot – Whether to use few-shot examples.
- Returns:
List of transliterated texts.