ব্যবহারকারী:Alex brollo/Transliteration tables

Roman to bengali

@Bodhisattwa: I see that you feel a need about roman to bengali transliteration. I found lots of standards for romanization of indic scripts, but few if any for "indicization" of roman scripts; perhaps from the fact that knowledge of roman script is very common among indic fellows, where knowledge of indic scripts among roman-writing people is "a little bit uncommon" o_O.

Strange to say, I think that it is a difficult task, since roman scripts have lots of national variants, mainly got by different use of diacritics; european languages use too different alphabets, as roman, cyrillic, greek. English doesn't use diacritics at all, but is an unusual case. So, how can we go on? Have you some reference? --Alex brollo (আলাপ) ১৫:১৫, ২৭ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo:, The only book in Bengali Wikisource which have romanized script is Márk Likhita Susamácár. There is a rule in this and this page, which we can use as a starting point reference, and gradually expand if needed. -- বোধিসত্ত্ব (আলাপ) ১৬:১৪, ২৭ অক্টোবর ২০১৬ (ইউটিসি)

@Bodhisattwa: Very useful and inspiring, even if some "reverse engineering" of rules is necessary. I'll review the English part of those two pages, please review carefully the Bengali characters, to use them as a safe starting point. Do those romanization rules follow any of previously listed methods, or are they "original"? --Alex brollo (আলাপ) ১৩:১২, ২৮ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo:, the method is kind of mixed of all the existing system, with some originality, of course. I need some time to tabulate all methods in a single table. -- বোধিসত্ত্ব (আলাপ) ১৩:৪৫, ২৮ অক্টোবর ২০১৬ (ইউটিসি)

Rules of Márk Likhita Susamácár are very complex (I feel that they are a mixture of character and phonetic transliteration). I supposte that a possible trick would be to build a js tool to add a second, editable, approximate Bengali transliteration as parameter 2 into a template, containing original romanized text as parameter 1. Something like this (I use random bengali text just to show the rough idea!): {{trans|susamácárer}} -> select, click the tool -> <nowiki>{{trans|susamácárer|ব্যবহারকারী আলা}}. The tool should only run when trans template has only one parameter. --Alex brollo (আলাপ) ১৪:০০, ২৮ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo:, I have tabulated all the rules in a single table (excluding the Mark Likhita Susamácár special rules.) In this table you can see, it is very easy to transliterate Devanagari and Bengali with each other. For Roman scripts, most of the rules are common in each system. Only there are some exceptions, which can be sorted out gradually. -- বোধিসত্ত্ব (আলাপ) ১৫:০৪, ২৮ অক্টোবর ২০১৬ (ইউটিসি)

All methods in a single table

All system
ISO !ISO	IAST	HK	NLK	ITRANS	Mark Likhita	Devanagari	Bengali
A !a	a A	a	a	a	a A	अ	অ
Aa !ā	ā Ā	A	ā	aa	á Á	आ ा	আ া
I !i	i I	i	i	i	i I	इ ि	ই ি
Ii !ī	ī Ī	I	ī	ii I	í Í	ई ी	ঈ ী
U !u	u U	u	u	u	u U	उ ु	উ ু
Uu !ū	ū Ū	U	ū	uu U	ú Ú	ऊ ू	ঊ ূ
R-vocalic !r̥	ṛ Ṛ	R	ṛ	RRi R^i	ri	ऋ ृ	ঋ ৃ
Rr-vocalic !r̥̄	ṝ Ṝ	RR		RRI R^I		ॠ ॄ	ৠ ৄ
L-vocalic !l̥	ḷ Ḷ	lR		LLi L^i		ऌ ॢ	ঌ ৢ
Ll-vocalic !l̥̄	ḹ Ḹ	lRR		LLI L^I		ॡ ॣ	ৡ ৣ
E-short !e						ऎ ॆ
E !ē	e E	e	ē	e	e	ए े	এ ে
E-candra !ê						ऍ ॅ
Ai !ai	ai Ai	ai	ai	ai	ai	ऐ ै	ঐ ৈ
O-short !o						ऒ ॊ
O !ō	o O	o	ō	o	o	ओ ो	ও ো
O-candra !ô						ऑ ॉ
Au !au	au Au	au	au	au	au	औ ौ	ঔ ৌ
M-anusvara !ṁ	ṃ Ṃ	M	ṃ	.m N		ं	ং
M-candrabindu !m̐				.N	ṇ	ँ	ঁ
H-visarga !ḥ	ḥ Ḥ	H	ḥ	H		ः	ঃ
Ka !k	k K	k	k	k	k	क्	ক্‌
Kha !kh	kh Kh	kh	kh	kh	kh	ख्	খ্‌
Ga !g	g G	g	g	g	g	ग्	গ্‌
Gha !gh	gh Gh	gh	gh	gh	gh	घ्	ঘ্‌
Nga !ṅ	ṅ Ṅ	G	ṅ	~N	n ṅ	ङ्	ঙ্‌
Ca !c	c C	c	c	ch	c	च्	চ্‌
Cha !ch	ch Ch	ch	ch	Ch	ch	छ्	ছ্‌
Ja !j	j J	j	j	j	j	ज्	জ্‌
Jha !jh	jh Jh	jh	jh	jh	jh	झ्	ঝ্‌
Nya !ñ	ñ Ñ	J	ñ	~n	n ñ	ञ्	ঞ্‌
Tta !ṭ	ṭ Ṭ	T	ṭ	T	ṭ	ट्	ট্‌
Ttha !ṭh	ṭh Ṭh	Th	ṭh	Th	ṭh	ठ्	ঠ্‌
Dda !ḍ	ḍ Ḍ	D	ḍ	D	ḍ	ड्	ড্‌
Rra !ṛ					ṛ	ड़्	ড়্‌
Ddha !ḍh	ḍh Ḍh	Dh	ḍh	Dh	ḍh	ढ्	ঢ্‌
Rrha !ṛh					ṛh	ढ़्	ঢ়্‌
Nna !ṇ	ṇ Ṇ	N	ṇ	N	n	ण्	ণ্‌
Ta !t	t T	t	t	t	t	त्	ত্‌
Tha !th	th Th	th	th	th	th	थ्	থ্‌
Da !d	d D	d	d	d	d	द्	দ্‌
Dha !dh	dh Dh	dh	dh	dh	dh	ध्	ধ্‌
Na !n	n N	n	n	n	n	न्	ন্‌
Pa !p	p P	p	p	p	p	प्	প্‌
Pha !ph	ph Ph	ph	ph	ph	ph	फ्	ফ্‌
Ba !b	b B	b	b	b	b	ब्	ব্‌
Bha !bh	bh Bh	bh	bh	bh	bh	भ्	ভ্‌
Ma !m	m M	m	m	m	m	म्	ম্‌
Ya !y	y Y	y	ẏ y	y	j y	य्	য্‌
Yya !ẏ					y	य़्	য়্‌
Ra !r	r R	r	r	r	r	र्	র্‌
La !l	l L	l	l	l	l	ल्	ল্‌
Va !v	v V	v	v	v w	v	व्	ব্‌
Sha !ś	ś Ś	z	ś	sh	sh	श्	শ্‌
Ssa !ṣ	ṣ Ṣ	S	ṣ	Sh	sh	ष्	ষ্‌
Sa !s	s S	s	s	s	s	स्	স্‌
Ha !h	h H	h	h	h	h	ह्	হ্‌
-avagraha !’				.a		ऽ	ঽ
Qa !q						क़्	ক়্
Khha !k͟h						ख़्	খ়্
Ghha !ġ						ग़्	গ়্
Za !z						ज़्	জ়্
Fa !f						फ़्	ফ়্

Few important rules

অ, আ, ই, ঈ, উ, ঊ, ঋ, ৠ, ঌ, ৡ, এ, ঐ, ও, ঔ are vowels. Rest are consonants.
When words start with vowels, it starts with the full form i.e. অ, আ, ই etc.
If vowels are conjoined with consonants, the short forms are written, like ক্‌ + অ = ক, খ্‌ + ই = খি, গ্‌ + আ = গা etc.

Number system

Arabic	Devanagari	Bengali
0	०	০
1	१	১
2	२	২
3	३	৩
4	४	৪
5	५	৫
6	६	৬
7	७	৭
8	८	৮
9	९	৯

New addition

@Alex brollo: - The letter 'x' was not included in the table, which needs to be added. In Devanagari, its क्स and in Bengali its ক্স. -- বোধিসত্ত্ব (আলাপ) ১১:৪১, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo: Also forgot to mention, Roman script full stop (.) is । in both Bengali and Devanagari. -- বোধিসত্ত্ব (আলাপ) ১২:৫৯, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

The former is not so a big issue, we have to add too digits; the latter is terrible, since it conflicts with template marckup! Another toxic character is the sign =, as you know.... --Alex brollo (আলাপ) ১৮:১২, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo: Full stop in Bengali (।) and pipe (|) are different. I think, it wont be a problem for templates. -- বোধিসত্ত্ব (আলাপ) ২০:২৯, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

@Bodhisattwa: Thanks! I just discovered it from a good blog post. So, the only issue remains a well-known one, the = character into parameters. --Alex brollo (আলাপ) ১২:০৯, ১ নভেম্বর ২০১৬ (ইউটিসি)

@Alex brollo:, = is same for both scripts. বোধিসত্ত্ব (আলাপ) ১২:১২, ১ নভেম্বর ২০১৬ (ইউটিসি)

Second try runs better

Ok, now I think that main rules of transliteration have been implemented. In view mode, as you asked, there's a sidebar link that only appears if the page has some html element class="tr". Click event onto the text has been erased. --Alex brollo (আলাপ) ২২:০৮, ১ নভেম্বর ২০১৬ (ইউটিসি)

@Alex brollo:, this is just awesome. The major issues have been solved. Only few dis-ambiguity and fine refinements are needed now. I am trying to enlist them below, one by one. -- বোধিসত্ত্ব (আলাপ) ০৫:২৬, ২ নভেম্বর ২০১৬ (ইউটিসি)

No.	rule	formula	example
1	if k is conjoined with sh, then sh = ষ্‌ (not শ্‌)	ksh = ক্‌ + ষ্‌ = ক্ষ্	ksha = ক্‌ + ষ্‌ + অ = ক্ষ, kshai = ক্‌ + ষ্‌ + ই = ক্ষি, etc.
2	if j is conjoined with n, then n = ঞ্‌ (not ঙ্, ণ্‌ or ন্)	jn = জ্‌ + ঞ্‌ = জ্ঞ্	জ্‌ + ঞ্‌ + অ = জ্ঞ, jní = জ্‌ + ঞ্‌+ ঈ = জ্ঞী, etc.
3	if sh is conjoined with n, then sh = ষ্‌ (not শ্‌) and n = ণ্‌ (not ঙ্, ঞ্‌ or ন্)	shn = ষ্‌ + ণ্‌ = ষ্ণ্‌	shna = ষ্‌ + ণ্‌ + অ = ষ্ণ
4	if another consonant is conjoined with y, then y= য্‌ (not য়্)	k + y = ক্‌ +য্‌ = ক্য্‌	k + y + a = ক্‌ +য্‌ + অ = ক্য, kh + y + i = খ্‌ + য্‌ + ই = খ্যি
5	if ri ~~is followed by~~ follows a vowel, then ri= r+i= র্‌ + ই = রি (not ri=ঋ)	a + ri = অ +রি = অরি	k + a + ri = ক্‌ + অ + র্‌ + ই = করি
6	if n is conjoined with g, then ng = ঙ্গ্‌		ng + a = ঙ্গ্‌ + অ = ঙ্গ
7	if ṇ is conjoined with g, then ṇg = ং		r + a + ṇg = র্‌ +অ + ং = রং

Do you think that the listed changes can be run as substitutions in Bengali transcription, or they need a refinement of some step into main "transliteration engine"?

Just to let you know, our work has been very inspiring and highlighting for some Italian script issues, it will turn out useful! I.e., ancient Italian and Latin have some need of transliteration and of conjuncts; we simply ignored them as a mysterious and exoteric problems... ;-): some examples: ct -> c‍t; st -> s‍t; fi -> f‍i; OE -> O‍E. Alex brollo (আলাপ) ০৮:০০, ২ নভেম্বর ২০১৬ (ইউটিসি)

@Alex brollo:, As these are special rules I think, addition of few more logic to include the above changes in the transliteration engine will refine the output.

Wow! Its great to know that your endeavor here has helped to solve Italian Wikisource issues. -- বোধিসত্ত্ব (আলাপ) ১২:৫৬, ২ নভেম্বর ২০১৬ (ইউটিসি)

@Bodhisattwa: Ok; I'll try some changes of the data flow into the script, perhaps they could make easier to manage exceptions to usual rules and to keep trace of transcription steps. Alex brollo (আলাপ) ২৩:১০, ২ নভেম্বর ২০১৬ (ইউটিসি)

@Bodhisattwa: I've some work to do for it.source friends, please be patient. In the meantime I think about the neatest way to implement exceptions. I presume that the rules could be best implemented after step 2 and before step 3; but some could be too implemented into step 1. I'll try into t1.js, the "test script". --Alex brollo (আলাপ) ২৩:২৯, ৪ নভেম্বর ২০১৬ (ইউটিসি)

Sure @Alex brollo:, No problem. Please take your time. -- বোধিসত্ত্ব (আলাপ) ০৬:০২, ৬ নভেম্বর ২০১৬ (ইউটিসি)

@Bodhisattwa:

করা হয়েছে , please take a good look to transliterated pages of Mark Likhita. For some reason MediaWiki:Epub.css is nor exported so far into ePub by now; I think for some exotic cache management. --Alex brollo (আলাপ) ২৩:৩২, ১১ নভেম্বর ২০১৬ (ইউটিসি)

@Alex brollo:, Now we have some complex rules with ষ and ণ. I am trying to tabulate the simpler rules, the more complex ones can be corrected manually. বোধিসত্ত্ব (আলাপ) ১৪:০২, ১২ নভেম্বর ২০১৬ (ইউটিসি)

No.	rule	formula	example
8	if sh is conjoined with k, then sh = ষ্‌ (not শ্‌)	shk = ষ্‌ + ক্‌ = ষ্ক্‌	shka = ষ্‌ + ক্‌ + অ = ষ্ক
9	if sh is conjoined with p, then sh = ষ্‌ (not শ্‌)	shp = ষ্‌ + প্‌ = ষ্প্‌	shpa = ষ্‌ + প্‌ + অ = ষ্প
10	if sh is conjoined with ph, then sh = ষ্‌ (not শ্‌)	shph = ষ্‌ + ফ্‌ = ষ্ফ্‌	shpha = ষ্‌ + ফ্‌ + অ = ষ্ফ
11	if sh is conjoined with ṭ, then sh = ষ্‌ (not শ্‌)	shṭ = ষ্‌ + ট্‌ = ষ্ট্‌	shṭa = ষ্‌ + ট্‌ + অ = ষ্ট
12	if sh is conjoined with ṭh, then sh = ষ্‌ (not শ্‌)	shṭh = ষ্‌ + ঠ্‌ = ষ্ঠ্‌	shṭha = ষ্‌ + ঠ্‌ + অ = ষ্ঠ
13	if sh is conjoined with t, then sh = ষ্‌ (not শ্‌) and t = ট্‌ (not ত্‌) [Mark Likhita special rule]	sht = ষ্‌ + ট্‌ = ষ্ট্‌	shta = ষ্‌ + ট্‌ + অ = ষ্ট
14	if sh is conjoined with t, then sh = ষ্‌ (not শ্‌) and th = ঠ্‌ (not থ্‌) [Mark Likhita special rule]	shth = ষ্‌ + ঠ্‌ = ষ্ঠ্‌	shṭha = ষ্‌ + ঠ্‌ + অ = ষ্ঠ
15	if n is conjoined with ṭ, then n = ণ্‌ (not ন্‌)	nṭ = ণ্‌ + ট্‌ = ণ্ট্‌	nṭa = ণ্‌ + ট্‌ + অ = ণ্ট
16	if n is conjoined with ṭh, then n = ণ্‌ (not ন্‌)	nṭh = ণ্‌ + ঠ্‌ = ণ্ঠ্‌	nṭha = ণ্‌ + ঠ্‌ + অ = ণ্ঠ
17	if n is conjoined with ḍ, then n = ণ্‌ (not ন্‌)	nḍ = ণ্‌ + ড্‌ = ণ্ড্‌	nḍa = ণ্‌ + ড্‌ + অ = ণ্ড
18	if n is conjoined with ḍh, then n = ণ্‌ (not ন্‌)	nḍh = ণ্‌ + ঢ্‌ = ণ্ঢ্‌	nḍha = ণ্‌ + ঢ্‌ + অ = ণ্ঢ
19	if n is conjoined with t, then n = ন্‌ (not ণ্‌)	nt = ন্‌ + ত্‌ = ন্ত্‌	nta = ন্‌ + ত্‌ + অ = ন্ত
20	if n is conjoined with th, then n = ন্‌ (not ণ্‌)	nth = ন্‌ + থ্‌ = ন্থ্‌	ntha = ন্‌ + থ্‌ + অ = ন্থ
21	if n is conjoined with d, then n = ন্‌ (not ণ্‌)	nḍ = ন্‌ + দ্‌ = ন্দ্‌	nḍa = ন্‌ + দ্‌ + অ = ন্দ
22	if n is conjoined with dh, then n = ন্‌ (not ণ্‌)	nḍh = ন্‌ + ধ্‌ = ন্ধ্‌	nḍha = ন্‌ + ধ্‌ + অ = ন্ধ
23	if sh is followed by ঋ, then sh = ষ্‌ (not শ্‌)	rish = ঋ + ষ্‌ = ঋষ্‌	rishi = ঋ + ষ্‌ + ই = ঋষি
24	if n is followed by ঋ, then n = ণ্‌ (not ন্‌)	rin = ঋ + ণ্‌ = ঋণ্‌	rina = ঋ + ণ্‌ + অ = ঋণ

Ok: I see that are groups of rules, so the job is easier. In the meantime, I guess that it would be useful a gadget, to add/change/remove diacritics to any character, by one click. I just install it (as a gadget) into la.source. It is perfectly unuseful for bengali but it can be very useful to fix Mark Likhita roman text. Would you like to try it? --Alex brollo (আলাপ) ১৮:১০, ১৩ নভেম্বর ২০১৬ (ইউটিসি)

@Alex brollo:, thanks a lot for the link. I have added the gadget here. -- বোধিসত্ত্ব (আলাপ) ১৪:২২, ১৪ নভেম্বর ২০১৬ (ইউটিসি)

Arabic	Devanagari	Bengali
0	०	০
1	१	১
2	२	২
3	३	৩
4	४	৪
5	५	৫
6	६	৬
7	७	৭
8	८	৮
9	९	৯

Arabic	Devanagari	Bengali
0	०	০
1	१	১
2	२	২
3	३	৩
4	४	৪
5	५	৫
6	६	৬
7	७	৭
8	८	৮
9	९	৯

Arabic	Devanagari	Bengali
0	०	০
1	१	১
2	२	২
3	३	৩
4	४	৪
5	५	৫
6	६	৬
7	७	৭
8	८	৮
9	९	৯