Non-English Letters in Nicks
 

News:

29 December 2022 - PtokaX 0.5.3.0 (20th anniversary edition) released...
11 April 2017 - PtokaX 0.5.2.2 released...
8 April 2015 Anti child and anti pedo pr0n scripts are not allowed anymore on this board!
28 September 2015 - PtokaX 0.5.2.1 for Windows 10 IoT released...
3 September 2015 - PtokaX 0.5.2.1 released...
16 August 2015 - PtokaX 0.5.2.0 released...
1 August 2015 - Crowdfunding for ADC protocol support in PtokaX ended. Clearly nobody want ADC support...
30 June 2015 - PtokaX 0.5.1.0 released...
30 April 2015 Crowdfunding for ADC protocol support in PtokaX
26 April 2015 New support hub!
20 February 2015 - PtokaX 0.5.0.3 released...
13 April 2014 - PtokaX 0.5.0.2 released...
23 March 2014 - PtokaX testing version 0.5.0.1 build 454 is available.
04 March 2014 - PtokaX.org sites were temporary down because of DDOS attacks and issues with hosting service provider.

Main Menu

Non-English Letters in Nicks

Started by WAJIM, 06 January, 2010, 14:39:43

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

WAJIM

There is a problem at use not-English letters in user nicks, Russian (CP-1251) in my case.  :-\

HUB doesn't much lower and upper letters. Example: if there are user1 on hub with nick "Вася" and then user2 with nick "ВАСЯ"/"вася"/"вАСЯ"/"ВасЯ" connects to hub, in the hub's code nick check something like lcase("Вася") != lcase("ВАСЯ"), that is incorrect! Thus lcase("Vasya") == lcase("VASYA"), that is correct. As a result the hub confuses MyInfo/IP between such users and doesn't kick registered user when new user with similar nick comes on HUB.

Letters ANSI ranges: 0xE0-0xFF (а-я) & 0xB8 (ё) (Lower), 0xC0-0xDF (А-Я) & 0xA8 (Ё) (Upper).

PPK

PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

#2
Quote from: PPK on 06 January, 2010, 22:39:17
PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
In my case tolower works incorrectly for russian letters.

It is necessary use locale-dependent tolower() function, look for 2nd parameter...  :-\

PS: In LUA os.setlocale("rus") makes string.lower() work fine for russian letters.  ::)

PPK

Then system locale where is hub running have different locale than russian...
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

Is it possible to make switching locale in hub's options?  ::)

PPK

It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already). Future is in unicode ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

#6
Quote from: PPK on 07 January, 2010, 00:08:34
It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already).
It is not necessary, clients works fine with CP1251, the problem in HUB's tolower() function.

Can you make conversion table in options like "AaBbCcDdEeFfGg..." for optional tolower() replacement? Each admin could specify the rules of change of letters.  ::)

PPK

Quote from: WAJIM on 07 January, 2010, 13:09:14
the problem in HUB's tolower() function.
Hub use same tolower function as lua in his string.lower(). As you said when you in lua set locale then it is working correctly. It is working correctly for me when system locale is set correctly (same thing cause that clients works correctly) ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

PPK

And maybe here was really bug... missing one magic line. Then 0.4.1.2 should fix it  :o
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

#9
Quote from: PPK on 09 January, 2010, 18:35:02
As you said when you in lua set locale then it is working correctly.
Yes, but only after this lines in every script:
function OnStartup()
os.setlocale("rus")
end


QuoteAnd maybe here was really bug... missing one magic line.
:o

WAJIM

#10
I have just checked up 0.4.1.2...
It seems, that locale not from my country...  :'( Some letters not tolowers correctly...
I use this function for additional nick checking.  :-\
function OnStartup()
os.setlocale("rus")
end

function ValidateNickArrival(user, data)
    local i, v, nick, lnick
_,_,nick = string.find(data, "^$ValidateNick (.+)|$")
lnick = string.lower(nick)
if string.find(lnick, "[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]") then
    for _, i in ipairs(RegMan.GetRegs()) do
        v = i.sNick
        if string.lower(v) == lnick and v ~= nick then
            Core.SendToUser(user, "*** Your nick ("..nick..") is not fully coincides with registered nick.|"..
                                  "*** If you are register this nick - change it to: "..i.sNick.."|"..
                                  "*** If you are not register this nick - change it to DIFFERENT.")
            Core.Disconnect(user)
            return true
        end
    end
    for _, i in ipairs(Core.GetOnlineUsers()) do
        v = i.sNick
        if string.lower(v) == lnick and v ~= nick then
            Core.SendToUser(user, "$ValidateDenide "..nick)
            Core.Disconnect(user)
            return true
        end
    end
end
end

WAJIM

#11
Quote from: Mutor on 10 January, 2010, 15:52:44
If you would, please insert your nick in the script , run it and post results.
LC_MONETARY=Russian_Russia.866
LC_TIME=Russian_Russia.866
LC_NUMERIC=Russian_Russia.866
LC_COLLATE=Russian_Russia.866
LC_CTYPE=Russian_Russia.866

But it's wrong, because all clients uses 1251 (ANSI) codepage instead of 866 (OEM).  :-\

WAJIM

Quote from: Mutor on 10 January, 2010, 16:30:58
Then you should check/adjust the settings for your System.
Control Panel -> Regional and Language Options
There all is OK, 1251 and 866 are checked and fixed.

866 - ANSI russian codepage for DOS-applications
1251 - OEM russian codepage for Windows-applications

I am surprised, why Ptokax uses OEM-codepage.  :o

Mutor, in your case codepage 437 is wrong too. Your codepage should be like 1252 for english windows.

PPK

Quote from: WAJIM on 10 January, 2010, 16:46:21
I am surprised, why Ptokax uses OEM-codepage.  :o
Because that is what system give to him as default system locale :(
This is in unix documentation:
QuoteInternationalised programs must call setlocale() to initiate a specific language operation. This can be done by calling setlocale() as follows:
setlocale(LC_ALL, "");
and in msdn is:
Quote
setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.
That is what was added to 0.4.1.2 and missing in previous versions :-\
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

PPK

Hmm i got crazy results with mutor script  ::)
Windows 32-bit versions:
QuoteLC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
Windows 64-bit version:
QuoteCzech_Czech Republic.1250
Looks like borland (used to compile 32-bit version) have buggy locales :'(
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

Quote from: Mutor on 10 January, 2010, 16:55:34
437 is fine I have all the conversion tables loaded that I need.
Because you're using only Latin letters, which have some codes in 437/866/1251/1252 CP.  ::)

WAJIM

Quote from: PPK on 10 January, 2010, 16:51:36
Because that is what system give to him as default system locale :(
This is in unix documentation:and in msdn is:That is what was added to 0.4.1.2 and missing in previous versions :-\
http://msdn.microsoft.com/en-us/library/x99tb11d%28VS.71%29.aspx
PPK, try to use:
setlocale(LC_ALL, ".ACP");

PPK

Quote from: WAJIM on 10 January, 2010, 17:07:09
PPK, try to use:
setlocale(LC_ALL, ".ACP");

That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
Quote from: Mutor on 10 January, 2010, 17:09:10
PPK: I don't think it's buggy, I think it simply has to do with what the system returns.
Of course it is buggy. Same windows, different results from code generated by different compilers.
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

Quote from: PPK on 10 January, 2010, 17:14:59
That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
If to manual specify codepage, like:
setlocale(LC_ALL, ".1250");

It's works in win32?

PPK

Quote from: WAJIM on 10 January, 2010, 17:22:23
If to manual specify codepage, like:
setlocale(LC_ALL, ".1250");

It's works in win32?
No, i'm tested "", ".ACP", ".1250" and same result :'(
QuoteLC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

#20
Quote from: PPK on 10 January, 2010, 17:26:08
No, i'm tested "", ".ACP", ".1250" and same result :'(
I found this..
QuoteBorland C ++ now supports only "C" locale, therefore the call of this function will not be to have what sense.
:'(

PPK

Oh that's nice.. I wanted to move from borland to ms compiler anyway. That will fix that, allow 64-bit gui version for windows and unicode support. Problem is that i need to rewritte gui.. again :-X
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

WAJIM

PPK, the problem is only in tolower function, all other works fine for me.

It's possible to replace all tolowers with self-made tolower_loc function with code conversion through optional user-defined lookup table?

It's only little lines of code...  ::)

Enuri

2 PPK:

Tolower conversion rules for russian characters:

1) Byte 168 -> 184 (Ё -> ё)
2) Bytes 192-223 -> +32. (А-Я -> а-я)

WAJIM

Quote from: Enuri on 10 January, 2010, 18:49:48
2 PPK: Tolower conversion rules for russian characters:
Except for Russian in the world there are other people... It is necessary to make convrsion universally..  ::)

SMF spam blocked by CleanTalk