Author Topic: Non-English Letters in Nicks  (Read 11016 times)

0 Members and 1 Guest are viewing this topic.

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Non-English Letters in Nicks
« on: 06 January, 2010, 14:39:43 »
There is a problem at use not-English letters in user nicks, Russian (CP-1251) in my case.  :-\

HUB doesn't much lower and upper letters. Example: if there are user1 on hub with nick "Вася" and then user2 with nick "ВАСЯ"/"вася"/"вАСЯ"/"ВасЯ" connects to hub, in the hub's code nick check something like lcase("Вася") != lcase("ВАСЯ"), that is incorrect! Thus lcase("Vasya") == lcase("VASYA"), that is correct. As a result the hub confuses MyInfo/IP between such users and doesn't kick registered user when new user with similar nick comes on HUB.

Letters ANSI ranges: 0xE0-0xFF (а-я) & 0xB8 (ё) (Lower), 0xC0-0xDF (А-Я) & 0xA8 (Ё) (Upper).
« Last Edit: 06 January, 2010, 14:42:09 by WAJIM »

PtokaX forum

Non-English Letters in Nicks
« on: 06 January, 2010, 14:39:43 »

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #1 on: 06 January, 2010, 22:39:17 »
PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #2 on: 06 January, 2010, 22:51:20 »
PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
In my case tolower works incorrectly for russian letters.

It is necessary use locale-dependent tolower() function, look for 2nd parameter...  :-\

PS: In LUA os.setlocale("rus") makes string.lower() work fine for russian letters.  ::)
« Last Edit: 06 January, 2010, 22:58:43 by WAJIM »

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #3 on: 06 January, 2010, 23:25:36 »
Then system locale where is hub running have different locale than russian...
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #4 on: 07 January, 2010, 00:02:18 »
Is it possible to make switching locale in hub's options?  ::)

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #5 on: 07 January, 2010, 00:08:34 »
It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already). Future is in unicode ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #6 on: 07 January, 2010, 13:09:14 »
It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already).
It is not necessary, clients works fine with CP1251, the problem in HUB's tolower() function.

Can you make conversion table in options like "AaBbCcDdEeFfGg..." for optional tolower() replacement? Each admin could specify the rules of change of letters.  ::)
« Last Edit: 08 January, 2010, 15:57:49 by WAJIM »

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #7 on: 09 January, 2010, 18:35:02 »
the problem in HUB's tolower() function.
Hub use same tolower function as lua in his string.lower(). As you said when you in lua set locale then it is working correctly. It is working correctly for me when system locale is set correctly (same thing cause that clients works correctly) ::)
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #8 on: 09 January, 2010, 19:06:46 »
And maybe here was really bug... missing one magic line. Then 0.4.1.2 should fix it  :o
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #9 on: 09 January, 2010, 19:38:35 »
As you said when you in lua set locale then it is working correctly.
Yes, but only after this lines in every script:
Code: [Select]
function OnStartup()
os.setlocale("rus")
end

Quote
And maybe here was really bug... missing one magic line.
:o
« Last Edit: 09 January, 2010, 19:49:15 by WAJIM »

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #10 on: 10 January, 2010, 13:41:45 »
I have just checked up 0.4.1.2...
It seems, that locale not from my country...  :'( Some letters not tolowers correctly...
I use this function for additional nick checking.  :-\
Code: [Select]
function OnStartup()
os.setlocale("rus")
end

function ValidateNickArrival(user, data)
    local i, v, nick, lnick
_,_,nick = string.find(data, "^$ValidateNick (.+)|$")
lnick = string.lower(nick)
if string.find(lnick, "[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]") then
    for _, i in ipairs(RegMan.GetRegs()) do
        v = i.sNick
        if string.lower(v) == lnick and v ~= nick then
            Core.SendToUser(user, "*** Your nick ("..nick..") is not fully coincides with registered nick.|"..
                                  "*** If you are register this nick - change it to: "..i.sNick.."|"..
                                  "*** If you are not register this nick - change it to DIFFERENT.")
            Core.Disconnect(user)
            return true
        end
    end
    for _, i in ipairs(Core.GetOnlineUsers()) do
        v = i.sNick
        if string.lower(v) == lnick and v ~= nick then
            Core.SendToUser(user, "$ValidateDenide "..nick)
            Core.Disconnect(user)
            return true
        end
    end
end
end
« Last Edit: 10 January, 2010, 13:49:19 by WAJIM »

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #11 on: 10 January, 2010, 16:04:13 »
If you would, please insert your nick in the script , run it and post results.
Code: [Select]
LC_MONETARY=Russian_Russia.866
LC_TIME=Russian_Russia.866
LC_NUMERIC=Russian_Russia.866
LC_COLLATE=Russian_Russia.866
LC_CTYPE=Russian_Russia.866
But it's wrong, because all clients uses 1251 (ANSI) codepage instead of 866 (OEM).  :-\
« Last Edit: 10 January, 2010, 16:05:49 by WAJIM »

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #12 on: 10 January, 2010, 16:46:21 »
Then you should check/adjust the settings for your System.
Control Panel -> Regional and Language Options
There all is OK, 1251 and 866 are checked and fixed.

866 - ANSI russian codepage for DOS-applications
1251 - OEM russian codepage for Windows-applications

I am surprised, why Ptokax uses OEM-codepage.  :o

American Idiot, in your case codepage 437 is wrong too. Your codepage should be like 1252 for english windows.

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #13 on: 10 January, 2010, 16:51:36 »
I am surprised, why Ptokax uses OEM-codepage.  :o
Because that is what system give to him as default system locale :(
This is in unix documentation:
Quote
Internationalised programs must call setlocale() to initiate a specific language operation. This can be done by calling setlocale() as follows:
setlocale(LC_ALL, "");
and in msdn is:
Quote
setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.
That is what was added to 0.4.1.2 and missing in previous versions :-\
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #14 on: 10 January, 2010, 16:58:04 »
Hmm i got crazy results with American Idiot script  ::)
Windows 32-bit versions:
Quote
LC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
Windows 64-bit version:
Quote
Czech_Czech Republic.1250
Looks like borland (used to compile 32-bit version) have buggy locales :'(
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #15 on: 10 January, 2010, 17:00:28 »
437 is fine I have all the conversion tables loaded that I need.
Because you're using only Latin letters, which have some codes in 437/866/1251/1252 CP.  ::)

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #16 on: 10 January, 2010, 17:07:09 »
Because that is what system give to him as default system locale :(
This is in unix documentation:and in msdn is:That is what was added to 0.4.1.2 and missing in previous versions :-\
http://msdn.microsoft.com/en-us/library/x99tb11d%28VS.71%29.aspx
PPK, try to use:
Code: [Select]
setlocale(LC_ALL, ".ACP");

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #17 on: 10 January, 2010, 17:14:59 »
PPK, try to use:
Code: [Select]
setlocale(LC_ALL, ".ACP");
That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
PPK: I don't think it's buggy, I think it simply has to do with what the system returns.
Of course it is buggy. Same windows, different results from code generated by different compilers.
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #18 on: 10 January, 2010, 17:22:23 »
That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
If to manual specify codepage, like:
Code: [Select]
setlocale(LC_ALL, ".1250");It's works in win32?

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #19 on: 10 January, 2010, 17:26:08 »
If to manual specify codepage, like:
Code: [Select]
setlocale(LC_ALL, ".1250");It's works in win32?
No, i'm tested "", ".ACP", ".1250" and same result :'(
Quote
LC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #20 on: 10 January, 2010, 17:38:45 »
No, i'm tested "", ".ACP", ".1250" and same result :'(
I found this..
Quote
Borland C ++ now supports only "C" locale, therefore the call of this function will not be to have what sense.
:'(
« Last Edit: 10 January, 2010, 17:50:54 by WAJIM »

Offline PPK

  • Administrator
  • Emperor
  • *****
  • Posts: 1 475
  • Karma: +209/-22
  • PtokaX developer
Re: Non-English Letters in Nicks
« Reply #21 on: 10 January, 2010, 17:55:09 »
Oh that's nice.. I wanted to move from borland to ms compiler anyway. That will fix that, allow 64-bit gui version for windows and unicode support. Problem is that i need to rewritte gui.. again :-X
"Most of you are familiar with the virtues of a programmer. There are three, of course: laziness, impatience, and hubris." - Larry Wall

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #22 on: 10 January, 2010, 18:06:50 »
PPK, the problem is only in tolower function, all other works fine for me.

It's possible to replace all tolowers with self-made tolower_loc function with code conversion through optional user-defined lookup table?

It's only little lines of code...  ::)

Offline Enuri

  • Newbie
  • *
  • Posts: 9
  • Karma: +1/-0
Re: Non-English Letters in Nicks
« Reply #23 on: 10 January, 2010, 18:49:48 »
2 PPK:

Tolower conversion rules for russian characters:

1) Byte 168 -> 184 (Ё -> ё)
2) Bytes 192-223 -> +32. (А-Я -> а-я)

Offline WAJIM

  • Member
  • ***
  • Posts: 41
  • Karma: +3/-5
Re: Non-English Letters in Nicks
« Reply #24 on: 10 January, 2010, 18:58:04 »
2 PPK: Tolower conversion rules for russian characters:
Except for Russian in the world there are other people... It is necessary to make convrsion universally..  ::)

PtokaX forum

Re: Non-English Letters in Nicks
« Reply #24 on: 10 January, 2010, 18:58:04 »