PtokaX forum

PtokaX => Bugs => Topic started by: WAJIM on 06 January, 2010, 14:39:43

Title: Non-English Letters in Nicks
Post by: WAJIM on 06 January, 2010, 14:39:43
There is a problem at use not-English letters in user nicks, Russian (CP-1251) in my case.  :-\

HUB doesn't much lower and upper letters. Example: if there are user1 on hub with nick "Вася" and then user2 with nick "ВАСЯ"/"вася"/"вАСЯ"/"ВасЯ" connects to hub, in the hub's code nick check something like lcase("Вася") != lcase("ВАСЯ"), that is incorrect! Thus lcase("Vasya") == lcase("VASYA"), that is correct. As a result the hub confuses MyInfo/IP between such users and doesn't kick registered user when new user with similar nick comes on HUB.

Letters ANSI ranges: 0xE0-0xFF (а-я) & 0xB8 (ё) (Lower), 0xC0-0xDF (А-Я) & 0xA8 (Ё) (Upper).
Title: Re: Non-English Letters in Nicks
Post by: PPK on 06 January, 2010, 22:39:17
PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 06 January, 2010, 22:51:20
Quote from: PPK on 06 January, 2010, 22:39:17
PtokaX use tolower function when hashing nick, and stricmp when comparing two nick. That means that nicks are hashed with lower case characters and compared case insensitive ::)
In my case tolower works incorrectly for russian letters.

It is necessary use locale-dependent tolower() function, look for 2nd parameter...  :-\

PS: In LUA os.setlocale("rus") makes string.lower() work fine for russian letters.  ::)
Title: Re: Non-English Letters in Nicks
Post by: PPK on 06 January, 2010, 23:25:36
Then system locale where is hub running have different locale than russian...
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 07 January, 2010, 00:02:18
Is it possible to make switching locale in hub's options?  ::)
Title: Re: Non-English Letters in Nicks
Post by: PPK on 07 January, 2010, 00:08:34
It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already). Future is in unicode ::)
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 07 January, 2010, 13:09:14
Quote from: PPK on 07 January, 2010, 00:08:34
It is possible. But i don't want to go that way, because then we need switchable locales in clients too (i know that some have that already).
It is not necessary, clients works fine with CP1251, the problem in HUB's tolower() function.

Can you make conversion table in options like "AaBbCcDdEeFfGg..." for optional tolower() replacement? Each admin could specify the rules of change of letters.  ::)
Title: Re: Non-English Letters in Nicks
Post by: PPK on 09 January, 2010, 18:35:02
Quote from: WAJIM on 07 January, 2010, 13:09:14
the problem in HUB's tolower() function.
Hub use same tolower function as lua in his string.lower(). As you said when you in lua set locale then it is working correctly. It is working correctly for me when system locale is set correctly (same thing cause that clients works correctly) ::)
Title: Re: Non-English Letters in Nicks
Post by: PPK on 09 January, 2010, 19:06:46
And maybe here was really bug... missing one magic line. Then 0.4.1.2 should fix it  :o
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 09 January, 2010, 19:38:35
Quote from: PPK on 09 January, 2010, 18:35:02
As you said when you in lua set locale then it is working correctly.
Yes, but only after this lines in every script:
function OnStartup()
os.setlocale("rus")
end


QuoteAnd maybe here was really bug... missing one magic line.
:o
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 13:41:45
I have just checked up 0.4.1.2...
It seems, that locale not from my country...  :'( Some letters not tolowers correctly...
I use this function for additional nick checking.  :-\
function OnStartup()
os.setlocale("rus")
end

function ValidateNickArrival(user, data)
   local i, v, nick, lnick
_,_,nick = string.find(data, "^$ValidateNick (.+)|$")
lnick = string.lower(nick)
if string.find(lnick, "[абвгдеёжзийклмнопрстуфхцчшщъыьэюя]") then
   for _, i in ipairs(RegMan.GetRegs()) do
       v = i.sNick
       if string.lower(v) == lnick and v ~= nick then
           Core.SendToUser(user, "*** Your nick ("..nick..") is not fully coincides with registered nick.|"..
                                 "*** If you are register this nick - change it to: "..i.sNick.."|"..
                                 "*** If you are not register this nick - change it to DIFFERENT.")
           Core.Disconnect(user)
           return true
       end
   end
   for _, i in ipairs(Core.GetOnlineUsers()) do
       v = i.sNick
       if string.lower(v) == lnick and v ~= nick then
           Core.SendToUser(user, "$ValidateDenide "..nick)
           Core.Disconnect(user)
           return true
       end
   end
end
end
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 16:04:13
Quote from: Mutor on 10 January, 2010, 15:52:44
If you would, please insert your nick in the script , run it and post results.
LC_MONETARY=Russian_Russia.866
LC_TIME=Russian_Russia.866
LC_NUMERIC=Russian_Russia.866
LC_COLLATE=Russian_Russia.866
LC_CTYPE=Russian_Russia.866

But it's wrong, because all clients uses 1251 (ANSI) codepage instead of 866 (OEM).  :-\
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 16:46:21
Quote from: Mutor on 10 January, 2010, 16:30:58
Then you should check/adjust the settings for your System.
Control Panel -> Regional and Language Options
There all is OK, 1251 and 866 are checked and fixed.

866 - ANSI russian codepage for DOS-applications
1251 - OEM russian codepage for Windows-applications

I am surprised, why Ptokax uses OEM-codepage.  :o

Mutor, in your case codepage 437 is wrong too. Your codepage should be like 1252 for english windows.
Title: Re: Non-English Letters in Nicks
Post by: PPK on 10 January, 2010, 16:51:36
Quote from: WAJIM on 10 January, 2010, 16:46:21
I am surprised, why Ptokax uses OEM-codepage.  :o
Because that is what system give to him as default system locale :(
This is in unix documentation:
QuoteInternationalised programs must call setlocale() to initiate a specific language operation. This can be done by calling setlocale() as follows:
setlocale(LC_ALL, "");
and in msdn is:
Quote
setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.
That is what was added to 0.4.1.2 and missing in previous versions :-\
Title: Re: Non-English Letters in Nicks
Post by: PPK on 10 January, 2010, 16:58:04
Hmm i got crazy results with mutor script  ::)
Windows 32-bit versions:
QuoteLC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
Windows 64-bit version:
QuoteCzech_Czech Republic.1250
Looks like borland (used to compile 32-bit version) have buggy locales :'(
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 17:00:28
Quote from: Mutor on 10 January, 2010, 16:55:34
437 is fine I have all the conversion tables loaded that I need.
Because you're using only Latin letters, which have some codes in 437/866/1251/1252 CP.  ::)
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 17:07:09
Quote from: PPK on 10 January, 2010, 16:51:36
Because that is what system give to him as default system locale :(
This is in unix documentation:and in msdn is:That is what was added to 0.4.1.2 and missing in previous versions :-\
http://msdn.microsoft.com/en-us/library/x99tb11d%28VS.71%29.aspx (http://msdn.microsoft.com/en-us/library/x99tb11d%28VS.71%29.aspx)
PPK, try to use:
setlocale(LC_ALL, ".ACP");
Title: Re: Non-English Letters in Nicks
Post by: PPK on 10 January, 2010, 17:14:59
Quote from: WAJIM on 10 January, 2010, 17:07:09
PPK, try to use:
setlocale(LC_ALL, ".ACP");
That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
Quote from: Mutor on 10 January, 2010, 17:09:10
PPK: I don't think it's buggy, I think it simply has to do with what the system returns.
Of course it is buggy. Same windows, different results from code generated by different compilers.
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 17:22:23
Quote from: PPK on 10 January, 2010, 17:14:59
That don't help, i'm checked clients and they use same thing that i'm added to PtokaX. It is working in them, it is working in 64bit PtokaX on windows because ms compiler use locales correctly ::)
If to manual specify codepage, like:
setlocale(LC_ALL, ".1250");
It's works in win32?
Title: Re: Non-English Letters in Nicks
Post by: PPK on 10 January, 2010, 17:26:08
Quote from: WAJIM on 10 January, 2010, 17:22:23
If to manual specify codepage, like:
setlocale(LC_ALL, ".1250");
It's works in win32?
No, i'm tested "", ".ACP", ".1250" and same result :'(
QuoteLC_MONETARY=Czech_Czech Republic.852
LC_TIME=Czech_Czech Republic.852
LC_NUMERIC=Czech_Czech Republic.852
LC_COLLATE=Czech_Czech Republic.852
LC_CTYPE=Czech_Czech Republic.852
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 17:38:45
Quote from: PPK on 10 January, 2010, 17:26:08
No, i'm tested "", ".ACP", ".1250" and same result :'(
I found this..
QuoteBorland C ++ now supports only "C" locale, therefore the call of this function will not be to have what sense.
:'(
Title: Re: Non-English Letters in Nicks
Post by: PPK on 10 January, 2010, 17:55:09
Oh that's nice.. I wanted to move from borland to ms compiler anyway. That will fix that, allow 64-bit gui version for windows and unicode support. Problem is that i need to rewritte gui.. again :-X
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 18:06:50
PPK, the problem is only in tolower function, all other works fine for me.

It's possible to replace all tolowers with self-made tolower_loc function with code conversion through optional user-defined lookup table?

It's only little lines of code...  ::)
Title: Re: Non-English Letters in Nicks
Post by: Enuri on 10 January, 2010, 18:49:48
2 PPK:

Tolower conversion rules for russian characters:

1) Byte 168 -> 184 (Ё -> ё)
2) Bytes 192-223 -> +32. (А-Я -> а-я)
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 10 January, 2010, 18:58:04
Quote from: Enuri on 10 January, 2010, 18:49:48
2 PPK: Tolower conversion rules for russian characters:
Except for Russian in the world there are other people... It is necessary to make convrsion universally..  ::)
Title: Re: Non-English Letters in Nicks
Post by: PPK on 15 January, 2010, 21:05:17
32-bit windows service compiled by ms compiler http://www.PtokaX.org/files/0.4.1.2-service-msvc.7z
Result of mutor script:
QuoteCzech_Czech Republic.1250
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 16 January, 2010, 17:33:08
Quote from: PPK on 15 January, 2010, 21:05:17
32-bit windows service compiled by ms compiler
Good!  :D Waiting for GUI-version on MSVC...  ;D
Title: Re: Non-English Letters in Nicks
Post by: PPK on 17 January, 2010, 02:30:05
You will wait few months (maybe years :P), for now it's not much looking like usable gui  ::)
(http://www.ptokax.org/files/PtokaX-WinApi-Gui.png)
Title: Re: Non-English Letters in Nicks
Post by: WAJIM on 17 January, 2010, 10:27:28
Quote from: PPK on 17 January, 2010, 02:30:05
for now it's not much looking like usable gui  ::)
:-X