Language Script needed

Archive of notes and cross input on Dev versions of Coranto (1.25.1 and Older)

Moderators: Dale Ray, SrNupsen, Bluetooth, Jackanape

Postby LoneOwl » Wed Apr 27, 2005 1:01 am

What Parahead's suggesting is I write some code for how Coranto will handle the language files. I think what you're refering to is how people will create the language files.

If you're wanting me to write it, don't count on it being done soon. I might find out tomorrow or the day after how much information I'll be able to get recovered for my laptop, let alone when I'll get it back.

Also, when it comes to Coranto, how about we make it standard to require language files, and everything else, to use utf-8? Perl 5.8 and above supports it fully so working with it would be much easier and utf-8 is probably the most commonly supported encoding after ascii. When dealing with internationalization, it'd be best to use the most international encoding in my opinion. Utf-8 is probably far more supported than Unicode after all.
User avatar
LoneOwl
 
Posts: 1465
Joined: Sun Mar 10, 2002 2:57 am
Location: That one place, you know?

Postby Parahead » Wed Apr 27, 2005 5:37 am

SomeGuyNamedJim wrote:Is it safe to assume that LoneOwl will be making this script and I can delete mine?
No, like LoneOwl say, your code is for creating language files, the Locale::Maketext module (or the Coranto version of it) is for dealing with those files.

LoneOwl wrote:Also, when it comes to Coranto, how about we make it standard to require language files, and everything else, to use utf-8? Perl 5.8 and above supports it fully so working with it would be much easier and utf-8 is probably the most commonly supported encoding after ascii. When dealing with internationalization, it'd be best to use the most international encoding in my opinion. Utf-8 is probably far more supported than Unicode after all.
And if you want to rely on Perl 5.8 for utf-8, what is the difference to depend on Locale::Maketext which is included in Perl 5.8 as well?
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby LoneOwl » Wed Apr 27, 2005 11:25 pm

Well it's more so that utf-8 is a common standard and more universally supported than other encodings. Perl supporting it isn't a necessity, but I think we can at least agree that having an encoding supported by probably all up to date graphical browsers is certainly very helpful thing. When it comes to the language files, if the encoding is utf8 then processing them could end up being simpler. Perl 5.8.0 has full utf-8 support, and previous versions had 'use utf8' and partial support, but still support.

Plus, html uses utf8 for all the &#xxxx; character references, and possibly converting all non ascii characters to those references could help in portability further. A tad more complex in some areas, but best in the end in my opinion.

Eh, it's a complex issue.... If only everyone would speak English.... Ethnocentrism works with you're an American.
User avatar
LoneOwl
 
Posts: 1465
Joined: Sun Mar 10, 2002 2:57 am
Location: That one place, you know?

Postby Parahead » Thu Apr 28, 2005 1:46 pm

LoneOwl wrote:Eh, it's a complex issue.... If only everyone would speak English.... Ethnocentrism works with you're an American.
I agree it is a complex issue... Anyway, I have started to change the current language files to use an approach *similar* to the Locale::Maketext thing, but not using it, which gives us better support with older Perl installations. And if we later on decide to use a home made approach of the Locale::Maketext module it will be easier to switch as well.

The idea is basically to replace the small 'keys' for the language files and use the hole english sentance as the key. If the sentance isn´t translated the key is used instead which will give us an approach that downgrades gracefully. Basically, at all places where $Messages{'UserLoggedOut'} currently is used a call to a sub is done instead, like CRprintmsg('User has logged out'); The CRprintmsg does the appropriate testings and is a single point of code/sub that can be replaced in the future if we need to improve it.
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby Parahead » Sat Apr 30, 2005 9:59 pm

SomeGuyNamedJim, if you are still interested in contributing with the translation script, please let me know. The language file does look a bit different now since I have incorporated the last ideas in this thread about the keys in the has being actual sentences. It would be great and really appreciated if you do would like to participate in developing the translation script, but I think we need to have a closer discussion of the actual layout of cradmin.pl to get it to work smoothly with your translation script.
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby SomeGuyNamedJim » Sun May 01, 2005 4:13 am

I've been working on this for a while. It's not fully functional. I've run in to a few problems which I am having a hard time fixing.

Handling multiple hashes is not a problem. The new crlang.pl file which I was sent also contains a few other changes which the original did not have (and these are the things that I am having trouble with).

First, the new file contains arrays. Working with them alone isn't very problematic, however handling the comment's for those arrays is proving to be very problematic. The amount and complexity of regular expressions that is required to handle most scenarios is currently beyond me.

Second, the new file contains entries with multiple lines. As you can see, I am using JavaScript extensively to handle the "Default" buttons. JavaScript doesn't like line breaks and whenever it sees one it spits out an error. I have been trying to write some JavaScript functions to deal with the line breaks, but so far I have had little success.

That is my progress update so far. I will continue with my attempts to figure out these problems.
SomeGuyNamedJim
 
Posts: 73
Joined: Thu Jan 20, 2005 3:12 am

Postby Parahead » Sun May 01, 2005 9:22 am

SomeGuyNamedJim wrote:I've been working on this for a while. It's not fully functional. I've run in to a few problems which I am having a hard time fixing.
I like, it looks really nice! :-D Regarding the problems you are having a hard time fixing, that is what I ment with "we need to have a closer discussion of the actual layout or crlang.pl". The thing is that it isn´t final and if you have suggestions on how to make things easier it is just great.

SomeGuyNamedJim wrote:Handling multiple hashes is not a problem. The new crlang.pl file which I was sent also contains a few other changes which the original did not have (and these are the things that I am having trouble with).
The new crlang.pl file I am working with which has been updated in regards to the discussion above in this thread doesn´t use several hashes so another form of "header" for each section would be a good thing to implement. Maybe have a remark line like:
Code: Select all
#HEAD: Admin Messages

And maybe even be able to specify a subsection so it is visually distinctive in your tool? Something like:
Code: Select all
#SECTION: Date Formats

What do you think about that idea?

And also, one entry in the hash now looks like this instead:
Code: Select all
q~General Date Format~
  => q~General Date Format~,

Previously the 'key' was a single line word with underscores, like General_Date_Format, now it is the actual english sentence which will not work that well to use for an input field in the form... Do you think something like this would be good then:
Code: Select all
q~General Date Format~ # Gen_Date_Format
  => q~General Date Format~,

To give you a unique key for the input fields? Or?

SomeGuyNamedJim wrote:First, the new file contains arrays. Working with them alone isn't very problematic, however handling the comment's for those arrays is proving to be very problematic. The amount and complexity of regular expressions that is required to handle most scenarios is currently beyond me.
Can´t we split the entries to a separate line then? Like:
Code: Select all
@Week_Days = ( # The days of the week
q~Sunday~, # Sunday
q~Monday~,
q~Tuesday~,
q~Wednesday~,
q~Thursday~,
q~Friday~,
q~Saturday~
);



SomeGuyNamedJim wrote:Second, the new file contains entries with multiple lines. As you can see, I am using JavaScript extensively to handle the "Default" buttons. JavaScript doesn't like line breaks and whenever it sees one it spits out an error. I have been trying to write some JavaScript functions to deal with the line breaks, but so far I have had little success.
Isn´t it possible to replace the linebreaks with a space instead in the Perl code, before it is outputed as a HTML page? Since the key is now the english sentence it isn´t that many fields that is split on several fields (and those who are looks different, see below), but a way of presenting them in a textarea would be great anyway I think. So that if a sentence is longer then X chars it is presented in a textarea?

And as I mentioned, the fields that *are* split on several fields, looks like this now:
Code: Select all
'_DATE_SETTINGS_INTRO' => <<'EOSENTENCE', # Introduction to the date settings
   The following settings allow you to configure how dates &amp;
   times appear. The general date format controls the date used in news items.
   The internal date format controls the date used on Coranto administrative pages.
   The three archive date formats control the date used to label monthly, weekly, and daily archives
   respectively. To insert a component of the date or time, use &lt;Field: Name&gt; where Name is one of:
   <b>Year</b>, <b>TwoDigitYear</b>, <b>Month_Name</b>, <b>Month_Number</b>, <b>TwoDigitMonth</b>,
   <b>Weekday</b>, <b>Day</b>, <b>TwoDigitDay</b>, <b>Hour</b>,
   <b>TwoDigitHour</b>, <b>Minute</b>, <b>Second</b>, <b>AMPM</b>, or <b>Time_Zone</b>.
   Remember that spacing and capitalization
   matter: &lt;Field: Day&gt; is valid, but &lt;field:day&gt is not (it contains three errors, actually).
EOSENTENCE


SomeGuyNamedJim wrote:That is my progress update so far. I will continue with my attempts to figure out these problems.
I think you have come far, I hope I don´t screw things up by changing the format of crlang.pl that much. Like I said earlier, my intention is to get you more involved in the exact design of the file to make things smoother on your end... An example file which include all the things I have mentioned in this post can be found here:
http://www.parahead.com/coranto/crlang2.pl
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby LoneOwl » Mon May 02, 2005 2:05 pm

I would say making the crlang file easier to use internally is more important than making it easy to have a script write. All and all, it wouldn't be extrodinarily difficult if it's simple hashes(push comes to shove, try the Data::Dumper module for ques, using Data::Dumper->Dump). For more complex stuff, it's the kind of thing I do for fun for little other reason than to sharpen my skills, or at least that's how it seems to end up anyway.

I think having the keys be long sentances has pros and cons. It would make the English version easier(if you keep the "split" sentance format), and small in scale if you used a tied hash that would return the key as the value except for long keys. The downside is it would force other languages to use more memory. An option that I think may have some decent potential for is something like this.
Code: Select all
package CRLang;

sub new {
    bless {
        Language    => 'English',
        LangTag     => 'en-us',
    }, shift;
}

sub A_MESSAGE {
    return "Hello, $User!  How are you?";
}

package main;
$Language = CRLang->new()
All in all, the right method is variable. It depends in part on how it's going to be used by Coranto. And I get the impression that how it's going to be used will be influenced by how the language file is written.

Oh, and Parahead, you forgot a comma and a semicolon.
User avatar
LoneOwl
 
Posts: 1465
Joined: Sun Mar 10, 2002 2:57 am
Location: That one place, you know?

Postby Parahead » Tue May 03, 2005 6:28 am

LoneOwl wrote:I would say making the crlang file easier to use internally is more important than making it easy to have a script write.
I totaly agree! Still, if simple stuff can be added to make life easier when dealing with it during the translation process, it doesn´t hurt, that´s all...

LoneOwl wrote:I think having the keys be long sentances has pros and cons. It would make the English version easier(if you keep the "split" sentance format), and small in scale if you used a tied hash that would return the key as the value except for long keys. The downside is it would force other languages to use more memory.
Another benefit of using sentances as keys is that the actual code in the core is much more readable. I don´t fully understand your reasoning regarding english vs. other languages, the format would be the same thus the same memory usage? Or am I missing something (as usuall)? Regardless, do you feel that the memory usage will have a big impact so it is a factor to consider?

LoneOwl wrote:An option that I think may have some decent potential for is something like this.
Code: Select all
package CRLang;

sub new {
    bless {
        Language    => 'English',
        LangTag     => 'en-us',
    }, shift;
}

sub A_MESSAGE {
    return "Hello, $User!  How are you?";
}

package main;
$Language = CRLang->new()
Am I understanding you correctly that each message should then have it´s own sub? Used like:
Code: Select all
print $Language->A_MESSAGE();
I agree, this would be a good option if using short keys as the sentance identifiers. Personally that point is my most concern, having the core files full of short-keys which makes it hard to maintain the code. That is why I think sentances as keys are better.

LoneOwl wrote:Oh, and Parahead, you forgot a comma and a semicolon.
Yeah, yeah... Fixed... ;-)

I have played around with this approach myself:
Code: Select all
# ADDED IN CRLIB.PL

sub CRgetmsg ($@){
 (my $str, @variables) = @_;
 # Fetch the translated sentance if it exists
 $str = $Messages{$str} if(defined $Messages{$str});
 # Replace [_X] with corresponding value
 $str =~ s/\[_(\d+)\]/$variables[${1}-1]/ge;
 # Replace [quant,_X,singular,plural,negative] with correct quantification
 $str =~ s/\[quant,_(\d+),([^\]]+)]/CRquantmsg($variables[${1}-1], ${2})/ge;
 return $str;   
}


sub CRquantmsg($@) {
 (my $nbr, $quantifiers) = @_;
 (my $singular, $plural, $negative) = split(/,/,$quantifiers);
   
 if( ($nbr == 0) && (defined($negative))) {
  return $negative;
 } elsif($nbr == 1) {
  return "1 $singular";
 } elsif(defined($plural)) {
  return "$nbr $plural";
 } else {
  return ("$nbr " . $singular . "s");
 }
}

# Then where you would like a message to print:
print CRgetmsg('I found [quant,_1,directory,directories,no directories] when searching', $nbr_of_hits);
The sentance is the key, making a good fallback and the code readable.
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby LoneOwl » Tue May 03, 2005 1:51 pm

Parahead wrote:Another benefit of using sentances as keys is that the actual code in the core is much more readable. I don´t fully understand your reasoning regarding english vs. other languages, the format would be the same thus the same memory usage? Or am I missing something (as usuall)? Regardless, do you feel that the memory usage will have a big impact so it is a factor to consider?
Well, although for your approach it wouldn't impact it too much directly, it would have the potential for perhaps doubling the memory needed for the internationalization. All the keys are stored in English, and as a result only for other languages would additional memory be needed. A further potential downside to having the keys be in English is if for some reason a key is changed(perhaps to increase clarity), all language files would need to be updated as a result, and not just an English language file.
Parahead wrote:Am I understanding you correctly that each message should then have it´s own sub? Used like:
Code: Select all
print $Language->A_MESSAGE();
I agree, this would be a good option if using short keys as the sentance identifiers. Personally that point is my most concern, having the core files full of short-keys which makes it hard to maintain the code. That is why I think sentances as keys are better.
Well for trickier areas, comments can always be added. ;-) A lot of Coranto could probably stand to use some comments.

With your idea(s), have you written a crlang yet in another language? It can help clear up issues. Preferably languages with different grammars will provide the broadest showing of how well it'll be to incorporate other languages.
User avatar
LoneOwl
 
Posts: 1465
Joined: Sun Mar 10, 2002 2:57 am
Location: That one place, you know?

Postby Parahead » Tue May 03, 2005 5:21 pm

LoneOwl wrote:Well, although for your approach it wouldn't impact it too much directly, it would have the potential for perhaps doubling the memory needed for the internationalization. All the keys are stored in English, and as a result only for other languages would additional memory be needed.
Well, depending on how you look at it, the english file could potentially look the same as the other files, just to keep all the files looking the same. So both the key and the value would be defined in the english file as well. Well, basically the english file could be empty, but I think it is better to keep all language files looking the same. And yes, basically a language file could just contain the parts that a certain user would like to have translated. For example, if a user only would like to present the 'normal' user interface translated but can live with the admin interface being in english, those sentances could be left out.

LoneOwl wrote:A further potential downside to having the keys be in English is if for some reason a key is changed(perhaps to increase clarity), all language files would need to be updated as a result, and not just an English language file.
Yes, I have thought about that myself and I agree it could be a problem. It isn´t that often we change sentances within Coranto though? An if the sentance changed, it may be needed to update the other language files sentance anyway to reflect the change in the sentance, not just the key. And potentially one *could* just update the sentance, not the key, even for the english file, depending on how big the change is of the sentance.

LoneOwl wrote:With your idea(s), have you written a crlang yet in another language? It can help clear up issues. Preferably languages with different grammars will provide the broadest showing of how well it'll be to incorporate other languages.
Nope, not yet. I have noticed that cerb didn´t extract that much sentances from the core as one would have hoped, so I have been looking into extracting as much as possible and creating an english file as I go... It would be good to have a tool when doing the actuall translating of the crlang.pl file, that is why I have wanted to work a little tighter with SomeGuyNamedJim to be able to make the language file work with his tool in a nice fashion. Then when starting to translate and we notice that some language requires other handling or support than for example the quant-sub, it can be added as well. (Actually there exists a bool-sub but I left it out in the example above to keep things simple).
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby LoneOwl » Wed May 04, 2005 3:09 pm

Parahead wrote:Well, depending on how you look at it, the english file could potentially look the same as the other files, just to keep all the files looking the same. So both the key and the value would be defined in the english file as well. Well, basically the english file could be empty, but I think it is better to keep all language files looking the same. And yes, basically a language file could just contain the parts that a certain user would like to have translated. For example, if a user only would like to present the 'normal' user interface translated but can live with the admin interface being in english, those sentances could be left out.
So if you'd still have an English language file, why not have the short keys, and include a comment? I think all language files should include all portions of the language when possible. Different users can always prefer different languages after all, some admin and some not.
Parahead wrote:Yes, I have thought about that myself and I agree it could be a problem. It isn´t that often we change sentances within Coranto though? An if the sentance changed, it may be needed to update the other language files sentance anyway to reflect the change in the sentance, not just the key. And potentially one *could* just update the sentance, not the key, even for the english file, depending on how big the change is of the sentance.
I know it's not often, and probably only happens when something is added. If the purpose is clarity, it'd probably be language specific. When it's translated, it may not be an exact translation. Plus this also has the possibility of duplicating sentances for English, once as the "key", and again in the language file, as the key and value again...

Parahead wrote:Nope, not yet. I have noticed that cerb didn´t extract that much sentances from the core as one would have hoped, so I have been looking into extracting as much as possible and creating an english file as I go... It would be good to have a tool when doing the actuall translating of the crlang.pl file, that is why I have wanted to work a little tighter with SomeGuyNamedJim to be able to make the language file work with his tool in a nice fashion. Then when starting to translate and we notice that some language requires other handling or support than for example the quant-sub, it can be added as well. (Actually there exists a bool-sub but I left it out in the example above to keep things simple).
I really think implementing other languages would be the best way to find out the most practical approach. About your quant modifier, instead of plural or singular(which leaves out zero, because zero may or may not be considered plural in all languages), it'd be best to just pass the quantity. Shouldn't be limited to just the style of English, or Indo-European languages.... Too many languages, too many language possibilities.....too many annoyances with internationalization....
User avatar
LoneOwl
 
Posts: 1465
Joined: Sun Mar 10, 2002 2:57 am
Location: That one place, you know?

Postby Parahead » Wed May 04, 2005 6:07 pm

LoneOwl wrote:So if you'd still have an English language file, why not have the short keys, and include a comment?
Because the key is the fallback sentance if it can´t be found in the currently choosen language file. For example, with the 'current' (1.30.11) approach if the key is missing in a language file the output is blank. And also, the code in the core is easier to read if using real sentances and then you don´t need comments.

LoneOwl wrote:I think all language files should include all portions of the language when possible.
Yes, I totaly agree, I just said that it is *possible* to just translate parts of the sentances... The language file is quite large and some people may choose to only translate part of it if their specific language isn´t supported, that´s all.

LoneOwl wrote:I really think implementing other languages would be the best way to find out the most practical approach.
I agree and that is more or less what I say too. But in order for people to easily translate something we need an interface, which SomeGuyNamedJim is building? And we also need sentances to work with, which sentances would we have people 'try out' in different languages (I have some 'heavy' example strings if someone would like to try)? But since people allready have translated Coranto into several languages with the old method without complaining one could guess it works? Since I just add improved functionality? Or maybe that is what makes things trickier... ;-)

LoneOwl wrote:About your quant modifier, instead of plural or singular(which leaves out zero, because zero may or may not be considered plural in all languages), it'd be best to just pass the quantity. Shouldn't be limited to just the style of English, or Indo-European languages....
The quant method I use *do* consider the zero situation, check out the code again... ;-) And what do you mean 'pass the quantity'? The quant-thing is supposed to be written during the translation process as well and the actual value *is* passed at runtime. Below is the language file:
Code: Select all
'I found [quant,_1,file,files,no files]' # english
=> 'Jag hittade [quant,_1,fil,filer,inga filer]' # swedish

This will be outputed as:
Code: Select all
I found no files => Jag hittade inga filer
I found 1 file => Jag hittade 1 fil
I found 3 files => Jag hittade 3 filer


LoneOwl wrote:Too many languages, too many language possibilities.....too many annoyances with internationalization....
I agree that I18N has many annoyances, but I think the current translation method is worse than what I suggest?
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

Postby SrNupsen » Wed May 04, 2005 7:32 pm

As we all know, there are about 5.500 languages in the world. Let's not try to make Coranto portable to all of them :wink:

SrNupsen
-----------------------------------------------------------------------------------------------------
Coranto is free software. I am available for custom work or troubleshooting.

http://www.sundaune.no - transkripsjon, webdesign, nettsider, tekstbyrå
http://www.vagbladet.no - satire, politikk, kultur, sport, nettavis
-----------------------------------------------------------------------------------------------------
SrNupsen
 
Posts: 2229
Joined: Tue Jan 09, 2007 6:46 pm
Location: Nesodden, outside Oslo, Norway

Postby Parahead » Wed May 04, 2005 8:47 pm

SrNupsen wrote:As we all know, there are about 5.500 languages in the world. Let's not try to make Coranto portable to all of them :wink:
I know... :-D The main goal from my side is to make it possible to translate all of Coranto, which currently isn´t the case. Secondly making it easy to use the language support in the core without loosing readability and keeping it fairly simple to maintain. Then adding support for addons so that they can use the same method if their authours would like to add localisation support. And while doing those things, adding an improved sentance structure (like the quant-thing) is just a bonus... ;-) At least this is how I look at it.
Yes, I am still around...
www.parahead.com/coranto/
User avatar
Parahead
 
Posts: 4837
Joined: Fri Jan 12, 2007 8:54 pm
Location: Stockholm - Sweden

PreviousNext

Return to Coranto Development Archives -- 1.25.1 and Older

Who is online

Users browsing this forum: No registered users and 1 guest

cron