RSS feed builder and international characters

Is Coranto not working properly for you? Here's where to ask for support help; for best results, follow these steps:
  • - Check your core/addon documentation for possible solutions
  • - Search these forums for similar problems that have already been solved
  • - If you're still stumped, check out THIS TOPIC, and post your question in this forum!!

Moderators: Spunkmeyer, Dale Ray, SrNupsen, Bluetooth, Jackanape

RSS feed builder and international characters

Postby Tanus » Tue Nov 20, 2007 9:19 am

I just installed the RSS feed builder 0.06a by Elvii. I am using with Coranto 1.25.1.

It seems that it is working fine. The xml-file is generated and looks fine. However I have trouble with international characters (more specifically the three Norwegian letters). I have only tried it with IE7, but it complains of invalid characters in the XML-file. When I look into it I find the problem in the CDATA-construction where the norwegian letters are pinpointed. When I manually removes the national letters the rest works fine and looks nicely.

As far as I can see most of my settings are the defalt settings in my rss-profile (except for those needed personalisation). Strip html on/off both gives the same problem.

Is it the UTF-8 that should be altered? Any suggestions?
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo

Postby SrNupsen » Tue Nov 20, 2007 10:19 am

The original RSS feed addon that you are using is quite old. I'd recommend you take a look at this solution instead, allowing you to create perfectly valid RSS 2.0 simply by using a style and a template.
-----------------------------------------------------------------------------------------------------
Coranto is free software. I am available for custom work or troubleshooting.

http://www.sundaune.no - transkripsjon, webdesign, nettsider, tekstbyrå
http://www.vagbladet.no - satire, politikk, kultur, sport, nettavis
-----------------------------------------------------------------------------------------------------
SrNupsen
 
Posts: 2229
Joined: Tue Jan 09, 2007 6:46 pm
Location: Nesodden, outside Oslo, Norway

Postby Tanus » Tue Nov 20, 2007 11:07 am

Thanks for a quick reply. I will try it tomorrow and post back.
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo

Postby Tanus » Tue Nov 20, 2007 11:08 pm

I have now changed from the addon to use the profile and style as described. In fact I got the same problem, but I have now discovered what was causing the problem, it was the UTF-8 setting in the header-part of the xml-file, when I changed it to ISO-8859-1 it worked flawlessly :). I have not tried it, but I assume the addon would have worked with the same small alteration.

Since I am using 1.25.1, I installed the addon to rename the output-file as described in the wiki-version referenced in the post. (I think the term "profile" and "style" should be swapped in the referenced article?)

I also did some smaller customisations for my website like translating some phrases.

One final thought: The addon seems somewhat easier to install, but using styles and profiles is more flexible and still easy to do.
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo

Postby Tanus » Tue Nov 20, 2007 11:23 pm

By the way: :idea:

I added the "Category"-feature by adding the following line to the Style
<category><![CDATA[<Field: Category>]]></category>
I added it just after the line with "Description" in the "item"-part:

As I am using the "(default)" category as my main category I substituted this with a more meaningful name by adding this line
$Category =~ s/\(default\)/meaningfulname/sg;
just after the handling of the subject and text attributes.
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo

Postby Musicvid » Thu Nov 22, 2007 4:35 am

The wiki-version is older and there for reference, the most recent version is in the forum link provided by SrNupsen. "For best results" you should be using the most current version, although you will still need to use the file-renamer with 1.25.

That could explain your encoding declaration problem, since "lots" of improvements have been made in that area since the wiki article was created.

Here is a bit more discussion on the utf-8 / 8859-1 / 1252 issues.
http://www.coranto.org/forum/viewtopic.php?t=10023
I think that as html and xml get closer to being totally compatible, these things will get solved by consensus, if not by rule ....

Glad it is working for you. There are some new "best practices" for RSS just out last month, and I haven't updated the RSS Style to reflect those latest recommendations. However, here is the result of my work sofar on these minor issues.
http://www.coranto.org/forum/viewtopic.php?t=10073

Let me know how it works for you, a link to your feed would be nice for me to check for bugs.
Last edited by Musicvid on Thu Nov 22, 2007 6:02 pm, edited 4 times in total.
Musicvid
 
Posts: 138
Joined: Wed Jan 17, 2007 1:05 am
Location: Western America

Postby Musicvid » Thu Nov 22, 2007 4:50 am

I just noticed that your site is the same one that wanted some local language support for my other RSS template for Calendarscript. I made the changes about a year ago, but never heard back from you.

Have you tried the latest version of the RSS template for Calendarscript?
http://www.calendarscript.com/support/f ... 527.0.html
Do the date/time functions now work in Norwegian with the latest update? Please let me know, thanks!
Last edited by Musicvid on Thu Nov 22, 2007 6:00 pm, edited 2 times in total.
Musicvid
 
Posts: 138
Joined: Wed Jan 17, 2007 1:05 am
Location: Western America

Postby Musicvid » Thu Nov 22, 2007 5:22 am

As I am using the "(default)" category as my main category I substituted this with a more meaningful name by adding this line
$Category =~ s/\(default\)/meaningfulname/sg;

You should change this line to
Code: Select all
my $Category = $Category;
$Category =~ s/\(default\)/meaningfulname/sg;
This will localize the search/replace function to the Profile(s) calling the RSS Style; otherwise, you have changed the default name of the global variable $Category in every style from that point forward. It could cause havoc elsewhere in the script.
Musicvid
 
Posts: 138
Joined: Wed Jan 17, 2007 1:05 am
Location: Western America

Postby Tanus » Fri Nov 23, 2007 7:12 am

Thank you Musicvid for your concern and advises.
I had - as you correctly noted - forgotten to declare the variable :oops:

I used your stuff from the post referenced by SrNupsen. But I had to read the wiki to get some additional guidance, more precisely the addon.

I found the utf-8 / 8859-1 / 1252 issues-post and read it a couple of times. I have now read it again and I think I should give it another try by adding some lines to the preserve-routine. I did the following modifications in the replace-routine and made it to work (would have had to add the capital letters as well...):

Code: Select all
   $string =~ s/\xE6/ae/sg; #æ
   $string =~ s/\xF8/oe/sg; #ø
   $string =~ s/\xE5/aa/sg; #Ã¥ 

However the substitutes aren't that good, so I think it might work if I do a similar addition to the preserve-routine, by replacing with the correct hex-values? (will need to look them up).

I am more than happy to try the new RSS "best practices", but I didn't quite follow where to put the code.

My feed as it is at the moment before (currently) the above mentioned alteration is found at http://www.nskl.no/nsklfeed.xml

I am really sorry about the open post with respect to the calendarscript-feed! I am using the calendarscript on my site and have good experiences. However I ran out of time when I was working on the feed (too much other stuff to do!). Later I have forgotten to follow it up, but I will have a look at it again when I am finished updating my use of coranto. And I am sure your advise and code will help me on the way. I will follow this up in the other post, hopefully within the end of the year...
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo

Postby Musicvid » Fri Nov 23, 2007 3:30 pm

Thank you so much for the followup! I write the code for my own use, but it is nice to know someone else finds it useful.

On further investigation of your feed and plugging those characters into my own test feed, it looks like changing the declaration line in the template to ISO-8859-1 as you have done is the best solution. Of course this can cause a minor validation complaint, but the extended characters appear correctly in my browsers. So no need to put replacement characters in the subs. (edited)

Of course this would seem to make the subs unnecessary, right? Well, it appears that way if you open the 8859-1-declared feeds in a Mozilla or Firefox variant -- all the characters appear correctly. But wait, IE 5.5 (and perhaps newer) honor the server header, so the characters are replaced with little boxes. So the subs are still needed. But Microsoft invented the Windows-1252 character set, etc., . . . . AARGHH!!

I don't always "agree" with some of the best practices guidelines, but I will put the full code in this thread soon for testing. Would you have time to try it and report back? I get concerned whenever I use external modules that they may not be on everyone's servers.

Thanks for getting back to me on the Calendarscript template too. I'll keep an eye out in that forum to see how the language localization works with date/time functions. Basically what I did is substitute the local month and weekday names and abbreviations specified in your calendar settings for the strftime names which are in English.

You sound very knowledgeable on this stuff so I'll welcome any thoughts or suggestions you have about RSS for either of these scripts.[/u]
Musicvid
 
Posts: 138
Joined: Wed Jan 17, 2007 1:05 am
Location: Western America

Postby Tanus » Sat Nov 24, 2007 11:44 am

At least ISO-8859-1 seems to be the easiest way of solving the problem. However I have now added the following lines to the preserve1252-routine and have got the description-part of the feed right.

Code: Select all
    $string =~ s/\xE6/æ/sg; #æ
    $string =~ s/\xF8/ø/sg; #ø
    $string =~ s/\xE5/Ã¥/sg; #Ã¥
    $string =~ s/\xC6/Æ/sg; #Æ
    $string =~ s/\xD8/Ø/sg; #Ø
    $string =~ s/\xC5/Ã…/sg; #Ã…

It puzzles me somewhat, as I really only are adding the ampersand...

I still struggle with the title-part, but haven't really given it a decent try.

If you update the style, give me a couple of weeks and I think I should be able to test it. Just make sure that I am notified (for instance in this thread). I have only XP on my computers, but both with Norwegian and English (UK) setup, and a couple of different browsers. (Until now I have only tried to solve the feed-problem from a Norwegian one with IE7 installed). I am using a server through an ISP, it is a Linux-server with Apache. (edited)

I have an idea for the feed as I would like to link to a page that looks like my main index-page and showing the full text of the requested feed. I have prototyped it and tested my theory. Since I am using HTML with frames I think I will need to put a link to viewnews (with a different template) and the wanted item in the frame-definition. To achieve this I am thinking of creating a new Field-definition, something like <Field: Call-called-url-again Tmpl: xxx>.

I will start a separate thread on this.
Tanus
 
Posts: 10
Joined: Fri Feb 23, 2007 7:57 pm
Location: Oslo


Return to Troubleshooting

Who is online

Users browsing this forum: No registered users and 1 guest

cron