The Crystal Hall
Forum Search:
Return to the Stories


Home » The Crystal Hall » Administrative Notes » Non-ASCII characters in stories
Non-ASCII characters in stories [message #23281] Fri, 23 January 2009 20:54 Go to next message
CNash  is currently offline CNash
Messages: 73
Registered: September 2008
I've noticed that while most of the stories are encoded in regular ASCII/ANSI Windows format, there are some that are UTF-8 (Unicode) or otherwise use non-ASCII characters like the "curly quotes" that don't render correctly in editors and readers that aren't set up to handle them (showing up either as garbage characters or question marks). As I'm using a text reader program that doesn't render non-ASCII characters, I have to run each story through a converter program before I can read it properly.

I really don't mean to complain when all of this is presented for free (and I certainly wouldn't expect anyone to go back through stories and change them to suit my whims!), but if it's not too much trouble, could future stories be uploaded with only ASCII-standard characters?


There are worlds where the sky is burning, and the sea's asleep, and the rivers dream. People made of smoke, and cities made of song. Somewhere there's danger, somewhere there's injustice, and somewhere else the tea's getting cold...
Re: Non-ASCII characters in stories [message #23284 is a reply to message #23281 ] Fri, 23 January 2009 21:00 Go to previous messageGo to next message
oljak.eru  is currently offline oljak.eru
Messages: 1341
Registered: December 2008
There's an easy solution to this - write a script that automatically converts any new stories from Windows-1252 into UTF-8 and in the process removes the Windows-1252 Content-Type meta element. Run it through all contents of the site once, then just have it run on any new content as it's added to the site.


“I am SO level-headed! And anyone who says different is going to have to answer to... The CABBIT OF DOOM!” -Jade
Re: Non-ASCII characters in stories [message #23293 is a reply to message #23281 ] Fri, 23 January 2009 22:44 Go to previous messageGo to next message
Goldie_HUnter  is currently offline Goldie_HUnter
Messages: 192
Registered: December 2006

I find that when I post my stories, I have saved them in plain .txt format and it seems to be better. I found out on the first part of my story that there were funky symbols that required changing the encoding in Firefox to a different one before it would come out correctly. But for the most part, things come cleanly when I d/l other people's stories.

goldie


I'm not a complete idiot - - Some parts are just missing.
Re: Non-ASCII characters in stories [message #23325 is a reply to message #23281 ] Sat, 24 January 2009 08:53 Go to previous messageGo to next message
Kristin Darken  is currently offline Kristin Darken
Messages: 567
Registered: January 2005
Location: California


The down side to using pure text is that it significantly limits formatting options that vastly improve readability. Font size changes, italics, bold, and so on are all frequently used in stories written for 'standard' publication. Without the ability to use italics, for example, telepathy (or subspace communicators) suddenly have to be explained and a standard combination of punctuation used to delineate where it starts and stops.

It's web publication and for the most part, its written to take advantage of the format options available... some of the stories in the past have even have embedded graphics and so on. Forcing it to pure text is a bit like taking a paperback novel and tearing out all the paper that doesn't have words printed on it. It's the same book... its probably still readable, but it might not give the same impression or read exactly the same. Any good tech or textbook writer (and a lot of the rest of us) knows that the white space can be just as important as the words.




Kristin Darken

Once upon a time...
Re: Non-ASCII characters in stories [message #23327 is a reply to message #23325 ] Sat, 24 January 2009 09:13 Go to previous messageGo to next message
Warren  is currently offline Warren
Messages: 1555
Registered: January 2005
Location: Wet wonderful Washington

I've gone around on the formatting issue previously. The last time was for vision impaired people and section breaks.

I am formating for the most readers use. Admittedly I mess up occasionally and don't run the story file through html tidy.

but the stats are available here
http://www.crystalhall.org/sitestats/

You can see the difference in who's using what by checking out the settings section. by default it's only showing data for ONE day.


Sometimes writing with geeks is like eating Jello with a chainsaw. Interesting but painful.
Re: Non-ASCII characters in stories [message #23440 is a reply to message #23281 ] Sun, 25 January 2009 17:19 Go to previous messageGo to next message
CNash  is currently offline CNash
Messages: 73
Registered: September 2008
I hadn't considered formatting like italics or bold because... well, I didn't actually know that the stories had it, as I read them all in plain text format and thus never see it! Smile

But I do see your point, and as I said, I wouldn't want anyone to do unreasonable amounts of work for the benefit of a minority (which might have only one member!). I'll continue as I've been doing.

One thing, though - double-spaces between words seem to occasionally show up as a special character and produce question marks. According to NoteTab, which I'm using to batch-convert the stories from HTML to text and replace curly quotes etc., the character converts to ASCII and produces this:

á

I can filter this, but I'm not sure why it's there in the first place. Really, I'm a bit of a newbie when it comes to character sets...


There are worlds where the sky is burning, and the sea's asleep, and the rivers dream. People made of smoke, and cities made of song. Somewhere there's danger, somewhere there's injustice, and somewhere else the tea's getting cold...
Re: Non-ASCII characters in stories [message #23443 is a reply to message #23440 ] Sun, 25 January 2009 17:49 Go to previous messageGo to next message
storyreader2005  is currently offline storyreader2005
Messages: 88
Registered: July 2005
Location: Ohio
Quote:

One thing, though - double-spaces between words seem to occasionally show up as a special character and produce question marks.

One of the characters in that double space is a "non-break space",
 
in html code.

That is because 2 regular spaces in a HTML page do not show up. The browsers display a single space only.
Re: Non-ASCII characters in stories [message #23565 is a reply to message #23281 ] Mon, 26 January 2009 13:21 Go to previous messageGo to next message
Rabiata  is currently offline Rabiata
Messages: 521
Registered: July 2008
Location: Germany
About formatting:

HTML formatting is not the problem here, as the HTML tags like <i>sometext</i> (italics in this case) consist of ASCII characters that match in all the common character encodings. I have yet to see a browser that fails to understand these.

The real problem is that ASCII only defines the character codes 0-127, and character encodings frequently differ for the codes 128-255. That leads to a mess has not been fully cleaned up yet through standardization.
Re: Non-ASCII characters in stories [message #23567 is a reply to message #23565 ] Mon, 26 January 2009 13:33 Go to previous messageGo to next message
oljak.eru  is currently offline oljak.eru
Messages: 1341
Registered: December 2008
Rabiata wrote on Mon, 26 January 2009 19:21

The real problem is that ASCII only defines the character codes 0-127, and character encodings frequently differ for the codes 128-255. That leads to a mess has not been fully cleaned up yet through standardization.
We have a solution that works for nearly any situation - Unicode. Just encode the content of the HTML in UTF-8, then attach an HTTP Content-Type header that tells the browser it's encoded in UTF-8.

The problem here is that these documents are encoded using Windows-1252 but the server is attaching an HTTP Content-Type header that tells the browser the document is UTF-8. But every character in the document is available in the UTF-8 encoding, so the easiest way to fix the problem is to encode the document in UTF-8 instead. There's probably a setting in whatever program the canon authors use to generate UTF-8 instead of Windows-1252, which should solve the problem. Another way to solve the problem is to script an encoding conversion that happens automatically when stories are uploaded.


“I am SO level-headed! And anyone who says different is going to have to answer to... The CABBIT OF DOOM!” -Jade
Re: Non-ASCII characters in stories [message #23569 is a reply to message #23281 ] Mon, 26 January 2009 13:44 Go to previous messageGo to next message
Warren  is currently offline Warren
Messages: 1555
Registered: January 2005
Location: Wet wonderful Washington

Simple fact of the matter is that I forget sometimes to run a story through HTML Tidy which converts "Dad,"

into

 & #8220;Dad,& #8221;


spaced apart so it would show correctly. If I left it closed up Ie the & next to the # it would read the character code and show the smart quotes. It also applies to the apostrophe and single quotes.

Microsoft Word relies on the font to show which goes where and converts on the fly to smart quotes instead of using " and '.

When I convert from a document or RTF I had to save it as a filtered HTML file to get most of the Microsoft formatting removed. At this point it's using the & ldqu and &rdqu for left and right quotes. Which doesn't show up correctly in all browsers.

Then after I edit the file in dreamweaver to incorporate section breaks and the other css formating used. I try to remember to run the whole thing through html tidy which always gets stuff I missed.

After updating various story listing pages on the site, I upload the the updated pages and the new story page to the site.


As I've said before "I'm shooting to make the most people happy." If you have vision problems requiring a reader, convert to a palm reader format or convert the file to an audio file to listen to while driving. I"m sorry if it doesn't meet your standards. I can't make EVERYONE happy. So I try to make as many as possible happy by reducing the formatting to very minimal which allows you to override it at your browser.


Sometimes writing with geeks is like eating Jello with a chainsaw. Interesting but painful.
Re: Non-ASCII characters in stories [message #23570 is a reply to message #23567 ] Mon, 26 January 2009 13:49 Go to previous messageGo to next message
XaltatunOfAcheron  is currently offline XaltatunOfAcheron
Messages: 1930
Registered: July 2005
Location: Atlantis
oljak.eru wrote on Mon, 26 January 2009 11:33

Rabiata wrote on Mon, 26 January 2009 19:21

The real problem is that ASCII only defines the character codes 0-127, and character encodings frequently differ for the codes 128-255. That leads to a mess has not been fully cleaned up yet through standardization.
We have a solution that works for nearly any situation - Unicode. Just encode the content of the HTML in UTF-8, then attach an HTTP Content-Type header that tells the browser it's encoded in UTF-8.

The problem here is that these documents are encoded using Windows-1252 but the server is attaching an HTTP Content-Type header that tells the browser the document is UTF-8. But every character in the document is available in the UTF-8 encoding, so the easiest way to fix the problem is to encode the document in UTF-8 instead. There's probably a setting in whatever program the canon authors use to generate UTF-8 instead of Windows-1252, which should solve the problem. Another way to solve the problem is to script an encoding conversion that happens automatically when stories are uploaded.


Actually, Warren has been handling the issue quite well - it's just that he forgot to run the story through HTMLTidy the last couple of times. This happens, especially when you've got an elderly relative who's on death's doorstep.

Bob Arnold runs the server (not Warren), and he's the guy who needs to look into changing the character type it's putting out to Windows-1252. Since (almost) all the authors use Word or an equivalent to create their stories, this would simplify things all around. However, it's his call, and he may very well have considerations I don't know about.

Xaltatun



Oxymoron: Jumbo Shrimp
Impossible: Sustainable Growth
Re: Non-ASCII characters in stories [message #23768 is a reply to message #23281 ] Tue, 27 January 2009 18:10 Go to previous message
CNash  is currently offline CNash
Messages: 73
Registered: September 2008
Warren, it's really not that big of a deal. I apologize if I seemed insistent or annyoing; as I've said, I really don't expect any changes to come from my whining! It's more idle curiosity on my part. I know it must seem like some upstart lurker is complaining for no real reason, but I didn't intend it that way.


There are worlds where the sky is burning, and the sea's asleep, and the rivers dream. People made of smoke, and cities made of song. Somewhere there's danger, somewhere there's injustice, and somewhere else the tea's getting cold...
Previous Topic:You missed it!
Next Topic:Well guys....
Goto Forum:
  


Current Time: Wed May 22 15:20:55 EDT 2013

Total time taken to generate the page: 0.01405 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.8.0.
Copyright ©2001-2009 FUDforum Bulletin Board Software