Previous Thread
Next Thread
Print Thread
accented char problem #32885 17 Jun 20 01:26 AM
Joined: Jun 2001
Posts: 3,376
J
Jorge Tavares - UmZero Online Content OP
Member
OP Online Content
Member
J
Joined: Jun 2001
Posts: 3,376
Hi Jack,

I'm experiencing a problem with the ú portuguese character that prints like ûû
I could post this in the Printing topic but, I've seen this in the browser while sending data via CGI so, I think it's not related to print.
Adding to this:
1. all other accented characters are fine
2. in VUE it displays ok, as well as in Excel where it was originally written

Any suggestion?

Thanks


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: accented char problem [Re: Jorge Tavares - UmZero] #32886 17 Jun 20 04:31 AM
Joined: Jun 2001
Posts: 11,645
J
Jack McGregor Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 11,645
Are we talking about the Latin 1 chr(250)?
I could believe that there might be a special issue with it, since it is used as part of a sequence for defining exitcodes, i.e. chr(7)+chr(250)+"123.", which is why you often see that character in the CmdLib column of control dumps.
But I'm having trouble reproducing the problem.
Here for example is a simple program outputting a message containing that character in three ways...

Code
++include'once ashinc:ashell.def

    ? tab(-1,0);"¿Qué dices tú?"
    
    tprint tab(3,1);"¿Qué dices tú?"
    
    xcall AUI, AUI_CONTROL, CTLOP_ADD, "txt", "¿Qué dices tú?", MBST_ENABLE, &
        MBF_STATIC, NUL_CMD$, NUL_FUNC$, NUL_CSTATUS, &
        5,10,6,30, NUL_FGC, NUL_BGC, &
        NUL_FONTATTR, 200
    end

[Linked Image]

Do you get the problem with this simple program? In that case, it seems to be some kind of configuration issue (miame.ini?) or environment (language/regional settings?)

Re: accented char problem [Re: Jorge Tavares - UmZero] #32889 17 Jun 20 12:48 PM
Joined: Jun 2001
Posts: 3,376
J
Jorge Tavares - UmZero Online Content OP
Member
OP Online Content
Member
J
Joined: Jun 2001
Posts: 3,376
Good morning Jack,
(second reply after a lost connection sent the first one to a dark hole) mad

The virtue of the simple things is to remove the noise around.
I didn't get into much details in my initial post and, obviously, your simple test couldn't fail here.
But your simple test organized my brain to split the amalgam in my code and find the culprit, that has nothing to do with printing or displaying data, but assigning variables.

map1 x$,x,0
dimx $item, ordmap(varstr; varstr)
x$ = "Número"
$item(" ") = x$
? $item(" ")
result:
Nûûmero
solution (use varx):
dimx $item, ordmap(varstr; varx)

To complete the example, even if strange, defining a size to the x variable result in a padding of ú
map1 x$,x,10
...
result:
Nûûmeroúúúú

I've done all possible assignments with the x variable to strings and all except ordmap varstr were fine so, maybe you want to fix it even considering that varx should be in place, no?

I have a related topic, but I'll post it in the proper place.

thanks for have enlightened the way to the solution.
(now a copy before click Post wink )


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal
Re: accented char problem [Re: Jorge Tavares - UmZero] #32893 17 Jun 20 04:31 PM
Joined: Jun 2001
Posts: 11,645
J
Jack McGregor Online Content
Member
Online Content
Member
J
Joined: Jun 2001
Posts: 11,645
This hopefully doesn't make you feel any better, but I've had that same experience too many times to count. (Each time I try to remember to copy and paste a complicated post to notepad first, but naturally you don't think of that until too late.) Sometimes however, I'm pleasantly surprised to find that I can use the back-arrow in the browser to recover the un-submitted text.

Anyway, back to the real problem. It's actually a pretty good brain teaser, but you essentially solved it.

The underlying explanation is a bit convoluted though, and in the process of reviewing it, I think I've come up with a minor refinement which will reduce the chance of this kind of confusion happening again.

Here's the issue:

The underlying ordmap data structure doesn't currently support variable length binary data, i.e. it only supports strings. (And potentially fixed length binary data, but the problem there is that it requires a separate source code template for each size, which isn't practical to implement in the way we currently are using it.) So, the workaround was to internally convert binary ("blob" or X) data to string when storing in a map. This is the real difference between ordmap(varstr;varstr) and ordmap(varstr;varx) -- the latter format triggers a "String2Blob" and "Blob2String" conversion when going in and out of the map. There are lots of ways to do this, but since the only character that's really a problem with treating binary data as string is the null byte, it seemed overkill (from both CPU and memory overhead perspectives) to use something like MIME (3:4) or hex encoding (1:2) encoding, so I came up with my own highly optimized conversion that leaves 252 of the 256 possible binary bytes alone. Null was being converted to chr(250), and then a real chr(250) was being "escaped" in a scheme that also involved chr(251) and chr(252). I chose those characters after a simple statistical analysis suggested they were the least common characters to appear in a representative sample of data files. But I didn't think about the fact that they were accented accented "u" characters, partly because I didn't think that it made any difference, since the conversion should be transparent.

But, you managed to find the crack in that scheme, relying on the natural overlap/confusion between S and X types.

When assigning to an ordmap, if the source is not string, A-Shell is stringifying it, i.e. using Blob2String(src) on it, regardless of whether the destination is ordmap(varstr;varstr) or ordmap(varstr;varx). My reasoning was that since the ordmap can only store string data, there's no harm in doing the conversion, and if we don't we risk failing to store the entire data due to embedded nulls. But I may not have thought that completely through, as your example points out, since when reading data out of an ordmap, A-Shell is only doing the String2Blob() reverse conversion if it's an ordmap(varstr;varx). The result is what you've uncovered, i.e. that copying X data into a map results in the blob-to-string conversion, but copying data out of a map only triggers the reverse string-to-blob conversion if the map was (varstr;varx).

I think this calls for one or both of the following refinements:
  • Either stop using Blob2String(src) when copying to ordmap(varstr;varstr), or always using the reverse conversion when copying out of the map.
  • Change the special characters used in the string/blob conversion to be less likely to occur in accented text (e.g, division size, multiplication sign, feminine ordinal sign).


Now that you've solved your problem, I guess there's no immediate urgency, but I suspect the issue will surface again eventually so should probably be dealt with.

Re: accented char problem [Re: Jorge Tavares - UmZero] #32901 17 Jun 20 06:56 PM
Joined: Jun 2001
Posts: 3,376
J
Jorge Tavares - UmZero Online Content OP
Member
OP Online Content
Member
J
Joined: Jun 2001
Posts: 3,376
Thank you very much for the details, always interesting and useful to help understand and fix some issues.

As for the browser incident, unfortunately, the back arrow didn't work this time, it just gave me the first paragraph :-)

Abraço


Jorge Tavares

UmZero - SoftwareHouse
Brasil/Portugal

Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3