Ideal Web National Language Considerations

Products

Datacom DATACOM - AD Ideal Datacom/AD Datacom/DB Datacom/Server

Issue/Introduction

Mainframes use EBCDIC and Web Browsers use ASCII. How does the translation between these get done in an Ideal/Web environment? And what extra considerations are there for web-based applications?

Environment

Release: Ideal for Datacom

Resolution

First, you should note that the header of an HTTP request is always purely US-ASCII, that is, all characters are in the range x'00' to x'7F', which is identical in all code pages. This means that this portion of a request can always be translated to EBCDIC by the same translate table, and the CICS Web Interface (CWI) does this before passing the header information to the Analyzer exit. If any characters outside that range are part of the URL, they have been escaped by the browser as "%xx" where the xx is the ASCII hexadecimal representation of the character in the customer's codepage. Many characters within the first 0 - 127 range are also escaped this way to avoid conflict with the parsing of the request.

The body of the request is potentially coded in multiple ways according to the content. There is a header field Content-encoding that allows a global description of the entire body, and it is also possible to split the body into multiple sections, each with its own encoding. Decoding must be done before the content's character set comes into play. Normally the request will not use any encoding, and CWI does nothing in this area, but sites using multi-byte character sets will have additional work to do here.

The Content-type field of the header may contain a charset= subfield to declare the character set used in the request body. If omitted, it is assumed to be ISO-8859-1, which is what the Broadcom supplied samples work with, so there is no logic present to check for any other values. If this is not your native character set, then you will need to take extra actions to accommodate that. You can add code to the Analyzer to look for this information, and CICS will use it to select the translate table for converting the request body to EBCDIC.

HTTP POST requests from a browser will also have "URL-encoded" the parameter data in the body in the same way mentioned earlier for characters in the URL in the header; that is, there will be sequences of "%xx" representing characters that are outside the standard range, or conflict with the HTTP syntax. These are translated to EBCDIC by a translate table in @I$IPOST when you use that routine to retrieve the parameter values, and you need to modify that table if you are not using ISO-8859-1 in your location.

The same Content-type header field is also present in the response, so an Ideal application that is producing a Web page or XML document using a local character set should include the charset= declaration in order for the browser or XML parser to process it correctly. Again the default value of ISO-8859-1 is used by all the sample code that Ideal provides, so this is omitted. CICS will use the customer codepage it picked up on the way in to perform the translation of the response body back from EBCDIC to ASCII.

Ideal applications that produce content containing URLs (such as the WEBDEMO6 sample) have to do their own URL-encoding to produce valid content. The sample code performed a simple translate from space to "+", as this was the only character of significance in the data field involved, but the same application handling names in other character sets would have to do additional conversions of the extra characters involved, both outgoing as part of the link and incoming as a parameter to the next request.