Text Encoding and Escaped Characters

Every so often someone reports a bug that AjaxMin is turning escaped characters into question marks, or boxes, or some other invalid characters. If you are seeing, IT IS NOT A BUG IN AJAXMIN, BUT SIMPLY A TEXT ENCODING MISMATCH ISSUE. AjaxMin outputs UTF-8 by default; if you require a different encoding, please use the –enc:out switch.

By default, AjaxMin expects input text to be in the UTF-8 text encoding; and also by default, it will output its results as UTF-8 with no Byte Order Mark (BOM). The UNICODE Standard neither requires nor recommends the use of a BOM in UTF-8 files. If you require a BOM in the front of your UTF-8 output file, explicitly specify the UTF-8 output encoding with the -enc:out utf-8 switch.

Because UTF-8 text encodings can represent every character in the UNICODE set of characters, there is no need to escape any character in the default output files generated by AjaxMin (for JavaScript: \uXXXX, and for CSS: \XXXXXX). But that does assume that the server on which you are hosting your files correctly understands your file encoding, and is correctly serving them as UTF-8. If your server is serving JavaScript and/or CSS files as ASCII, and if your code contains non-ASCII characters that are correctly encoded in UTF-8 but become garbled when served to the client as ASCII files, simply specify the ASCII output encoding with the –enc:out ascii switch and AjaxMin will automatically escape all non-ASCII characters. UTF-8 is the de facto text encoding of the Internet, so if you are having this problem, it is preferable that you fix your server. However, one does not always have the ability to change server settings and/or encodings, so forcing AjaxMin output to ASCII is sometimes the safest solution.

And just for the record, YES: I have used and continue to use libraries such as font-awesome (which really is awesome) and bootstrap in my website code; and YES: they work just fine after being run through AjaxMin with the default UTF-8 encoding. Again, if you are experiencing issues with the UTF-8 encoded content properties in your AjaxMin output, the issue is with your server, or with some other processing step in your build process. You can either fix the server settings, change your build process, or just change the AjaxMin output to be encoded with the ASCII text encoding. If you continue to have problems, please create a new Discussion item, not a new Issue, as the problem is [most likely] not with AjaxMin per se.

One common build-process issue I’ve seen is the piping of AjaxMin output. Sometimes console pipes will treat the text streamed through them as ASCII. So when AjaxMin outputs UTF-8 encoded text, the pipe sends them to the next process garbled. Whenever piping output of AjaxMin to another process, it’s a good idea to use the ASCII output encoding unless you know for sure that your console pipes can support UTF-8 streams. I’ve also seen files generated by AjaxMin get concatenated together using console commands, like copy, that generate ASCII output files. If your input files are UTF-8 encoded by the console app/command encodes the output as ASCII, you may get garbled extended characters. Again, any kind of console manipulation of AjaxMin output in your build process should be scrutinized to make sure the processes support UTF-8, and if not, AjaxMin should be set to output as ASCII.

Last edited Feb 25, 2014 at 4:53 PM by ronlo, version 3

Comments

gfox1984 Feb 9 at 7:28 AM 
How do you do that using the Minfier API in C#?