This small utility returns the character encoding of an input stream containing an XML document. The encoding is detected using the guidelines specified in the XML W3C Specification, and the method was designed to be as fast as possible, without extensive string operations or regular expressions
A sample use scenario would be:
InputStream in = ...; String encoding = detectEncoding(in); BufferedReader reader = new BufferedReader(new InputStreamReader(in,encoding));
And from that point you can happily read text from the input stream.
detectEncoding() takes a single parameter - an InputStream containing the data to be read. The stream must support mark()/reset(), otherwise the caller should wrap that stream in a BufferedInputStream before invoking the method. After the call, the stream is positioned at the < character (this means that if there were any byte-order-marks, they are skipped).
The method returns the detected encoding as a String, the canonical name in java.io (see Supported Encodings ).
|
detectEncoding.java ( 5 Kb ) |