2013年4月27日

识别字符串编码是GBK或UTF-8

public static String guess(byte[] bytes) {
        try {
            String guess = new String(bytes, "gbk");
            String verify = new String(new String(bytes, "utf-8").getBytes("utf-8"), "gbk");
            if (!verify.equals(guess)) {
                return guess;
            } else {
                return new String(bytes, "utf-8");
            }
        } catch (UnsupportedEncodingException ignore) {}
        return new String(bytes, Charset.forName("utf-8"));
    }

一段猜测字符串的编码的代码,如果不是GBK,就认为是UTF-8。

String str = "参数test";
ByteBuffer gbkbb = Charset.forName("gbk").encode(str);
byte[] gbkbytes = new byte[gbkbb.remaining()];
gbkbb.get(gbkbytes);
ByteBuffer utf8bb = Charset.forName("utf-8").encode(str); byte[] utf8bytes = new byte[utf8bb.remaining()]; utf8bb.get(utf8bytes);
System.out.println(guess(gbkbytes).equals(guess(utf8bytes)));

输出true,测试通过。

不保证100%正确。