diff options
author | Hyeseong Kim <cometkim.kr@gmail.com> | 2018-05-01 01:54:11 +0900 |
---|---|---|
committer | Christopher Speller <crspeller@gmail.com> | 2018-04-30 09:54:11 -0700 |
commit | e73f1d73143ebba9c7e80d21c45bba9b61f2611c (patch) | |
tree | e84e85d12308ae5624643ff7768f55be45c3350c /vendor/golang.org/x/text/encoding/internal/identifier/identifier.go | |
parent | 9c5815ee41f29e27774d17382d9a4bd10d208545 (diff) | |
download | chat-e73f1d73143ebba9c7e80d21c45bba9b61f2611c.tar.gz chat-e73f1d73143ebba9c7e80d21c45bba9b61f2611c.tar.bz2 chat-e73f1d73143ebba9c7e80d21c45bba9b61f2611c.zip |
MM-9072/MM-10185 Force-convert the encoding of OpenGraph metadata to UTF-8 (#8631)
* Force-convert non-UTF8 HTML to UTF8 before opengraph processing
* Split the force-encoding function
* Add benchmark Test for the forceHTMLEncodingToUTF8()
```
Running tool: /home/comet/go-v1.9.2/bin/go test -benchmem -run=^$ github.com/mattermost/mattermost-server/app -bench ^BenchmarkForceHTMLEncodingToUTF8$
[03:32:58 KST 2018/04/21] [INFO] (github.com/mattermost/mattermost-server/app.TestMain:28) -test.run used, not creating temporary containers
goos: linux
goarch: amd64
pkg: github.com/mattermost/mattermost-server/app
BenchmarkForceHTMLEncodingToUTF8/with_converting-4 100000 11201 ns/op 18704 B/op 32 allocs/op
BenchmarkForceHTMLEncodingToUTF8/without_converting-4 300000 3931 ns/op 4632 B/op 13 allocs/op
PASS
ok github.com/mattermost/mattermost-server/app 2.703s
Success: Benchmarks passed.
```
* Remove an unnecessary constraint
* Add pre-check if content-type header is already utf-8
* Move the checking for utf-8 into forceHTMLEncodingToUTF8() for testing
* Revert df3f347213faa0d023c26d201fa6531f46391086..HEAD, without Gopkg.lock
Diffstat (limited to 'vendor/golang.org/x/text/encoding/internal/identifier/identifier.go')
-rw-r--r-- | vendor/golang.org/x/text/encoding/internal/identifier/identifier.go | 81 |
1 files changed, 81 insertions, 0 deletions
diff --git a/vendor/golang.org/x/text/encoding/internal/identifier/identifier.go b/vendor/golang.org/x/text/encoding/internal/identifier/identifier.go new file mode 100644 index 000000000..7351b4ef8 --- /dev/null +++ b/vendor/golang.org/x/text/encoding/internal/identifier/identifier.go @@ -0,0 +1,81 @@ +// Copyright 2015 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +//go:generate go run gen.go + +// Package identifier defines the contract between implementations of Encoding +// and Index by defining identifiers that uniquely identify standardized coded +// character sets (CCS) and character encoding schemes (CES), which we will +// together refer to as encodings, for which Encoding implementations provide +// converters to and from UTF-8. This package is typically only of concern to +// implementers of Indexes and Encodings. +// +// One part of the identifier is the MIB code, which is defined by IANA and +// uniquely identifies a CCS or CES. Each code is associated with data that +// references authorities, official documentation as well as aliases and MIME +// names. +// +// Not all CESs are covered by the IANA registry. The "other" string that is +// returned by ID can be used to identify other character sets or versions of +// existing ones. +// +// It is recommended that each package that provides a set of Encodings provide +// the All and Common variables to reference all supported encodings and +// commonly used subset. This allows Index implementations to include all +// available encodings without explicitly referencing or knowing about them. +package identifier + +// Note: this package is internal, but could be made public if there is a need +// for writing third-party Indexes and Encodings. + +// References: +// - http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/convrtrs.txt +// - http://www.iana.org/assignments/character-sets/character-sets.xhtml +// - http://www.iana.org/assignments/ianacharset-mib/ianacharset-mib +// - http://www.ietf.org/rfc/rfc2978.txt +// - http://www.unicode.org/reports/tr22/ +// - http://www.w3.org/TR/encoding/ +// - https://encoding.spec.whatwg.org/ +// - https://encoding.spec.whatwg.org/encodings.json +// - https://tools.ietf.org/html/rfc6657#section-5 + +// Interface can be implemented by Encodings to define the CCS or CES for which +// it implements conversions. +type Interface interface { + // ID returns an encoding identifier. Exactly one of the mib and other + // values should be non-zero. + // + // In the usual case it is only necessary to indicate the MIB code. The + // other string can be used to specify encodings for which there is no MIB, + // such as "x-mac-dingbat". + // + // The other string may only contain the characters a-z, A-Z, 0-9, - and _. + ID() (mib MIB, other string) + + // NOTE: the restrictions on the encoding are to allow extending the syntax + // with additional information such as versions, vendors and other variants. +} + +// A MIB identifies an encoding. It is derived from the IANA MIB codes and adds +// some identifiers for some encodings that are not covered by the IANA +// standard. +// +// See http://www.iana.org/assignments/ianacharset-mib. +type MIB uint16 + +// These additional MIB types are not defined in IANA. They are added because +// they are common and defined within the text repo. +const ( + // Unofficial marks the start of encodings not registered by IANA. + Unofficial MIB = 10000 + iota + + // Replacement is the WhatWG replacement encoding. + Replacement + + // XUserDefined is the code for x-user-defined. + XUserDefined + + // MacintoshCyrillic is the code for x-mac-cyrillic. + MacintoshCyrillic +) |