1
0
Fork 0
forked from forgejo/forgejo

Preserve BOM in web editor (#28935) (#28959)

Backport #28935 by @silverwind

The `ToUTF8*` functions were stripping BOM, while BOM is actually valid
in UTF8, so the stripping must be optional depending on use case. This
does:

- Add a options struct to all `ToUTF8*` functions, that by default will
strip BOM to preserve existing behaviour
- Remove `ToUTF8` function, it was dead code
- Rename `ToUTF8WithErr` to `ToUTF8`
- Preserve BOM in Monaco Editor
- Remove a unnecessary newline in the textarea value. Browsers did
ignore it, it seems but it's better not to rely on this behaviour.

Fixes: https://github.com/go-gitea/gitea/issues/28743
Related: https://github.com/go-gitea/gitea/issues/6716 which seems to
have once introduced a mechanism that strips and re-adds the BOM, but
from what I can tell, this mechanism was removed at some point after
that PR.

Co-authored-by: silverwind <me@silverwind.io>
(cherry picked from commit b8e6cffd31)
This commit is contained in:
Giteabot 2024-01-28 06:23:31 +08:00 committed by Earl Warren
parent afdce6bd45
commit b272224918
No known key found for this signature in database
GPG key ID: 0579CB2928A78A00
13 changed files with 69 additions and 134 deletions

View file

@ -174,7 +174,7 @@ func (b *Indexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserErro
return batch.Index(id, &RepoIndexerData{
RepoID: repo.ID,
CommitID: commitSha,
Content: string(charset.ToUTF8DropErrors(fileContents)),
Content: string(charset.ToUTF8DropErrors(fileContents, charset.ConvertOpts{})),
Language: analyze.GetCodeLanguage(update.Filename, fileContents),
UpdatedAt: time.Now().UTC(),
})

View file

@ -135,7 +135,7 @@ func (b *Indexer) addUpdate(ctx context.Context, batchWriter git.WriteCloserErro
Id(id).
Doc(map[string]any{
"repo_id": repo.ID,
"content": string(charset.ToUTF8DropErrors(fileContents)),
"content": string(charset.ToUTF8DropErrors(fileContents, charset.ConvertOpts{})),
"commit_id": sha,
"language": analyze.GetCodeLanguage(update.Filename, fileContents),
"updated_at": timeutil.TimeStampNow(),