Skip to content
This repository has been archived by the owner on Sep 18, 2021. It is now read-only.

Tweet length counter seems to be conflicting with Twitter's "Character Counting" guideline about Unicode normalization #133

Open
hakatashi opened this issue Sep 21, 2014 · 0 comments

Comments

@hakatashi
Copy link

Since Twitter's official Character Counting guideline is saying "Tweet length is measured by the number of codepoints in the NFC normalized version of the text", "café" (U+0063 U+0061 U+0066 U+0065 U+0301) should be normalized as "café" (U+0063 U+0061 U+0066 U+00E9) and counted 4 characters. This fails by simply testing

var twitter = require('twitter-text');
console.log(twitter.getTweetLength('cafe\u0301'));

prints "5" and also the 30 times repetition of "café" (which should be counted as 120 characters) is kicked as invalid tweet.

By way of comparison, Ruby implementation works different since testing

require "twitter-text"
include Twitter::Validation
p tweet_length("cafe" + 0x0301.chr("UTF-8"))

prints "4" and the 30 times repetition passes validation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant