extractUrls doesn't handle Non-Latin characters #131

eipark · 2014-07-29T20:33:14Z

This seems to be an intentional decision, but extractUrls does not handle any links that may have non-latin characters. For instance, URL's that would not be properly extracted:

extractUrl's would return:

It seems the only reason that this is the case, according to the README, is that in Japanese/Korean/Chinese, sometimes links are not followed by a space. The behavior is consistent with what I see on twitter.com. To me it seems like extractUrls should be simpler and just delimit based on spaces which would allow uncommon characters, as this is a more common use case (correct me if I'm wrong on that though). And for the use case of twitter.com, since links are highlighted as you type them, Asian tweeters will know to stick a space between links and their text.

Was there some discussion on going one way or the other on this?

The text was updated successfully, but these errors were encountered:

jakl · 2014-07-30T00:19:14Z

Yes I think you raise some good points and we're working on linking more unicode characters as valid URLs. Separating by spaces would be my vote because it's fairly standard/expected. #Simplify

eipark · 2014-07-30T18:21:43Z

Any chance this is in the works @jakl ? Just changing that method itself is relatively trivial, but it also has implications on getTweetLength and I imagine there'd be a bit of Twitter internal non-code change as well.

jakl · 2014-07-30T18:31:15Z

It's a longer term effort - and I've been pressed for time by many other projects.
I'll keep this issue open, and make sure it gets proper visibility internally.
Also any changes need to be reflected across rb/java/objc/conformance too.

eipark · 2014-07-30T18:51:57Z

Yeah makes sense - thanks for taking a look.

eipark · 2014-10-14T20:08:20Z

Hi - any update on this? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extractUrls doesn't handle Non-Latin characters #131

extractUrls doesn't handle Non-Latin characters #131

eipark commented Jul 29, 2014

jakl commented Jul 30, 2014

eipark commented Jul 30, 2014

jakl commented Jul 30, 2014

eipark commented Jul 30, 2014

eipark commented Oct 14, 2014

extractUrls doesn't handle Non-Latin characters #131

extractUrls doesn't handle Non-Latin characters #131

Comments

eipark commented Jul 29, 2014

jakl commented Jul 30, 2014

eipark commented Jul 30, 2014

jakl commented Jul 30, 2014

eipark commented Jul 30, 2014

eipark commented Oct 14, 2014