Using variation selectors, you can conceal messages of arbitrary length within emoji, or indeed any unicode character. "To be clear, this is an abuse of unicode and you shouldn't do it," writes Paul Butler. "If your mind is wandering to practical use cases for this, shut it down."
Most unicode characters do not have variations associated with them. Since unicode is an evolving standard and aims to be future-compatible, variation selectors are supposed to be preserved during transformations, even if their meaning is not known by the code handling them. So the codepoint
U+0067
("g") followed byU+FE01
(VS-2) renders as a lowercase "g", exactly the same asU+0067
alone. But if you copy and paste it, the variation selector will tag along with it.
The hidden data can be detected in text by machines looking for that sort of thing, so don't get carried away. The fun thing now might be to go looking in existing text to see who has been using this technique.
Previously:
• Steganographically hiding secret messages in fake fingerprints
• The Smiths vinyl contains hidden secret messages
• Hiding secret messages in whale song