Tile format: Difference between revisions

From Our World of Text Wiki
Jump to navigation Jump to search
m (Protected "Tile format" ([Edit=Allow only autoconfirmed users] (indefinite) [Move=Allow only autoconfirmed users] (indefinite)))
m (Removed protection from "Tile format")
(No difference)

Revision as of 23:45, 22 January 2023

A tile is comprised of several data structures that provide several key information about the tile. The most important bit of information is the content, which stores the tile's textual information.

  • Tile coordinates (X, Y)
  • Content (128 cells), including text decorations
  • Text color
  • Background color
  • Cell protection values
  • Per-cell links (url links and coordinate links)
  • Writability (protection value for the whole tile)

Coordinates

A cell within a tile is addressable in two ways. The most common way is by using a set of coordinates with (0, 0) starting at the top left cell of a tile. For instance, the cell at the bottom right corner would have the coordinates (15, 7). The second but less common way is to use a cell index (0 - 127), which starts at the top left cell and goes left to right, top to bottom.

The coordinates of a tile are signed, limited by the maximum and minimum safe values of a double-precision number. The minimum and maximum values are -9007199254740991 and 9007199254740991 respectively. Cartesian coordinates are used, with Y becoming negative as you go up, and positive as you go down. X becomes positive as you go right, and negative as you go left.

Content

A tile is made up of 128 cells, with each cell only being able to contain one character. A cell can support a UTF-8 character with surrogate-pair handling, plus up to sixteen combining characters after it. A cell cannot have a lone surrogate pair, and it cannot have a lone combining character or start with one. A cell also cannot have any NUL (\0) characters in it.

During storage and transmission, the content is a string that must be properly split into an array of 128 strings for proper processing,

Because of the ability to support surrogate pairs, a cell can support any character beyond the Basic Multilingual Plane, which includes Emoji characters. For backwards compatibility purposes, Javascript handles strings by 16-bit code points. This makes working with Emoji characters difficult whenever a string containing such characters is split using the String.prototype.split function. While a simple way to solve this problem is by using the [...string] syntax and the charCodeAt/fromCharCode functions, it's useful to know how to properly split a string containing surrogate characters.

In Unicode, all characters have a code point which is a numerical value uniquely referring to that character. For instance, the character 'A' has a code point of 65 (41 in hex), and the character '♫' has a code point of 9835 (266B in hex). Because Javascript strings are encoded as UTF-16 code points, code points range from 0 to 65535. This clearly isn't enough for all characters, which is why specific ranges have been reserved to reference characters beyond that range. The solution is to put two surrogate characters together, with the left one ranging from 0xD800 to 0xDBFF (1024 values), and the right one ranging from 0xDC00 to 0xDFFF (1024 values). This allows for a combination of 1,048,576 extra code points. On top of our existing 65,536 code points, we can now reference 1,114,112 total code points, not excluding the surrogate characters.

A character can also be followed by combining characters, which can provide accent marks and diacritical marks to a character. Combining characters can be stacked up to sixteen times and can also follow a surrogate pair. The supported hex ranges are 0300-036F, 1DC0-1DFF, 20D0-20FF, and FE20-FE2F. In total, 240 unique combing code points can be supported.

The range 20F0 to 20FF is mostly unused by the Unicode standard and is used to store text formatting data to save resources. These are the last sixteen code points under the "Combining Diacritical Marks for Symbols" block. The decision to store text formatting data as combining characters rather than a separate data structure was done to save time during development since no server-side changes needed to be done to add text decoration support. The only text decoration modes that are supported are bold, italic, underline, and strikethrough. A 4-bit integer is used to store the text decoration data, with bit 0 storing strikethrough, bit 1 storing underline, bit 2 storing italic, and bit 3 storing bold. Prior to doing processing, it's recommended to normalize the integer by subtracting the base (0x20F0)

  • Bold: code >> 3 & 1
  • Italic: code >> 2 & 1
  • Underline: code >> 1 & 1
  • Strikethrough: code & 1

To construct the integer, the formula bold << 3 | italic << 2 | under << 1 | strike is used. You must ensure that the base (0x20F0) is added to it before converting it into a character. For text decorations to work properly, the combining character must be at the very end of the cell string. If there is more than one, they must all be truncated prior to processing.

Color & Background color

Each cell of a tile can have text color and background color assigned to it. An array of 128 integers is used to store the color information of a cell. The integers are 24-bit unsigned integers (0-16777215) that refer to a RGB color value.