Tile format

From Our World of Text Wiki
Revision as of 01:06, 23 January 2023 by Owot (talk | contribs) (minor typo)
Jump to navigation Jump to search

A tile is comprised of several data structures that provide several key information about the tile. The most important bit of information is the content, which stores the tile's textual information.

  • Tile coordinates (X, Y)
  • Content (128 cells), including text decorations
  • Text color
  • Background color
  • Per-cell protection values
  • Per-cell links (url links and coordinate links)
  • Writability (protection value for the whole tile)

Coordinates

A cell within a tile is addressable in two ways. The most common way is by using a set of coordinates with (0, 0) starting at the top left cell of a tile. For instance, the cell at the bottom right corner would have the coordinates (15, 7). The second but less common way is to use a cell index (0 - 127), which starts at the top left cell and goes left to right, top to bottom.

The coordinates of a tile are signed, limited by the maximum and minimum safe values of a double-precision number. The minimum and maximum values are -9007199254740991 and 9007199254740991 respectively. Cartesian coordinates are used, with Y becoming negative as you go up, and positive as you go down. X becomes positive as you go right, and negative as you go left.

Content

A tile is made up of 128 cells, with each cell only being able to contain one character. A cell can support a UTF-8 character with surrogate-pair handling, plus up to sixteen combining characters after it. A cell cannot have a lone surrogate pair, and it cannot have a lone combining character or start with one. A cell also cannot have any NUL (\0) characters in it.

During storage and transmission, the content is a string that must be properly split into an array of 128 strings for proper processing. The string is under the "content" property of a tile that is being transmitted through the network and is mandatory for an existing tile.

Because of the ability to support surrogate pairs, a cell can support any character beyond the Basic Multilingual Plane, which includes Emoji characters. For backwards compatibility purposes, Javascript handles strings by 16-bit code points. This makes working with Emoji characters difficult whenever a string containing such characters is split using the String.prototype.split function. While a simple way to solve this problem is by using the [...string] syntax and the charCodeAt/fromCharCode functions, it's useful to know how to properly split a string containing surrogate characters.

In Unicode, all characters have a code point which is a numerical value uniquely referring to that character. For instance, the character 'A' has a code point of 65 (41 in hex), and the character '♫' has a code point of 9835 (266B in hex). Because Javascript strings are encoded as UTF-16 code points, code points range from 0 to 65535. This clearly isn't enough for all characters, which is why specific ranges have been reserved to reference characters beyond that range. The solution is to put two surrogate characters together, with the left one ranging from 0xD800 to 0xDBFF (1024 values), and the right one ranging from 0xDC00 to 0xDFFF (1024 values). This allows for a combination of 1,048,576 extra code points. On top of our existing 65,536 code points, we can now reference 1,114,112 total code points, not excluding the surrogate characters.

A character can also be followed by combining characters, which can provide accent marks and diacritical marks to a character. Combining characters can be stacked up to sixteen times and can also follow a surrogate pair. The supported hex ranges are 0300-036F, 1DC0-1DFF, 20D0-20FF, and FE20-FE2F. In total, 240 unique combining code points can be supported.

The range 20F0 to 20FF is mostly unused by the Unicode standard and is used to store text formatting data to save some resources. These are the last sixteen code points under the "Combining Diacritical Marks for Symbols" Unicode block. The decision to store text formatting data as combining characters rather than a separate data structure was done to save time during development since no server-side changes needed to be done to add text decoration support. The only text decoration modes that are supported are bold, italic, underline, and strikethrough. A 4-bit integer is used to store the text decoration data, with bit 0 storing strikethrough, bit 1 storing underline, bit 2 storing italic, and bit 3 storing bold. Prior to doing processing, it's recommended to normalize the integer by subtracting the base (0x20F0):

  • Bold: code >> 3 & 1
  • Italic: code >> 2 & 1
  • Underline: code >> 1 & 1
  • Strikethrough: code & 1

To construct the integer, the formula bold << 3 | italic << 2 | under << 1 | strike is used. You must ensure that the base (0x20F0) is added to it before converting it into a character. For text decorations to work properly, the combining character must be at the very end of the cell string. If there is more than one text decoration combining character at the end of the string, they must all be truncated prior to processing.

Color & Background color

Each cell in a tile can have a text color and a background color assigned to it. An array of 128 integers is used to store the color information of all cells in a tile. The integers are 24-bit unsigned integers (0-16777215) that refer to a RGB color value. In the case of background colors, the integer -1 is used to denote the lack of any assigned background color. The two arrays are stored under the "properties" object of a file. The text color array is stored as "color", and the background color array is stored as "bcolor". Both arrays are optional.

Writability

The writability of a tile indicates the protection level of a tile. It is the "writability" property under "properties" in a network-transmitted tile. The following list displays the meaning of each value:

  • null
    • This tile has the default protection, which is inherited from the world. For instance, if the world is set to be writable by members only, then the tile's writability is virtually that of a member-protected tile.
  • 0
    • This tile is public, which is not affected by the world's writability status.
  • 1
    • This tile can only be modified by members.
  • 2
    • This tile can only be modified by the owner.

Per-Cell Protection

Each cell in a tile can have a protection value assigned to it. The information carrying this data is stored as a string under the "char" property under "properties". The string is a base64 encoded bitfield of the protection status of each cell. Two bits are given to each cell, allowing for up to four different values. The first character of the "char" string indicates the format, with "@" indicating the base64 type, "#" indicating numerical values, and "x" indicating 16-bit padded hexadecimal.

The base64 table as used by the format is as follows: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

In order to decode the base64 string, the "@" character must first be stripped. Then, all characters must be iterated through. The base64 character must be converted to its index within the base64 table. Each base64 character holds information for three cells. To retrieve the information for the next three characters, the respective formulas are used: idx >> 4 & 3; idx >> 2 & 3; idx & 3. To allow for the proper encoding of protection information in base64, the writability values are shifted up by one and null is assigned 0. When decoding, the numbers are shifted down by one and what was originally zero is assigned null. To re-encode the string, the "@" character must first be added to indicate the base64 type. Then, the following formula is used for the next three characters: c1 << 4 | c2 << 2 | c3. If no characters are left, the value 0 is substituted. The final step is to convert the value into a base64 character and then append it to the string.

Per-Cell Links