module watt.markdown.lines

Split a file into line objects..

Code Map

//! Split a file into line objects..
module watt.markdown.lines;


//! Represents a line of input.
struct Line
{
public:
	iterationCount: size_t;
	listCount: size_t;


public:
	fn set(str: string, idx: size_t) { }
	fn unchanged() bool { }
	fn toString() string { }
	fn empty() bool { }
	//! Length of the advanced portion.
	fn length() size_t { }
	//! Retrieve the nth character from the start.
	fn opIndex(n: size_t) char { }
	//! Return a slice of the advanced string, starting from a (inclusive),
	//! going to b (exclusive). If the slice is out of range, an empty string
	//! will be returned.
	fn slice(a: size_t, b: size_t) string { }
	//! Advance the line forward by n characters. Tabs encountered will be
	//! expanded. If n is greater than the rest of the line, the result will be
	//! an empty string.
	fn advance(n: size_t) { }
	fn leadingWhitespace() size_t { }
	fn realLeadingWhitespace() size_t { }
	fn realContiguousWhitespace(a: size_t) size_t { }
	fn contiguousWhitespace(a: size_t) size_t { }
	//! Call advance while this line is non-empty and starts with whitespace.
	fn consumeWhitespace() size_t { }
	//! Call advance if this line is non-empty and starts with a given
	//! character.
	fn consumeChar(c: char) bool { }
	//! Make this line empty.
	fn clear() { }
	//! Remove whitespace from both sides of the underlying string. The
	//! advanced string index is reset.
	fn strip() { }
	//! Replace a with b.
	fn replace(a: string, b: string) { }
	//! Remove (from the underlying string) any trailing whitespace
	//! characters.
	fn stripRight() { }
	fn removeLast() { }


public:
	//! Given a string, return a list of Line structures, split by the \n
	//! characters present in the string.
	static fn split(src: string) Line[] { }
}

struct Line

Represents a line of input.

CommonMark handles tabs as tabs with a tab stop of 4 set. The tab stop is considered with regards to the whole line, even parts that a given parsing function won't see.

Consider the following, a >, two tabs, and the string "content".

 >		content

The rules for blockquotes state that a block quote is a '>' an optional space, and the rest of the line is the content.

The rules for indented code blocks state that an indented codeblock is four spaces of indentation, and then the content of the codeblock. If the rule was "tabs are expanded to four spaces", the above would be trivial.

><tab><tab>content

Becomes (where '.' is space)

>........content

The '> ' is consumed when the blockquote is parsed, and then

.......content

Is parsed as a code block, with the result being

...content

A codeblock with content indented by three spaces. But that is not how CommonMark handles tabs.

Tabs are considered to be expanded to spaces where they have to be (when you're removing leading whitespace, as above, for example), but as tab stops, so the text rounds to 4, considering the . So, (given that | is invisible and represents groups of four characters):

><tab><tab>c|onte|nt

Becomes (where

>...|<tab>con|tent

The '> ' is removed, as before. (where the text in parens has been removed, but we need to consider it so as to not break tab stops).

(>.)..|<tab>con|tent

Then the codeblock parser needs to remove four spaces of indentation, and as we only have two spaces on front, we need to expand the tab again.

(>.)..|....|cont|ent

And remove four spaces.

(>...)|(..)..|cont|ent

And that's how '> content' becomes ' content' in a quoted codeblock. Simple!

Anyway, we need to be able to look at the entire string to remove leading whitespace, but the following parsing functions need to see what's been removed. The expansion of tabs occurs in place.

fn split(src: string) Line[]

Given a string, return a list of Line structures, split by the \n characters present in the string.

fn toString() string

Return

The advanced string.

fn empty() bool

Return

True if the advanced string is empty.

fn length() size_t

Length of the advanced portion.

fn opIndex(n: size_t) char

Retrieve the nth character from the start.

fn slice(a: size_t, b: size_t) string

Return a slice of the advanced string, starting from a (inclusive), going to b (exclusive). If the slice is out of range, an empty string will be returned.

fn advance(n: size_t)

Advance the line forward by n characters. Tabs encountered will be expanded. If n is greater than the rest of the line, the result will be an empty string.

fn consumeWhitespace() size_t

Call advance while this line is non-empty and starts with whitespace.

fn consumeChar(c: char) bool

Call advance if this line is non-empty and starts with a given character.

Return

True if the line was advanced.

fn clear()

Make this line empty.

fn strip()

Remove whitespace from both sides of the underlying string. The advanced string index is reset.

fn replace(a: string, b: string)

Replace a with b.

fn stripRight()

Remove (from the underlying string) any trailing whitespace characters.