Split a file into line objects..
Code Map
//! Split a file into line objects..
module watt.markdown.lines;
//! Represents a line of input.
struct Line
{
public:
iterationCount: size_t;
listCount: size_t;
public:
fn set(str: string, idx: size_t) { }
fn unchanged() bool { }
fn toString() string { }
fn empty() bool { }
//! Length of the advanced portion.
fn length() size_t { }
//! Retrieve the nth character from the start.
fn opIndex(n: size_t) char { }
//! Return a slice of the advanced string, starting from a (inclusive),
//! going to b (exclusive). If the slice is out of range, an empty string
//! will be returned.
fn slice(a: size_t, b: size_t) string { }
//! Advance the line forward by n characters. Tabs encountered will be
//! expanded. If n is greater than the rest of the line, the result will be
//! an empty string.
fn advance(n: size_t) { }
fn leadingWhitespace() size_t { }
fn realLeadingWhitespace() size_t { }
fn realContiguousWhitespace(a: size_t) size_t { }
fn contiguousWhitespace(a: size_t) size_t { }
//! Call advance while this line is non-empty and starts with whitespace.
fn consumeWhitespace() size_t { }
//! Call advance if this line is non-empty and starts with a given
//! character.
fn consumeChar(c: char) bool { }
//! Make this line empty.
fn clear() { }
//! Remove whitespace from both sides of the underlying string. The
//! advanced string index is reset.
fn strip() { }
//! Replace a with b.
fn replace(a: string, b: string) { }
//! Remove (from the underlying string) any trailing whitespace
//! characters.
fn stripRight() { }
fn removeLast() { }
public:
//! Given a string, return a list of Line structures, split by the \n
//! characters present in the string.
static fn split(src: string) Line[] { }
}
Represents a line of input.
CommonMark handles tabs as tabs with a tab stop of 4 set. The tab stop is considered with regards to the whole line, even parts that a given parsing function won't see.
Consider the following, a >, two tabs, and the string "content".
> content
The rules for blockquotes state that a block quote is a '>' an optional space, and the rest of the line is the content.
The rules for indented code blocks state that an indented codeblock is four spaces of indentation, and then the content of the codeblock. If the rule was "tabs are expanded to four spaces", the above would be trivial.
><tab><tab>content
Becomes (where '.' is space)
>........content
The '> ' is consumed when the blockquote is parsed, and then
.......content
Is parsed as a code block, with the result being
...content
A codeblock with content indented by three spaces. But that is not how CommonMark handles tabs.
Tabs are considered to be expanded to spaces where they have to be
(when you're removing leading whitespace, as above, for example), but
as tab stops, so the text rounds to 4, considering the
><tab><tab>c|onte|nt
Becomes (where
>...|<tab>con|tent
The '> ' is removed, as before. (where the text in parens has been removed, but we need to consider it so as to not break tab stops).
(>.)..|<tab>con|tent
Then the codeblock parser needs to remove four spaces of indentation, and as we only have two spaces on front, we need to expand the tab again.
(>.)..|....|cont|ent
And remove four spaces.
(>...)|(..)..|cont|ent
And that's how '> content' becomes ' content' in a quoted codeblock. Simple!
Anyway, we need to be able to look at the entire string to remove leading whitespace, but the following parsing functions need to see what's been removed. The expansion of tabs occurs in place.
Given a string, return a list of Line structures, split by the \n characters present in the string.
Return
The advanced string.
Return
True if the advanced string is empty.
Length of the advanced portion.
Retrieve the nth character from the start.
Return a slice of the advanced string, starting from a (inclusive), going to b (exclusive). If the slice is out of range, an empty string will be returned.
Advance the line forward by n
characters.
Tabs encountered will be expanded.
If n is greater than the rest of the line, the result
will be an empty string.
Call advance while this line is non-empty and starts with whitespace.
Call advance if this line is non-empty and starts with a given character.
Return
True if the line was advanced.
Make this line empty.
Remove whitespace from both sides of the underlying string. The advanced string index is reset.
Replace a
with b
.
Remove (from the underlying string) any trailing whitespace characters.