|
Home Pages Pidgin Azarennya (S|N) Mac Textanium Reference ToDo Food Local Edit Local: Hide Weather • GoWhere? • YahooMaps (mine) • GoogleMaps • Metro (map) • FairfaxConnector • GreaterGreaterWashington • DCMetrocentric • WhatsUpArlington • Restonian • BeyondDC • BaconsRebellion Language: Hide Fantasy: Hide • AnnalsOfArda • Arda • SilmarillionChronology • TolkienGateway • SciFi: Hide • CentauriDreams • ColdNavy • ConceptShips • RavenstarStudios • SkyscraperPage • StarTrek • StarTrekVsStarWars • Film: Hide IMDB • BigHollywood • DKnowsAll • Jabootu • KyleSmith Music: Hide REALbasic: Hide • Resources • Garage • University • WebRing • Forums: • REAL • ElfData • Plugins and Code: • BKeeney • DeclareSub • Einhugur • Joe • Restrepo • Tempelmann • ZAZ Coding: Hide Forums: • PowWeb • PHP • Webmaster • Coding • Walkers • Perl • Intro • Monks • PHP • JavaScript • Toolbox • UnobtrusiveJavaScript • JavaScriptCompressor • RegularExpressions (test) • JSLint • SQL • Cocoa • CocoaBuilder • CocoaDev • CocoaLab • AppleScript • BBS • Userland • Faqintosh • FileMaker • FileMakerTips • FileMakerWorld • FileMakerPlugins Science: Hide DarwinCentral • PhysOrg • PandasThumb • TalkOrigins • TalkRational • AstronomyDailyPics • Curmudgeon • SmallThings • ArchaeoBlog • AntiEvolution.org • SkepticsGuide • EvC • BadAstronomer • PhysicsForum • SlashdotScience • JunkScience • Engadget • Thunderbolts • Icecap • CentauriDreams • NewScientist • Gizmodo • CO2Science • ClimateDebate • ScienceDaily • Nrich • Math • TalkOrigins • GoodMathBadMath • Magazines • AmericanScientist • NationalGeographic • Space.com History: Hide • 1421 News/Politics: Hide WideAwakes • Anchoress • Lucianne • Strata • AceOfSpades • BigLizards • BlackAndRight • Cannonfire • DrSanity • FloppingAces • GatewayPundit • HillBuzz • HotAir • Husaria • JawaReport • JimTreacher • JsCafeNette • LittleGreenFootballs • MyVRWC • Newsbusters • Pal2Pal • PinkFlamingo • PowerLine • RachelLucas • RogerLSimon • SisterToldjah • StolenThunder • SultanKnish • TCSDaily • UppityWoman • Wizbang • NewsGroper • NewsRightNow • OriginalSignal • Blogs... Cults/Crime: Hide Miscellaneous: Hide Fun: Agony ICanHas? ObSkills Snopes Pix: Deviant Places Renderosity Blender Artists X86: OSX86 ArsTech OSNews TUAW Tools: Calculator AsciiArt XMLVal FunStuff: Pictures: Photobucket (eg Dubai) Videos: YouTube Subtitler InterestingThings: LibraryThing FlashCards GoogleDocs Wowio Bubbl.us Colemak Audible PodioBooks WonderfulInfo BooksOnline AboutUs.org |
Wiki /
MarkupUsingBracketsThings the markup code should doUse only brackets?This is some text. This is [b bold text]. A line (or group of lines) flanked by blank lines is a paragraph. [table [row [td ] ] ] Pass 1 evaluates line types and page structureAssume that a new section MUST start with a single command by itself: [table], [list], [numlist], [quote], [item], [comment], [code], [html], and [/]. Then we do multiple passes through the Wiki text. The first pass just divides the text into section commands and the text between those commands.
// Break the Wiki text down by section (because different sections must be
// handled differently).
$string = str_replace("\r\n", "\n", $string);
$string = str_replace("\r", "\n", $string);
$lines = explode("\n", $string);
$i = 0;
$sections = array('');
$cmds = array(
'quote', 'table', 'list', 'numlist', 'item',
'comment', 'code', 'html', '/'
);
foreach ($lines as $line) {
$cmd = trim($line);
if ($cmd{0} == '[' and $cmd{strlen($cmd)-1} == ']') {
$word = substr($cmd, 1, strlen($cmd)-1);
if (in_array($cmds, $word) {
$sections[$i+1] = $word;
$i += 2;
} else {
$sections[$i] .= "$line\n";
}
} else {
$sections[$i] .= "$line\n";
}
}
// Handle newline commands according to section: "[<<]" and "[>>]" work
// in every section except "[code]" and "[html]".
$stack = array();
$sp = 0;
$i = 0;
$count = count($sections);
while ($i < $count) {
$line = $sections[$i];
// For all sections except [code] and [html], resolve newline
// commands: [>>] at the start of a line fuses the line to the
// end of the preceding line, and [<<] inserts a newline, thus
// splitting a line in two.
if ($stack[$sp] != 'code' and $stack[$sp] != 'html') {
$line = str_replace ("\n[>>]", '', $line);
$line = str_replace ('[<<]', "\n", $line);
}
// For all sections except [html], convert HTML characters ('<',
// '>', and '&') into HTML entities. (Yes, including those inside
// embedded bracket commands.)
if ($stack[$sp] != 'html') {
$line = str_replace ('<', '<', $line);
$line = str_replace ('>', '>', $line);
$line = str_replace ('&', '&', $line);
}
$sections[$i] = $line;
// As long as we haven't reached the last section already, get the
// next section name. If it is "/", pop the previous section name
// from the stack; otherwise push the new section name onto the
// stack. Make sure the section pointer ($i) is pointing at the
// next text item.
if ($i + 1 < $count) {
$cmd = $sections[$i + 1];
if ($cmd == '/') {
--$sp; if ($sp < 0) $sp = 0;
} else {
++$sp; $stack[$sp] = $cmd;
}
$i += 2;
}
}
After this we can do other things. We can start adding some HTML tags right away:
Normalize line endings
// Make all end-of-line characters consistent
$string = str_replace("\r\n", "\n", $string);
$string = str_replace("\r", "\n", $string);
// Resolve newline commands
$string = str_replace("\n[>>]", "", $string);
$string = str_replace("[<<]", "\n", $string);
$string = str_replace("[:", "\n[:", $string);
// Split Wiki text into lines
$lines = explode("\n", $string);
$output = '';
foreach ($lines as $line) {
$sp = 0; // stack pointer
$stack = array();
$space = FALSE;
$i = 0;
$count = strlen($line);
do {
while ($i < $count) {
$c = $line{$i++};
// If current section is NOT [:code:] or [:html:],
// compress spans of whitespace into single spaces.
if ($pre != FALSE && ($c == ' ' or $c == "\t")) {
if ($space != FALSE) {
$stack[$sp] .= ' ';
$space = TRUE;
}
} else {
$space = FALSE;
}
// If character begins a command, go up one
// stack level. If character ends a command,
// execute the command and append the result
// to the text on the next stack level down.
// (This has the effect of running innermost
// nested commands first.)
if ($c == '[') {
++$sp;
} elseif ($c == ']') {
$result = command($stack[$sp]);
--$sp;
// If user entered too many closing
// brackets, stack would "underflow"
// and crash this program. Excess
// closing brackets can be ignored.
if ($sp < 0) $sp = 0;
$stack[$sp] .= $result;
}
// If character is an HTML special character,
// and current section is not [:html:], then
// replace character with an HTML entity.
elseif ($c == '<' && $html == FALSE) {
$stack[$sp] .= '<';
} elseif ($c == '>' && $html == FALSE) {
$stack[$sp] .= '>';
} elseif ($c == '&' && $html == FALSE) {
$stack[$sp] .= '&';
}
// Otherwise just append the character to the
// text at the current stack level.
else {
$stack[$sp] .= $c;
}
}
// If we've reached the end of the line and still have
// items on the stack, then append closing brackets to
// the line so that the stack can be cleaned up neatly.
if ($sp > 0) {
$count += $sp;
$line .= str_repeat (']', $sp);
}
} while ($i < $count);
// We now have a line of HTML at the bottom of the stack.
$output .= $stack[0];
}
Command() might itself keep its own stack, for HTML tags that haven't yet been closed. If the command is enclosed in colons, then some or all of these tags might be pulled from the stack and sent to output. Etc.Ruminations
Code to search for "[:...:]":
function test_regexp() {
$subject = 'see if [:code it:] is [:not:] found';
$pattern = '/\[\:[a-zA-Z_0-9]+\:\]/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
echo "<pre>\n";
print_r($matches);
echo "</pre>\n";
}
RESULTS:
Array
(
[0] => Array
(
[0] => [:not:]
[1] => 22
)
)
Older stuff
// Make all end-of-line characters consistent
$string = str_replace("\r\n", "\n", $string);
$string = str_replace("\r", "\n", $string);
// Convert non-ASCII commands into ASCII commands
$string = str_replace('[<<]', '[newline]', $string);
$string = str_replace('[>>]', '[mergeline]', $string);
// Split Wiki text into lines
$lines = explode("\n", $string);
foreach ($lines as $line) {
// Trim line
$trimmed = trim($line);
// If command on line by itself, it may be a new-section command.
if ($trimmed{0} == '[') {
$trimmed = substr($trimmed, 0, -1); // remove final ']'
if ($trimmed{1} == '/') {
$closing = true;
$trimmed = substr($trimmed, 2);
} else {
$closing = false;
$trimmed = substr($trimmed, 1);
}
if ($trimmed == 'comment') {
} elseif ($trimmed == 'quote') {
} elseif ($trimmed == 'list') {
} elseif ($trimmed == 'numlist') {
} elseif ($trimmed == 'item') {
} elseif ($trimmed == 'table') {
} elseif ($trimmed == 'code') {
} elseif ($trimmed == 'html') {
} else {
}
}
// Trim line
if ($section != CODE and $section != HTML) {
$line = trim($line);
}
// Need also check for section-command,
// e.g., "[/code]" or "[/html]" on a line by itself.
// (Shouldn't we split line into parts in brackets
// vs. parts out of brackets here?)
// Convert HTML characters into HTML entities
// QUESTION!!!! Do we do this inside brackets?????
if ($section != HTML) {
$line = str_replace('&', '&', $line);
$line = str_replace('<', '<', $line);
$line = str_replace('>', '>', $line);
}
// For each bracket on the line, starting from the left:
// function command($name):
// Create function name "cmd_" + first word in brackets
// (but if word has nonalphanumeric characters, just look
// word up in glossary of text substitutions).
// Find end of command in line; return offset into line
// to next character to process.
// At end of line, do special processing (???)
}
Any two consecutive lines are fused by ending the first line with two backslashes. Any line can be split into two lines by entering [<] at the point where the line should be split. Therefore, my code must remove all instances of two-backslashes-and-a-linebreak, and then replace each instance of "[<]" (not inside a tag) with a newline. Markup schemeI rely mainly on commands in square brackets. I'll allow some conventional Wiki formatting commands, but only a handful. Commands should be nestable, so that a field that calculates a sum can be nested inside a formatting command, e.g., [em [sum vowelcount conscount]]. (Here I'm assuming that fields can also set constants, e.g., [set vowelcount 8].) Note that a command cannot cover text on more than one line (unless it is a section command on a line by itself), so my code must close any commands that the user has neglected to close before the end of the line. Example commands:
Hard-coded commands are nonalphanumeric commands (user-defined commands must begin with a letter):
Markup sectionsA markup section is enclosed in an opening section command and a closing section command. Each command must be on a line by itself; otherwise the command is treated as inline (and its influence ends at the end of the line it is on). The section commands are:
Note that if [quote] or [table] or [code] is on a line with other text, then it would produce something inline, something that would be terminated at the end of the line even if the [/quote] or [/table] or [/code] command is missing.
Formatting codes (might not be needed after all)My markup will use primarily commands in brackets instead of conventional Wiki formatting codes, but some formatting codes are handy:
To make links or other things, you'd use bracket commands. I can dispense with bold, italic, and bold italic! Just use "I must [em really] stress..." or "A [term collie] is..." or "[critical WARNING!]". Also just use [h1] through [h6]. This leaves you with just [...], and [/...] for end tags, and [\...] for literal brackets. |