<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Karig</title>
	<atom:link href="http://karig.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://karig.net</link>
	<description>My humble home on the Web</description>
	<lastBuildDate>Thu, 17 Dec 2009 20:24:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Marks and tracks</title>
		<link>http://karig.net/2009/12/marks-and-tracks/</link>
		<comments>http://karig.net/2009/12/marks-and-tracks/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 21:05:42 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[algorithms]]></category>

		<guid isPermaLink="false">http://karig.net/?p=758</guid>
		<description><![CDATA[I'll want my text editor to be able to internally "mark" words and phrases and characters within the text. Here I discuss the basics of a system for doing this. I'll need this system to implement such features as syntax highlighting, background spellchecking, bookmarks, and highlighting the results of previous searches.]]></description>
			<content:encoded><![CDATA[<p><em>In </em><a href="/2009/11/textbuffer-overview/"><em>a previous post</em></a><em>, I listed some of the features I wanted my TextBuffer class to have. One of these was &#8220;tags.&#8221; I decided to call these &#8220;marks&#8221; instead.</em></p>
<p><strong><em>Marks</em></strong> are something that my text editor will have under the hood; they are not a feature that would be directly visible to the user. Specifically, a mark is a chunk of data that is associated with a <strong><em>span</em></strong> of text (typically a single word or phrase) within a file being edited. Marks are not saved with the text as part of the file, but the file, while open, could have thousands of marks associated with various parts of the text. Each mark remains associated with its span even as the text is being edited. Marks would be useful for implementing syntax highlighting, bookmarks, and other features that a modern text editor would be expected to offer.</p>
<p>Each mark belongs to a <strong><em>track</em></strong>, which is conceived as being of the same length as the main text. Each point on a track corresponds to a character position within the main text. Each mark takes up certain points on its track, so each mark corresponds to the characters at the corresponding positions — the mark&#8217;s span. Each point on a track is occupied by no more than one mark, so marks on the same track cannot overlap. Thus, if a span of text is to have two marks, the second mark has to belong to a separate track. The editor can generate as many of these separate tracks as needed.</p>
<p>Here is an example involving three tracks of marks:</p>
<pre>Track 1: art noun adverb_ verb_ noun art adjective noun
Track 2: U                      U                      PB
Track 3:    O    O       O     O    O   O         O     M
Text:    The girl quickly found Fido the yellowish mutt.&lt;</pre>
<p>(Note that the less-than sign at the end of the text represents a newline character.)</p>
<p>Take a look at each track in this example:</p>
<ul>
<li>Track 1 marks each word, each contiguous sequence of letters, with its part of speech: article, noun, verb, adjective, or adverb. Each mark in this track takes up the length of its corresponding word.</li>
<li>Track 2 assigns a class to each character other than a space or a lowercase letter — &#8220;U&#8221; marks each uppercase letter, &#8220;P&#8221; marks each punctuation character, and &#8220;B&#8221; marks the mandatory-break character at the end. In this case, each mark takes up no more than one character.</li>
<li>Finally, Track 3 marks the characters where the line can be broken on a wordwrapped display: &#8220;O&#8221; marks optional-break characters like spaces, while &#8220;M&#8221; marks mandatory-break characters like newlines. Again, each mark corresponds to a single character.</li>
</ul>
<p>Note that each mark has a <strong><em>name</em></strong>. This name in effect assigns a class or category to the mark&#8217;s span. Track 1 marks some words as nouns and others as articles; Track 2 marks some characters as uppercase letters; Track 3 marks each space as an optional-break character to make wordwrap easier to implement. Mark names allow the text editor to treat different spans in the same way if they have the same mark name, but differently if their mark names are different.</p>
<p>(Another way of thinking about this, if you&#8217;re familiar with object-oriented programming, is that a track corresponds to an object property (part of speech, character type, line-break type), and a mark corresponds to a value assigned to that property.)</p>
<p>You can see how this would be useful in the bowels of a text editor. Marks would be useful not only for marking the locations of line-break opportunities or of different types of characters; they would also be useful in syntax highlighting (marking the locations of various classes of words or phrases) and for spellchecking-as-you-type.</p>
<p>Marks would likely be most useful for marking up just the text that is about to be displayed on the screen &amp;mdash; finding where each line of text can be wrapped, determining the &#8220;classes&#8221; of words so that the editor knows what font and style to use when drawing the text, finding misspellings, and so on. Marking up the rest of the file would be a time-consuming chore that would generate a lot of data that would then have to be stored somewhere in case it is needed later, so this kind of busywork is best avoided.</p>
<h3>Implementation</h3>
<p>The text buffer contains nothing but text from the open file. Loading and saving files is simpler if the buffer contains only the text loaded from or saved to files. Search and replace is also simpler if the buffer contains all of the data, and only the data, to be searched. (Syntax highlighting relies on searches in the background, so searches need to be fast, and they are faster when they are simpler.) So marks and tracks are stored elsewhere.</p>
<p>Marks never overlap within a track, so a track could be represented as an array of short records, where each record contains just a mark name (as String) and a span length (as Integer). If the mark name is blank, then the span is just unmarked text. So the data for Track 1 above would look like this:</p>
<pre>Name       Length     | (Corresponding
as String  as Integer |  span)
---------  ---------- | ---------------
article        3      | "The"
               1      | " "
noun           4      | "girl"
               1      | " "
adverb         7      | "quickly"
               1      | " "
verb           5      | "found"
               1      | " "
noun           4      | "Fido"
               1      | " "
article        3      | "the"
               1      | " "
adjective      9      | "yellowish"
               1      | " "
noun           4      | "mutt"
               2      | "." + EndOfLine</pre>
<p>I&#8217;m storing lengths, not offsets, to make it easier to keep marks and spans together when text is inserted or deleted. If text is typed in the middle of a span, then all that needs to happen is to increment the length of the span within the mark&#8217;s record. If I need to find the mark that corresponds to a specific offset within the text, I just keep adding span lengths until the sum reaches the offset.</p>
<p>I&#8217;ll have more to say about this later. I&#8217;ll need to spell out exactly how this information is laid out in memory. I&#8217;ll also have to figure out how I want this information saved into temp files and then reloaded when the mark information is needed. But that&#8217;s for later. Right now I just wanted to get something posted.</p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/12/marks-and-tracks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>String vs. MemoryBlock</title>
		<link>http://karig.net/2009/11/string-vs-memoryblock/</link>
		<comments>http://karig.net/2009/11/string-vs-memoryblock/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 06:23:41 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[REALbasic]]></category>

		<guid isPermaLink="false">http://karig.net/?p=538</guid>
		<description><![CDATA[My text editor has to have a way to store text in memory. REALbasic provides two classes for this: String and MemoryBlock. For my purposes, MemoryBlock is better. ]]></description>
			<content:encoded><![CDATA[<p>My text editor has to have a way to store text in memory. REALbasic provides two classes for this: String and MemoryBlock. You might think at first that Strings are for text and MemoryBlocks are for binary data, but for my purposes, MemoryBlock is better.</p>
<h3>String objects</h3>
<p>Strings in REALbasic are immutable — that is, you can&#8217;t write code to just reach inside a String and rearrange its contents. If you write code to change the text in a String, your program will always create a new String object and copy all of the text from the old String into the new String. That&#8217;s just how REALbasic Strings work. The whole point of a text editor is to allow you to change the text in a file at any time, so this operation should be efficient, and REALbasic&#8217;s String class is not efficient for this.</p>
<p>This is how you insert a new piece of text into a String:</p>
<pre>Dim offset as Integer
//...
buffer = Left( buffer, offset ) _
    + newText _
    + Right( buffer, Len( buffer ) - offset )</pre>
<p>Here is what happens under the hood whenever the above code is run: REALbasic creates a new &#8220;buffer&#8221; object, copies all of the data from the old &#8220;buffer&#8221; and &#8220;newText&#8221; objects into the new &#8220;buffer&#8221; object, returns the new &#8220;buffer&#8221; object, and marks the old &#8220;buffer&#8221; object as empty space to be reclaimed by the operating system. This isn&#8217;t so bad if &#8220;buffer&#8221; is always a small String, but what if the String is the text from a 200KB file? Then every time the user types something, the editor has to borrow a 200KB block from the operating system to build the new object, then copy 200KB of data from the existing buffer into the new object, then let the REALbasic framework &#8220;collect&#8221; the old buffer so the OS can have its 200KB block back. This takes up both CPU time and memory. Surely there is a way to insert new text that takes up less?</p>
<p>There is, of course. Instead of keeping a file&#8217;s contents in memory as a single big long String, the editor could split the file&#8217;s contents up into many short Strings and store references to those Strings in an array. Inserting a String into an array actually inserts only the location of the String, not its contents, so insertion is quick. Now the code to insert new text into the buffer looks like this:</p>
<pre>Dim index, offset as Integer
Dim leftPart, rightPart as String
//...
leftPart = Left( buffer( index ), offset )
rightPart = Right( buffer( index ), _
    Len(buffer( index )) - offset)
buffer.Insert index, leftPart
buffer.Insert index+1, newText
buffer.Insert index+2, rightPart</pre>
<p>The code is longer, but it runs faster because there is less going on under the hood: New text is inserted by splitting only a single segment of the file and inserting the new text in between the parts. It isn&#8217;t necessary to append the parts together into a long String again, because the array keeps the parts in order.</p>
<p>This clearly alleviates the problem, but it does not solve it: To insert new data into the buffer, you still have to create a copy of data you already have.</p>
<p>In addition, there is another problem: Searching the buffer becomes very awkward when the buffer consists of lots of fragments in an array instead of one great big long string. So in this case the editor would probably append all of these items into a single string when a search is about to be performed. Again, the data in the many segments of the buffer would need to be copied into a separate String object just to perform a search. As before, this takes time that the editor could spend more productively doing something else, and it doubles the editor&#8217;s memory requirements because it has to make a second copy of all of the text data. There has to be a better way.</p>
<p><i>(Side note: Theodore H. Smith offers a REALbasic plugin called <a href="http://www.elfdata.com/plugin/">ElfData</a>. The plugin offers special string classes (ElfData and FastString) that let you split and recombine strings much more efficiently than REALbasic&#8217;s String class, mostly by cutting down on the amount of data copying and memory allocation that has to be done at runtime. However, the problem (from my perspective) is the same: The ElfData and FastString classes still work like String objects. To insert text into a long string, you still have to break the long string apart and copy all of the text from the various pieces into a new object, and you need space in memory for two copies of your data to do that. A text editor would use memory more efficiently if it could avoid creating a second copy of the text altogether.)</i></p>
<h3>MemoryBlock objects</h3>
<p>Unlike String objects, MemoryBlocks in REALbasic can be updated in place: You can insert new text into a MemoryBlock without the REALbasic framework recreating the whole MemoryBlock each time. Further, as <a href="http://www.declaresub.com/article/149/realbasics-secret-string-buffer">Charles Yeomans points out</a>, if you use a BinaryStream object as the front end for a MemoryBlock, you end up with a buffer that acts like a file: you can &#8220;seek&#8221; to a given position anywhere within the buffer and read or write data at that point, and if you write data beyond the end of the buffer, then the buffer will be made large enough to hold the new data.</p>
<p>When you write ten bytes of data into the MemoryBlock, then ten bytes of space within the MemoryBlock, beginning with the current position, will be overwritten. If the current position is ten or more bytes from the end of the buffer, then no data in the MemoryBlock is copied or moved, so the write operation is extremely fast. However, if the current position is less than ten bytes from the end of the buffer, then a new MemoryBlock has to be allocated, and the data in the old MemoryBlock has to be copied into the new one. If you use a BinaryStream object as the front end for a MemoryBlock, as explained in Charles Yeomans&#8217;s article, the new MemoryBlock will be twice the size of the old one, so the need to move data to a new MemoryBlock should not arise often.</p>
<p>But if you want to <em>insert</em> text into a MemoryBlock, you don&#8217;t want to overwrite the text already in there. So it would seem that you&#8217;d still need to copy existing text to a new location to make room for the new text. However, if the text to be overwritten is actually considered garbage and not part of the text you want to keep, then you can &#8220;insert&#8221; your text extremely quickly. This is the basis for the <a href="http://en.wikipedia.org/wiki/Gap_buffer">gap buffer</a> that is used in many text editors. A gap buffer is a text buffer with three regions: the text before the current position, the text after the current position, and the &#8220;gap&#8221; in between. The gap is of course just a stretch of nonsense text that isn&#8217;t considered part of the main text and that is present to be overwritten with whatever new text the user types or pastes in.</p>
<p>If you need to search the text being edited, the search might be easier if the gap is removed. This of course requires copying text within the buffer to remove the gap. However, no new objects are created, so the editor would not need to borrow more memory from the operating system to perform this operation.</p>
<h3>MemoryBlocks are best</h3>
<p>Clearly, if you want to write a text editor in REALbasic, it is better to use a MemoryBlock to store the text being edited. Inserting and deleting text will be much faster and will require less memory than if you were to try to store the text in String objects.</p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/11/string-vs-memoryblock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TextBuffer: Overview</title>
		<link>http://karig.net/2009/11/textbuffer-overview/</link>
		<comments>http://karig.net/2009/11/textbuffer-overview/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 15:46:15 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[classes and objects]]></category>

		<guid isPermaLink="false">http://karig.net/?p=476</guid>
		<description><![CDATA[My text editor is going to need a buffer system. What I want is a buffer that offers features like the following: the ability to attach "tags" to specific words or phrases within the text, the ability to record changes (and thus allow both undo and macro recording), the automatic saving of changes to temporary files, and the ability to handle files too large to load into memory.]]></description>
			<content:encoded><![CDATA[<p><em>[<strong>Update 17 December 2009:</strong> I've changed some of the terminology here. "Tags" are now "</em><a href="/2009/12/marks-and-tracks/"><em>marks</em></a><em>"; marks are a part of the innards of the text editor, while the word "tags" ought to be reserved for a feature that the user might use (such as "<a href="http://en.wikipedia.org/wiki/Ctags">ctags</a>"). In addition, "entries" are now "sections" because a section can contain any type of data, while entries, notes, articles, and similar things each contain data of a specific kind, or data that have certain characteristics: You'd expect an "entry" to have a date, or a byline, or both; you'd expect an "article" to have an essay-like structure with an introductory part and a conclusion; and so on. I've updated the text below to use the new terms.]</em></p>
<p>This is just a quick overview. I&#8217;ll flesh out the ideas here in later posts.</p>
<p><a href="/2009/10/i-want-to-write-a-text-editor/">My text editor</a> is going to need a buffer system. I need a buffer for at least two things:</p>
<ul>
<li><a href="/2009/10/why-a-custom-text-control/">My custom text control</a> will need a buffer to cache the text being displayed in the editor window, so that it can redraw the text when the window has to be redrawn. The buffer also needs to hold font and style information, so the control knows what fonts to use to redraw the text. (I&#8217;ll need this so that I can implement syntax highlighting.)</li>
<li>The editor itself needs a buffer to hold the text of the file being edited.</li>
</ul>
<p>I&#8217;ve thought this over. What I want is a buffer that offers the following:</p>
<h3>General buffer functionality</h3>
<p>Ideally the text buffer should be useful outside the context of a text-editor application. The user should be able to insert or delete text anywhere within the buffer, search the buffer for phrases or regular expressions, and extract substrings. I&#8217;m thinking of giving the buffer the same functions available for strings in REALbasic — the &#8220;+&#8221; operator to append data to the end of the buffer, &#8220;Mid&#8221; and other functions to extract substrings, &#8220;InStr&#8221; to find substrings, and so on. Some other functions that a text editor needs might also be useful in a general-purpose buffer, such as counting the number of words or lines.</p>
<h3>Sections</h3>
<p>I want an application that can treat a single long file as many sections of text. The characters that determine where one section ends and the next begins should be selected by the user. For example, you might use a line of hyphens to separate one section from the next, so all of the lines between two lines of hyphens would constitute a single section.</p>
<p>The significance of sections is that they would be like records in a database: You can get a subset of the complete file by specifying a phrase or a regular expression that each section must contain if it is to be part of the subset. The subset would contain only complete sections, not just the individual lines containing the search matches. You could then decide what to do with the sections in the subset — print them, sort them, export them to a script or a command-line program, or whatever.</p>
<p>In essence, if you were to take advantage of sections, you could use a text file as a simple database of notes or code snippets, a journal, or a card file.</p>
<h3>Marks</h3>
<p>A text editor often needs to mark spans (substrings) of text within the complete document. A span within a source-code file might be a keyword or a comment and thus should be displayed in a style distinct from the style used for the rest of the text. A span might be a bit of text that the user wants to return to later, so that it needs to be marked internally with a bookmark. In each case, this extra information associated with a span of text needs to remain attached to the span while the user is editing the document. In fact, even if the editing affects the span text itself, then the information needs to remain associated with whatever part of the span is left in the document.</p>
<p>This extra information is called a &#8220;mark.&#8221; Marks would be useful for associating style information with spans, for marking the results of previously run searches, for storing bookmarks within the text, and for other things.</p>
<p>Marks would be stored separately from the text in the buffer. The text in the buffer is not mixed with any other kind of information, so that searches through the text don&#8217;t require any kind of preparation beforehand and in fact can be performed at almost any time. (I strongly suspect that my text editor will need to do a lot of searches in the background, particularly if I want such things as syntax highlighting, line and word counts, and so on, so I want searches to be as fast as possible.)</p>
<h3>Recording</h3>
<p>The buffer should record each change that is made to the text in the buffer. Both text insertions and text deletions should be recorded, along with the locations where each insertion and each deletion occurred.</p>
<p>This facility would provide the foundation for several features desirable in a text editor, most notably undo/redo, review of changes made to a file, and macros.</p>
<h3>Autosave</h3>
<p>I want the buffer to save changes automatically, but <em>without</em> overwriting the original file. So the buffer needs two &#8220;stores&#8221; or places to load and save text — the original file or text source, and a temporary one for holding changed text until the user decides whether to discard the changes or to save them as a new file and overwrite the original.</p>
<p>If the &#8220;temp store&#8221; is somewhere other than memory, such as a folder on your hard disk or thumb drive, then you have some insurance against losing your changes in the event of a system crash or a power outage. Because the changes are kept separate from the original file, you still have the option, after a crash, to either commit the changes to the original file or to discard them.</p>
<p>This storage would be handled by two objects other than the buffer itself. The buffer wouldn&#8217;t know and wouldn&#8217;t care how these two objects load and save data. So although the original store would usually be a file on a hard disk, and although the temp store would usually be a folder, either store (or both) could access data on a website or an FTP site. On the other hand, if an application only needs the buffer for a few seconds, either store (or both) might access data in data structures in memory and never read or write anything on disk. This will help make the buffer system more useful in more situations than just as part of a text-editor application.</p>
<h3>Segments</h3>
<p>I want the buffer to be able to load and edit files too large to load completely into memory without bogging down the computer. Of course, the only way to do that is to treat such a file as an array of smaller chunks or &#8220;segments&#8221; and then load only the segments that the user is reading or editing.</p>
<p>Note that segments are not sections:</p>
<ul>
<li>Segments are intended to be of roughly similar size, though segments will grow and shrink as the user edits them. If a segment grows too large, the software will split it in two; if it gets too small, it is merged with a neighboring segment. Segments exist to simplify the processing that the software has to do when working with very large files or strings.</li>
<li>Sections can be of any size useful to the user, and it is the user who determines where a section begins and ends.</li>
</ul>
<p>Segments would also simplify some of the other work that a text editor would need to do, such as updating the location of tag spans. Since tags themselves are stored outside the text, my buffer code would need to do some extra work after each edit to ensure that each tag remains attached to its span. If each tag stored its span&#8217;s location as a single integer, as an offset in bytes from the start of the file, then this would make editing very large files problematic, because then every tag in the whole huge file might be affected. If instead a span&#8217;s location is an offset from the start of a segment, then a change in the text affects only the tags whose spans begin within that one segment. So segments will be very useful in making the job of editing huge files manageable.</p>
<h3>Summary</h3>
<p>So these are the basic features I want: sections, marks, automatic storage of changes, and the ability to tackle huge files. A buffer class with a feature set like this should be useful for building a fairly sophisticated text editor. I&#8217;ll flesh out each of these features in future posts.</p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/11/textbuffer-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why a custom text control?</title>
		<link>http://karig.net/2009/10/why-a-custom-text-control/</link>
		<comments>http://karig.net/2009/10/why-a-custom-text-control/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 14:04:51 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[text control]]></category>

		<guid isPermaLink="false">http://karig.net/?p=303</guid>
		<description><![CDATA[I want to roll my own text control, because the text control that comes with REALbasic doesn't offer the feature set I want.]]></description>
			<content:encoded><![CDATA[<p>REALbasic, like <a href="http://www.answers.com/topic/visual-basic-net">Visual Basic</a>, lets you build your application&#8217;s user interface by dragging controls (which might be called &#8220;<a href="http://en.wikipedia.org/wiki/GUI_widget">widgets</a>&#8221; in another environment) onto a window. One of these controls is the TextArea. (Until REALbasic 2009R3 came out, this was called the EditField. It is a <a href="http://en.wikipedia.org/wiki/Text_box">text box</a>.) This control is essentially a formatted-text editor: The user can type in and edit multiple lines of text, and multiple fonts and styles can be applied to different parts of the text. My text-editor application will obviously need something like this. So why am I considering rolling my own formatted-text-editor control instead?</p>
<p>I have several reasons:</p>
<ul>
<li><em>Handling of large amounts of text.</em> The TextArea is reportedly very slow when it has a huge amount of text in it. Although most text files my editor will handle will likely be under a megabyte in size, I&#8217;d like the editor to be able to work with huge files, tens or hundreds of megabytes in size, every once in a while.</li>
<li><em>Memory use.</em> The complete file has to be loaded into the TextArea. There is no provision to defer loading a part of the file until the user actually scrolls or jumps to that part. This is necessary if the control is to support the editing of huge files.</li>
<li><em><a href="http://en.wikipedia.org/wiki/Word_wrap">Word wrap</a>.</em> The TextArea doesn&#8217;t let you turn off word wrap; a text editor should let you do this (the better to see the structure of source code with very long lines).</li>
<li><em><a href="http://en.wikipedia.org/wiki/Indentation#Indentation_in_programming">Indentation</a>.</em> You can indent the first line of a paragraph in a TextArea by inserting tab characters at the start of the first line in the paragraph, but the TextArea doesn&#8217;t provide anything else for indentation. Whenever a paragraph is wrapped, each line after the first one always starts at the left edge of the control; a text editor ought to be able to indent all of the lines in each paragraph. The structure of a text outline, for example, is easier to see if every line in each paragraph, and not just the first line in each, is indented.</li>
<li><em>Tabstops.</em> The TextArea supports only the conventional fixed-width tabstops. These tab stops are useful for indenting the first line of a paragraph but aren&#8217;t much use in arranging substrings of text separated by tabs. Variable-width tabstops are a feature of word processors, not text editors, and they usually have to be set by the user &mdash; but I&#8217;d like to have my editor support <a href="http://en.wikipedia.org/wiki/Elastic_tabstop">elastic tabstops</a>, which the editor can just calculate on the fly, and which the user doesn&#8217;t even need to set.</li>
<li><em>Showing invisible characters.</em> Text editors often have an option to display normally invisible characters using visible substitutes &mdash; spaces as raised dots (&ldquo;·&rdquo;, U+00B7, <span style="font-variant: small-caps;">middle dot</span>), or tab characters as right-pointing triangles (&ldquo;▶&rdquo;, U+25B6, <span style="font-variant: small-caps;">black right-pointing triangle</span>), or newlines as pilcrow signs (&ldquo;¶&rdquo;, U+00B6) or as left-pointing arrows (&ldquo;↵&rdquo;, U+21B5, <span style="font-variant: small-caps;">downwards arrow with corner leftwards</span>). TextArea doesn&#8217;t offer anything like this.</li>
<li><em>Background colors.</em> It would be nice if my text editor can use different background colors to mark spans of text that match ad-hoc searches. TextArea can show text in different colors, but it has one background color for selected text and one for unselected text. This isn&#8217;t a huge deal, but I&#8217;d like a little more flexibility. I&#8217;d like to use light or pastel background colors for search matches and a dark color for the actual selection.</li>
<li><em>Drawing events.</em> The TextArea does not offer any events that would allow a subclass to draw onto the canvas or to alter the positions of text on the canvas. Therefore I can&#8217;t create the kind of control I want by subclassing TextArea.</li>
<li><em>Hiding lines.</em> Code folding and outlining are two text-editor features that would require the control to display two lines while hiding the lines in between. The TextArea does not offer this, either.</li>
</ul>
<p>It is wonderful that REALbasic lets you roll your own controls in REALbasic. But others have already written substitutes for the TextArea — True North Software&#8217;s <a href="http://www.truenorthsoftware.com/formattedtextcontrol/">FormattedText</a> control and Alex Restrepo&#8217;s <a href="http://homepage.mac.com/alexrestrepo/indexmain.html">CustomTextField</a> are examples. Why not use one of them? Because they don&#8217;t offer the feature mix I want, and because I like the idea of having my own code that I can modify as I see fit and having the freedom to add the things I want.</p>
<p>So I&#8217;ll be spending time figuring out how to wrap lines of text on a Canvas control, how to display a blinking cursor, how to display text as selected as the user drags the mouse over the control, how the text should be structured in memory to support all of these features, and how to regression-test everything so I know it all works. Hopefully the results will be worth the effort.</p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/10/why-a-custom-text-control/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why REALbasic?</title>
		<link>http://karig.net/2009/10/why-realbasic/</link>
		<comments>http://karig.net/2009/10/why-realbasic/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 00:30:25 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[REALbasic]]></category>

		<guid isPermaLink="false">http://karig.net/?p=283</guid>
		<description><![CDATA[My current plan is to write my text editor in REALbasic. REALbasic as a language is easy to learn, easy to write, and easy to read. You can create your own GUI controls, you can compile your application for Windows and Linux and the Mac all at once, and you can even write plugins for REALbasic in C++ if REALbasic isn't fast enough.]]></description>
			<content:encoded><![CDATA[<p>My current plan is to write my text editor in <a href="http://www.realsoftware.com/realbasic/">REALbasic</a>.</p>
<p>I like REALbasic. The language is easy to learn, easy to write, and easy to read. The REALbasic IDE provides lots of pre-built controls for building your program&#8217;s user interface, but you can also create your own controls from the ground up. If you have the Professional version of REALbasic, you can write your program once and use it to build executables for Windows, Mac OS X, and Linux all at once. And if some part of your program should prove to be unacceptably slow, you have the option of writing a REALbasic plugin in C++ to speed things up.</p>
<h3>What&#8217;s good about REALbasic?</h3>
<p>There is a lot to like about this language.</p>
<ul>
<li>The language is easy to pick up. It&#8217;s BASIC, after all.</li>
<li>The language is object-oriented. You can create your own classes, or subclasses of existing classes. You can define interfaces as in Java and then create classes that implement those interfaces.</li>
<li>Objects are reference-counted and are deallocated automatically when no other object points to them.</li>
<li>Some of the features the language has include:
<ul>
<li><strong>Basic data types:</strong> Integer, Single, Double, Boolean, Color, Date, Currency, String, Array, Variant, Dictionary (actually a <a href="http://en.wikipedia.org/wiki/Hash_table">hash table</a>), MemoryBlock. REALbasic has many functions for dealing with variables of each type.</li>
<li><strong>Low-level data types</strong> for working with binary files or C functions in a library or the operating system: Int8, Int16, Int32, Int64; UInt8 (Byte), UInt16, UInt32, UInt64; Ptr; CString, PString, WString.</li>
<li><strong>Literal syntax</strong> not only for hexadecimal (<tt>&amp;hFF</tt>), binary (<tt>&amp;b11101101</tt>), and octal (<tt>&amp;o137</tt>) numbers, but also for colors (<tt>&amp;cFF6F00</tt> for orange) and Unicode codepoints (<tt>&amp;u2014</tt> for an em dash).</li>
<li><strong>Delegates</strong> are essentially typesafe function pointers. A Delegate can point at any function (or object method) with a specific list of argument types and a specific return type — for example, any function with a first argument &#8220;As Integer&#8221;, a second argument &#8220;As String&#8221;, and no other arguments, and that returns a Boolean. You can use these at runtime and change what your program does in response to a keystroke or a button press.</li>
<li>Support for <strong>exceptions</strong>. You can define your own exception classes, use the Raise statement to throw an exception, and Try&#8230;Catch&#8230;Finally to catch and handle exceptions.</li>
<li><strong>Declare</strong> lets you call subroutines or functions in DLLs (on Windows) or shared libraries (on Mac OS X or Linux).</li>
<li><strong>Assigns</strong> lets you set up a Sub to be called with &#8220;theSub(a, b) = c&#8221; instead of &#8220;theSub(a, b, c)&#8221;.</li>
<li><strong>Extends</strong> lets you set up a Sub or Function to be called with &#8220;a.theSub(b, c)&#8221; instead of theSub(a, b, c)&#8221;.</li>
<li><strong>#If&#8230;#Endif</strong> allows conditional compilation. This is most useful for allowing or &#8220;commenting out&#8221; code depending on whether (1) a constant is defined, or (2) the running within the IDE, or (3) the application is running on Windows or the Mac or Linux.</li>
</ul>
</li>
<li>The IDE lets you build your GUI by dragging controls onto a window and setting their properties, just as <a href="http://en.wikipedia.org/wiki/Visual_Basic">Visual Basic</a> does.</li>
<li>If the controls supplied with REALbasic don&#8217;t do what you want, you can subclass an existing control and write event handlers to change how it works, or you can even subclass the Canvas control and create your own custom control from the ground up.</li>
<li>REALbasic comes with classes for all sorts of things &#8212; standard dialog boxes (for picking folders, files, or colors), regular expressions, XML, databases, shell commands, graphics (lines, shapes, text), networking (including sending data to and receiving data from websites), conversion between text encodings, cooperative (not pre-emptive) threads, sound, editing video (via QuickTime), etc.</li>
<li>There is even a command to make the computer speak the text you pass to it. (This works on Windows and on the Mac, but not on Linux.)</li>
<li>RBscript is a scaled-down version of REALbasic that you can compile into your application so that users can run scripts. Even better, these scripts are compiled, not interpreted, before being run.</li>
<li>REALbasic (the Professional edition of it at any rate) is cross-platform &#8212; you can build applications for Windows, for Mac OS X, and for Linux from a single project file.</li>
<li>REALbasic code is compiled into native machine code, not interpreted.</li>
<li>You don&#8217;t have to pay any royalties for distributing your applications.</li>
<li>REALbasic Personal Edition for Linux is free. If you have a Linux machine, you can write and test REALbasic code on it and not pay a cent to REAL Software until you&#8217;re ready to build executables for Windows or the Mac.</li>
<li>If the code that REALbasic produces isn&#8217;t fast enough, you can write a plugin for REALbasic in C++ containing code to alter the contents of your REALbasic objects.</li>
<li>Many developers have developed their own plugins and controls and offer them either for sale or free of charge.</li>
</ul>
<h3>What&#8217;s bad about REALbasic?</h3>
<p>There are some drawbacks to REALbasic, though none of these are showstoppers for me personally.</p>
<ul>
<li>Every version of REALbasic other than the Personal Edition for Linux is commercial software. If you want to compile for any platform other than Linux, you have to <a href="http://www.realsoftware.com/store/">pay up</a>.</li>
<li>REALbasic&#8217;s IDE can be constraining at times. You can only see one function or one class method in the IDE at a time. People accustomed to seeing two or more function definitions at once in a text editor may chafe at this.</li>
<li>REALbasic&#8217;s controls reflect the lowest common denominator of the equivalent controls on the three platforms. For example, REALbasic offers a TextArea for letting the user edit styled text, but the TextArea does not have built-in undo; you have to supply that.</li>
<li>REALbasic&#8217;s executables are large. A do-nothing GUI application is around 1.5MB because REALbasic compiles a large part of the runtime framework into the application. Even a do-nothing console application is about a megabyte in size.</li>
<li>REALbasic does not support <a href="http://developer.apple.com/cocoa/">Cocoa</a> (although a new release that <em>does</em> support Cocoa is supposed to be released by the end of the year), so any Mac OS X application built with REALbasic right now relies on <a href="http://developer.apple.com/carbon/">Carbon</a>. (This is one reason why developers who develop only for the Mac often prefer <a href="http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/ObjectiveC/Introduction/introObjectiveC.html">Objective-C</a> and <a href="http://developer.apple.com/tools/xcode/">Xcode</a> over REALbasic.)</li>
<li>REALbasic currently produces only 32-bit code, not 64-bit code.</li>
<li>Documentation for the plugin API is spotty, and actually writing the plugin appears to be something of a dark art. (I might try gathering this information and writing up a how-to on building a plugin with gcc in a later post.)</li>
<li>REAL Software sometimes adds new features to REALbasic and releases them before they are ready. (The latest release, 2009R4, has new report-building functionality that is apparently <a href="http://forums.realsoftware.com/viewtopic.php?f=1&amp;t=30272">not yet ready</a> <a href="http://forums.realsoftware.com/viewtopic.php?f=1&amp;t=30282">for prime time</a>.)</li>
</ul>
<h3>Good enough</h3>
<p>Every programming language and package involves tradeoffs. REALbasic, like every other programming language, has its good points and its bad points. I like its good points and I can live with its bad points, so I&#8217;m going with REALbasic.</p>
<h3>Links</h3>
<ul>
<li>Markus Winter <a href="http://www.realsoftware.com/support/whyrealbasic.php">gives his take on REALbasic</a>.</li>
<li>Thomas Tempelmann warns those new to REALbasic about some <a href="http://www.tempel.org/REALbasicAnnoyances">annoyances</a> to watch for.</li>
<li>Many developers who develop only for the Mac <a href="http://www.cocoadev.com/index.pl?RealBasic">prefer Cocoa and XCode over REALbasic</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/10/why-realbasic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I want to write a text editor</title>
		<link>http://karig.net/2009/10/i-want-to-write-a-text-editor/</link>
		<comments>http://karig.net/2009/10/i-want-to-write-a-text-editor/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 19:56:33 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://karig.net/?p=262</guid>
		<description><![CDATA[I want to write my own text editor because I am not completely satisfied with other text editors I've used, and because I have some ideas for features I'd like to see in a text editor.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m thinking about writing my own text editor as a hobby.</p>
<h3>What I want the editor for</h3>
<p>I like to keep notes and outlines on things.</p>
<p>I would prefer to keep my notes and outlines as plain text if I can. Plain text is an <a href="http://www.openformats.org/main">open format</a> and the most massively well-supported data format on Earth. There are <a href="http://en.wikipedia.org/wiki/Comparison_of_text_editors">dozens if not hundreds of text editors in existence</a> for virtually every operating system on the planet, so if you should for whatever reason lose the ability to use your favorite text editor, there&#8217;s always another one available.</p>
<h3>But why write yet another text editor?</h3>
<p>For one thing, I&#8217;m not 100% satisfied with most of the text editors I&#8217;ve tried so far, which are generally geared toward programmers. (I want to do coding, of course, but I also want to do other things with the editor.) For another thing, I have some ideas for features I&#8217;d like to see in a text editor. For another thing still, I&#8217;d like to have a text editor that works not just on one OS, but on all three of the major platforms — Windows, Mac OS X, and Linux. Finally, I just want to see if I can do this. <img src='http://karig.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>This is a list of some features I&#8217;d like to have:</p>
<ul>
<li>I want the ability to keep many individual notes, and I want to be able to see only the notes that match certain criteria. I want to keep all of the notes in one place — a single text file. So I want a text editor that can tell (or be told) where one note ends and the next begins.</li>
<li>I also want the editor to make manipulating notes as simple as manipulating files with a file manager. (Perhaps the editor would have a display mode where notes are displayed as icons. I&#8217;ll have to think more on this.)</li>
<li>I want the editor to support <a href="http://en.wikipedia.org/wiki/Unicode">Unicode</a>. Although 99.9% of my text would be plain <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a>, occasionally I&#8217;ll want to use letters with diacritics, or special symbols, or even an occasional <a href="http://en.wikipedia.org/wiki/International_Phonetic_Alphabet">IPA</a> or Greek or Japanese character.</li>
<li>I want the editor to save changes automatically, without altering the original file. When a file is first opened, the editor should create a working folder for the file and copy the file&#8217;s contents into the folder. (This copying should be done in a thread so that the user doesn&#8217;t have to wait for the process to finish before he can start editing.) Any changes made to the file should be saved automatically to the copy in the working folder; the original file is left untouched until the user specifies that the original should be overwritten with the working version. This also ensures that changes made will survive even if the system crashes and must be rebooted.</li>
<li>I want the editor to save a history of changes into the working folder, so that changes can be undone, redone, and reviewed even after the system is rebooted.</li>
<li>I want the editor to handle huge files gracefully. If the file to be edited is too big to load into memory, then the editor should copy the file&#8217;s contents into multiple smaller files, each small enough to load. This means that the editor would be able to handle files of any size (as long as the disk has enough space for a copy of the original file).</li>
<li>I want the editor to make it simple and easy to create and reuse regular expressions for searches and filters. You should be able to use regular expressions like Lego blocks and piece them together to create more complex regular expressions, then give a name to the result and save it so you can use it later.</li>
<li>The editor should support incremental search, so that the editor jumps to the next match for a phrase as you type in the phrase.</li>
<li>I want the editor to use stylesheets. A stylesheet in this case consists of a series of rules. Each rule spells out some text to look for and the font, style, and color to use to display the text. (The font need not be monospace; you should be able to use whatever font suits you.) The text to find might be a direct match for a regular expression, or it might be whatever text lies between matches for a pair of regular expressions.</li>
<li>Stylesheets should also support rules for automatic indentation, such as what indentation to apply if the current line begins with a specific character, or what indentation to apply to the new line after the user hits Enter.</li>
<li>I want the editor to support different stylesheets for different sections within the same text.</li>
<li>I want the editor to support <a href="http://nickgravgaard.com/elastictabstops/">elastic tabstops</a> so that &#8220;fields&#8221; within adjacent lines are lined up in the display, so I can create &#8220;tables&#8221; on the fly.</li>
<li>I want it to be easy to record, play back, write, edit, and debug macros from within the editor. I&#8217;m considering devising my own macro language for this.</li>
<li>I want to be able to enter commands just by typing, or by highlighting some text in the file and pressing a key. These commands could be built into the editor, or they could be macros or external scripts.</li>
<li>And of course I&#8217;ll want most of the other things that are expected in a text editor: line numbers, the ability to filter text through an external script, conversion of text from one encoding to another, easy insertion of text clips, autocomplete, and so on.</li>
</ul>
<p>I&#8217;m sure I&#8217;ll come up with other ideas, but this is the basic feature set I want.</p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/10/i-want-to-write-a-text-editor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hello world!</title>
		<link>http://karig.net/2009/09/hello-world/</link>
		<comments>http://karig.net/2009/09/hello-world/#comments</comments>
		<pubDate>Sat, 26 Sep 2009 10:01:28 +0000</pubDate>
		<dc:creator>Karig</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://karig.net//?p=1</guid>
		<description><![CDATA[I decided to start a blog. It would be cool to be able to write about what I've read or learned about something and then have casual visitors with similar interests drop by and comment.]]></description>
			<content:encoded><![CDATA[<p>I decided to start a blog. It would be cool to be able to write about what I&#8217;ve read or learned about something and then have casual visitors with similar interests drop by and comment.</p>
<p>What I&#8217;ll probably do is try to make each post (after this one) an essay, something longer than just a few sentences, something with some meat in it, instead of just running to the &#8220;Edit Post&#8221; page every time I learn something new. This would probably also make reading my stuff a little more interesting. <img src='http://karig.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://karig.net/2009/09/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
