My text editor has to have a way to store text in memory. REALbasic provides two classes for this: String and MemoryBlock. You might think at first that Strings are for text and MemoryBlocks are for binary data, but for my purposes, MemoryBlock is better.
String objects
Strings in REALbasic are immutable — that is, you can’t write code to just reach inside a String and rearrange its contents. If you write code to change the text in a String, your program will always create a new String object and copy all of the text from the old String into the new String. That’s just how REALbasic Strings work. The whole point of a text editor is to allow you to change the text in a file at any time, so this operation should be efficient, and REALbasic’s String class is not efficient for this.
This is how you insert a new piece of text into a String:
Dim offset as Integer
//...
buffer = Left( buffer, offset ) _
+ newText _
+ Right( buffer, Len( buffer ) - offset )
Here is what happens under the hood whenever the above code is run: REALbasic creates a new “buffer” object, copies all of the data from the old “buffer” and “newText” objects into the new “buffer” object, returns the new “buffer” object, and marks the old “buffer” object as empty space to be reclaimed by the operating system. This isn’t so bad if “buffer” is always a small String, but what if the String is the text from a 200KB file? Then every time the user types something, the editor has to borrow a 200KB block from the operating system to build the new object, then copy 200KB of data from the existing buffer into the new object, then let the REALbasic framework “collect” the old buffer so the OS can have its 200KB block back. This takes up both CPU time and memory. Surely there is a way to insert new text that takes up less?
There is, of course. Instead of keeping a file’s contents in memory as a single big long String, the editor could split the file’s contents up into many short Strings and store references to those Strings in an array. Inserting a String into an array actually inserts only the location of the String, not its contents, so insertion is quick. Now the code to insert new text into the buffer looks like this:
Dim index, offset as Integer
Dim leftPart, rightPart as String
//...
leftPart = Left( buffer( index ), offset )
rightPart = Right( buffer( index ), _
Len(buffer( index )) - offset)
buffer.Insert index, leftPart
buffer.Insert index+1, newText
buffer.Insert index+2, rightPart
The code is longer, but it runs faster because there is less going on under the hood: New text is inserted by splitting only a single segment of the file and inserting the new text in between the parts. It isn’t necessary to append the parts together into a long String again, because the array keeps the parts in order.
This clearly alleviates the problem, but it does not solve it: To insert new data into the buffer, you still have to create a copy of data you already have.
In addition, there is another problem: Searching the buffer becomes very awkward when the buffer consists of lots of fragments in an array instead of one great big long string. So in this case the editor would probably append all of these items into a single string when a search is about to be performed. Again, the data in the many segments of the buffer would need to be copied into a separate String object just to perform a search. As before, this takes time that the editor could spend more productively doing something else, and it doubles the editor’s memory requirements because it has to make a second copy of all of the text data. There has to be a better way.
(Side note: Theodore H. Smith offers a REALbasic plugin called ElfData. The plugin offers special string classes (ElfData and FastString) that let you split and recombine strings much more efficiently than REALbasic’s String class, mostly by cutting down on the amount of data copying and memory allocation that has to be done at runtime. However, the problem (from my perspective) is the same: The ElfData and FastString classes still work like String objects. To insert text into a long string, you still have to break the long string apart and copy all of the text from the various pieces into a new object, and you need space in memory for two copies of your data to do that. A text editor would use memory more efficiently if it could avoid creating a second copy of the text altogether.)
MemoryBlock objects
Unlike String objects, MemoryBlocks in REALbasic can be updated in place: You can insert new text into a MemoryBlock without the REALbasic framework recreating the whole MemoryBlock each time. Further, as Charles Yeomans points out, if you use a BinaryStream object as the front end for a MemoryBlock, you end up with a buffer that acts like a file: you can “seek” to a given position anywhere within the buffer and read or write data at that point, and if you write data beyond the end of the buffer, then the buffer will be made large enough to hold the new data.
When you write ten bytes of data into the MemoryBlock, then ten bytes of space within the MemoryBlock, beginning with the current position, will be overwritten. If the current position is ten or more bytes from the end of the buffer, then no data in the MemoryBlock is copied or moved, so the write operation is extremely fast. However, if the current position is less than ten bytes from the end of the buffer, then a new MemoryBlock has to be allocated, and the data in the old MemoryBlock has to be copied into the new one. If you use a BinaryStream object as the front end for a MemoryBlock, as explained in Charles Yeomans’s article, the new MemoryBlock will be twice the size of the old one, so the need to move data to a new MemoryBlock should not arise often.
But if you want to insert text into a MemoryBlock, you don’t want to overwrite the text already in there. So it would seem that you’d still need to copy existing text to a new location to make room for the new text. However, if the text to be overwritten is actually considered garbage and not part of the text you want to keep, then you can “insert” your text extremely quickly. This is the basis for the gap buffer that is used in many text editors. A gap buffer is a text buffer with three regions: the text before the current position, the text after the current position, and the “gap” in between. The gap is of course just a stretch of nonsense text that isn’t considered part of the main text and that is present to be overwritten with whatever new text the user types or pastes in.
If you need to search the text being edited, the search might be easier if the gap is removed. This of course requires copying text within the buffer to remove the gap. However, no new objects are created, so the editor would not need to borrow more memory from the operating system to perform this operation.
MemoryBlocks are best
Clearly, if you want to write a text editor in REALbasic, it is better to use a MemoryBlock to store the text being edited. Inserting and deleting text will be much faster and will require less memory than if you were to try to store the text in String objects.