[Update 17 December 2009: I've changed some of the terminology here. "Tags" are now "marks"; marks are a part of the innards of the text editor, while the word "tags" ought to be reserved for a feature that the user might use (such as "ctags"). In addition, "entries" are now "sections" because a section can contain any type of data, while entries, notes, articles, and similar things each contain data of a specific kind, or data that have certain characteristics: You'd expect an "entry" to have a date, or a byline, or both; you'd expect an "article" to have an essay-like structure with an introductory part and a conclusion; and so on. I've updated the text below to use the new terms.]
This is just a quick overview. I’ll flesh out the ideas here in later posts.
My text editor is going to need a buffer system. I need a buffer for at least two things:
- My custom text control will need a buffer to cache the text being displayed in the editor window, so that it can redraw the text when the window has to be redrawn. The buffer also needs to hold font and style information, so the control knows what fonts to use to redraw the text. (I’ll need this so that I can implement syntax highlighting.)
- The editor itself needs a buffer to hold the text of the file being edited.
I’ve thought this over. What I want is a buffer that offers the following:
General buffer functionality
Ideally the text buffer should be useful outside the context of a text-editor application. The user should be able to insert or delete text anywhere within the buffer, search the buffer for phrases or regular expressions, and extract substrings. I’m thinking of giving the buffer the same functions available for strings in REALbasic — the “+” operator to append data to the end of the buffer, “Mid” and other functions to extract substrings, “InStr” to find substrings, and so on. Some other functions that a text editor needs might also be useful in a general-purpose buffer, such as counting the number of words or lines.
Sections
I want an application that can treat a single long file as many sections of text. The characters that determine where one section ends and the next begins should be selected by the user. For example, you might use a line of hyphens to separate one section from the next, so all of the lines between two lines of hyphens would constitute a single section.
The significance of sections is that they would be like records in a database: You can get a subset of the complete file by specifying a phrase or a regular expression that each section must contain if it is to be part of the subset. The subset would contain only complete sections, not just the individual lines containing the search matches. You could then decide what to do with the sections in the subset — print them, sort them, export them to a script or a command-line program, or whatever.
In essence, if you were to take advantage of sections, you could use a text file as a simple database of notes or code snippets, a journal, or a card file.
Marks
A text editor often needs to mark spans (substrings) of text within the complete document. A span within a source-code file might be a keyword or a comment and thus should be displayed in a style distinct from the style used for the rest of the text. A span might be a bit of text that the user wants to return to later, so that it needs to be marked internally with a bookmark. In each case, this extra information associated with a span of text needs to remain attached to the span while the user is editing the document. In fact, even if the editing affects the span text itself, then the information needs to remain associated with whatever part of the span is left in the document.
This extra information is called a “mark.” Marks would be useful for associating style information with spans, for marking the results of previously run searches, for storing bookmarks within the text, and for other things.
Marks would be stored separately from the text in the buffer. The text in the buffer is not mixed with any other kind of information, so that searches through the text don’t require any kind of preparation beforehand and in fact can be performed at almost any time. (I strongly suspect that my text editor will need to do a lot of searches in the background, particularly if I want such things as syntax highlighting, line and word counts, and so on, so I want searches to be as fast as possible.)
Recording
The buffer should record each change that is made to the text in the buffer. Both text insertions and text deletions should be recorded, along with the locations where each insertion and each deletion occurred.
This facility would provide the foundation for several features desirable in a text editor, most notably undo/redo, review of changes made to a file, and macros.
Autosave
I want the buffer to save changes automatically, but without overwriting the original file. So the buffer needs two “stores” or places to load and save text — the original file or text source, and a temporary one for holding changed text until the user decides whether to discard the changes or to save them as a new file and overwrite the original.
If the “temp store” is somewhere other than memory, such as a folder on your hard disk or thumb drive, then you have some insurance against losing your changes in the event of a system crash or a power outage. Because the changes are kept separate from the original file, you still have the option, after a crash, to either commit the changes to the original file or to discard them.
This storage would be handled by two objects other than the buffer itself. The buffer wouldn’t know and wouldn’t care how these two objects load and save data. So although the original store would usually be a file on a hard disk, and although the temp store would usually be a folder, either store (or both) could access data on a website or an FTP site. On the other hand, if an application only needs the buffer for a few seconds, either store (or both) might access data in data structures in memory and never read or write anything on disk. This will help make the buffer system more useful in more situations than just as part of a text-editor application.
Segments
I want the buffer to be able to load and edit files too large to load completely into memory without bogging down the computer. Of course, the only way to do that is to treat such a file as an array of smaller chunks or “segments” and then load only the segments that the user is reading or editing.
Note that segments are not sections:
- Segments are intended to be of roughly similar size, though segments will grow and shrink as the user edits them. If a segment grows too large, the software will split it in two; if it gets too small, it is merged with a neighboring segment. Segments exist to simplify the processing that the software has to do when working with very large files or strings.
- Sections can be of any size useful to the user, and it is the user who determines where a section begins and ends.
Segments would also simplify some of the other work that a text editor would need to do, such as updating the location of tag spans. Since tags themselves are stored outside the text, my buffer code would need to do some extra work after each edit to ensure that each tag remains attached to its span. If each tag stored its span’s location as a single integer, as an offset in bytes from the start of the file, then this would make editing very large files problematic, because then every tag in the whole huge file might be affected. If instead a span’s location is an offset from the start of a segment, then a change in the text affects only the tags whose spans begin within that one segment. So segments will be very useful in making the job of editing huge files manageable.
Summary
So these are the basic features I want: sections, marks, automatic storage of changes, and the ability to tackle huge files. A buffer class with a feature set like this should be useful for building a fairly sophisticated text editor. I’ll flesh out each of these features in future posts.