Recent Changes - Search:

Home Pages Pidgin   Azarennya (S|N) Mac Thesaurus Reference ToDo Colino Food Local

Blogs: BadIdea Rachel RIAA Cult: Clambake Infidels Fi: Arda StarTrek Trek/Wars Film: IMDB D Harry Jabootu Kyle Fun: Agony ICanHas? ObSkills Snopes Lang: ZBB Vreleksá AwkWords Omniglot Scriptorium More... Local: Maps Map MyWeb Metro (map) FC Weather GoWhere? GGWash DC Arlington Reston Beyond Bacon Pix: Deviant Places Renderosity Blender Artists Pol: Anchoress Lizards Lucianne Strata WAwakes Sci: SmallThings Darwin AntiEvo Skeptics EvC BAUT Physics /.Sci Junk Panda Pharyngula Mags AmSci NatG Space X86: OSX86 ArsTech OSNews TUAW Dev PowWeb PHP Webmaster Coding Walkers Prog: PHP JS Toolbox Unobt Compress RegExp (test) Lint SQL Cocoa Builder Dev Apple BBS Userland Faqin

Science/Tech: Engadget Thunderbolts Icecap Centauri NewSci Gizmodo co2sci ClimateDebate SciDaily Nrich NatGeog Math CreatClaims GoodBadMath

CurrentEvents: OrigSig Flamingo FlopAces ImmigProf ~J~ MyVRWC NewsGroper Pal2Pal Sanity Simon TCS Toldjah Blogs...

Tools: Calculator AsciiArt XMLVal

FunStuff: Pictures: Photobucket (eg Dubai) Videos: YouTube Subtitler

InterestingThings: LibraryThing FlashCards GoogleDocs Wowio Bubbl.us Colemak Audible PodioBooks WonderfulInfo BooksOnline AboutUs.org

Colino

("Colino" is the Italian word for a strainer, which is more or less the function of my software -- to "strain out" the records you don't want to see. Besides, I just thought the word was cool. :) See also: Code)

I am working on Colino because I want to keep notes on my computer in a certain way.

  • I want to organize my notes on various topics into large text files. I want each note to go into its own "record" within a text file. That means that I need a convention for marking where one record within a file ends and the next begins.
  • I want to add fields to any record, and then be able to retrieve only those records that have particular fields.
  • I want to use text files, not word-processor files and not a database, because text files are editable on any computer platform, and text editors are as common as dirt, while other file formats are more subject to being "orphaned" in the long run.
  • I want the conventions for marking record boundaries and record fields to work alongside other markup schemes (namely Markdown, but also Textile, XHTML, TeX, POD, etc.). It is only necessary that my software be able to distinguish boundaries and fields from the markup used within records; the software would retrieve records before they are to be translated into formatted text.

More than one version of Colino

Clearly I'll be writing my record extraction software at least twice -- once in PHP, to be run from the website, and once again in JavaScript, to be run from within EmEditor. (I might also decide one day to write a UNIX-style CLI version in C.)

So I need to define exactly what the program does, so that each version of the program does the exact same thing.

Input parameters

The software takes the following as input:

  • (f=) The filename of the record file to filter.
  • (c=) The string of criteria to apply to the records.
  • (s=) A list of names of fields on which to sort the records.
  • (l=) An optional leader string to output before the first record.
  • (b=) An optional string to output between records.
  • (t=) An optional trailer string to output before the last record.

The software sends to output each record that matches the criteria.

Record-file format

A record file is simply a text file, with certain lines used to mark the boundaries between records. A record is simply a sequence of lines of text. Some of these lines may be freeform fields.

  • A line beginning with four hyphens and a hash mark (----#) indicates the start of a new record. Any text after the initial hash mark is ignored.
  • A line beginning with four hyphens and a colon (----:) is a field within a record. The first word after the colon is the field name; any text after that, up to the end of the line, is the field value.
  • A line beginning with four hyphens followed by anything other than a hash mark or a colon is just another line of record text.
  • The four hyphens have to be the first four characters on the line, or the software ignores the line.

These conventions were selected to make it easy to use markup (Markdown, Textile, etc.) within each record.

Criterion-string format

This is an example of a string of criteria:

	tactics "long days" year > 1796 year < 1837 "name of author" [ Don

This specifies the following:

  • Find the word "tactics" in the record text.
  • Find the phrase "long days" in the record text.
  • Find the "year" field and verify that its value is greater than 1796.
  • Find the "year" field and verify that its value is less than 1837.
  • Find the "name_of_author" field and verify that its value begins with "don".

Commas make the string more readable, but the software ignores them:

	tactics, "long days", year > 1796, year < 1837, "name of author" [ Don

The rules for parsing the criteria string are simple:

  • A single word out of quotes is a term.
  • A group of words in quotes is a term.
  • An operator is a "word" consisting of nonalphanumeric characters.
    • An operator not recognized by the software is treated as meaning "contains".
    • An operator following a term already determined to be a field value is ignored.
  • A term followed by an operator is considered to be a field name.
    • If the term is a phrase, its spaces are replaced by underscores.
  • A term preceded by an operator is considered to be a field value.
  • A term elsewhere is to be sought within the record text.
  • The record text is represented by a blank string.
    • "" [ "once upon a time" -- record text must begin with that phrase.

The operators recognized are:

	=	equal to
	==	equal to
	<>	not equal to
	<	less than
	<=	less than or equals
	>	greater than
	>=	greater than or equals
	[	begins with
	]	ends with
	^	contains

Any of these operators may be preceded by ! to invert the operator's meaning. Thus != means "not equal to", !^ means "does not contain", etc.


On keeping everything in one big text file, see also:

Edit - History - Print - Recent Changes - Search
Page last modified on February 25, 2008, at 08:19 AM