To LUGNET HomepageTo LUGNET News HomepageTo LUGNET Guide Homepage
 Help on Searching
 
Post new message to lugnet.admin.generalOpen lugnet.admin.general in your NNTP NewsreaderTo LUGNET News Traffic PageSign In (Members)
 Administrative / General / 2034
2033  |  2035
Subject: 
Re: markup syntax for member pages
Newsgroups: 
lugnet.admin.general, lugnet.general
Date: 
Sat, 3 Jul 1999 00:36:05 GMT
Viewed: 
895 times
  
In lugnet.admin.general, Micah Jaffe
<zeade+usenet+.gov+.mil@boink.stanford.edu> writes:
[Just a sidenote, I ended up registering on lugnet to post this, Hi Everybody]

Hi!  Welcome!

I'll try to respond to things point-by-point below...


I work in online publishing of scientific journals, and well you drew me out
of the woodwork.  You've obviously put a lot of thought into providing a
means of presenting formated text for member pages, but it sounds like
you're loading the shotgun for the fly on the wall.  I've created (and work
with) a few different kinds of proprietery mark-up/scripting languages.
I've worked a lot with SGML-based parsers and know my way around regex
engines nicely.  I'd have to say, when I consider "rolling my own", I look
first to see if it can be done another way.

Me too.  But just because something -can- be done another way, doesn't mean
the other way is necessarily better.  I can grind wheat with the butt-end of
a Coke bottle, but I'll probably have more luck doing it with something
specially designed for grinding wheat.

To the shotgun analogy, I can probably kill a fly with a loaded shotgun a
lot easier than I can with a fly-swatter.  But the gun takes more skill,
more preparation time, is a bit more dangerous, and is a lot easier to make
a big mess with.  :-)


Here's things to consider:

1.  Support.  What you create may start out small and easy, but maintaining
    a parser can be a PITA.  Granted, the type of markup you describe
    doesn't show behavior of feeping creaturism, it's *always* important to
    consider how easy it'll be to 1) extend and 2) have others support
    (codewise).

Except that in this case, if radical improvements are needed, it's trivial
to "upgrade" existing pages from whatever form they currently exist in, into
HTML.  Or to have keep a choice between two (or N) input languages.  So it
doesn't seem gross to me at all not to worry about extensions for that
simple fact.  I totally see eye-to-eye with your underlying philosophy, but
this is a funny kind of case.


2.  HTML is already there, and most who'd be interested in creating a
    members web page will very likely know enough basic HTML to do the
    job.  There are many many many tools to help validate/filter HTML tags
    (hell regexs will do a lot for you).  Creating a proprietary language
    means you have to help the end user learn it.  There are mountains of
    tutorials and documentation on HTML that nearly every question can be
    refered to.  Your own language == you're the customer support desk.
    This is *very* important, in that the more people you have using your
    language, the more people need help from you.

Points well taken!  But offering HTML is a double-edged sword -- I can
easily forsee far more support problems with people asking why the heck
<BLINK> and <IMG> and <APPLET> and <SCRIPT> are broken, than with people not
understanding a simple specialized alternative markup.  The thing is, it
oughtn't really -seem- or -feel- like a markup language -- it should just
feel like editing ASCII text, but with a bit of magic formatting, like
automatically converting indented and starred-items to lists.

And that's related to another huge advantage of avoiding HTML -- items can
be included in news postings and still be perfectly readable.  For example,
one way that people could watch the flow/stream of updates to pages is
through a special newsgroup which could be set up just for (automated)
postings of changes.  Now people could tune into this special newsgroup and
actually read the content, since it would basically look like normal ASCII
text.  Or maybe really important things on a group-by-group basis (at the
newsgroup homepages on the web) could actually be posted into the groups
whenever a significant change is made.  This even has a lot of exciting
related possibilities with regard to FAQ lists.


3.  Who are you trying to defend against?  These are member pages, which
    would theoretically only be able to be modified by accepted members of
    the group.  Inappropriate or abusive use of HTML can be a violation of
    membership, etc.

How do you apologize to someone who's been the victim of a malicious and
sneakily embedded script, after the fact?


    Also, as above, you can create filters which only
    allow a strict subset of HTML.  You could even get very picky and create
    your own DTD which contains only the desired subset of HTML you want,
    and as someone else, mentioned use nsgmls, (along with sgmlspl, a VERY
    awesome perl parser extention, I have more info if you're interested).
    This is what I would do, if I was very concerned about
    security/integrity, yet wanted to use an "open" mark-up.

It doesn't really matter whether the markup is "open" or not (whatever that
means for a markup language) -- only that it be readable as well as friendly
to non-computer people.  HTML is great for what it does, but it can look
like gibberish when it's read as plain text.  And it's friendly enough for
geeks, but try explaining it to your grandmother.


The only benefit I can think to not using HTML is that is not really a good
language to compose with.

That's the main thing.  Speed and easy of updating/entry.  The news-posting
possibility is the other small thing, and avoiding misleading people by
using a subset of HTML is another big thing.


But as you already stated, the type of text
formating you want available is fairly limited, so the amount of tagging
that people will have to do is limited.

To me, that says that HTML is even less appropriate.

Think about links...and the huge complexity difference between

   <A HREF="http://www.lugnet.com/cad/dev/">CAD Development</A>

and

   </cad/dev/>

which would both display the same way on the screen.


  See \i{Spot} run.         See <I>Spot</I> run.         italics
  See \b{Spot} run.         See <B>Spot</I> run.         boldface
  See \i{\b{Spot} run}.     See <I><B>Spot</B> run</I>.  italics & boldface
  \sect{Spot Drinks Blood}  <H3>Spot Drinks Blood</H3>   section header

Mmmm Latex.  Easier to type out than HTML, but not the easiest thing in the
world to help people learn (although I assume you'd be using an extremely
limited subset).

Oh, I wasn't considering this -- it was just an example.  It's from TeX,
later LaTeX, and then ripped off by Microsoft and called RTF (Ripped-off TeX
Format ;-) -- although I probably got a couple of the details wrong because
I haven't used it in 20 years.


  See (i Spot) run.         See <I>Spot</I> run.         italics
  See (b Spot) run.         See <B>Spot</I> run.         boldface
  See (i (b Spot) run).     See <I><B>Spot</B> run</I>.  italics & boldface
  (sect Spot Drinks Blood)  <H3>Spot Drinks Blood</H3>   section header

Never seen before, looks Lispy.

Yeah, I think it comes from a Lisp textbook explaining how something like
that could be done in Lisp.  It's not very froody.


  See ''Spot'' run.         See <I>Spot</I> run.         italics
  See '''Spot''' run.       See <B>Spot</I> run.         boldface
  -n/a-                     See <I><B>Spot</B> run</I>.  italics & boldface
  '''Spot Drinks Blood'''   <H3>Spot Drinks Blood</H3>   section header

Don't like at all.  Where have you seen this used?

This isn't very froody, either.  I think it came from a program called
"wiki."


The reason I like it is because it seems intuitive, friendly, relatively
easy to type, and because normal text rarely uses [, ], {, }, <, and >.

Now, what's shown above there is only { } and [ ] ...  That leaves < > for
hyperlinks!

Simple, but this system is not very expandable (every time you need a new
convention, you need to use a new character of some sort).

Very true.  Doubling up []'s to make things like

   [[This is a section title]]

is a reasonable option, but mixing and matching bracket-types probably only
leads to confusion.  So it's not very expandable.  But I don't think it
needs to be expandable.  Normally I would agree, but it's a very, very low
risk to take, for what I see is a big gain.

If you think about the types of things that appear in printed text
(journals, magazines, books, newspapers, etc.), the formatting possibilities
we see totday haven't really changed in a very, very long time.  You get
paragraphs, italics (fairly common), maybe some boldface once in a while,
and of course different levels of section headers.  Then once in a while
there are bulleted lists and maybe a sidebar or something like that.  And
underlining.  I suppose _foo bar_ could easily enough be meant to mean
"foo bar" gets underlined, to solve that.  See how much easier that is to
read and write than HTML tags?  It's hardly even a markup language -- and
that's really the goal.  (I should never have used the word "markup" in the
subject line of my first post on this thread, heh heh.  :-)


Anyone know of any alternative markup possibilites worth considering?
Any success stories?  Horror stories?

Also, another question, what about text that has been parsed?  How are
members to edit old content?  Do you have to have another parser to parse
back from HTML to the proprietary markup?

nono -- it's just stored raw.  It gets stored exactly as they typed it, so
when they save it and come back to edit it, it's byte-for-byte the same in
the edit box.  Then it just gets transformed on-the-fly into HTML whenever
the page is displayed.

That's the way the news articles here on the system work, for example.
They're stored as raw NNTP articles, not as HTML, and the only time HTML
exists is for the zillionth of a second that it takes to load the NNTP
article into memory, convert it to HTML, and then dump it to STDOUT (which
redirects to a socket connected to the user's browser).  Then that HTML code
in the server's memory simply vanishes from the universe until it's asked
for again.


I can't offer any true horror stories along the lines of "oh my god, using
my own markup language cost me my job."  But if you view the above as merely
a side-project, it won't be.  It all comes down to a matter of time and what
you want to do with it.

Yup -- it's a very specialized purpose, and it has requirements that go
beyond what HTML can do (namely to be readable as plain text).  HTML is
great, but it's really the wrong tool for this job.


I contend using HTML will be easier to build,
maintain and support.  A validation system can be created via a DTD that
contains an HTML subset using nsgmls/sgmlspl; this will take more time (if
you aren't familiar with it already), although this security aspect is not
something I see as a vital concern.

Micah, thanks so much for your insights, experiences, and opinions.  What
you've given, overall, sounds to me like a really great argument -for- HTML,
from your experienced and wise background.  HTML now seems to me more like a
good second choice instead of a tenth choice.  I'm happy to learn that
people have had good experiences with off-the-shelf HTML parsers, especially
ones which allow subsets.

--Todd



Message is in Reply To:
  Re: markup syntax for member pages
 
[Just a sidenote, I ended up registering on lugnet to post this, Hi Everybody] In lugnet.admin.general, thou, Todd Lehman (lehman@javanet.com), hast wrytted... (...) Todd, I work in online publishing of scientific journals, and well you drew me out (...) (25 years ago, 2-Jul-99, to lugnet.admin.general, lugnet.general)

31 Messages in This Thread:











Entire Thread on One Page:
Nested:  All | Brief | Compact | Dots
Linear:  All | Brief | Compact
    

Custom Search

©2005 LUGNET. All rights reserved. - hosted by steinbruch.info GbR