首页 | 主题 | 图库 | 问答 | 文摘 | 原创 | 百科

历史 | 地理 | 人物 | 艺术 | 体育 | 科学 | 音乐 | 电影 | 信息技术 | 世界遗产

 开放、中立,源自维基百科

个人工具


置标语言

维库,知识与思想的自由文库

(重定向自標記語言)
跳转到: 导航, 搜索
SGML是一種專門的標記語言,被用作編寫《牛津英語詞典》的電子版本。這不但容許使用者更複雜的查詢,結果更很容易被編翻為超文本置标语言。
SGML是一種專門的標記語言,被用作編寫《牛津英語詞典》的電子版本。這不但容許使用者更複雜的查詢,結果更很容易被編翻為超文本置标语言

置标语言,也称标记语言是一種将文本以及文本相关的其他信息结合起来,展现出关于文档结构和数据处理细节的電腦文字编码。与文本相关的其他信息(包括例如文本的结构和表示信息等)与原来的文本结合在一起,但是使用标记(markup)进行标识。当今广泛使用的置标语言是超文本置标语言HyperText Markup LanguageHTML)和可扩展置标语言eXtensible Markup LanguageXML)。置标语言广泛应用于网页网络应用程序。标记最早用于出版业,是作者、编辑以及出版商之间用于描述出版作品的排版格式所使用的。

目录

[编辑] 置标语言的分类

置标语言通常可以分为三类:表示性的、过程性的以及描述性的。

[编辑] 表示性的置标语言

表示性的置标语言 (Presentational markup)是在编码过程中,标记文档的结构信息。例如,在文本文件中, 文件的标题可能需要用特定的格式表示(居中,放大等),这样我们就需要标记文件的标题。字处理以及桌面出版产品有时候能够自动推断出这类的结构信息,但是绝大多数的,像Wiki这样的纯文本编辑器还不能解决这个问题。

[编辑] 过程性标识

过程性置标语言(Procedural markup) 一般都專門於文字的表達,但通常对于文本编辑者可见,并且能够被软件依其出现顺序依次解读 。 为了格式化一个标题,在标题文本之前,会紧接着插入一系列的格式标识符,以指示计算机转换到居中的显示模式,同时加大及加粗字体。在标题文本之后,会紧接缀上格式中止标识; 对于更高级的系统宏命令或这堆栈模式会让这一过程的实现方式更加丰富 。大多是情况下, 过程性置标能力包含有一个Turing-complete编程语言。 过程性置标语言的范例有:nroff, troff, TeX, Lout 以及 PostScript. 过程性置标语言被广泛应用在专业出版领域, 专业的出版商会根据要求使用不同的指标语言已达到出版要求.

[编辑] 描述性标识

描述性标识(Descriptive markup)语义标识(semantic markup) applies labels to fragments of text without necessarily mandating any particular display or other processing semantics. For example, the Atom syndication language provides markup to label the "updated" time-stamp, which is an assertion from the publisher as to when some item of information was last changed. While the Atom specification discusses the meaning of the "updated" timestamp, and specifies the markup used to identify it, it makes no assertions about whether or how it might be presented to a user. Software might put this markup to a variety of uses, including many not foreseen by the designers of the Atom language. SGML and XML are systems explicitly designed to support the design of descriptive markup languages.

In practice, the classes of markup usually co-occur in any given system. For example, HTML contains markup elements which are purely procedural (for example b for bold) and others which are purely descriptive ("blockquote", or the "href=" attribute). HTML also includes the PRE element, which encloses areas of presentational markup to be laid out exactly as typed.

Sets of markup elements and rules for their use are commonly developed by standards bodies to support the kinds of documents used in particular industries or communities. One of the earliest of these was CALS, used by the US military for technical manuals. Industries with large-scale documentation requirements soon followed suit, developing tag-sets for aircraft, telecommunications, automotive, and computer hardware manuals. This led to delivering many such manuals solely in electronic form; some companies were able to produce printed, online, and CD-based manuals all from a single (descriptive markup) source. A notable example was Sun Microsystems, where Jon Bosak (who later founded the XML committee) decided on SGML for multi-target documentation delivery, achieving considerable cost savings.

Markup languages now abound; among the more widely known are DocBook, MathML, SVG, Open eBook, TEI, and XBRL. Many are for various kinds of text documents, but specialized languages are used in many other domains.

Generic markup is another term for descriptive markup. Most modern descriptive markup systems structure documents into trees, while also providing some means for embedding cross-references. Because of this, documents can be readily treated as databases, in which the database system is aware of the structure (not "blobs" as in the past). Because they do not have such strict schemas as relational databases, however, they are commonly called "semi-structured databases".

In the third millennium, great interest has arisen in document structures that are not trees. For example, ancient and sacred literature commonly has a rhetorical or prose structure (stories, pericopes, paragraphs, and so on), as well as a reference structure (books, chapters, verses, lines). Since the boundaries of these units often cross, they cannot readily be encoded using tree-structured markup systems. Among the document modeling systems that support such structures are MECS (developed for encoding the works of Wittgenstein), aspects of the TEI Guidelines, LMNL, and CLIX.

A primary virtue of descriptive markup is considered to be its flexibility: if the fragments of text are labeled as to "what they are" as opposed to "how they should be displayed", software may be written to process these fragments in useful ways not anticipated by the designers of the languages. For example, HTML's hyperlinks, originally designed for activation by a human following a link, are also widely used by Web search engines both in discovering new material to index and in estimating the popularity of Web resources.

Descriptive markup also facilitates the simpler task of reformatting a document as needed, because the format specification is not intertwined with the content. For example, italics might be used both for emphasis, and to indicate foreign words. However, if both are merely tagged (presentationally or procedurally) as italic, this ambiguity cannot readily be sorted out. If a decision is later made not to italicize foreign words, there is nothing for it but to review all italic portions and sort them out one by one. However, if the two cases were (descriptively or generically) tagged differently to begin with, either can be reformatted without interfering with the other.

[编辑] 历史

“置标(markup)”这个词来源自传统出版业的“标记”一个手稿,也就是在原稿的边缘加注一些符号来指示打印上的要求。 长久以来,这个工作都是由专门的人("markup men" )以及校对人来进行,对原稿标志出使用什么样的字体,字型以及字号,然后再将原稿交给其他人进行手工的排版工作。

[编辑] GenCode

The idea of "markup languages" was apparently first presented by publishing executive William W. Tunnicliffe at a conference in 1967, although he preferred to call it "generic coding." Tunnicliffe would later lead the development of a standard called GenCode for the publishing industry. Book designer Stanley Fish also published speculation along similar lines in the late 1960s. Brian Reid, in his 1980 dissertation at Carnegie Mellon University, developed the theory and a working implementation of descriptive markup in actual use. However, IBM researcher Charles Goldfarb is more commonly seen today as the "father" of markup languages, because of his work on IBM GML, and then as chair of the International Organization for Standardization committee that developed SGML, the first widely used descriptive markup system. Goldfarb hit upon the basic idea while working on an early project to help a newspaper computerize its workflow, although the published record does not clarify when. He later became familiar with the work of Tunnicliffe and Fish, and heard an early talk by Reid which further sparked his interest.

It must be noted that the details of the early history of descriptive markup languages are hotly debated. However, it is clear that the notion was independently discovered several times throughout the 70s (and possibly the late 60s), and became an important practice in the late 80s. [來源請求]


Some early examples of markup languages available outside the publishing industry can be found in typesetting tools on Unix systems such as troff and nroff. In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a trial and error iterative process to get a document printed correctly.[來源請求]

Availability of WYSIWYG ("what you see is what you get") publishing software supplanted much use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts.

[编辑] TeX

Another major publishing standard is TeX, created and continuously refined by Donald Knuth in the 1970s and 80s. TeX concentrated on detailed layout of text and font descriptions in order to typeset mathematical books in professional quality. This required Knuth to spend considerable time investigating the art of typesetting. However, TeX requires considerable skill from the user, so that it is mainly used in academia, where it is a de-facto standard in many scientific disciplines. A TeX macro package known as LaTeX provides a descriptive markup system on top of TeX, and is widely used.

[编辑] SGML

The first language to make a clear and clean distinction between structure and presentation was certainly Scribe, developed by Brian Reid and described in his doctoral thesis in 1980[1]. Scribe was revolutionary in a number of ways, not least that it introduced the idea of styles separated from the marked up document, and of a grammar controlling the usage of descriptive elements. Scribe influenced the development of Generalized Markup Language (later SGML) and is a direct ancestor to HTML and LaTeX.

In the early 1980s, the idea that markup should be focused on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James D. Mason were also key members of the SGML committee.

SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD) or schema). This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. From the late 80s on, most substantial new markup languages have been based on SGML system, including for example TEI and DocBook. SGML was promulgated as an International Standard by International Organization for Standardization, ISO 8879, in 1986.

SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, it was generally found to be cumbersome and difficult to learn, a side effect of attempting to do too much and be too flexible. For example, SGML made end tags (or start-tags, or even both) optional in certain contexts, because it was thought that markup would be done manually by overworked support staff who would appreciate saving keystrokes.

[编辑] HTML

HTML的範例
HTML的範例
主条目:HTML

By 1991, it appeared to many that SGML would be limited to commercial and data-based applications while WYSIWYG tools (which stored documents in proprietary binary formats) would suffice for other document processing applications.

The situation changed when Sir Tim Berners-Lee, learning of SGML from co-worker Anders Berglund and others at CERN, used SGML syntax to create HTML. HTML resembles other SGML-based tag languages, although it began as simpler than most and a formal DTD was not developed until later. DeRose[2] argues that HTML's use of descriptive markup (and SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled (other factors include the notion of URLs and the free distribution of browsers). HTML is quite likely the most used markup language in the world today.

However, HTML's status as a markup language is disputed by some computer scientists. The argument for this is that HTML restricts the placement of tags, requiring them to be either fully nested inside of other tags, or the root tag of the document. Because of this, these scientists would suggest instead that HTML is a container language, following a Hierarchical model.

[编辑] XML

主条目:XML

Another, newer, markup language that is now widely used is XML (Extensible Markup Language). XML was developed by the World Wide Web Consortium, in a committee created and chaired by Jon Bosak. The main purpose of XML was to simplify SGML by focusing on a particular problem — documents on the Internet.[3] XML remains a meta-language like SGML, allowing users to create any tags needed (hence "extensible") and then describing those tags and their permitted uses.

XML adoption was helped because every XML document is also an SGML document, and existing SGML users and software could switch to XML fairly easily. However, XML eliminated many of the more complex features of SGML, easing learning and implementation (while increasing markup size and reducing readability). Other improvements rectified some SGML problems in international settings, and made it possible to parse and interpret document hierarchy even if no schema is available.

XML was designed primarily for semi-structured environments such as documents and publications. However, it appeared to hit a sweet spot between simplicity and flexibility, and was rapidly adopted for many other uses. XML is now widely used for communicating data between applications.

[编辑] XHTML

Template:Mainarticle Since January 2000 all W3C Recommendations for HTML have been based on XML rather than SGML, using the abbrebiation XHTML (the eXtensible Hypertext Markup Language). The language specification requires that XHTML Web documents must be "well-formed" XML documents – this allows for more rigorous and robust documents while using tags familiar from HTML.

One of the most noticeable differences between HTML and XHTML is the rule that all tags must be closed: 'empty' HTML tags such as <br> must either be 'closed' with a regular end-tag, or replaced by a special form: <br /> (note that there must be a space before the '/' on the end tag as otherwise the tag is not valid SGML). Another is that all attribute values in tags must be quoted.

[编辑] 其他基于XML的应用

还有其他一些基于XML的应用,比如RDF, XForms, DocBook, SOAP以及Web Ontology Language (OWL)。具体可以参见XML标记语言列表.

[编辑] 特征

A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. Here, for example, is a small section of text marked up in HTML:

<h1> Anatidae </h1>
<p>
The family <i>Anatidae</i> includes ducks, geese, and swans,
but <em>not</em> the closely-related screamers.
</p>

The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes "h1", "p", and "em" are examples of structural markup, in that they describe the intended purpose or meaning of the text they include. Specifically, "h1" means "this is a first-level heading", "p" means "this is a paragraph", and "em" means "this is an emphasized word". A device reading such structural markup may apply its own rules or styles for presenting it, using larger type, boldface, indentation, or whatever style it prefers. The "i" instruction is an example of presentational markup. It specifies the exact appearance of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance.

The Text Encoding Initiative (TEI) has published extensive guidelines for how to encode texts of interest in the humanities and social sciences, developed through years of international cooperative work. These guidelines are used by countless projects encoding historical documents, the works of particular scholars, periods, or genres, and so on.

[编辑] 其他应用

While the idea of markup language originated with text documents, there is an increasing usage of markup languages in areas like vector graphics, web services, content syndication, and user interfaces. Most of these are XML applications because it is a clean, well-formatted, and extensible language. The use of XML has also led to the possibility of combining multiple markup languages into a single profile, like XHTML+SMIL and XHTML+MathML+SVG [1].

[编辑] 參見

[编辑] 參考資料

  1. Reid, Brian. "Scribe: A Document Specification Language and its Compiler." Ph.D. thesis, Carnegie-Mellon University, Pittsburgh PA. Also available as Technical Report CMU-CS-81-100.
  2. DeRose, Steven J. "The SGML FAQ Book." Boston: Kluwer Academic Publishers, 1997. ISBN 0-7923-9943-9
  3. http://www.w3.org/TR/2004/REC-xml11-20040204/ Extensible Markup Language (XML)

[编辑] 資料來源

  • TEI guidelines
  • Markup systems and the future of scholarly text processing by James H. Coombs, Allen H. Renear, and Steven J. DeRose. Originally published in the November 1987 CACM, and reprinted several times in other forums, this article introduced many of the concepts now used in discussing markup languages, and lays out the basic arguments for the superior usability of descriptive markup.

[编辑] 外部連結

其它语言
AD Links