計算機とその周辺: What I Talk About When I Talk About Computers: XMLを扱う (その15) [Unicode][Common Lisp]


 ワークフローの続き。closure-xmlを理解したいのだが、runesという用語
 がわからない、というのが文脈。ソースを読む。

 closure-common.asd :

 char-code-limitなどを用いて、処理系(内部処理)のUnicode対応を判定し
 ている。判定した結果としてfeatureが追加される。

 ---
 Unicode非対応
   :rune-is-integer
 Unicode(UTF-16)対応?
   Unicode対応のようだがsurrogate pairの取り扱いが変。
   :rune-is-character
 Unicode(UTF-16)対応
   :rune-is-utf-16
   :rune-is-character
 ---

 で、

 #-rune-is-character
 (format t "~&;;; Building Closure with (UNSIGNED-BYTE 16) RUNES~%")

 #+rune-is-character
 (format t "~&;;; Building Closure with CHARACTER RUNES~%") 

 ということらしい。さらにdefsystemのfileのソース指定もこれらによっ
 て変化する。

---
(defsystem :closure-common
    :default-component-class closure-source-file
    :serial t
    :components
    ((:file "package")
     (:file "definline")
     (:file runes
            :pathname
             #-rune-is-character "runes"
             #+rune-is-character "characters")
     #+rune-is-integer (:file "utf8")
     (:file "syntax")
     #-x&y-streams-are-stream (:file "encodings")
     #-x&y-streams-are-stream (:file "encodings-data")
     #-x&y-streams-are-stream (:file "xstream")
     #-x&y-streams-are-stream (:file "ystream")
     #+x&y-streams-are-stream (:file #+scl "stream-scl")
     (:file "hax"))
    :depends-on (#-scl :trivial-gray-streams
         #+rune-is-character :babel))
---

 ちなみにaclは、

 Unicode(UTF-16)対応
   :rune-is-utf-16
   :rune-is-character

 だった。

 さて、characters.lispとrunes.lispの比較。

 まず型。

--- characters.lisp --- 
(deftype rune () #-lispworks 'character #+lispworks 'lw:simple-char)
(deftype rod () '(vector rune))
(deftype simple-rod () '(simple-array rune))
---  

--- runes.lisp ---  
(deftype rune () '(unsigned-byte 16))
(deftype rod () '(array rune (*)))
(deftype simple-rod () '(simple-array rune (*)))
---   

 characters.lispの方は処理系の型の別名。
 runes.lispは16-bitを確保。

 関数とか。
 
--- characters.lisp --- 
(definline digit-rune-p (char &optional (radix 10))
  (digit-char-p char radix))
---

--- runes.lisp --- 
(definline digit-rune-p (char &optional (radix 10))
  (cond ((<= #.(char-code #\0) char #.(char-code #\9))
         (and (< (- char #.(char-code #\0)) radix)
              (- char #.(char-code #\0))))
        ((<= #.(char-code #\A) char #.(char-code #\Z))
         (and (< (- char #.(char-code #\A) -10) radix)
              (- char #.(char-code #\A) -10)))
        ((<= #.(char-code #\a) char #.(char-code #\z))
         (and (< (- char #.(char-code #\a) -10) radix)
              (- char #.(char-code #\a) -10))) ))
---

 こちらもcharacters.lispは処理系の機能の別名、runes.lispは処理を実
 装。

 さて、utf8というprefixの関数がcxmlの説明でたまにでてくるが、これは
 何か。

 まず、defsystemで、

      #+rune-is-integer (:file "utf8")

 とあるのでUnicode非対応の処理系用の機能だろう。
 続いてutf8.lispを見る。

--- 
(deftype rune () 'character)
(deftype rod () '(vector rune))
(deftype simple-rod () '(simple-array rune))
---

 なるほど。runes.lispではcxml内で、UTF-16をエミュレートするが、
 utf8.lisp では、UTF-8をエミュレートするのか。

 これでcxmlのドキュメントを読み解く基礎はできたかなぁ。

こつこつ。

計算機とその周辺: What I Talk About When I Talk About Computers

2009年2月2日月曜日

XMLを扱う (その15) [Unicode][Common Lisp]

0 件のコメント:

ラベル

自己紹介

ブログアーカイブ

計算機とその周辺: What I Talk About When I Talk About Computers

2009年2月2日月曜日

XMLを扱う (その15) [Unicode][Common Lisp]

0 件のコメント:

ラベル

自己紹介

ブログ アーカイブ

ブログアーカイブ