計算機とその周辺: What I Talk About When I Talk About Computers: 2月 2009

2009年2月27日金曜日

【Subversion】UbuntuとOSXを使った現象論の整理

点滴をうったら多少元気になった。。。

Subversionの深掘りをしたいわけではなく、使うにあたってちゃんと知りたい、というのが目的であった。
その目的からすると、特に懸案であった文字エンコーディングの内部機構はずいぶんおさえられたので、あとは現象論でおさえればよいように思えてきた。現象論を整理してみよう。

日本語のみ探る。

パッチ無しリポジトリOSX
- メッセージ
  - from:OSX:OK:NFC
  - to:Ubuntu:OK
  - from:Ubuntu:OK:NFC
  - to:OSX:OK
- ファイル名
  - from:OSX:NG:?:svn addはできるがsvn statusにて不整合状態とありsvn ciできない。
  - from:Ubuntu:OK:db/revsはNFC
  - to:OSX:NG:ファイルシステム上はNFD。Terminalでは化けないが、ScreenやEmacsでは化ける。ただしsvn statusすると、端末にかかわらず
```
$ svn status
?      が.txt
!      が.txt
$ 
```
    という状態になる。

ふむ。リポジトリとしてのOSXは、パッチ無しでも問題なさそうだ。
OSXの文字エンコーディング問題というのは、あくまでOSX上のWorking directoryの問題のようだ。

整理してみよう。

リポジトリ Ubuntu/GNU/Linux
クライアントパッチ済OSX
クライアント Ubuntu/GNU/Linux

という組み合わせは今までもいろいろなところで運用している。なのでこれは整合している。

今回分かったのは、

リポジトリパッチ無しOSX/パッチ有りOSX
クライアントパッチ済OSX
クライアント Ubuntu/GNU/Linux

という組み合わせでも整合しそうだ、ということだ。

そして問題はクライアント側に集中しており、

svn statusでNFC/NFD不整合が発生し、working directoryをまともに扱えない。
TerminalはNFDのファイル名を化けなく表示できるが、screen -U、Emacsのdired、Emacsのshell modeは文字化けをおこす。

ということがある。1番目が致命的。

短期的視点では、OSXがNFDを採用したのは失敗に思える。

よろよろ。

【Subversion】Subversionにおけるエンコーディングの取扱い

これくらいのまとめを書くのに実は結構な量のソースを読んで七転八倒した。
ソース以外の設計図がないというのはつらい。


Subversionのエンコーディング取扱
--------------------------

Subversionとして文字コード取り扱いの基礎となるのは、

static svn_error_t *
convert_cstring(const char **dest,
                const char *src,
                xlate_handle_node_t *node,
                apr_pool_t *pool);

である。ここでxlate_handle_node_t *nodeが文字コード
の変換を規程するapr_xlate_t *handleを持っている。ち
なみにapr_xlateはAPRのI18N変換ライブラリである。

typedef struct xlate_handle_node_t {
  apr_xlate_t *handle;
  /* FALSE if the handle is not valid, since its pool is being
     destroyed. */
  svn_boolean_t valid;
  /* The name of a char encoding or APR_LOCALE_CHARSET. */
  const char *frompage, *topage;
  struct xlate_handle_node_t *next;
} xlate_handle_node_t;

handleは次のよう。

struct apr_xlate_t {
    apr_pool_t *pool;
    char *frompage;
    char *topage;
    char *sbcs_table;
    iconv_t ich;
};

ここでchar *frompageが変換元文字コードの指定、char
*topageが変換先文字コードの指定である。

さて、handleはget_ntoU_xlate_handle_nodeが作る。

static svn_error_t *
get_ntou_xlate_handle_node(xlate_handle_node_t **ret, apr_pool_t *pool)
{
  return get_xlate_handle_node(ret, SVN_APR_UTF8_CHARSET,
                               SVN_APR_LOCALE_CHARSET,
                               SVN_UTF_NTOU_XLATE_HANDLE, pool);
}

これはWrapperであり、本体は次のもの。

static svn_error_t *
get_xlate_handle_node(xlate_handle_node_t **ret,
                      const char *topage, const char *frompage,
                      const char *userdata_key,
                      apr_pool_t *pool);

この中で、まずhandleをopenして、

apr_xlate_open(&handle, topage, frompage, pool);

  apr_xlate_openによって、handleの中身がどのように
  作られるのかというと、

  new->ich = iconv_open(topage, frompage);

  ということで、とどのつまりtopageとfrompageの組み
  合わせで指定してiconvから取得しているのだ。


続いてxlate_handle_node t **retを初期化する。

  *ret = apr_palloc(pool, sizeof(xlate_handle_node_t));
  (*ret)->handle = handle;
  (*ret)->valid = TRUE;
  (*ret)->frompage = ((frompage != SVN_APR_LOCALE_CHARSET)
                      ? apr_pstrdup(pool, frompage) : frompage);
  (*ret)->topage = ((topage != SVN_APR_LOCALE_CHARSET)
                    ? apr_pstrdup(pool, topage) : topage);
  (*ret)->next = NULL;

とする。

さてここで作成されたxlate_handle_node_t型オブジェク
トのhandleメンバがapr_xlate_conv_bufferの引数
convsetとして使われる。


APU_DECLARE(apr_status_t) apr_xlate_conv_buffer(apr_xlate_t *convset,
                                                const char *inbuf,
                                                apr_size_t *inbytes_left,
                                                char *outbuf,
                                                apr_size_t *outbytes_left);

この関数の中身で文字コード変換をしている実体は、
iconvである。(ただしwi32にiconvがないので、それは、
apr_iconvという同梱されているものをつかう)

translated = iconv(convset->ich, (ICONV_INBUF_TYPE)&inbufptr,
                   inbytes_left, &outbufptr, outbytes_left);

さて、

convert_cstringの中身はわかった。iconvである。
それのtopageとfrompageがどう与えられるかを確認する。

代表的なのは、

svn_error_t *
svn_utf_cstring_to_utf8(const char **dest,
                        const char *src,
                        apr_pool_t *pool)
{
  xlate_handle_node_t *node;
  svn_error_t *err;

  SVN_ERR(get_ntou_xlate_handle_node(&node, pool));
  err = convert_cstring(dest, src, node, pool);
  put_xlate_handle_node(node, SVN_UTF_NTOU_XLATE_HANDLE, pool);
  SVN_ERR(err);
  SVN_ERR(check_cstring_utf8(*dest, pool));

  return SVN_NO_ERROR;
}

svn_error_t *
svn_utf_cstring_from_utf8(const char **dest,
                          const char *src,
                          apr_pool_t *pool)
{
  xlate_handle_node_t *node;
  svn_error_t *err;

  SVN_ERR(check_utf8(src, strlen(src), pool));

  SVN_ERR(get_uton_xlate_handle_node(&node, pool));
  err = convert_cstring(dest, src, node, pool);
  put_xlate_handle_node(node, SVN_UTF_UTON_XLATE_HANDLE, pool);

  return err;
}

の2つである。これらはほぼシンメトリックだ。

これらを呼び出しているのは例えば、svn_path_*だ。


svn_error_t *
svn_path_cstring_to_utf8(const char **path_utf8,
                         const char *path_apr,
                         apr_pool_t *pool)
{
  svn_boolean_t path_is_utf8;
  SVN_ERR(get_path_encoding(&path_is_utf8, pool));
  if (path_is_utf8)
    {
      *path_utf8 = apr_pstrdup(pool, path_apr);
      return SVN_NO_ERROR;
    }
  else
    return svn_utf_cstring_to_utf8(path_utf8, path_apr, pool);
}

svn_error_t *
svn_path_cstring_from_utf8(const char **path_apr,
                           const char *path_utf8,
                           apr_pool_t *pool)
{
  svn_boolean_t path_is_utf8;
  SVN_ERR(get_path_encoding(&path_is_utf8, pool));
  if (path_is_utf8)
    {
      *path_apr = apr_pstrdup(pool, path_utf8);
      return SVN_NO_ERROR;
    }
  else
    return svn_utf_cstring_from_utf8(path_apr, path_utf8, pool);
}

ここで重要なのは、APRの内部処理がUTF-8かどうかによっ
て、振舞いをかえているということだ。(ちなみに
svn_path_cstring_to_utf8がOSXでsvnがちゃんとうごく
ようにするためのcore foundationのパッチをあてると
ころ)

get_path_encoding(&path_is_utf8, pool)

これは、与えられたpath_*のエンコーディングではなく、
APRの内部エンコーディングがどうなっているかを問合
わせている。

これが、UTF-8の場合は、

svn_path_cstring_to_utf8 は
  *path_utf8 = apr_pstrdup(pool, path_apr); するだけ。

svn_path_cstring_from_utf8 は
  *path_apr = apr_pstrdup(pool, path_utf8); するだけ。

UTF-8じゃない場合は、

svn_path_cstring_to_utf8 は
  svn_utf_cstring_to_utf8する。

svn_path_cstring_from_utf8 は
  svn_utf_cstring_from_utf8する。

ということ。すなわち、

  * APRの内部エンコーディングがUTF-8であるというこ
    とは、その環境(OSなど)のエンコーディングが
    UTF-8であるということの証左である。

  * SVNの内部ではエンコーディングはUTF-8である。

  * ただし、APR判定で環境がUTF-8の場合は、外部から
    与えられたUTF-8バイト列を無変換で内部に取り込
    む。APR判定で環境がUTF-8でない場合は、
    svn_utf_cstring_*等によって変換処理をして内部
    に取り込む。

ということだ。違う言い方をすると、

  * UTF-8として内部に取り込まれる方式が二種類ある
    が、いずれもNFD/NFCについては気にしておらず、
    Subversionの中で、ファイル名やパス文字列を
    NFD/NFCのどちらで扱っているかは、Subversionの
    中では規程されておらず、Subversionを利用してい
    る環境に依存する。

ということだな。



それでは、svn clientがどのようにしてworking
directoryに新規ファイルを追加するのか。そのときに
path名をどう扱っているのかを追ってみよう。



まずsvn addコマンドの本体は次の関数である。

/* This implements the `svn_opt_subcommand_t' interface. */
svn_error_t *
svn_cl__add(apr_getopt_t *os,
            void *baton,
            apr_pool_t *pool);

この中でいろいろな処理をするが、今関心である文字列
についていえば、

  apr_array_header_t *targets;

なる構造が重要である。これを

  SVN_ERR(svn_cl__args_to_target_array_print_reserved(&targets, os,
                                                      opt_state->targets, 
                                                      pool));

によって作成した後、

  for (i = 0; i < targets->nelts; i++)
    {
      const char *target = APR_ARRAY_IDX(targets, i, const char *);

      svn_pool_clear(subpool);
      SVN_ERR(svn_cl__check_cancel(ctx->cancel_baton));
      SVN_ERR(svn_cl__try
              (svn_client_add4(target,
                               opt_state->depth,
                               opt_state->force, opt_state->no_ignore,
                               opt_state->parents, ctx, subpool),
               NULL, opt_state->quiet,
               SVN_ERR_ENTRY_EXISTS,
               SVN_ERR_WC_PATH_NOT_FOUND,
               SVN_NO_ERROR));
    }

にて、svn_client_add4を読んで、entriesへの情報の追
加処理を実施している。

さて、ここでtargetsがどのように作られるか確認しよ
う。

svn_error_t *
svn_cl__args_to_target_array_print_reserved(apr_array_header_t **targets,
                                            apr_getopt_t *os,
                                            apr_array_header_t *known_targets,
                                            apr_pool_t
                                            *pool);

は、

svn_opt_args_to_target_array3(targets, os,
                              known_targets, pool);

のwrapperである。svn_opt_aargs_to_target_array3を
みてみよう。さすがにエンコーディングに関するコメン
トがあるので、そのまま掲載する。


svn_error_t *
svn_opt_args_to_target_array3(apr_array_header_t **targets_p,
                              apr_getopt_t *os,
                              apr_array_header_t *known_targets,
                              apr_pool_t *pool)
{
  int i;
  svn_error_t *err = SVN_NO_ERROR;
  apr_array_header_t *input_targets =
    apr_array_make(pool, DEFAULT_ARRAY_SIZE, sizeof(const char *));
  apr_array_header_t *output_targets =
    apr_array_make(pool, DEFAULT_ARRAY_SIZE, sizeof(const char *));

  /* Step 1:  create a master array of targets that are in UTF-8
     encoding, and come from concatenating the targets left by apr_getopt,
     plus any extra targets (e.g., from the --targets switch.) */

  for (; os->ind < os->argc; os->ind++)
    {
      /* The apr_getopt targets are still in native encoding. */
      const char *raw_target = os->argv[os->ind];
      SVN_ERR(svn_utf_cstring_to_utf8
      /* *****************************************
         ここで一発utf8変換(しないかもだけど、をか
         ける。UTF-8であることは保証される。
        *****************************************/
              ((const char **) apr_array_push(input_targets),
               raw_target, pool));
    }

  if (known_targets)
    {
      for (i = 0; i < known_targets->nelts; i++)
        {
          /* The --targets array have already been converted to UTF-8,
             because we needed to split up the list with svn_cstring_split. */
          const char *utf8_target = APR_ARRAY_IDX(known_targets,
                                                  i, const char *);
          APR_ARRAY_PUSH(input_targets, const char *) = utf8_target;
          /* ********************************
             ここでknown_tagetsをばらして、
             input_tagetsに吸収している。
             ******************************** */
        }
    }

  /* Step 2:  process each target.  */

  for (i = 0; i < input_targets->nelts; i++)
    {
      const char *utf8_target = APR_ARRAY_IDX(input_targets, i, const char *);
      const char *peg_start = NULL; /* pointer to the peg revision, if any */
      const char *target;      /* after all processing is finished */
      int j;

      /* Remove a peg revision, if any, in the target so that it can
         be properly canonicalized, otherwise the canonicalization
         does not treat a ".@BASE" as a "." with a BASE peg revision,
         and it is not canonicalized to "@BASE".  If any peg revision
         exists, it is appended to the final canonicalized path or
         URL.  Do not use svn_opt_parse_path() because the resulting
         peg revision is a structure that would have to be converted
         back into a string.  Converting from a string date to the
         apr_time_t field in the svn_opt_revision_value_t and back to
         a string would not necessarily preserve the exact bytes of
         the input date, so its easier just to keep it in string
         form. */
      for (j = (strlen(utf8_target) - 1); j >= 0; --j)
        {
          /* If we hit a path separator, stop looking.  This is OK
              only because our revision specifiers can't contain
              '/'. */
          if (utf8_target[j] == '/')
            break;
          if (utf8_target[j] == '@')
            {
              peg_start = utf8_target + j;
              break;
            }
        }
      if (peg_start)
        utf8_target = apr_pstrmemdup(pool,
                                     utf8_target,
                                     peg_start - utf8_target);

      /* URLs and wc-paths get treated differently. */
      if (svn_path_is_url(utf8_target))
        /* *******************************
           ここは(scheme)://(optional_stuff)という形
           式をみているだけ。
           ******************************* */
        {
          /* No need to canonicalize a URL's case or path separators. */

          /* Convert to URI. */
          target = svn_path_uri_from_iri(utf8_target, pool);
          /* ***************************
             ここはいわゆるURI-encodeをするだけ。
             UTF-8の部分。
             *************************** */
          /* Auto-escape some ASCII characters. */
          target = svn_path_uri_autoescape(target, pool);
          /* ***************************
             ここもいわゆるURI-encodeをするだけ。
             ASCIIの部分。
             *************************** */

          /* The above doesn't guarantee a valid URI. */
          if (! svn_path_is_uri_safe(target))
            return svn_error_createf(SVN_ERR_BAD_URL, 0,
                                     _("URL '%s' is not properly URI-encoded"),
                                     utf8_target);

          /* Verify that no backpaths are present in the URL. */
          if (svn_path_is_backpath_present(target))
            return svn_error_createf(SVN_ERR_BAD_URL, 0,
                                     _("URL '%s' contains a '..' element"),
                                     utf8_target);

          /* strip any trailing '/' */
          target = svn_path_canonicalize(target, pool);
          /* ***************************
             target 一丁あがり。
             *************************** */
        }
      else  /* not a url, so treat as a path */
        {
          const char *apr_target;
          const char *base_name;
          char *truenamed_target; /* APR-encoded */
          apr_status_t apr_err;

          /* canonicalize case, and change all separators to '/'. */
          SVN_ERR(svn_path_cstring_from_utf8(&apr_target, utf8_target,
                                             pool));
          /* *************************************
             APRの内部表現に変換。
             内部表現がUTF-8ならコピーするだけ。
             ************************************* */
          apr_err = apr_filepath_merge(&truenamed_target, "", apr_target,
                                       APR_FILEPATH_TRUENAME, pool);

          if (!apr_err)
            /* We have a canonicalized APR-encoded target now. */
            apr_target = truenamed_target;
          else if (APR_STATUS_IS_ENOENT(apr_err))
            /* It's okay for the file to not exist, that just means we
               have to accept the case given to the client. We'll use
               the original APR-encoded target. */
            ;
          else
            return svn_error_createf(apr_err, NULL,
                                     _("Error resolving case of '%s'"),
                                     svn_path_local_style(utf8_target,
                                                          pool));

          /* convert back to UTF-8. */
          SVN_ERR(svn_path_cstring_to_utf8(&target, apr_target, pool));
          /* *************************************
             APRの内部表現からUTF-8に変換。
             内部表現がUTF-8ならコピーするだけ。
             ************************************* */
          target = svn_path_canonicalize(target, pool);
          /* ***************************
             target 一丁あがり。
             後続にskip処理があるけどね。
             *************************** */

          /* If the target has the same name as a Subversion
             working copy administrative dir, skip it. */
          base_name = svn_path_basename(target, pool);
          /* FIXME:
             The canonical list of administrative directory names is
             maintained in libsvn_wc/adm_files.c:svn_wc_set_adm_dir().
             That list can't be used here, because that use would
             create a circular dependency between libsvn_wc and
             libsvn_subr.  Make sure changes to the lists are always
             synchronized! */
          if (0 == strcmp(base_name, ".svn")
              || 0 == strcmp(base_name, "_svn"))
            {
              err = svn_error_createf(SVN_ERR_RESERVED_FILENAME_SPECIFIED,
                                      err, _("'%s' ends in a reserved name"),
                                      target);
              continue;
            }
        }

      /* Append the peg revision back to the canonicalized target if
         there was a peg revision. */
      if (peg_start)
        target = apr_pstrcat(pool, target, peg_start, NULL);

      APR_ARRAY_PUSH(output_targets, const char *) = target;
      /* ***************************
         targetをoutput_tagetsに登録。
         *************************** */
    }


  /* kff todo: need to remove redundancies from targets before
     passing it to the cmd_func. */

  *targets_p = output_targets;
   /* ***************************
      targetsできあがり。
      *************************** */

  return err;
}


これでtargetsがどうできるのか理解できた。結局、
UTF-8の環境ならば、pathがNFDならNFDであるし、NFCな
らNFCということだ。

さて、これを受け取ってadd4が処理を実施する。
add4の呼び出し部分は


              (svn_client_add4(target,
                               opt_state->depth,
                               opt_state->force, opt_state->no_ignore,
                               opt_state->parents, ctx, subpool),


であった。さきのtargetsの要素がtargetとして渡され
ている。

svn_client_add4 は、

svn_error_t *
svn_client_add4(const char *path,
                svn_depth_t depth,
                svn_boolean_t force,
                svn_boolean_t no_ignore,
                svn_boolean_t add_parents,
                svn_client_ctx_t *ctx,
                apr_pool_t *pool);

であり、これの主たる処理は、

  err = add(path, depth, force, no_ignore, adm_access, ctx, pool);

である。引数のpathがそのままaddの引数のpathになる。
addのIFは、

static svn_error_t *
add(const char *path,
    svn_depth_t depth,
    svn_boolean_t force,
    svn_boolean_t no_ignore,
    svn_wc_adm_access_t *adm_access,
    svn_client_ctx_t *ctx,
    apr_pool_t *pool);

であり、addの対象がファイルであるときは(今はファイ
ルの場合のみを追う)。

    err = add_file(path, ctx, adm_access, pool);

が処理本体となる。add_fileを見てみよう。


static svn_error_t *
add_file(const char *path,
         svn_client_ctx_t *ctx,
         svn_wc_adm_access_t *adm_access,
         apr_pool_t *pool)
{
  apr_hash_t* properties;
  apr_hash_index_t *hi;
  const char *mimetype;
  svn_node_kind_t kind;
  svn_boolean_t is_special;

  /* Check to see if this is a special file. */
  SVN_ERR(svn_io_check_special_path(path, &kind, &is_special, pool));

  if (is_special)
    mimetype = NULL;
  else
    /* Get automatic properties */
    /* This may fail on write-only files:
       we open them to estimate file type.
       That's why we postpone the add until after this step. */
    SVN_ERR(svn_client__get_auto_props(&properties, &mimetype, path, ctx,
                                       pool));

  /* Add the file */
  SVN_ERR(svn_wc_add2(path, adm_access, NULL, SVN_INVALID_REVNUM,
                      ctx->cancel_func, ctx->cancel_baton,
                      NULL, NULL, pool));
  /* **************************************
    ここでsvn_wc_add2を読んでいる。これが本体
    *************************************** */

/* ... 後略 ... */
}

というわけでpathをそのまま引き継ぎつつ、今度は
svn_wc_add2を呼んでいる。
svn_wc_add2をみてみよう。


svn_error_t *
svn_wc_add2(const char *path,
            svn_wc_adm_access_t *parent_access,
            const char *copyfrom_url,
            svn_revnum_t copyfrom_rev,
            svn_cancel_func_t cancel_func,
            void *cancel_baton,
            svn_wc_notify_func2_t notify_func,
            void *notify_baton,
            apr_pool_t *pool)
{
  const char *parent_dir, *base_name;
  const svn_wc_entry_t *orig_entry, *parent_entry;
  svn_wc_entry_t tmp_entry;
  svn_boolean_t is_replace = FALSE;
  svn_node_kind_t kind;
  apr_uint64_t modify_flags = 0;
  svn_wc_adm_access_t *adm_access;

  SVN_ERR(svn_path_check_valid(path, pool));
  /* *******************************
     ここはpathに制御文字が入ってないか確認してる
     だけ。
     ******************************* */

  /* Make sure something's there. */
  SVN_ERR(svn_io_check_path(path, &kind, pool));
  /* *******************************
     svn_io_check_pathの本体はio_check_path。
     ここはpathにsvn_path_cstring_from_utf8
     を一回かけた上で、
     apr_stat(&finfo, path_apr, flags, pool);
     をやる。
     apr_statはflagsの値によって、
     lstatまたはstatでファイルにあたる。

     man 2 stat によると、引数として渡されるfname
     のエンコーディングにたいする記述は存在しない。

     なので、UTF-8としてもここでNFCで問い合わせるべ
     きなのか、それともNFDで問い合わせるべきなのか
     はOS次第である。OSXの場合は、このpathはNFDな
     ので、statに与えるのもNFDである。statがそれで
     正常に動作するかはわからない。後で実験してみよ
     う。
     ******************************* */

/* ... 中略 ... */

  if (adm_access)
    SVN_ERR(svn_wc_entry(&orig_entry, path, adm_access, TRUE, pool));
    /* *******************************
       orig_entryにsvn_wc_entry_t オブジェクトを設
       定する。その際、ファイルの場合、
       SVN_ERR(svn_wc__adm_retrieve_internal(&dir_access, adm_access, path, pool));
       を呼ぶ。

       ここで、svn_wc_adm_retrieve_internalは
       adm_accessをさぐりつつ、dir_accessを構成す
       る。

       if (associated->set)
         *adm_access = apr_hash_get(associated->set, path, APR_HASH_KEY_STRING);

       があるので、associatedが指し示す
       svn_wc_adm_access_t型構造がset(hash)を持つ
       ならば、、、ここよくわからない。というか、
       svn_wc_adm_access_t型の使われ方をもっと理解
       しないと理解が無理。そしてそれを理解するこ
       とは、Subversion全てを理解することのような
       気がする。その時間はかけられない。どうする
       か。
       ******************************* */
  else
    orig_entry = NULL;

/* ... 中略 ... */

  /* Split off the base_name from the parent directory. */
  svn_path_split(path, &parent_dir, &base_name, pool);
  /* *************************************
     ここで、pathからbase_nameをとりだす。
     ファイルの場合これがファイル名となる。
     ************************************* */

/* ... 中略 ... */

  /* Now, add the entry for this item to the parent_dir's
     entries file, marking it for addition. */
  SVN_ERR(svn_wc__entry_modify(parent_access, base_name, &tmp_entry,
                               modify_flags, TRUE, pool));
 /* *******************************
    ここでbase_nameにて、entriesファイルの書き換え
    を実行する。
    ******************************* */

/* ... 後略 ... */
}

さて、base_nameに辿りつくまでの間いろいろあるのだ
が、エンコーディングの変換は実施されていなさそうだ。
ひとつ気になるのは、すでにentriesに登録済のファイ
ルとのバッティングを調べるところがあるのだが、そこ
で何と何を比べているのかということをわかっていない
ということだ。もしかしたらそこで比較する際にエンコー
ディングの問題が発生しうるかもしれない。

ただ、OSXにおいても、たとえば"が.txt"をsvn addする
こと自体はできたはずなので、それは発生しないという
ことにしておく。OSXで問題が発生するのはsvn status
からだ。

さて、

  SVN_ERR(svn_wc__entry_modify(parent_access, base_name, &tmp_entry,
                               modify_flags, TRUE, pool));

を見なければいけない。
svn_wc__entry_modifyのIFは、

svn_error_t *
svn_wc__entry_modify(svn_wc_adm_access_t *adm_access,
                     const char *name,
                     svn_wc_entry_t *entry,
                     apr_uint64_t modify_flags,
                     svn_boolean_t do_sync,
                     apr_pool_t *pool);

であり、まず、

  apr_hash_t *entries, *entries_nohidden;
  svn_boolean_t entry_was_deleted_p = FALSE;
  /* Load ADM_ACCESS's whole entries file. */
  SVN_ERR(svn_wc_entries_read(&entries, adm_access, TRUE, pool));
  SVN_ERR(svn_wc_entries_read(&entries_nohidden, adm_access, FALSE, pool));

というようにentriesファイルを読み込む。

nameについては、

  if (name == NULL)
    name = SVN_WC_ENTRY_THIS_DIR;

こんな処理をした上で、

  /* If the entry wasn't just removed from the entries hash, fold the
     changes into the entry. */
  if (! entry_was_deleted_p)
    {
      fold_entry(entries, name, modify_flags, entry,
                 svn_wc_adm_access_pool(adm_access));
      if (entries != entries_nohidden)
        fold_entry(entries_nohidden, name, modify_flags, entry,
                   svn_wc_adm_access_pool(adm_access));
    }

として、entriesにnameという名前でentrをfold_entry
する。ちなみにfold_entryが何かというと、

/* Update an entry NAME in ENTRIES, according to the combination of
   entry data found in ENTRY and masked by MODIFY_FLAGS. If the entry
   already exists, the requested changes will be folded (merged) into
   the entry's existing state.  If the entry doesn't exist, the entry
   will be created with exactly those properties described by the set
   of changes. Also cleanups meaningless fields combinations.

   POOL may be used to allocate memory referenced by ENTRIES.
 */
static void
fold_entry(apr_hash_t *entries,
           const char *name,
           apr_uint64_t modify_flags,
           svn_wc_entry_t *entry,
           apr_pool_t *pool);

ということ、この関数にてnameのエンコーディングがい
じられることはない。

そして最後に、

    SVN_ERR(svn_wc__entries_write(entries, adm_access, pool));

にて、svn_wc__entries_writeにてentryを組み込んだ
entriesをファイルEntriesに書き出す。



ということで、svn_wc__entries_write を調べる必要が
ある。

svn_wc__entries_writeをみてみよう。


svn_error_t *
svn_wc__entries_write(apr_hash_t *entries,
                      svn_wc_adm_access_t *adm_access,
                      apr_pool_t *pool)
{
  svn_error_t *err = SVN_NO_ERROR;
  svn_stringbuf_t *bigstr = NULL;
  apr_file_t *outfile = NULL;
  apr_hash_index_t *hi;
  svn_wc_entry_t *this_dir;

  SVN_ERR(svn_wc__adm_write_check(adm_access));

  /* Get a copy of the "this dir" entry for comparison purposes. */
  this_dir = apr_hash_get(entries, SVN_WC_ENTRY_THIS_DIR,
                          APR_HASH_KEY_STRING);

  /* If there is no "this dir" entry, something is wrong. */
  if (! this_dir)
    return svn_error_createf(SVN_ERR_ENTRY_NOT_FOUND, NULL,
                             _("No default entry in directory '%s'"),
                             svn_path_local_style
                             (svn_wc_adm_access_path(adm_access), pool));

  /* Open entries file for writing.  It's important we don't use APR_EXCL
   * here.  Consider what happens if a log file is interrupted, it may
   * leave a .svn/tmp/entries file behind.  Then when cleanup reruns the
   * log file, and it attempts to modify the entries file, APR_EXCL would
   * cause an error that prevents cleanup running.  We don't use log file
   * tags such as SVN_WC__LOG_MV to move entries files so any existing file
   * is not "valuable".
   */
  SVN_ERR(svn_wc__open_adm_file(&outfile,
                                svn_wc_adm_access_path(adm_access),
                                SVN_WC__ADM_ENTRIES,
                                (APR_WRITE | APR_CREATE),
                                pool));

  if (svn_wc__adm_wc_format(adm_access) > SVN_WC__XML_ENTRIES_VERSION)
    {
      apr_pool_t *subpool = svn_pool_create(pool);
      bigstr = svn_stringbuf_createf(pool, "%d\n",
                                     svn_wc__adm_wc_format(adm_access));  //#### ここでentriesファイルの中身たる文字列バッファを作成。
      /* Write out "this dir" */
      write_entry(bigstr, this_dir, SVN_WC_ENTRY_THIS_DIR, this_dir, pool); //#### 起点dir情報を書く。

      for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi)) //#### 引数で与えられたentries(hash)をひとつづつ処理する。
        {
          const void *key;
          void *val;
          svn_wc_entry_t *this_entry;

          svn_pool_clear(subpool);

          /* Get the entry and make sure its attributes are up-to-date. */
          apr_hash_this(hi, &key, NULL, &val); //#### apr_hash_thisでkeyに値を仕込む。apr_hash_thisは要調査。
          this_entry = val;

          /* Don't rewrite the "this dir" entry! */
          if (! strcmp(key, SVN_WC_ENTRY_THIS_DIR ))
            continue;

          /* Append the entry to BIGSTR */
          write_entry(bigstr, this_entry, key, this_dir, subpool); //#### ここでthis_entryを文字列に書き出している。keyが引数になっている。
        }

      svn_pool_destroy(subpool);
    }
  else
    /* This is needed during cleanup of a not yet upgraded WC. */
    write_entries_xml(&bigstr, entries, this_dir, pool);

  SVN_ERR_W(svn_io_file_write_full(outfile, bigstr->data,
                                   bigstr->len, NULL, pool),  //#### ここでbigstrをoutfileに書き出し。
            apr_psprintf(pool,
                         _("Error writing to '%s'"),
                         svn_path_local_style
                         (svn_wc_adm_access_path(adm_access), pool)));

  err = svn_wc__close_adm_file(outfile,
                               svn_wc_adm_access_path(adm_access),
                               SVN_WC__ADM_ENTRIES, 1, pool);

  svn_wc__adm_access_set_entries(adm_access, TRUE, entries);
  svn_wc__adm_access_set_entries(adm_access, FALSE, NULL);

  return err;
}


この関数の要点は、apr_hash_thisでhashから順次entry
を取り出して、write_entryでそれをバッファに書いて、
svn_io_file_write_fullでそれをファイルに書くという
こと。

apr_hash_thisは、"Get the current entry's details
from the iteration state." とのこと。

とすると、

apr_hash_this(hi, &key, NULL, &val);

は、keyについてそのままわたすだけ。


続いてwrite_entryを調べてみよう。


/* Append a single entry ENTRY to the string OUTPUT, using the
   entry for "this dir" THIS_DIR for comparison/optimization.
   Allocations are done in POOL.  */
static void
write_entry(svn_stringbuf_t *buf,
            svn_wc_entry_t *entry,
            const char *name, //#### ここでkeyが渡される。
            svn_wc_entry_t *this_dir,
            apr_pool_t *pool)
{
  const char *valuestr;
  svn_revnum_t valuerev;
  svn_boolean_t is_this_dir = strcmp(name, SVN_WC_ENTRY_THIS_DIR) == 0;
  svn_boolean_t is_subdir = ! is_this_dir && (entry->kind == svn_node_dir);

  assert(name);

  /* Name. */
  write_str(buf, name, pool); //#### 渡されたkeyをそのままwrite_strに渡す。

  /* Kind. */
  switch (entry->kind)
    {
    case svn_node_dir:
      write_val(buf, SVN_WC__ENTRIES_ATTR_DIR_STR,
                 sizeof(SVN_WC__ENTRIES_ATTR_DIR_STR) - 1);
      break;

    case svn_node_none:
      write_val(buf, NULL, 0);
      break;

    case svn_node_file:
    case svn_node_unknown:
    default:
      write_val(buf, SVN_WC__ENTRIES_ATTR_FILE_STR,
                 sizeof(SVN_WC__ENTRIES_ATTR_FILE_STR) - 1);
      break;
    }

  /* Revision. */
  if (is_this_dir || (! is_subdir && entry->revision != this_dir->revision))
    valuerev = entry->revision;
  else
    valuerev = SVN_INVALID_REVNUM;
  write_revnum(buf, valuerev, pool);

  /* URL. */
  if (is_this_dir ||
      (! is_subdir && strcmp(svn_path_url_add_component(this_dir->url, name,
                                                        pool),
                             entry->url) != 0))
    valuestr = entry->url;
  else
    valuestr = NULL;
  write_str(buf, valuestr, pool); //#### URLについてもentry->urlをそのままwrite_str。

  /* Repository root. */
  if (! is_subdir
      && (is_this_dir
          || (this_dir->repos == NULL
              || (entry->repos
                  && strcmp(this_dir->repos, entry->repos) != 0))))
    valuestr = entry->repos;
  else
    valuestr = NULL;
  write_str(buf, valuestr, pool);

  //... 中略 ...

  /* Remove redundant separators at the end of the entry. */
  while (buf->len > 1 && buf->data[buf->len - 2] == '\n')
    buf->len--;

  svn_stringbuf_appendbytes(buf, "\f\n", 2);
}


nameとentry->urlを処理しているのはwrite_strであった。
なのでwrite_strを調べてみよう。


/* If STR is non-null, append STR to BUF, terminating it with a
   newline, escaping bytes that needs escaping, using POOL for
   temporary allocations.  Else if STR is null, just append the
   terminating newline. */
static void
write_str(svn_stringbuf_t *buf, const char *str, apr_pool_t *pool)
{
  const char *start = str;
  if (str)
    {
      while (*str)
        {
          /* Escape control characters and | and \. */
          if (svn_ctype_iscntrl(*str) || *str == '\\')
            {
              svn_stringbuf_appendbytes(buf, start, str - start);
              svn_stringbuf_appendcstr(buf,
                                       apr_psprintf(pool, "\\x%02x", *str));
              start = str + 1;
            }
          ++str;
        }
      svn_stringbuf_appendbytes(buf, start, str - start); //#### エスケープされた制御文字以外は、ここでappendbytesするだけ。
    }
  svn_stringbuf_appendbytes(buf, "\n", 1);
}


write_strは、svn_stringbuf_appendbytesでバイトを足すだけだ。

この枝はこれで葉。svn_wc__entries_writeにおける処理の次の枝は、
svn_io_file_write_full だ。

svn_id_file_write_full を見てみよう。


svn_error_t *
svn_io_file_write_full(apr_file_t *file, const void *buf,
                       apr_size_t nbytes, apr_size_t *bytes_written,
                       apr_pool_t *pool)
{
  apr_status_t rv = apr_file_write_full(file, buf, nbytes, bytes_written);

#ifdef WIN32
#define MAXBUFSIZE 30*1024
  if (rv == APR_FROM_OS_ERROR(ERROR_NOT_ENOUGH_MEMORY)
      && nbytes > MAXBUFSIZE)
    {
      apr_size_t bw = 0;
      *bytes_written = 0;

      do {
           rv = apr_file_write_full(file, buf, 
                                 nbytes > MAXBUFSIZE ? MAXBUFSIZE : nbytes, &bw); //#### 実質ここでファイルに書き出している。
        *bytes_written += bw;
        buf = (char *)buf + bw;
        nbytes -= bw;
      } while (rv == APR_SUCCESS && nbytes > 0);
    }
#undef MAXBUFSIZE
#endif

  return do_io_file_wrapper_cleanup
    (file, rv,
     N_("Can't write to file '%s'"),
     N_("Can't write to stream"),
     pool);
}


これは、apr_file_write_fullのラッパーであった。
apr_file_write_fullは、

------
apr_status_t apr_file_write_full( apr_file_t * thefile,
                                  const void * buf,
                                  apr_size_t nbytes,
                                  apr_size_t * bytes_written
     ) 
     
     Write data to the specified file, ensuring that all of the data is written before returning.

     Parameters:
     thefile The file descriptor to write to.
     buf The buffer which contains the data.
     nbytes The number of bytes to write.
     bytes_written If non-NULL, this will contain the number of bytes written.
------

なのでたぶんbufのバイト列を書き出すだけということ
だろう。


さて、svn_wc__entries_writeとは何だったのか。

* svn_wc__entries_writeは、既に作成済みのentries(ハッ
* シュ)をentriesファイルに書き出す処理を実施するが、
* その処理過程にて、entries(ハッシュ)に格納されてい
* るnameなどは無加工である。


またずいぶん長かったが、svn addにてファイルを
working directoryに足すとき、.svn/entitiesファイル
に書かれるファイル名は、コマンドラインでの

$ svn add が.txt

の"が.txt"がNFDならNFD、NFCならNFCということがわかっ
た。

実際にOSXでsvn add が.txtしたときのentriesファイル
をみてみると、

$ od -t x1 -t c entries
... 前略 ...
0000360    36  34  31  61  63  61  64  66  33  0a  0c  0a  e3  81  8c  2e
           6   4   1   a   c   a   d   f   3  \n  \f  \n  が  **  **   .
0000400    74  78  74  0a  66  69  6c  65  0a  0a  0a  0a  61  64  64  0a
           t   x   t  \n   f   i   l   e  \n  \n  \n  \n   a   d   d  \n
0000420    0c  0a                                                        
          \f  \n                                                        
0000422
$ 

このようにNFCになっている。ということはコマンドラ
インの引数の取り扱いがOSXではNFCであるということだ
ろうか？

GDBで確認してみよう。

int main(int argc, char *argv [])
{
     return (0);
}

をコンパイルして"が.txt"を引数にして実行して、GDBでみ
てみると、

(gdb) x/3cx argv[1]
0xbffff3c5: 0xe3 0x81 0x8c

たしかにNFCだ。しかしこれはGDBの所作かもしれないが、
まあよしとしちゃう。(疲弊しているのだ。。。)

ここがOSXでのNFD/NFC問題の一方の原因なのかもしれな
い。


readdirでディレクトリの中身を読取ると、ファイル名は
NFDで返ってくる。同じファイル名でsvn addするとその
引数はNFCで渡される。

だいぶ見えてはきている。

Subversionの内部エンコーディングはUTF-8。ただし、それがNFDかNFCかは常に関知していなくて、それぞれそのまま扱う。
Subversionの中でエンコーディングの変換をする場合は、APRの内部エンコーディング=環境のエンコーディングという仮定のもと、外部から受け取ったpathについてはUTF-8に変換する。
変換エンジンはiconvである。
OSXは、コマンドラインからアプリへの引数の受け渡しはNFCのようだ。さきに確認したようにreaddirはNFDでファイル名を返す。このあたりでアンマッチを発生させている予感。
あとstat(2)はNFDなのかNFCなのかも気になる。
OSXでsubversionがまともに動くようにするためのpatchは、svn_path_cstring_to_utf8について、core foundationにあるNFCへの変換関数を必ず通るようにする、というものだ。
ここで必ずNFCにするようにしておけば、Subversionの内部においては、常にNFCであることが担保されるのであろう。
そうすると、ファイルシステム上のファイルは、本体であろうがtext-base/配下であろうがNFDであり、それをsubversionの目で見ると、entriesの内容含めてすべてNFCに統一されてみえるということだろう。
CF patch無しのsvnにおける別の状況として、例えば、LinuxマシンでNFCの"が.txt"をcheck inして、それをcheck outした場合はどうなるのだろう。
おそらく、OSXのファイルを作るAPIにおいて、外からNFCとして投入されたファイルの名前はNFDに自動変換されるのだろう。
そして、entriesの中身はNFCなのだろう。
ここでアンマッチが発生して使えないworking directoryが一丁あがりとなるのだろう。

さて、ここからどこまで調べるか。。。。もういいとするか。。。
しかし、精魂尽きた。。。土曜日のShibuya.lispに行けるのだろうか。。。

体調不良

だめだ、Subversionのソース読みと業務に根をつめすぎて、体調がすこぶる悪い。。。
Subversionのソースは追えども追えども新しい概念を理解せねばならず、まだ出口が見えない。。。

仮眠して調子が戻らなければ、病院に行くべし。

2009年2月23日月曜日

C言語におけるUTF-8の取り扱い確認


/*

ちょっと調査。

まずemacsでひらがなの「が」をファイルに書くとLinux
とOSXでは違いがあるのかどうか。そのファイルをodで比
べてみる。

まずOSX。

----
$ od -t x1 -t c HIRAGANA-LETTER-GA.txt 
0000000    61  62  63  64  0a  e3  81  8c  0a                            
           a   b   c   d  \n  が  **  **  \n                            
0000011
----

続いてLinux。

----
$ od -t x1 -t c HIRAGANA-LETTER-GA.txt 
0000000 61 62 63 64 0a e3 81 8c 0a
          a   b   c   d  \n 343 201 214  \n
0000011
----

違いは無い。

さて、e3 81 8c をUTF-8にて解釈してみる。

ビットにすると、

----
CL-USER(13): (dolist  (n (list #xe3 #x81 #x8c))
              (format t "~b " n))
11100011 10000001 10001100 
NIL
----

である。このビットパターンは、UTF-8としては

1110yyyy 10yxxxxx 10xxxxxx

であり、最大16bitのコードポイントを表現している。

具体的にコードポイント対応部分を抽出すると、

0011 000001 001100

さらにこれを右詰めoctetになおすと、

00110000 01001100

----
CL-USER(14): (dolist (n (list #b00110000 #b01001100))
               (format t "~x " n))
30 4c 
NIL
----

というわけで、これはNFCの「が」(U+304C)だ。

とどのつまりファイルの中身にはOSは関知していないか
ら、emacsやodが文字をUTF-8のNFCで取り扱っており、そ
こに一貫性がある、ということだろう。

続いてgccのプリプロセッサまでの処理における文字の取
り扱いを確認したい。

前章で実験したところ、gccはC99に完全準拠していなかっ
た。ソースファイルの識別子に多バイト文字が使われて
いた場合は、それをプリプロセス前に国際文字名
(\uxxxx)に変換しなければいけないのだが、それをやっ
てくれていないようだ。それを確認する。

*/
----
#include <stdio.h>
int main(void)
{
     int が;
     for (が=1;が<10;が++) 
          printf("%d ", が);

     return (0);
}
----
/*

こんなソースでためす。これのod。

----
$ od -t x1 -t c i18n-character-name-2.c
0000000    23  69  6e  63  6c  75  64  65  20  3c  73  74  64  69  6f  2e
           #   i   n   c   l   u   d   e       <   s   t   d   i   o   .
0000020    68  3e  0a  69  6e  74  20  6d  61  69  6e  28  76  6f  69  64
           h   >  \n   i   n   t       m   a   i   n   (   v   o   i   d
0000040    29  0a  7b  0a  20  20  20  20  20  69  6e  74  20  e3  81  8c
           )  \n   {  \n                       i   n   t      が  **  **
0000060    3b  0a  20  20  20  20  20  66  6f  72  20  28  e3  81  8c  3d
           ;  \n                       f   o   r       (  が  **  **   =
0000100    31  3b  e3  81  8c  3c  31  30  3b  e3  81  8c  2b  2b  29  20
           1   ;  が  **  **   <   1   0   ;  が  **  **   +   +   )    
0000120    0a  20  20  20  20  20  20  20  20  20  20  70  72  69  6e  74
          \n                                           p   r   i   n   t
0000140    66  28  22  25  64  20  22  2c  20  e3  81  8c  29  3b  0a  0a
           f   (   "   %   d       "   ,      が  **  **   )   ;  \n  \n
0000160    20  20  20  20  20  72  65  74  75  72  6e  20  28  30  29  3b
                               r   e   t   u   r   n       (   0   )   ;
0000200    0a  7d  0a                                                    
          \n   }  \n                                                    
0000203
$ 
----

これをgcc -Eで処理。

includeしているから長くなるが、とにかく変換はしていないようだ。

*/
----
# 1 "i18n-character-name-2.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "i18n-character-name-2.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 64 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/_types.h" 1 3 4
# 27 "/usr/include/_types.h" 3 4
# 1 "/usr/include/sys/_types.h" 1 3 4
# 32 "/usr/include/sys/_types.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 33 "/usr/include/sys/_types.h" 2 3 4
# 1 "/usr/include/machine/_types.h" 1 3 4
# 34 "/usr/include/machine/_types.h" 3 4
# 1 "/usr/include/i386/_types.h" 1 3 4
# 37 "/usr/include/i386/_types.h" 3 4
typedef signed char __int8_t;


... 中略 ...


static __inline int __sputc(int _c, FILE *_p) {
 if (--_p->_w >= 0 || (_p->_w >= _p->_lbfsize && (char)_c != '\n'))
  return (*_p->_p++ = _c);
 else
  return (__swbuf(_c, _p));
}
# 2 "i18n-character-name-2.c" 2
int main(void)
{
     int が;
     for (が=1;が<10;が++)
          printf("%d ", が);

     return (0);
}
---
/*

では、文字列定数の中だとちゃんとやってくれるのか？
というところで同じソースを変更して確認する

*/
----
#include <stdio.h>
int main(void)
{
     int i;
     for (i=1;i<10;i++) 
          printf("%d が", i);

     return (0);
}
----
*/
/*

あり、だめだ。次のように、変換していない。

*/
----
前略 ...

static __inline int __sputc(int _c, FILE *_p) {
 if (--_p->_w >= 0 || (_p->_w >= _p->_lbfsize && (char)_c != '\n'))
  return (*_p->_p++ = _c);
 else
  return (__swbuf(_c, _p));
}
# 2 "i18n-character-name-3.c" 2
int main(void)
{
     int i;
     for (i=1;i<10;i++)
          printf("%d が", i);

     return (0);
}
----
/*

というわけで、gccの字句関係処理はC99の要求まんまに
実装されているのではなさそうだ。ソース文字集合が
UTF-8そのものである、という作りに見受けられる。

さて、次にファイル名の取り扱いを確認したい。ファイ
ルの中身とちがって、ここはOSの関与がある部分だ。
"が.txt"というファイルを作成する。これの中身を表示
するプログラムで振舞いを確認してみる。APIはCの標準
ライブラリを使う。これは、予測としては不一致なくう
まくいくはず。gccの標準ライブラリがfopenの引数たる
ファイル名をOSXならOSX向けに取り扱ってくれるという
予測だ。

----が.txt---
abcd
が
-------------

*/
----ga-opne.c----
#include <stdio.h>

int main(void)
{
     FILE *fp;
     fp = fopen("が.txt", "r");

     if (fp == NULL)
          printf("failed.\n");
     else {
          printf("Succeed.\n");
          fclose(fp);
     }
     return (0);
}
--------
/*

----
$ gcc -std=c99 ga-open.c
$ ./a.out
Succeed.
$ 
----

うまくいった。

さて、ファイル名の取得はどうだろう。ディレクトリに
含まれるファイルの名前を取得する方法は標準Cにはない。
よって、ここから先は処理系次第となる。ただし処理系/環境
がPOSIXだとかSUSとかに対応しているなら、それらAPIを
使うということである程度のポータビリティは期待でき
る。

さて、この領域は、Advanced Programming in the UNIX
Environment (apue.2e)だろう、ということでapue.2eか
らサンプルをとってくる。

----ls1.c----
*/
#include "apue.h"
#include <dirent.h>

int
main(int argc, char *argv[])
{
 DIR    *dp;
 struct dirent *dirp;

 if (argc != 2)
  err_quit("usage: ls directory_name");

 if ((dp = opendir(argv[1])) == NULL)
  err_sys("can't open %s", argv[1]);
 while ((dirp = readdir(dp)) != NULL)
  printf("%s\n", dirp->d_name);

 closedir(dp);
 exit(0);
}
/*
--------

apue.2eのサイトにあがっているソースをLeopard上でコ
ンパイルするには多少調整が必要。apue.2eが発刊された
ときは、10.3.xだったのだ。それから10.4、10.5とかわ
るなかでOSXはPOSIX対応を進めたため、ヘッダ関係で混
乱があるようだ。

いくつか調整作業をしてmakeと次のコマンドでls1.cをコ
ンパイルした。

gcc -ansi -I/Users/aka/scratch/c-ref/apue.2e/include -Wall -g -DMACOS  -L../lib ls1.c ../lib/libapue.a -o ls1

で、そのls1で、が.txtを含むga-testディレクトリの中
身をOSから取得する。



(gdb) n
..
(gdb) p dirp->d_name
$10 = "..\000\000?\r\000\024\000\b\nが.txt\000*", '\0' <repeats 231 times>
(gdb) n
(gdb) p dirp->d_name
$11 = "が.txt\000*", '\0' <repeats 243 times>
(gdb) p/x dirp->d_name
$12 = {0xe3, 0x81, 0x8b, 0xe3, 0x82, 0x99, 0x2e, 0x74, 0x78, 0x74, 0x0, 0x2a, 0x0 <repeats 244 times>}
(gdb) 

で、ここでメモリに入ってる

0xe3, 0x81, 0x8b, 0xe3, 0x82, 0x99

が、"が"なのだが、これを先程と同じ方法でUTF-8として
読み解くと、

U+304b U+3099 
[HIRAGANA LETTER KA] [COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK]

であることがわかる。ここでNFDがあらわれた！

そのあとは、printfで%sしてるだけなので、この
UTF-8(NFD)が正しく表示されるかは、その表示を担当す
るものによりけりになる。上記のls1->gdb->emacs->screen->Terminal
では実は正常に表示されず、が.txtの"が"と"."の間に妙な文字が表示
されてしまう。(コピペした際に上のgdb出力のこの異常は消えちゃった)
Terminal直ならば問題なくて、

------------
$ ./ls1 ga-test
.
..
が.txt
$ 
------------

となる。

さて、ここまで来たところで、apue.2eのreaddirの出自
を確認する必要がある。

apue.2eによると、POSIX.1とのこと。
とするとmanにあるのか？ man readdir、あった。

-----------
DIRECTORY(3)             BSD Library Functions Manual             DIRECTORY(3)

NAME
     closedir, dirfd, opendir, readdir, readdir_r, rewinddir, seekdir, telldir --
     directory operations

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <dirent.h>

...
-----------

するとdirentもmanにあるのか？ あった。

-----------
DIR(5)                      BSD File Formats Manual                     DIR(5)

NAME
     dir, dirent -- directory file format

SYNOPSIS
     #include <sys/types.h>
     #include <sys/dir.h>

DESCRIPTION
     Directories provide a convenient hierarchical method of grouping files while
     obscuring the underlying details of the storage medium.  A directory file is
     differentiated from a plain file by a flag in its inode(5) entry.  It consists
     of records (directory entries) each of which contains information about a file
     ...
-----------

で、

-----------
*/
          struct dirent {
               ino_t     d_ino;                /* file number of entry */
               u_int64_t d_seekoff;            /* length of this record */
               u_int16_t d_reclen;             /* length of this record */
               u_int16_t d_namlen;             /* length of string in d_name */
               u_int8_t  d_type;               /* file type, see below */
               char      d_name[MAXPATHLEN];   /* name must be no longer than this */
          };
/*
-----------

なそうであり、d_nameの中身がどうあるべきかは無い。
まあ、ファイルシステムは環境によって違うからなぁ。

というわけで、このPOSIX.1のreaddir経由でOXとアプリ
がファイル名をやりとりする場合は、エンコードについ
てはアプリ側がOSが何を選択しているかをどこかで知っ
た上で使わないといけないということだ。
 */

これで、Subversionにおけるファイル名の取り扱いを探る準備ができたのかなぁ。

こつこつ。

2009年2月22日日曜日

【C:ARM5】2 字句要素

うーん。字句について理解するだけで結構時間がかかった。。。
めげない、めげない。


/*

 2 字句要素 (Lexical Elements)

 - "This chapter describes the lexical structure
   of the C language--that is, the characters that
   may appear in a C source file and how they are
   collected into lexical units, or tokens."


 2.1 文字集合 (Character Set)

 - ソース文字集合(source character set)というとき、
   それはいわゆる文字集合そのままであり、エスケープ
   表記等は含んでいないことに注意。

 - ここでは文字コードは指定していない。文字集合の指
   定である。

 - 基本的にはISO/IEC 10646のBasic Latinブロックだよ。

 - 国によってはその国の文字集合にBasic Latinの中の
   文字が含まれていないこともある。そこでCには
   ISO/IEC 646-1083のInvariant Code Setだけで書ける
   ように、漏れた記号をInvariant code setの文字で表
   現する仕組みがあるよ。

 2.1.1 実行時文字集合 (Execution Character Set)

 - クロスコンパイルを考えよ。ソースをコンパイルする
   ときの環境の文字集合とプログラムが実行される環境
   の文字集合は必ずしも同じではない。

 - 実行時文字集合(execution character set)はバック
   スラッシュによるエスケープ表記も含む。

 - ソースをコンパイルするときの環境と実行するときの
   環境が同じなら、ソース文字集合と実行時文字集合は
   同じになる。

 2.1.2 空白類文字と行の終わり (Whitespace and Line Termination)

 - 空白類文字は隣り合う字句を分離する働きを持つ。

 - 論理ソース行の例。gccでコンパイルできた。
*/

int main(void)
{
     if (1==2) 1; el\
se 2;
     return (0);
}

/*

 2.1.3 文字コード (Character Encoding)

 - (実行時)文字集合に含まれる各文字には何らかの慣用
   的な数値表現が割り当てられている。これを文字コー
   ドと呼ぶ。

 - Cでは文字コードについていくつかの制約は課すが、
   細かな指定はしない。

 2.1.4 3文字表記 (Trigraphs)

 - ISO 646-1083 Invariant Code Set に含まれる文字だ
   けでCプログラムを書くための仕組み。

 - 3文字表記(trigraphs)の処理は、字句解析の前に実施
   される。かなり先頭の作業。

 - 3文字表記の例。gccでは-trigraphsオプションが必要。
*/

int main(void)
??<
     return (0);
??>

/*

 2.1.5 多バイト文字とワイド文字 (Multibyte and Wide Characters)

  - 非英語アルファベットに対応するためにワイド文字
    とワイド文字列がある。

  - ワイド文字とは、拡張文字集合(extended
    character set)の要素のバイナリ表現である。

  - 規格Cではワイド文字のためのwint_t型とwchar_t型
    がある。

  - ワイド文字は16ビットを占めるのが普通である。

  - ナルワイド文字以外は、規格Cは拡張文字集合につい
    て規程していない。

  - ファイル等の外部メディアやCソースプログラムは、
    大抵のところバイトの多きさの文字で構成されてい
    る(バイト指向)。

  - そのため多バイトコードという仕組みが使われる。
    多バイトコードとは、バイトの大きさの文字の並び
    とワイド文字の並びとの間でロケール固有の写像を
    行う方法。

  - 1つのワイド文字を、ソース文字集合または実行時文
    字集合に含まれる文字(の並び)によって表現したの
    が多バイト文字である。

  - 多バイト文字は通常のC文字列である。

  - 多バイト文字の形式や、多バイト文字とワイド文字
    の写像は処理系定義である。

  - 多バイト文字の方式には、状態依存型と状態独立型
    がある。

  - 規格Cは多バイト文字についていくつかの制約を課し
    ている。

  - 多バイト文字は次の場所で使ってよい。
    - 注釈
    - 識別子
    - ヘッダ名
    - 文字列定数
    - 文字定数

  - 「ソースの物理的表現の中の多バイト文字は、字句
    解析、プリプロセス処理、それどころか継続行の接
    合よりも前に認識され、ソース文字集合に翻訳され
    なければならない。」 う、これわからない。。。

 2.2 注釈 (Comments)

  - お、Cでも//って使えるんだ。試すと、、、使えた。
*/

// Program to compute the squares of
// the first 10 integers
#include <stdio.h>
void Squares( /* no arguments */)
{
     int i;
     /*
       Loop from 1 to 10,
       printing out the squares
     */
     for (i=1; i<=10; i++)
          printf("%d //squared// is %d\n",i,i*i);
}

int main(void)
{
     Squares();
     return (0);
}

/*

 2.3 字句 (Tokens)

 - やっとTokenだ。。。

 - 「Cプログラムを構成する文字は、以下に述べる規則
   に従って集められて字句となる」うーん。名文だ。

 - 字句は5種類。
   - 演算子
   - 分離子
   - 識別子
   - キーワード
   - 定数

 - 「コンパイラは左から右に文字を集めるとき、できる
   だけ長い字句を作ろうとする。」

 2.4 演算子と分離子 (Operators and Separators)

 - 単純演算子
   ! % ^ など
 - 複合代入演算子
   += -= など
 - その他の複合演算子
   -> ++ など
 - 分離子
   ( ) [ ] など
 - 代替綴り
   <% %> <: :> など

 2.5 識別子 (Identifiers)

 - 識別子(名前)はラテン大文字と小文字、数字、アンダ
   スコア、国際文字名および処理系定義の多バイト文字
   の並びである。ということはgccでUTF-8の日本語も使
   えるかも？ 試す。
*/

#include <stdio.h>
int main(void)
{
     int あ;
     for (あ=1;あ<10;あ++) 
          printf("%d ", あ);

     return (0);
}

/*

 - これは駄目。-std=c99しても駄目。

 - \uでいく。

*/

#include <stdio.h>
int main(void)
{
     int \u3041; // あ
     for (\u3041=1;\u3041<10;\u3041++) 
          printf("%d ", \u3041);

     return (0);
}

/*

 - おお。これは-std=c99ならOK。gccが多バイトからソー
   ス文字集合への変換をしてくれないんだなきっと。

 - 識別子の文字数の制限には注意が必要。特にリンカな
   どの外部のプログラムに露出する場合は、そっちの制
   限もあるから。

 2.6 キーワード (Keywords)

 - 識別子の一部は規格にてキーワードとして定められて
   おり、普通の識別子として使ってはいけない。

 - キーワードの一覧

   auto _Bool break case char _Complex const
   continue default do double else enum extern
   float for goto if _Imaginary inline int long
   register restrict return short signed sizeof
   static struct switch typedef union unsigned
   void volatile while

 2.6.1 既定義識別子 (Predifined Identifiers)

 - __func__は既定義識別子であり、プログラマが定義し
   てはならない。


 2.7 定数 (Constants)

 - 別の言語ではリテラルとも言う。Cでは伝統的に
   Constantsと呼ぶ。

 - この節、型の概念とクロスリファレンスになっている
   なぁ。まあしょうがないんだろうな。

 2.7.1 整数定数 (Integer Constants)

 - まあ、ごちゃごちゃと。割愛。

 2.7.2 浮動小数点定数 (Floating-Point Constants)

 - これも割愛。

 2.7.3 文字定数 (Character Constants)

 - 文字定数は、一つ以上の文字をアポストロフィで囲んで書く。

 - 文字定数の前にLをつけるとワイド文字定数になる。

 - 形式は次のとおり。

   文字定数:
      ' c文字の並び '
     L' c文字の並び '

   c文字の並び:
      c文字
      c文字の並び c文字

   c文字:
      「'、 \ および改行を除く任意のソース文字集合の文字」
      エスケープ表記
      国際文字名

 - 先頭にLを付けない文字定数はint型である。その値は、
   実行時文字集合におけるその文字のコードである。

 - Lで始まる文字定数はwchar_t型である。ワイド文字定
   数は複数の文字と複数のエスケープ表記の並びとなっ
   ていて、それらがひとつとなって多バイト文字を表す
   のが普通。多バイト文字からから対応するワイド文字
   への写像は処理系定義である。その変換を実行時に行
   うのはmbtowc関数である。

 - 複数文字定数の意味は処理系定義である。

 2.7.4 文字列定数 (String Constants)

 - 形式は次のとおり。

   文字定数:
      " s文字の並び "
     L" s文字の並び "

   s文字の並び:
      s文字
      s文字の並び s文字

   s文字:
      「"、 \ および改行を除く任意のソース文字集合の文字」
      エスケープ表記
      国際文字名

 - n文字を含む文字列定数の型はchar [n+1]である。最
   後はナル文字'\0'である。

 - n文字を含むワイド文字列定数の型はwchar_t [n+1]で
   ある。最後はナルワイド文字である。

 - 文字列定数の文字を保持しているメモリの内容を書き
   換えようとしてはいけない。

 2.7.5 エスケープ表記 (Escape Characters)

 - 割愛。

 2.7.6 文字エスケープコード (Character Escape Codes)

 - 割愛。

 2.7.7 数値エスケープコード (Numeric Escape Codes)

 - 割愛。

 2.8 C++との互換性 (C++ Compatibility)

 - 割愛。

 2.9 文字集合、レパートリおよびコードについて (On Character sets, Repertories and Encodings)

 - 国際文字名 (Universal Character Names)は、文字定
   数、文字列定数および識別子の中に任意のUCS-2文字
   またはUCS-4文字を書く書き方である。
 - コードはISO/IEC 10646に従う。

 2.10 練習問題 (Execises)

 - 割愛。

*/

Cでの日本語の取り扱いを多少理解できた。

こつこつ。

2009年2月21日土曜日

【Subversion】Subversionのディレクトリ構造

ソースを読むまえに、実際にSubversionを動かして、各種ファイルの挙動や役割を確認した。


----------------------------
リポジトリの内部構造のまとめ
----------------------------


大枠のディレクトリ構造
----------------------

$ ls -laR
total 16
drwxr-xr-x   9 aka  staff  306  2 17 14:17 .
drwxr-xr-x   5 aka  staff  170  2 17 15:21 ..
-rw-r--r--   1 aka  staff  229  2 17 14:17 README.txt ; これはおきまりの説明ファイル。無内容。
drwxr-xr-x   5 aka  staff  170  2 17 14:17 conf ; 接続(ra)関係の設定情報達が入っている。
drwxr-xr-x   2 aka  staff   68  2 17 14:17 dav ; mod_dav_svn用ディレクトリ
drwxr-sr-x  10 aka  staff  340  2 17 22:29 db ; リポジトリ本体ここにバージョン管理している情報がある。BDBかFSFSかで異なるのはこのディレクトリの中身のみ。
-r--r--r--   1 aka  staff    2  2 17 14:17 format ; ひとつの整数値。リポジトリレイアウトのバージョン番号。現在は5。これはリポジトリの方式のバージョン番号であり通常のリポジトリ運用では不変である。
drwxr-xr-x  11 aka  staff  374  2 17 14:17 hooks ; フックスクリプトの設置場所(テンプレートおよびインストールしたもの)
drwxr-xr-x   4 aka  staff  136  2 17 14:17 locks ; ロック管理のためのディレクトリただし1.2.xより上のバージョンでは使用していない。

./conf:
total 24
drwxr-xr-x  5 aka  staff   170  2 17 14:17 .
drwxr-xr-x  9 aka  staff   306  2 17 14:17 ..
-rw-r--r--  1 aka  staff   684  2 17 14:17 authz
-rw-r--r--  1 aka  staff   309  2 17 14:17 passwd
-rw-r--r--  1 aka  staff  1457  2 17 14:17 svnserve.conf

./dav:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 14:17 .
drwxr-xr-x  9 aka  staff  306  2 17 14:17 ..

./db:
total 32
drwxr-sr-x  10 aka  staff  340  2 17 22:29 .
drwxr-xr-x   9 aka  staff  306  2 17 14:17 ..
-rw-r--r--   1 aka  staff    6  2 17 22:29 current ; 3つの整数が並んでいる。"6 5 1"など。頭は最新のリビジョンのようだ。
-r--r--r--   1 aka  staff    2  2 17 14:17 format ; 整数値がひとつ。"2"など。これは不変だ。FSFSのリポジトリ構造のバージョンか？
-rw-r--r--   1 aka  staff    5  2 17 14:17 fs-type ; ファイルシステムタイプ。"fsfs"など。
drwxr-xr-x   9 aka  staff  306  2 17 22:29 revprops ; リビジョンに対するプロパティ達。(リビジョン管理されている)
drwxr-xr-x  11 aka  staff  374  2 17 22:29 revs ; リビジョンの内容そのもの達。(リビジョン管理されている)
drwxr-xr-x   2 aka  staff   68  2 17 22:29 transactions ; トランザクション処理において使われるのか？？
-rw-r--r--   1 aka  staff   37  2 17 14:17 uuid ; UUIDがひとつ。不変。どうやらリポジトリの/に対するUUIDのようだ。
-rw-r--r--   1 aka  staff    0  2 17 14:17 write-lock ; 書き込みロックのための何かなのだろう。

./db/revprops:
total 56
drwxr-xr-x   9 aka  staff  306  2 17 22:29 .
drwxr-sr-x  10 aka  staff  340  2 17 22:29 ..
-rw-r--r--   1 aka  staff   50  2 17 14:17 0
-rw-r--r--   1 aka  staff  110  2 17 15:15 1
-rw-r--r--   1 aka  staff  105  2 17 19:03 2
-rw-r--r--   1 aka  staff  108  2 17 19:30 3
-rw-r--r--   1 aka  staff  109  2 17 19:47 4
-rw-r--r--   1 aka  staff  123  2 17 22:21 5
-rw-r--r--   1 aka  staff  111  2 17 22:29 6

./db/revs:
total 3288
drwxr-xr-x  11 aka  staff      374  2 17 22:29 .
drwxr-sr-x  10 aka  staff      340  2 17 22:29 ..
-rw-r--r--   1 aka  staff      115  2 17 14:17 0
-rw-r--r--   1 aka  staff      385  2 17 15:15 1
-rw-r--r--   1 aka  staff      418  2 17 19:03 2
-rw-r--r--   1 aka  staff      427  2 17 19:30 3
-rw-r--r--   1 aka  staff      317  2 17 19:47 4
-rw-r--r--   1 aka  staff  1648983  2 17 22:21 5
-rw-r--r--   1 aka  staff      733  2 17 22:29 6

./db/transactions:
total 0
drwxr-xr-x   2 aka  staff   68  2 17 22:29 .
drwxr-sr-x  10 aka  staff  340  2 17 22:29 ..

./hooks:
total 72
drwxr-xr-x  11 aka  staff   374  2 17 14:17 .
drwxr-xr-x   9 aka  staff   306  2 17 14:17 ..
-rw-r--r--   1 aka  staff  2015  2 17 14:17 post-commit.tmpl
-rw-r--r--   1 aka  staff  1638  2 17 14:17 post-lock.tmpl
-rw-r--r--   1 aka  staff  2255  2 17 14:17 post-revprop-change.tmpl
-rw-r--r--   1 aka  staff  1567  2 17 14:17 post-unlock.tmpl
-rw-r--r--   1 aka  staff  2934  2 17 14:17 pre-commit.tmpl
-rw-r--r--   1 aka  staff  2038  2 17 14:17 pre-lock.tmpl
-rw-r--r--   1 aka  staff  2764  2 17 14:17 pre-revprop-change.tmpl
-rw-r--r--   1 aka  staff  1979  2 17 14:17 pre-unlock.tmpl
-rw-r--r--   1 aka  staff  2137  2 17 14:17 start-commit.tmpl

./locks:
total 16
drwxr-xr-x  4 aka  staff  136  2 17 14:17 .
drwxr-xr-x  9 aka  staff  306  2 17 14:17 ..
-rw-r--r--  1 aka  staff  139  2 17 14:17 db-logs.lock
-rw-r--r--  1 aka  staff  139  2 17 14:17 db.lock
$ 



revsの構造
----------

まず、ファイル名が番号になっているがそれはrevision
番号であり、そのファイルに記載されいていることがそ
のrevisionを構成するのに必要な基本情報である。

基本情報、というのは、まずそのリビジョンを構成する
のに必要なすべてのディレクトリとファイルの配置に関
する情報は含まれている。これは一見冗長なように思え
るが、svn coするときは、常に配置に関する全ての情報
を渡してworking directoryを構成しなければならない
ので、そんなに無駄ではない。

つづいて、配置されるファイルの中身に関してもこのファ
イルの中に記載されている。その記述方法は次のとおり。

まず、新規ファイルなら中身はすべてここに格納される。
既存ファイルの修正ならば、修正が少なければ、既存ファ
イルに対する差分情報だけ記載される。ただし、差分元
がどのリビジョンなのかのidが一緒に記載される。
修正が多い場合は、新規ファイルがごとく全て記載され
る。

大枠の構造は次のとおり。

---- ---- ---- ----
DELTA
... (test-pdf の中身)
ENDREP
DELTA
... (test-file2 の中身)
ENDREP

id: 4.0.r5/1648331
type: file
count: 0
text: 5 0 723003 765532 d76ffb583e9cf560c70768ec5124aa69
cpath: /test-pdf
copyroot: 0 /

id: 3.0.r5/1648459
type: file
count: 0
text: 5 723016 925302 1128184 a59d4991a080313dcd6c5bd3d286e291
cpath: /test-file2
copyroot: 0 /

;; ここまでがファイルの中身情報
;; ここからがこのディレクトリの情報。
;; ディレクトリの中に含まれているものがプロパティ的に
;; 表現されている。また、それぞれの実体がidというかリ
;; ビジョンというかで表現されている。

PLAIN
K 8
test-dir
V 12
dir 2.0.r4/0
K 9
test-file
V 14
file 1.0.r3/70
K 10
test-file2
V 19
file 3.0.r5/1648459
K 8
test-pdf
V 19
file 4.0.r5/1648331
END
ENDREP
id: 0.0.r5/1648756
type: dir
pred: 0.0.r4/146
count: 5
text: 5 1648595 148 148 3ca827292b495f4de066b5732524ca7e
cpath: /
copyroot: 0 /
---- ---- ---- ----


ファイルの中身情報は具体的には次のような形式で記載される。

---- ---- ---- ----
DELTA 1 0 34
SVN&...(binary)...&THIS IS THE SAME YES SAME TEST FILE.

ENDREP
---- ---- ---- ----



revpropsの構造
--------------

revprops/配下のファイルのファイル名についてはrevsと
同様である。

中身は次のようにそのリビジョンのメタ情報がプロパティ
の形式で記載されている。

--------
$ cat test-repos.2/db/revprops/1
K 10
svn:author
V 3
aka
K 8
svn:date
V 27
2009-02-17T06:15:31.009624Z
K 7
svn:log
V 16
A first check in
END
$ 
--------


----------------------------------------
ワーキングディレクトリの内部構造のまとめ
----------------------------------------

大枠のディレクトリ構造
----------------------

$ ls -laR
total 0
drwxr-xr-x   3 aka  staff  102  2 17 15:02 .
drwxr-xr-x  12 aka  staff  408  2 17 22:28 ..
drwxr-xr-x   4 aka  staff  136  2 17 15:14 test-repos

./test-repos:
total 8
drwxr-xr-x  4 aka  staff  136  2 17 15:14 .
drwxr-xr-x  3 aka  staff  102  2 17 15:02 ..
drwxr-xr-x  8 aka  staff  272  2 17 18:57 .svn ; リポジトリの/ディレクトリに関する情報が入っている。
-rw-r--r--  1 aka  staff   22  2 17 15:10 test-file

./test-repos/.svn:
total 16
drwxr-xr-x  8 aka  staff  272  2 17 18:57 .
drwxr-xr-x  4 aka  staff  136  2 17 15:14 ..
-r--r--r--  1 aka  staff  381  2 17 18:57 entries ; .svnが設置されているディレクトリの内容に関する情報
-r--r--r--  1 aka  staff    2  2 17 15:02 format ; 整数値ひとつ。"8"など。通常の運用においては不変。
drwxr-xr-x  2 aka  staff   68  2 17 15:02 prop-base ; このディレクトリのプロパティについてcheck out時の情報。
drwxr-xr-x  2 aka  staff   68  2 17 15:02 props ; このディレクトリのプロパティについての情報。
drwxr-xr-x  3 aka  staff  102  2 17 15:15 text-base ; このディレクトリの内容(ファイルとかディレクトリとか)に関するcheck out時の情報。
drwxr-xr-x  5 aka  staff  170  2 17 18:57 tmp ; svnがローカルでいろいろ変更を加えるときの作業用一時ディレクトリ。

./test-repos/.svn/prop-base:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 15:02 .
drwxr-xr-x  8 aka  staff  272  2 17 18:57 ..

./test-repos/.svn/props:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 15:02 .
drwxr-xr-x  8 aka  staff  272  2 17 18:57 ..

./test-repos/.svn/text-base:
total 8
drwxr-xr-x  3 aka  staff  102  2 17 15:15 .
drwxr-xr-x  8 aka  staff  272  2 17 18:57 ..
-r--r--r--  1 aka  staff   22  2 17 15:10 test-file.svn-base ; coしたときのファイルそのもの。

./test-repos/.svn/tmp:
total 0
drwxr-xr-x  5 aka  staff  170  2 17 18:57 .
drwxr-xr-x  8 aka  staff  272  2 17 18:57 ..
drwxr-xr-x  2 aka  staff   68  2 17 15:02 prop-base
drwxr-xr-x  2 aka  staff   68  2 17 15:02 props
drwxr-xr-x  2 aka  staff   68  2 17 15:15 text-base

./test-repos/.svn/tmp/prop-base:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 15:02 .
drwxr-xr-x  5 aka  staff  170  2 17 18:57 ..

./test-repos/.svn/tmp/props:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 15:02 .
drwxr-xr-x  5 aka  staff  170  2 17 18:57 ..

./test-repos/.svn/tmp/text-base:
total 0
drwxr-xr-x  2 aka  staff   68  2 17 15:15 .
drwxr-xr-x  5 aka  staff  170  2 17 18:57 ..
$ 


entriesの構造
-------------


---- ---- ---- ----
8    ; おそらく.svnのformatの8。
     ; 空行
dir  ; 固定。.svnがいるdirectoryに関する記述開始。
1    ; Working Directoriesにもってきたときの、このdirectoryのリビジョン
file:///Users/aka/scratch/subversion/test/repos/test-repos; リポジトリ上のURI
file:///Users/aka/scratch/subversion/test/repos/test-repos; リポジトリ上のURI
     ; 空行  ↑なぜ同じものが二行あるのかわからない。
     ; 空行
     ; 空行
2009-02-17T06:15:31.009624Z ; 作成日付け？
1    ; 作成時のリビジョン
aka ; 作成ユーザ
     ; 空行
     ; 空行
svn:special svn:externals svn:needs-lock ; なんだろ？
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
     ; 空行
2c7b609e-ce01-4a14-978c-b57bba64d288 ; このdirectoryのUUID
^L ; エントリのdelimiter
test-dir ; エントリの名前
dir  ; エントリの種別
     ; 空行
     ; 空行
     ; 空行
add  ; WD上でsvn addされた場合はここに目印が入る。
   ; エントリのdelimiter
test-file; エントリの名前
file ; エントリの種別
3    ; このWDにもってきたときのリビジョン
     ; 空行
     ; 空行
     ; 空行
2009-02-17T10:29:34.000000Z ; なんの日付？
6230f0c4f2f42c6471c6c747e7b29b57 ; ハッシュのようだ
2009-02-17T10:30:14.853118Z ; リビジョン3のコミット時刻
3    ; なんのリビジョン？
aka  ; コミットユーザ
^L ; エントリのdelimiter
---- ---- ---- ----



ファイル名文字コード問題
------------------------

ファイル名は次の箇所にあらわれる。

リポジトリ
  db/revs/n ; ファイルの中の文字列として

ワーキングディレクトリ
  test-file ; ファイルシステム上に。
  .svn/entries ; ファイルの中の文字列として。
  .svn/text-base/test-file.svn-base ; ファイルシステム上に。

おそらくOSXで不具合がでているのは、このentriesの中
の文字列としての文字コードとtest-fileや
test-file.svn-baseとかをsvnが取り扱うときの文字コー
ドで差異が発生しているのが原因なのだろう。

こつこつ。

2009年2月16日月曜日

【C:ARM5】1 入門


/*
 1 入門

 1.1 Cの進化
 1.2 Cのどの方言を使うべきか？
 1.3 Cプログラムの概観
 1.4 規格合致性
 1.5 構文記法


  - 実行時ライブラリという言葉、気になるな。

  - Standard C と Standard C++ の intersection を
    Clean C と言うのか。

  - 「普通、リンカはCに特化されていない。どのコン
    ピュータシステムもリンカは一つで、さまざまな言
    語で書かれたプログラムを処理する」なるほど。

  - そうだとすると、リンカの入力となるオブジェクト
    コードの構造も、コンピュータシステムにおいて単
    一なのか？　 ・・・単一なのだろう。ある言語がそ
    うじゃないコードにコンパイルされたとしたら、そ
    れはなんらかの仮想マシン上で動くものなのだろう。

  - 規格合致性には二種類あるよ。規格合致ホスト処理
    系であることと、規格合致フリースタンディング処
    理系であること。後者は埋め込みシステムなど、質
    素なターゲット環境向けなんだよ。

  - 終端記号は、プログラム中に書かれたとき、そのと
    おりに見えなければいけない記号だよ。なるほど、
    こういう言い方もあるな。
*/

こつこつ。

【C:ARM5】「Cリファレンスマニュアル」を読む

Subversionのソースを見ていたら、Cの復習がしたくなった。そこで以前から読みたいと思っていたSteeleのCリファレンスマニュアルをちょっと覗いてみようと思う。Lispを知りつくしたSteeleが書くCの本ということでとても興味があったのだ。できれば、自分なりのCommon Lispとの対比も交えていきたい。

【Subversion】Subversion Development


        * : [subversion] Subversion Development

        http://subversion.tigris.org/development.html

        を読む。ほぼ内容無し。

        The Big Pictureの図でアタリを確認する。

        - Working Copy Management Library
        - Repository Interface

        あたりがポイントになるのかな。しかしこれは本当にbig pictureで、実
        態どうだか、というところだ。

【Subversion】調査の基礎


        * : [subversion] 調査の基礎

        対象ソース
        subversion_1.5.1dfsg1.orig.tar.gz
         - apt-getで取ってきたもの。

        情報源
        http://subversion.tigris.org/
         - ここにいろいろある。

        ソースコードリーディングのツールや手法がいろいろありそうだが、あん
        まり知らない。。。とりあえず、etagsとoccurぐらいが手駒。あたりまえ
        だけど、自分の手駒でまずは攻めるしかない。。。

        まずは情報源の資料を読む。

【Subversion】Subversionの理解


        * : [subversion] subversionの調査

        ポータブルな環境構築においてリポジトリは重要である。リポジトリは
        とりあえずsubversion + svkでいこうと思う。

        subversionを取り扱うときには文字コードの問題がある。
        今まであいまいにしてきたが、ここで整理する。

        ファイルの中身は問題ではない。問題はファイル名だ。

        subversionのFSFSでは、db/にバージョン管理対象ファイルが設置される。

        ここではファイル名は管理ファイルの中にのみ書かれている。なのでリポ
        ジトリの管理としてはOSのファイル名管理APIは関係ない。

        続いて、svnクライアントとsubversionリポジトリとの間では何がどのよ
        うにやり取りされるのか、ということ。

        移送されるのはワーキングコピーであり、それはファイル実体と管理領域
        たる.svnである。

        .svnの中にはバージョン管理対象となっているファイル実体は無い。管理
        情報は管理ファイルの中にすべて記載される。

        うーん。もっとちゃんと調べないと、駄目だ。

        これらのことはなんとなくはわかるのだが、仕様書が存在しない。そこで
        ソースを呼んでみるがそれなりにボリュームがあるし、C言語なので、
        Common Lispのようなスピードでは読めない。5時間くらいソースの中を放
        浪してなんとなく捉めてはきたが、ちゃんとわかるには、系統だってしっ
        かりソースを読まないと駄目だ。

うーん。どうしたものか。
リポジトリの内部動作についてちゃんと理解していないで、自分の開発の信頼性をそれに預けることができるのだろうか？　できるかできないかよくわからない。

そこで、おそるおそるSubversionのソースコードリーディングをしてみることにする。無理そうだったら逃げる。

こつこつ。

【Ubuntu】ポータブルな制作環境準備

Ubuntuに引っ越しするには、現在のOSX上の制作環境自体をまずポータブルにせねば。これはそこそこしっかりやっておこうと思う。ここでしっかりやっておけば、複数の機体に跨がった制作も統一的にできるし、将来UbuntuからBSDに移りたいなんて思っても速やかに移れるから。


 * : [Unix][作業環境][mac][ubuntu]

 ~/local に制作環境をまとめて、制作環境をポータブルにする。

 

 ~/local/config/

   - 自分のリポジトリでバージョン管理
   - バックアップ非対象
 
 ~/におかれるdot的初期化ファイルについては、それ自体はそこに設置する。
 ただしそこに置いたものは、ユーザー設定としてはいわゆるload機能のみ
 とする。

 Emacs :
 ~/.emacs : Emacsの起動時に実行されるファイル
 (load-file "~/local/config/emacs.el")

 ACL :
 ~/.clinit.cl : ACLの起動時に実行されるファイル
 (load "~/local/config/clinit.cl")

 GNU Screen :
 ~/.screenrc : Screenの起動時に実行されるファイル

 screenの設定ファイルに外部ファイルのload機能は存在しない。
 そこで、bashの設定として、

 alias cscreen='screen -c ~/local/config/.screenrc'

 とする。

 bash :
 ~/.bash_profile : ログインシェルとしての起動時に実行されるファイル
  - OSXのTerminalではTerminalの起動毎に実行される。
  - ただしTerminal内でのbashの起動時には実行されない。
 
 ~/.bashrc : bashの起動時に(ログイン時含めていつでも)実行されるファイル
  - OSXのTerminal起動時には実行されない。
  - Terminal内のbash起動時毎に実行される。

 このようにOSXは乱れている。なので次のようにする。

 ~/.bash_profile :
 source ~/local/config/.bash_profile

 ~/.bashrc :
 source ~/local/config/.bashrc

 そしてOSXでは、

 ~/local/config/.bash_profile :
 source ~/local/config/.bash_parent_start
 source ~/local/config/.bash_every_start

 ~/local/config/.bashrc :
 source ~/local/config/.bash_every_start

 として、
 
 ~/local/config/.bash_parent_start
 ~/local/config/.bash_every_start

 にはそれぞれ、親起動時に一回のみ実行させたいことと、bashが呼ばれる
 たびに起動させたいことをそれぞれ書く。



 
 ~/local/lib/

   - 自分のリポジトリでバージョン管理
   - バックアップ非対象


 ライブラリファイルについて。

 - 環境毎のパッケージ管理機構によってライブラリも管理されている場合、
 もしくはそういったライブラリはその機構に従う。

 - 環境非依存のパッケージ管理機構によってライブラリを管理している場
 合は、~/local/lib/以下にファイルを設置する。

 - 手で管理しているライブラリも~/local/lib以下にファイルを設置する。

 
 lib内は、今のところ、次のような配置とする。

 catalog : XMLcatalogファイル
 cl : Common Lisp 手管理ライブラリ
 clbuild : Common Lisp clbuild 管理ライブラリ
 elisp : Emacs Lisp ライブラリ
 java : java ライブラリ
 xml : XMLスキーマファイル

 OS毎に分離が必要なものは各ディレクトリの中でディレクトリに分ける。


 ~/local/tmp/

 配置先が決まっていないファイルの一時置き場。
   - 自分のリポジトリでバージョン管理
   - バックアップ非対象

 
 ~/local/var/

 リポジトリなどを設置。
   - 自分のリポジトリでのバージョン管理はしない。
   - バックアップ対象

 
 ~/local/bin/

 自分で作成した便利shell command。
   - 自分のリポジトリでのバージョン管理はしない
   - バックアップ対象
   - binの中身をmakeするスクリプトを用意する。

 
 ~/local/work/

 制作作業に係わるもの。

 
 ~/local/work/scratch/

 自身の日々のプロジェクトに関するもの。それを落書きと捉える。
   - 自分のリポジトリでバージョン管理
   - バックアップ非対象

 
 ~/local/work/external/

 他の人が絡むプロジェクトに関するもの。それはそれ用のリポジトリがあ
 るはず。
   - プロジェクトのリポジトリでバージョン管理
   - バックアップ非対象

さて、ここまで来て、リビジョン管理の中心となるsubversion + svkの内部構造についてほとんど何も知らないことに気付いた。そこでSubversionについて調べはじめたら、これがドツボにはまっていくことになる。次回はそのエントリ。

こつこつ。

2009年2月15日日曜日

【Ubuntu】Parallels tool のインストール


 まずUbuntu側の準備。

 $ sudo aptitude install linux-headers-$(uname -r) build-essential

 つづいてParallelsのメニューにて、

 Virtual Machines
  -> Install Parallels tool ...

 するとCD-ROMがマウントされてWindowが開く。
 それはまあ、関係なくて、

 sudo  sh /cdrom/install

 あとはGUIに従ってinstall。

 マウスポインタがOSXとVM内とでシームレスに動くようになった。
 "|"と"`"のキーがきかない問題がなおった。

 画面サイズの追従が機能しない。xrandrはあるけど、parallelsのWindowサ
 イズを広げても選択肢が増えないのだ。

 いろいろ調べてみると、Video RAMの割り当てがポイントなようだ。64MB
 にすると、xrandrとは無関係にDynamic Resolutionが機能するようになった。
 これはやっぱり便利。

 使えそうな気がしてきた。

こつこつ。

【実践CL】22 黒帯のためのLOOP

拡張loopは好みがわかれるらしい。私はどうか、というと「便利ならいいじゃん」という感じ。便利かどうかちょこっと試してみる。


;;;
;;; 22 黒帯のためのLOOP
;;;

;; 説明よりも、コードサンプルを中心に考える。

;; 22.1 LOOPのパーツ

;; 特になし


;; 22.2 反復の制御

;; この例は動かせない。
;; (loop
;;     for item in list
;;     for i from 1 to 10
;;     do (something))


;; 22.3 数えるループ

(loop for i upto 10 collect i)
                                        ; (0 1 2 3 4 5 6 7 8 9 10)
(loop for i downto -10 collect i)
                                        ; Error: Don't know where to start stepping.
(loop for i from 0 downto -10 collect i)
                                        ; (0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10)


;; 22.4 コレクションやパッケージについてループする。

(loop for i in (list 3 4 5 6) collect i)
                                        ; (3 4 5 6)
(loop for i in (list 3 4 5 6) by #'cddr collect i)
                                        ; (3 5)
(loop for i on (list 3 4 5 6) collect i)
                                        ; ((3 4 5 6) (4 5 6) (5 6) (6))
(loop for i on (list 3 4 5 6) by #'cddr collect i)
                                        ; ((3 4 5 6) (5 6))

(loop for x across "abcd" collect x)
                                        ; (#\a #\b #\c #\d)

(defparameter *v* excl::*package-table*)

(loop for x across *v* collect x)
                                        ; (NIL #<The COMMON-LISP package> #<The KEYWORD package> #<The EXCL package> #<The ACLMOP package> #<The SYSTEM package>
                                        ;  #<The GARBAGE package> #<The COMMON-LISP-USER package> #<The TOP-LEVEL package> #<The COMPILER package> #<The FOREIGN-FUNCTIONS package>
                                        ;  #<The DEBUGGER package> #<The MULTIPROCESSING package> #<The DEFSYSTEM package> #<The LEP package> #<The LEP-IO package>
                                        ;  #<The NULL-PACKAGE-REPLY-SESSION package> #<The ACL-SOCKET package> #<The EXCL.SCM package> #<The CROSS-REFERENCE package>
                                        ;  #<The PROFILER package> #<The INSPECT package> #<The NET.URI package> #<The ASDF package> #<The UTIL.AKA package>
                                        ;  #<The ALEXANDRIA.0.DEV package> #<The CXML-SYSTEM package> #<The TRIVIAL-GRAY-STREAMS-SYSTEM package>
                                        ;  #<The CLOSURE-COMMON-SYSTEM package> #<The PURI-SYSTEM package> #<The TRIVIAL-GRAY-STREAMS package> #<The BABEL-ENCODINGS package>
                                        ;  #<The BABEL package> #<The RUNES package> #<The UTF8-RUNES package> #<The RUNES-ENCODING package> #<The HAX package> #<The PURI package>
                                        ;  #<The CXML package> #<The SAX package> #<The CXML-XMLS package> #<The KLACKS package> #<The DOM package> #<The RUNE-DOM package>
                                        ;  #<The DOMTEST package> #<The DOMTEST-TESTS package> #<The XMLCONF package> NIL NIL NIL)


;; 処理系が使っているハッシュテーブルを探す。
(do-all-symbols (sym)
  (when (handler-case
          (symbol-value sym)
        (condition (c) nil))
    (when (typep (symbol-value sym) 'hash-table)
      (print sym))))
                                        ; ACL-SOCKET::*HOSTNAME-CACHE* 
                                        ; ACL-SOCKET::*IPADDR-CACHE* 
                                        ; ACL-SOCKET::*PORT-CACHE* 
                                        ; ASDF::*DEFINED-SYSTEMS* 
                                        ; BABEL::*STRING-VECTOR-MAPPINGS* 
                                        ; BABEL-ENCODINGS::*ABSTRACT-MAPPINGS* 
                                        ; BABEL-ENCODINGS::*CHARACTER-ENCODINGS* 
                                        ; COMPILER::*LOC-PARAM-CONS* 
                                        ; CXML:*DTD-CACHE* 
                                        ; EXCL::*ADVISE-HASH-TABLE* 
                                        ; EXCL::*EQL-SPECIALIZER-TABLE* 
                                        ; EXCL::*PACKAGE-NAMES* 
                                        ; EXCL::*LAP-EMITTERS* 
                                        ; EXCL::*FSPEC->PATHNAME* 
                                        ; EXCL::.SAVED-ENTRY-POINTS. 
                                        ; EXCL::*PATHNAME->FSPECS* 
                                        ; EXCL::*LONG-METHOD-COMBINATION-FUNCTIONS* 
                                        ; EXCL::*PREVIOUS-NWRAPPERS* 
                                        ; EXCL::*ARG-INFO-TABLE* 
                                        ; EXCL::*EF-DUAL-CHANNEL-FUNCTIONS* 
                                        ; EXCL::*FWRAP-HASH-TABLE* 
                                        ; EXCL::*SHARED-ESLOTS* 
                                        ; EXCL::*EF-SINGLE-CHANNEL-FUNCTIONS* 
                                        ; EXCL::*LOGICAL-PATHNAME-TRANSLATIONS* 
                                        ; EXCL::*SETF-FUNCTION-HASHTABLE* 
                                        ; EXCL::*XP-PARSER-TABLE* 
                                        ; EXCL::*PROPERTY-HASH-TABLE* 
                                        ; EXCL::*EF-SINGLE-CHANNEL-DIRECT-FUNCTIONS* 
                                        ; EXCL::*FIND-CLASS* 
                                        ; EXCL::*NAME-TO-CHAR-TABLE* 
                                        ; EXCL::*ENCAPSULATION-HASH-TABLE* 
                                        ; EXCL::*SHARED-CONS-TABLE* 
                                        ; EXCL::.SET-READTABLES. 
                                        ; EXCL.SCM::*CHANGED-DEFINITIONS* 
                                        ; EXCL.SCM::*FILE-SECTIONS* 
                                        ; FOREIGN-FUNCTIONS::*ANON-IFTYPE-CACHE* 
                                        ; NET.URI::*URIS* 
                                        ; PURI::*URIS* 
                                        ; RUNES::*RUNE-NAMES* 
                                        ; RUNES-ENCODING::*NAMES* 
                                        ; RUNES-ENCODING::*ENCODINGS* 
                                        ; RUNES-ENCODING::*CHARSETS* 
                                        ; TOP-LEVEL::*COMMAND-HASH-TABLE* 
                                        ; NIL


(loop for k being the hash-keys in excl::*package-names* collect k)
                                        ; ("" "MULTIPROCESSING" "SI" "FOREIGN-FUNCTIONS" "CL-USER" "CXML-SYSTEM" "CL" "DEBUG" "DS" "NET.URI" ...)

(loop for k being the hash-keys in excl::*package-names* (hash-value v) collect v)
                                        ; (#<The KEYWORD package> #<The MULTIPROCESSING package> #<The SYSTEM package>
                                        ;  #<The FOREIGN-FUNCTIONS package> #<The COMMON-LISP-USER package> #<The CXML-SYSTEM package>
                                        ;  #<The COMMON-LISP package> #<The DEBUGGER package> #<The DEFSYSTEM package> #<The NET.URI package> ...)
(loop for v being the hash-values in excl::*package-names* collect v)
                                        ; (#<The KEYWORD package> #<The MULTIPROCESSING package> #<The SYSTEM package>
                                        ;  #<The FOREIGN-FUNCTIONS package> #<The COMMON-LISP-USER package> #<The CXML-SYSTEM package>
                                        ;  #<The COMMON-LISP package> #<The DEBUGGER package> #<The DEFSYSTEM package> #<The NET.URI package> ...)

(loop for sym being the symbols in (find-package :cl) collect sym)
                                        ; (COMMON-LISP::P LOGCOUNT TIME ARITHMETIC-ERROR-OPERANDS NSUBST CHANGE-CLASS MAPHASH EVAL-WHEN BLOCK MEMBER-IF ...)

(loop for sym being the symbols in (find-package :cl) count sym)
                                        ; 978

(loop for sym being the present-symbols in (find-package :cl) count sym)
                                        ; 978

(loop for sym being the external-symbols in (find-package :cl) count sym)
                                        ; 977

(loop for sym being the external-symbols in (find-package :excl) count sym)
                                        ; 588

(loop for sym being the present-symbols in (find-package :excl) count sym)
                                        ; 6947

(loop repeat 5)                         ; NIL
(loop repeat 5
    for x = 0 then (1+ x)
    collect x)                          ; (0 1 2 3 4)
(loop repeat 5
    for x = 0 then y
    for y = 1 then (+ x y)
    collect y)                          ; (1 2 4 8 16)

(loop repeat 3 for k being the hash-keys in excl::*package-names* collect k)
                                        ; ("" "MULTIPROCESSING" "SI")

;; 22.6 ローカル変数

;; 特になし


;; 22.7 変数を分配する

(loop for (a b) in '((1 2) (3 4) (5 6))
    do (format t "a: ~a; b: ~a~%" a b))
                                        ; a: 1; b: 2
                                        ; a: 3; b: 4
                                        ; a: 5; b: 6
                                        ; NIL

(loop for cons on (list 1 2 3 4 5)
    do (format t "~a" (car cons))
    when (cdr cons) do (format t ", "))
                                        ; 1, 2, 3, 4, 5
                                        ; NIL

(loop for (item . rest) on (list 1 2 3 4 5)
    do (format t "~a" item)
    when rest do (format t ", "))
                                        ; 1, 2, 3, 4, 5
                                        ; NIL

(defparameter *random* (loop repeat 100 collect (random 1000)))

(loop for i In *random*
    counting (evenp i) into evens
    counting (oddp i) into odds
    summing i into total
    maximizing i into max
    minimizing i into min
    finally (return (list min max total evens odds)))
                                        ; (0 998 54241 59 41)



;; 22.9 無条件実行

(loop for i from 1 to 10 do (print i))
                                        ; 1 
                                        ; 2 
                                        ; 3 
                                        ; 4 
                                        ; 5 
                                        ; 6 
                                        ; 7 
                                        ; 8 
                                        ; 9 
                                        ; 10 
                                        ; NIL

(block outer
  (loop for i from 0 return 100)
  (print "This will print")
  200)
                                        ; "This will print" 
                                        ; 200

(block outer
  (loop for i from 0 do (return-from outer 100))
  (print "This won't print")
  200)                                  ; 100


;; 22.10 条件実行

(loop for i from 1 to 10 do (when (evenp i) (print i)))
                                        ; 2 
                                        ; 4 
                                        ; 6 
                                        ; 8 
                                        ; 10 
                                        ; NIL

(loop for i from 1 to 10 when (evenp i) sum i)
                                        ; 30

(loop for key in (list "CXML" "RUNE" "CL" "DEBUG" "HOGE" "PIYO")
    when (gethash key excl::*package-names*) collect it)
                                        ; (#<The CXML package> #<The COMMON-LISP package> #<The DEBUGGER package>)


;; 22.11 セットアップと後始末

(loop named outer for p across excl::*package-table*
    do (when p
         (loop for sym being the symbols in p
               do (if (equal "CAR" (symbol-name sym))
                      (return-from outer p))))) ; #<The COMMON-LISP package>


;; 22.12 終了条件のテスト

(if (loop for n in (list 1 2 3 4) always (evenp n))
    (print "All numbers even."))        ; NIL

(if (loop for n in (mapcar #'(lambda (n) (* 2 n)) (list 1 2 3 4)) always (evenp n))
    (print "All numbers even."))        ; "All numbers even."

(if (loop for n in (list 1 3 11 17) never (evenp n))
    (print "All numbers odd."))         ; "All numbers odd."

(loop for char across "abc123" thereis (digit-char-p char)) ; 1
(loop for char across "abcdef" thereis (digit-char-p char)) ; NIL


;; 22.13 まとめ

特になし。

loop、便利だなぁ。

お、やっと後半戦に入れる。後半戦は実践の章。
こつこつ。

【実践CL】21 大規模開発に向けて：パッケージとシンボル

パッケージとシンボルは、いろんな機会で自分なりに整理してきたので、軽く復習というノリで。


;;;
;;; 21 大規模開発に向けて：パッケージとシンボル
;;; 


;; 21.1 読取器はパッケージをどうやって使うか

*package*                               ;#<The COMMON-LISP-USER package>
(find-symbol "CAR")                     ;CAR, :INHERITED
(find-symbol "CAR" (find-package "COMMON-LISP"))
                                        ;CAR, :EXTERNAL
(find-symbol "car")                     ;NIL, NIL

(find-symbol "PIYO")                    ;NIL, NIL
(intern "PIYO")                         ;PIYO, NIL
(find-symbol "PIYO")                    ;PIYO, :INTERNAL

(eql ':foo :foo)                        ;T

(symbol-name :foo)                      ;"FOO"

(eql '#:foo '#:foo)                     ;NIL

(symbol-name '#:foo)                    ;"FOO"
(symbol-package '#:foo)                 ;NIL

(defparameter *pair-1* (cons '#:foo (intern "FOO")))
(defparameter *pair-2* (cons '#:foo (intern "FOO")))

(eql *pair-1* *pair-2*)                 ;NIL
(equal *pair-1* *pair-2*)               ;NIL
(equal (car *pair-1*) (car *pair-2*))   ;NIL

(eql (car *pair-1*) (car *pair-2*))     ;NIL
(eql (cdr *pair-1*) (cdr *pair-2*))     ;T

(setf (symbol-value (car *pair-1*)) 1)  ;1
(symbol-value (car *pair-1*))           ;1

(defparameter *a* '#:foo)
(type-of *a*)                           ;SYMBOL
(symbol-value *a*)                      ;Error: Attempt to take the value of the unbound variable `#:FOO'.
(setf (symbol-value *a*) 10)
(symbol-value *a*)                      ;10

(defparameter *b* *a*)
(symbol-value *b*)                      ;10

(gensym)                                ;#:G7


;; 21.2 パッケージとシンボルの用語をいくつか

;; 特になし。


;; 21.3 3つの標準パッケージ

*package*                               ; #<The COMMON-LISP-USER package>
common-lisp:*package*                   ; #<The COMMON-LISP-USER package>
cl:*package*                            ; #<The COMMON-LISP-USER package>
:a                                      ;:A
keyword:a                               ;:A
(eql :a keyword:a)                      ;T


;; 21.4 自分でパッケージを定義する

(defpackage :util.aka
  (:use :common-lisp))

excl::*package-table*

(in-package :util.aka)


;; 21.5 再利用可能なライブラリのパッケージ化

(defpackage :util.aka
  (:use :common-lisp)
  (:export :open-db
           :save
           :store))

;; 21.6 個別の名前をインポートする

;; On REPL
(in-package :util.aka)
(asdf:operate 'asdf:load-op :cxml)
(defpackage :util.aka
  (:use :common-lisp)
  (:import-from :cxml :parse))
(find-symbol "PARSE")                   ;PARSE, :INTERNAL

(defpackage :util.aka
  (:use :common-lisp :cxml)
  (:shadow :parse))


;; 21.7 パッケージ化の定石

;; 特になし

;; 21.8 パッケージの悩みどころ

;; 特になし

こつこつ。

2009年2月14日土曜日

【Ubuntu】環境構築をはじめる

やっぱりOSXだと、Common Lispを学ぶにしても深く入っていくことができない。単純な話、ソースコードを見れないから。

なので、Ubuntuの環境を構築しはじめる。結構時間がかかると思う。ちょこちょこメモを書いていく。

まずはOSX上のParallels DesktopにInterpid Ibexをいれた。サーバ用途じゃないのでデスクトップ版。

実はさっそく困っている。

インストール時にキーボードで"US"を選択したのだが、KINESIS Advantageで、"|"のキーと"`"のキーが認識されない、、、

2009年2月12日木曜日

【実践CL】20 特殊オペレータ

特殊オペレータは多少まじめにやってみた。
コンパイルはまったく理解していないので途中で力尽きた。


;;;
;;; 20 特殊オペレータ
;;;

;; 20章は例が少ない。即興で自分で例を作成していくことにする。
;; また自分なりのコメントもつける。

;; 何がspecial operatorかは、数学の公理のように選択である。
;; 言語を仕様化するには公理を選択するのが簡便である。


;; 20.1 評価を制御する。

(mapcar #'special-operator-p '(quote if progn))
                                        ; 3

;; quote
;; quoteはreaderが作ったLisp Objectを返す。
;; 別の言い方をすれば、evalを迂回する。

(car (quote (cdr nil)))                 ; CDR
(car (cdr  nil))                        ; NIL

;; if
;; ifはLispの基本評価ルールから逸脱する。


(defun if-func (condition true-clause false-clause)
  (if condition
      true-clause
    false-clause))

(if-func nil
         (car 10)
         (cdr nil))
;; (car 10)が先に評価されるのでerrorになる。

(if nil
    (car 10)
  (cdr nil))
;; (car 10)が評価されるのは条件がnon-nilのときのみ。
;; よってerrorにならない。


;; progn
;; Lispの基本評価ルールから逸脱して、並列した式を評価する。
;; ある処理をprognで書くということはそれが副作用を目的としている
;; ということである。

;; 通常の評価は再帰的に単一のLisp formを評価するのみ。
(map 'vector #'1+
  (append (cdr (list 1 2 3))
          (car '((10 20) 30))))

;; prognは複数のLisp formを順次評価する。
(progn
  (print 1)
  (print 2)
  (print 3))

;; ふと疑問。ファイルをloadすると文書内の文字列が
;; read evalされていくが、その順次処理はprognなのだ
;; ろうか。違うな。loadは関数で、ストリームのeofに
;; なるまで、read、evalを繰り返しているだけだろう。
;; 逆にprognというのが、prognの範囲において、そのよ
;; うな動作をする特殊オペレータということなのだろう。


;; 20.2 レキシカル環境を操作する

;; レキシカル環境が何かということを考える観点のひと
;; つにそれがevalの内部機構であるということがある。
;; Lispをしているとき、evalは2つある。1つはプログラ
;; マの頭の中。もうひとつは計算機の中だ。

;; 計算機でevalを実現するということは、ハードウエア
;; だけで組み立てるにせよ、ハードウエアとソフトウエ
;; アで組み立てるにせよ、ソフトウエアとしてどこから
;; Lisp自身で組み立てるかにせよ、いずれにせよ何か
;; Lisp以外のものも使いつつevalを作らなければいけな
;; い。

;; そのときのevalの構成物のひとつがレキシカル環境だ。

;; なので、レキシカル環境にプログラマがアクセスする
;; ことは、REPLでそれ以外のLisp formやLispオブジェ
;; クトを取り扱うのを越えた「意味」をもっている。


(mapcar #'special-operator-p '(let let* function setq flet labels macrolet symbol-macrolet))
                                        ; 8 3
(let ((v 1))
  (incf v))
;; vは評価されず、レキシカル環境の変数名となる。vに
;; は1がbindされる。evalは、let式の中ではvという記
;; 号を変数名として取り扱い、vを評価すると変数に
;; bindされた値を返すように振る舞う。

(setq v 1)                              ;このsetqは特殊オペレータだがアクセスしているのはレキシカル環境ではない。
(let ((v 10)
      (w (incf v)))
  (incf w))                             ; 3

(setq v 1)
(let* ((v 10)
       (w (incf v)))
  (incf w))                             ; 12
;; letとlet*では、レキシカル変数を生成するときの評
;; 価ルールが異なる。


(flet ((add-10 (n)
         (incf n 10)))
  (add-10 100))                         ; 110
;; fletはletの関数版。

(defun add-10 (n)
  (incf n 2))
(add-10 1)                              ; 3
(flet ((add-10 (n)
         (if (< n 10)
             (add-10 (incf n))
           n)))
  (add-10 1))                           ; 4
(labels ((add-10 (n)
           (if (< n 10)
               (add-10 (incf n))
             n)))
  (add-10 1))                           ; 10
;; fletとflabelsの関係はletとlet*のごとし。

;; fletとflabelsはそれが置かれた場所(lexical)の環境
;; にアクセスできるので、top-levelで定義する関数と
;; 比べると、そこでの文脈に依存した簡潔な記述が可能
;; となる。

;; function
;; symbolまたはlambda expressionから関数オブジェクトを取り出す。

(function car)                          ; #
(function (lambda (x) x))               ; #

;; ところで関数って何だっけ？

;; 関数(関数オブジェクト)はLispオブジェクトであり、
;; 基本評価ルールとして、lisp formのcarに位置したと
;; きは、evalによって、cdrを情報として渡される。(こ
;; の処理を特殊オペレータにしたのがfunctionかなぁ)
;; 渡された情報とLispオブジェクト内の定義に従って処
;; 理を実行して値を返す。

;; 関数はclosureを成す場合と成さない場合がある。

(defun hoge (x) x)
(type-of #'hoge)
(symbol-function 'hoge)                 ; function
(type-of #'hoge)                        ; FUNCTION
(defun make-piyo ()
  (let ((v 0))
    #'(lambda (x) x)))
(setf (symbol-function 'piyo) (make-piyo))
(symbol-function 'piyo)                 ; interpreted closure
(type-of #'piyo)                        ; FUNCTION


;; macro

;; Common Lispにはmacroと名のつくものがたくさんある。
;; 代表的なものは次のとおり。
;; 
;; - macro
;; - compiler macro
;; - reader macro
;;   - dispatching macro
;; - symbol macro
;;
;; macroなんぞや、を書く気力は今はないので割愛。

;; macrolet
;; macroletは、defunに対するfletがごとく、defmacroに対する。

;; macroletの例は思いつかなかったのでCLtL2より。
(defun foo (x flag)
  (macrolet ((fudge (z)
               `(if flag
                    (* ,z ,z)
                  ,z)))
    (+ x
       (fudge x)
       (fudge (+ x 1)))))

(foo 2 nil)                             ; 7
(foo 2 t)                               ; 15


;; symbol-macroletは、同様に、define-symbol-macroに対する。

(symbol-macrolet ((hoge 'piyo)
                  (moge #'car))
  (list hoge moge hoge moge))           ; (PIYO # PIYO #)


;; 20.3 ローカルなフローの制御

(mapcar #'special-operator-p '(block return-from tagbody go))
                                        ; 4 8 3

;; block, return-from
;;

(block outer
  (print 'outer-1)
  (block inner
    (print 'inner-1)
    (return-from inner)
    (print 'inner-2)
    )
  (print 'outer-2))

(tagbody
  (go bottom)
 top
  (print 'top)
  (go out)
 middle
  (print 'middle)
  (go top)
 bottom
  (print 'bottom)
  (go middle)
 out)


;; 20.4 スタックの巻き戻し
(mapcar #'special-operator-p '(catch throw unwind-protect))
                                        ; 3 4 8 3

(defun foo ()
  (format t "Entering foo~%")
  (block a
    (format t " Entering BLOCK~%")
    (bar #'(lambda () (return-from a)))
    (format t " Leaving BLOCK~%"))
  (format t "Leaving foo~%"))

(defun bar (fn)
  (format t "  Entering bar~%")
  (baz fn)
  (format t "  Leaving bar~%"))

(defun baz (fn)
  (format t "  Entering baz~%")
  (funcall fn)
  (format t "  Leaving baz~%"))

; CL-USER(58): (foo)
; Entering foo
;  Entering BLOCK
;   Entering bar
;   Entering baz
; Leaving foo
; NIL
; CL-USER(59): 

(defparameter *obj* (cons nil nil))
(defun foo ()
  (format t "Entering foo~%")
  (catch *obj*
    (format t " Entering CATCH~%")
    (bar #'(lambda () (return-from a)))
    (format t " Leaving CATCH~%"))
  (format t "Leaving foo~%"))

(defun bar (fn)
  (format t "  Entering bar~%")
  (baz fn)
  (format t "  Leaving bar~%"))

(defun baz (fn)
  (format t "  Entering baz~%")
  (throw *obj* nil)
  (format t "  Leaving baz~%"))

; CL-USER(62): (foo)
; Entering foo
;  Entering CATCH
;   Entering bar
;   Entering baz
; Leaving foo
; NIL

;; unwind protect

(defun foo ()
  (unwind-protect
      (progn
        (format t "Entering foo~%")
        (block a
          (format t " Entering BLOCK~%")
          (bar #'(lambda () (return-from a)))
          (format t " Leaving BLOCK~%"))
        (format t "Leaving foo~%"))
    (format t "Unwind-protect foo~%")))

(defun bar (fn)
  (unwind-protect
      (progn
        (format t "  Entering bar~%")
        (baz fn)
        (format t "  Leaving bar~%"))
    (format t "Unwind-protect bar~%")))

(defun baz (fn)
  (unwind-protect
      (progn
        (format t "  Entering baz~%")
        (funcall fn)
        (format t "  Leaving baz~%"))
    (format t "Unwind-protect baz~%")))

; CL-USER(64): (foo)
; Entering foo
;  Entering BLOCK
;   Entering bar
;   Entering baz
; Unwind-protect baz
; Unwind-protect bar
; Leaving foo
; Unwind-protect foo
; NIL
; CL-USER(65): 


;; 20.5 多値

(mapcar #'special-operator-p '(multiple-value-call multiple-value-prog1))
                                        ; 2 3 4 8 3

;; multiple-value-call
(funcall #'+ (values 1 2) (values 3 4)) ; 4
(multiple-value-call #'+ (values 1 2) (values 3 4)) ; 10

;; multiple-value-prog1
(prog1                  
    (values 1 2)
  (values 3 4)
  (values 5 6))                         ; 1

(multiple-value-prog1                  
    (values 1 2)
  (values 3 4)
  (values 5 6))                         ; 1 2



;; 20.6 EVAL-WHEN

(special-operator-p 'eval-when)
                                        ; 1 2 3 4 8 3

;; loadは、基本的にはtop-levelの表現を順次評価する。
;; compile-fileは、基本的には表現をcompiled code に
;; コンパイルするのみで評価はしない。
;;
;; ただしsymbol/package関係の表現やmacroなどの一部
;; の表現については、コンパイルの前作業として評価し
;; ないと正しくコンパイルできない。
;;
;; この問題を解決することを動機として、何を何時評価
;; するかを指定する方法がeval-whenである。


;; コンパイルのことを考えるには、まずトップレベルを
;; 明確にしなければならない。

;; トップレベルとは、処理系が持つREPLである。この
;; REPLを介してプログラマと処理系は対話する。もちろ
;; んそれはスタートポイントであり、プログラムを組ん
;; で別の方法(例えばHTTPとか)で対話するようにしても
;; よい。

;; トップレベルの表現として便利なように、いくつ
;; かの特殊オペレータは設計されている。

;; loadでソースファイルを読み込むのはソースファイル
;; の表現を順次手で入力していくのと似たようなもので
;; ある。違う点は、REPLじゃなくてRELであるというこ
;; とだ。

;; ではコンパイル。

;; まずcompileが何かを確認する。compileとは、関数ま
;; たはdefmacroのマクロを処理系においてより処理効率
;; がよい言語に翻訳する関数である。

(defun hoge (x) x)

(symbol-function 'hoge)                 ; interpreted function hoge
(compile 'hoge)                         ; HOGE nil nil
(symbol-function 'hoge)                 ; function hoge
(disassemble 'hoge)
; ;; disassembly of #
; ;; formals: X

; ;; code start: #x1000ec1328:
;    0: 48 83 f8 01    cmp rax,$1
;    4: 74 01          jz 7
;    6: 06             (push es)        ; SYS::TRAP-ARGERR
;    7: 41 80 7f a7 00 cmpb [r15-89],$0  ; SYS::C_INTERRUPT-PENDING
;   12: 74 01          jz 15
;   14: 17             (pop ss)         ; SYS::TRAP-SIGNAL-HIT
;   15: f8             clc
;   16: 4c 8b 74 24 10 movq r14,[rsp+16]
;   21: c3             ret
(function-lambda-expression #'hoge)     ; (LAMBDA (X) (BLOCK HOGE X)) NIL HOGE


(defmacro piyo (x) x)
(symbol-function 'piyo)                 ; macro piyo
(macroexpand-1 '(piyo 3))               ; 3 T
(compile 'piyo)                         ; PIYO nil nil
(symbol-function 'piyo)                 ; macro piyo
(disassemble 'piyo)
; ;; disassembly of #
; ;; formals: EXCL::**MACROARG** EXCL::..ENVIRONMENT..
; ;; constant vector:
; 0: X
; 1: (X)

; ;; code start: #x1001256328:
;    0: 48 81 ec 98 00 subq rsp,$152     ; 19
;       00 00 
;    7: 4c 89 74 24 08 movq [rsp+8],r14
;   12: 48 83 f8 02    cmp rax,$2
;   16: 74 01          jz 19
;   18: 06             (push es)        ; SYS::TRAP-ARGERR
;   19: 41 80 7f a7 00 cmpb [r15-89],$0  ; SYS::C_INTERRUPT-PENDING
;   24: 74 01          jz 27
;   26: 17             (pop ss)         ; SYS::TRAP-SIGNAL-HIT
;   27: 48 8b d7       movq rdx,rdi
;   30: 48 89 94 24 80 movq [rsp+128],rdx        ; EXCL::**MACROARG**
;       00 00 00 
;   38: 48 c7 c7 08 00 movq rdi,$8       ; 1
;       00 00 
;   45: 48 c7 c6 08 00 movq rsi,$8       ; 1
;       00 00 
;   52: 49 8b 8f 67 fd movq rcx,[r15-665]        ; :MACRO
;       ff ff 
;   59: 49 8b af 77 fd movq rbp,[r15-649]        ; EXCL::DT-MACRO-ARGUMENT-CHECK
;       ff ff 
;   66: b0 04          movb al,$4
;   68: ff d3          call *ebx
;   70: 48 8b bc 24 80 movq rdi,[rsp+128]        ; EXCL::**MACROARG**
;       00 00 00 
;   78: 41 ff 57 67    call *[r15+103]   ; SYS::QCDR
;   82: 48 89 7c 24 78 movq [rsp+120],rdi        ; #:|g47|
;   87: 49 8b 76 36    movq rsi,[r14+54] ; X
;   91: 49 8b af 5f fd movq rbp,[r15-673]        ; EXCL::CAR-FUSSY
;       ff ff 
;   98: ff 53 d0       call *[rbx-48]
;  101: 48 89 bc 24 88 movq [rsp+136],rdi        ; X
;       00 00 00 
;  109: 48 8b 7c 24 78 movq rdi,[rsp+120]        ; #:|g47|
;  114: 41 ff 57 67    call *[r15+103]   ; SYS::QCDR
;  118: 48 8b f7       movq rsi,rdi
;  121: 49 8b 56 3e    movq rdx,[r14+62] ; (X)
;  125: 33 ff          xorl edi,edi
;  127: 49 8b af 57 fd movq rbp,[r15-681]        ; EXCL::LAMBDASCAN-MAXARGS
;       ff ff 
;  134: b0 03          movb al,$3
;  136: ff d3          call *ebx
;  138: 48 8b bc 24 88 movq rdi,[rsp+136]        ; X
;       00 00 00 
;  146: f8             clc
;  147: 48 8d a4 24 98 leaq rsp,[rsp+152]
;       00 00 00 
;  155: 4c 8b 74 24 10 movq r14,[rsp+16]
;  160: c3             ret
;  161: 90             nop

;; おあそびを少々
(compile 'foo '(lambda (x) x))          ; FOO NIL NIL
(symbol-function 'foo)                  ; function foo
(foo 3)                                 ; 3
(compile nil '(lambda (x) x))           ; Function anonymous lambda NIL NIL
(setq foo (compile nil '(lambda (x) x)))
(funcall foo 3)                         ; 3
(setq bar #'(lambda (x) x))
(funcall bar 3)                         ; 3
(setf bar (compile nil (symbol-value 'bar)))
(funcall bar 3)                         ;3

;; compileしたcompiled functionはsymbolに格納される。
;; すなわちsymbolの中に関数本体が格納されているとい
;; うことはInterpreted functionだろうがCompiled
;; functionだろうが変わらない。いずれにしても
;; functionを使うということは、REPL上またはソースファ
;; イル上でその関数を呼び出すような表現を記述すると
;; いうことにすぎない。そして、その表現をLisp form
;; に変換した上で評価する際にいずれにしても関数コー
;; ドが参照されて、evalがそれを*利用して*評価を実行
;; するのだ。

;; このcompileがCommon Lispのコンパイルの基礎になる。

;; 続いてcompile-file。

;; あるソースファイルをそのままloadしたときと、
;; compile-file した上でloadしたときとで、処理スピー
;; ドに違いはあって欲しいが、処理内容に違いはあって
;; 欲しくない。

;; loadの仕様は、上記のようにトップレベルの表現を順
;; 次評価する、ということだ。二者で差異が発生しない
;; ようにするにはcompile-fileの仕様をどうすればよい
;; か。

;; まず*package*がソースをloadするときもCompile済み
;; のコードをloadするときも同じでなければならない。
;; ソースの中に*package*に係わる表現が存在しないな
;; らば、それはプログラマの手順の課題である。loadす
;; るときの*package*がずれないようにする。ソースの
;; 中で*package*に係わる記述がある場合は、それ以降
;; のソースにずれがないようにするにはソースのかきぶ
;; りの課題でありそれを実現するのがeval-whenだ。

;; また、compile-fileした時の処理系の状態とloadする
;; ときの処理系の状態も差異がないようにすべきである。
;; 具体的には、compile-fileするときに利用可能となっ
;; ている外部シンボルは、loadするときにも利用可能で
;; あるようにプログラマは手順を配慮すべきである。

;; 。。。 だめだ。途中で力つきた。

;; コンパイルはいつかちゃんとやろう。。。



;; 20.7 その他の特殊オペレータ

(mapcar #'special-operator-p '(locally the load-time-value progv))
                                        ; 4 2 1 3 4 8 3
                                        ; (+ 4 2 1 3 4 8 3)

;; locally
(defun sample-function (y)              ; this y is regarded as special
  (declare (special y))
  (let ((y t))                          ; this y is regarded as lexical
    (list y
          (locally (declare (special y))
            ;; this next y is regarded as special
            y))))
(sample-function nil)                   ; (T NIL)


;; the

(the symbol 'a)
(the fixnum 1)
(the (values) (values 1 2))
(the (values integer) (values 1 2))
(the (values integer float symbol) (values 1 2.0 'a))
(the (values integer float symbol) (values 1 2 3)) ; error


;; load-time-value
(defvar *loaded-at* (get-universal-time))
(defun when-loaded () *loaded-at*)

(defun when-loaded () (load-time-value (get-universal-time)))
(when-loaded)

;; progv

(setq x 10
      y 20)
(progv '(x y) '(1 2)
  (list x y))                           ; (1 2)
(list x y)                              ; (10 20)

(defun hoge ()
  (+ nanja konja))
(progv '(nanja konja) '(1 2)
  (hoge))                               ; 3
(let ((nanja 1) (konja 2))
  (hoge))                               ; error

こつこつ。

登録: 投稿 (Atom)

計算機とその周辺: What I Talk About When I Talk About Computers

2009年2月27日金曜日

【Subversion】UbuntuとOSXを使った現象論の整理

【Subversion】Subversionにおけるエンコーディングの取扱い

体調不良

2009年2月23日月曜日

C言語におけるUTF-8の取り扱い確認

2009年2月22日日曜日

【C:ARM5】2 字句要素

2009年2月21日土曜日

【Subversion】Subversionのディレクトリ構造

2009年2月16日月曜日

【C:ARM5】1 入門

【C:ARM5】「Cリファレンスマニュアル」を読む

【Subversion】Subversion Development

【Subversion】調査の基礎

【Subversion】Subversionの理解

【Ubuntu】ポータブルな制作環境準備

2009年2月15日日曜日

【Ubuntu】Parallels tool のインストール

【実践CL】22 黒帯のためのLOOP

【実践CL】21 大規模開発に向けて：パッケージとシンボル

2009年2月14日土曜日

【Ubuntu】環境構築をはじめる

2009年2月12日木曜日

【実践CL】20 特殊オペレータ

ラベル

自己紹介

ブログアーカイブ

2009年2月27日金曜日

2009年2月23日月曜日

2009年2月22日日曜日

2009年2月21日土曜日

2009年2月16日月曜日

2009年2月15日日曜日

2009年2月14日土曜日

2009年2月12日木曜日

ラベル

自己紹介

ブログ アーカイブ

ブログアーカイブ