Jinser Kafka [jsr-0001]

土地测量员

I am a Junior Computer Science major and a fancier of programming languages. There is my WIP &'static CV.

NOTE: I always forget, so let me write it down here: Press the key Ctrl + k or Cmd + k to open ninja "telescope".

正在为 IDEA 工作，实习开发 MoonBit 周边设施。

1. Blog posts [jsr-0003]

这是我的 blog，我会在这里写一些工作，学习中遇到的经验和笔记。

目前都是凑数的。

Forester anatomy of \def [jsr-0006]

May 19, 2024
Jinser Kafka

简单描述在 Forester 中 \def 的使用方式。从语法开始。

Forester def of \def [fst-0005]

May 19, 2024
Jinser Kafka

Forester 的 def 语法定义大致摘抄自 ocaml-forester 中的 lib/frontend/Grammar.mly。

%token <string> IDENT
let ident :=
| ident = IDENT;
 { String.split_on_char '/' ident }

let bvar :=
| x = TEXT; { [x] }
let binder == list(squares(bvar))

let textual_expr == list(locate(textual_node))
let arg == braces(textual_expr)
let textual_node :=
| ~ = TEXT; <Code.Text>
| ~ = WHITESPACE; <Code.Text>
| ~ = head_node; <Fun.id>

let fun_spec == ~ = ident; ~ = binder; ~ = arg; <>

let head_node :=
(* ... *)
| DEF; (~,~,~) = fun_spec; <Code.Def>

从 grammar 看 \def 语法大致由三个部分组成：ident，binder，arg。考虑一个完整的 \def 实例：

Example. Forester example of \def with binder [fst-0004]

May 19, 2024
Jinser Kafka

    \def\lam[x]{
        \lambda\x\mathpunct{.}
    }

分别来看组成部分：

\def 是关键词，表示一个宏定义的开始。
\lam 是 ident，表示正在 \def 的宏的名字。
[x] 是 binder，即绑定参数名的位置，这里绑定的参数名是 x。
\lambda\x\mathpunct{.} 是 arg，parser 中写作 arg 也许是想表示其作为 \def 参数的意思，不过实际上可以看作正在定义的宏的 body。观察 body，可以发现 binder 中绑定的参数名在其中被用到：\x。

这个通过 \def 定义出的名为 \lam 的宏被设计为在 \(\LaTeX \) 环境中使用，使用 ## { } 进入。

##{
  \begin{aligned}
    \lam{x} \\
    \lam{x y} \\
    \lam{x}\lam{y} \\
  \end{aligned}
}

可以注意到参数是如何传递给 \lam 宏的，在宏后面写一对花括号，花括号中的任意字符都是会传递给该宏的参数。这里三次传递的参数分别是 x，x y 以及 x 和 y。

##{
  \begin{aligned}
    \lambda\x\mathpunct{.} \\
    \lambda\x y\mathpunct{.} \\
    \lambda\x\mathpunct{.}\lambda\y\mathpunct{.} \\
  \end{aligned}
}

观察展开后的样子，可以看到这些参数分别出现在了各自原先 \x 的位置，这可以看作一种重写或替换。

渲染的效果就像下面这样：

\[ \begin {aligned} \lambda x\mathpunct {.} \\ \lambda x y\mathpunct {.} \\ \lambda x\mathpunct {.}\lambda y\mathpunct {.} \\ \end {aligned} \]

\def 的参数可以不止一个，在 binder 的位置写任意数量的参数都是允许的。

Example. Forester example of \def with multi binder [fst-0007]

May 19, 2024
Jinser Kafka

    \def\SubSup[arg1][arg2]{#{_{\arg1}^{\arg2}}}

每个 binder 都需要用方括号括起来，每个绑定参数的名字都在方括号中；绑定了的参数名可以在 arg 中使用。

\SubSup{x}{y} => \(_{x}^{y}\)。

传参的时候也是写多个花括号，要传递的参数按顺序对应。

最后看一个更简单的例子：

Example. Forester example of \def without binder [fst-0006]

May 19, 2024
Jinser Kafka

    \def\forester{
        [ocaml-forester](https://git.sr.ht/~jonsterling/ocaml-forester)
    }

在这个例子中，binder 被省略，这意味着这个宏不需要绑定参数。使用不需要参数的宏非常简单直接，只需要不写花括号提供参数。

\forester is a experimental forest markup language.

会被展开成：

[ocaml-forester](https://git.sr.ht/~jonsterling/ocaml-forester) is a experimental forest markup language.

ocaml-forester is a experimental forest markup language.

Forester 与  Neovim / Nix 合作笔记 [jsr-0004]

May 1, 2024
Jinser Kafka

扁平的文件树导航问题可以通过虚拟嵌套（部分）解决，如果不是遇到特别大的树，效果挺好的。

Forester nesting tree [fst-0001]

April 9, 2024
Jinser Kafka

Forester 的 tree 是扁平的，不推荐用 file tree 组织，这样一来当树多的时候，使用 file tree 就难以导航。我暂时没有使用 fuzz finder 类似插件的习惯，所以我就在 file tree 插件层面配置了一个 file nesting 虚拟嵌套。

Neo-tree file nesting [nvim-0001]

April 9, 2024
Jinser Kafka

:help neo-tree-file-nesting

nesting_rules = {
  ["forester"] = {
    pattern = "(.*)-0001%.tree$",
    files = { "%1-*.tree" },
  },
},

hls-0001.tree
    hls-0002.tree
    hls-0003.tree
    hls-0004.tree

TREENAME-0001.tree
    TREENAME-0002.tree
    TREENAME-0003.tree
    ...

效果还不错。但有个小问题，git status 只会渲染文件和包含该文件的文件夹；而 nested file 不是文件夹，而 pattern 作为文件本身也不应该被渲染。

Neo-tree file nesting git-status issue [nvim-0002]

April 9, 2024
Jinser Kafka

expand nested files


  trees/

  hls-0001.tree
  hls-0002.tree
 
  hls-0003.tree

  hls-0004.tree
  ...
  TREENAME-0001.tree
  TREENAME-0002.tree
  TREENAME-0003.tree
  ...

collapse nested files


  trees/

  hls-0001.tree
  ...
  TREENAME-0001.tree

这有一点两难，我目前想到了两个方式。

动态渲染：折叠 file 后，让 pattern 也渲染 git status。但是估计有点问题，文件夹的状态应该也是 git status 给的，貌似很难决定 nested file pattern 应该是什么状态。
虚拟文件夹：在 Forester 的用例下，其实有个虚拟文件夹会更好？但是这样仍然不能让 git 来计算文件夹的 status。

Fix Forester Nix flake on x86_64-darwin [jsr-0005]

May 1, 2024
Jinser Kafka

从 Forester 3.0 开始增加了 Nix flake 支持，遗憾的是，jon 和 kento 都不使用 x86_64-darwin。作为二等公民 nix-darwin 的二等公民 x86_64-darwin 不幸运地 broken 了。

Fix Forester opam-nix ocaml-cf broken [fst-0002]

May 1, 2024
Jinser Kafka

ocaml-cf 的 lib_gen/dune 文件疑似写错了，导致 Forester 在 x86_64-darwin 上构建失败。

(rule
 (targets detect.exe)
 (enabled_if
  (= %{system} macosx))
 (deps
  detect.c
  (package ctypes))
 (action
  (run %{cc} -I %{ocaml_where} -I %{lib:ctypes:} -o %{targets} %{deps})))

Done: 31% (18/58, 40 left) (jobs: 0)^M                                    ^MFile "lib_gen/dune", line 27, characters 0-185:
27 | (rule
28 |  (targets detect.exe)
29 |  (enabled_if
30 |   (= %{system} macosx))
31 |  (deps
32 |   detect.c
33 |   (package ctypes))
34 |  (action
35 |   (run %{cc} -I %{ocaml_where} -I %{lib:ctypes:} -o %{targets} %{deps})))
Error: File unavailable:
/nix/store/xb81cf7qgnyvsyvp9nfj15dmqchc24gk-ctypes-0.22.0/doc/ctypes/CHANGES.md

我不懂 dune 是如何工作的，但我猜测 (deps detect.c (package ctypes)) 是将 ctypes 当作源代码依赖来构建了。我将 (package ctypes) 直接删去，一切都正常工作了。

我在 GitHub 提了一个 PR。

Fix forster nix flake on x86_64-darwin link error [fst-0003]

May 1, 2024
Jinser Kafka

使用 forster 原先的 nix flake 在 x86_64-darwin 上 build 到最后 link 的阶段会遇到 ld 报错：Unresolved Symbol(s): `_preadv` 和 Unresolved Symbol(s): `_pwritev`。

好吧，是因为 nixpkgs 中的 x86_64-darwin 默认用的 apple sdk 是 10.12 的，是个非常老的版本，确实没有这两个符号。解决方案就是用更高的版本 override 掉默认的 10.12 sdk，一般是 11。

Opam-nix override stdenv [nix-0008]

May 1, 2024
Jinser Kafka

/startverb
pkgs = import nixpkgs {
  inherit system;
  config.replaceStdenv = { pkgs }: if pkgs.stdenv.isDarwin then pkgs.overrideSDK pkgs.stdenv "11.0" else pkgs.stdenv;
};
...
scope = on.buildOpamProject' { repos = [ "${opam-repository}" ]; inherit pkgs; } ./. query;
/stopverb

这里有一个完整的 example

2. Translations [jsr-000N]

翻译文章的列表，标记的时间是翻译时间。

Translation. Using zippers to handle huge trees [jsr-000Q]

December 22, 2024
Diego Olivier Fernandez Pons with contributions from Jinser Kafka
Original

This article is translated from Diego Olivier Fernandez Pons's article titled "Using zippers to handle huge trees".

本文翻译自 Diego Olivier Fernandez Pons 题为《Using zippers to handle huge trees》的文章

tips: Click on the translation to expand and view the original text. 点击译文可展开查看原文。有修复原文 typo。

嗨，

Bonjour,

2003年4月9日星期三，Yang Shouxun 写道：

我不知道如何编写尾递归版本来构建树。如果连续属性不多且数据集也不是很大，那树会在堆栈溢出之前停止生长。

（译者按：连续属性是指其在数据集中可以取无限多值，通常是实数，例如身高、温度等。）

On Wednesday, 09 Apr 2003, Yang Shouxun wrote:

> I don't know how to write a tail recursive version to build trees.
> If there are not that many continuous attributes and the dataset is
> not so large, the tree stops growing before stack overflow.

2003 年 4 月 9 日星期三，Markus Mottl 写道：

这里的诀窍是使用延续传递风格（CPS）：传递一个包含后续计算所需的所有内容的函数闭包（延续）。子函数不返回结果，而是使用结果调用延续，这使得函数成为尾递归。

On Wednesday 09 April 2003, Markus Mottl wrote:

> The trick is to use continuation passing style (CPS): you pass a
> function closure (continuation) containing everything that's needed
> in subsequent computations.  Instead of returning a result, the
> sub-function calls the continuation with the result, which makes the
> functions tail-recursive.

zipper 是一种处理巨大（实际是无限）tree 的简单方法。

Zippers are a simple way to handle huge (in fact infinite) trees.

Zipper 的解释 _{Explanation of zippers} [#257]

December 22, 2024
Diego Olivier Fernandez Pons

zipper 可以被视为一种“函数式指针”，因为它能够提供：

纯 ( purely ) 的函数式和类型化的操作
O(1) 的访问指针元素
O(1) 的指针移动

Zippers may be seen as 'functional pointers' since they offer :
- purely functional and typed operations
- O(1) access to the pointed element
- O(1) pointer movements

限制是数据结构只允许有一个指针，且每次指针移动都会分配内存。

The restrictions are that only one pointer is allowed by data structure and every pointer movement allocates memory.

采用二叉搜索树的经典类型声明

Take a classical type declaration for binary search trees

type 'a tree = E | N of 'a tree * 'a * 'a tree * int

考虑一棵二叉搜索树和一个你想要指向的内部节点。要对指向的子树进行 O(1) 访问，它必须可以直接从数据结构的基础中获取。然后，树的其余部分必须保存在单独的地方。我们将沿着从根到指向的节点的路径来解构它

Consider a binary search tree and an inner node to which you want to
point. To have a 0(1) access to the pointed subtree, it has to be
directly available from the base of the data structure. Then, the
rest of the tree must be kept in a separate place. We will deconstruct
it along the path from the root to the pointed node

type 'a path =
  | Root
  | Left of 'a * 'a tree * 'a path
  | Right of 'a tree * 'a * 'a path

type 'a zipper = 'a tree * 'a path

zipper 约束 ( constraints ) 声明如下：

指向的子树
沿着通往根的路径断开的树的其余部分

The zipper constraint as announced :
- the pointed subtree
- the rest of the tree breaked along the path to the root

接着我们定义指针的移动（数据结构中的每个指针各一个）：

Then we define the pointer movements (one for each pointer in the data structure) :

exception Bottom

(* To be replaced by a balancing constructor *)
let makeDTree = fun l v r -> N (l, v, r, 0)

let move_left = fun (tree, path) ->
  match tree with
    | E -> raise Bottom
    | N (l, v, r, _) -> (l, Left (v, r, path))

let move_right = fun (tree, path) ->
  match tree with
    | E -> raise Bottom
    | N (l, v, r, _) -> (r, Right (l, v, path))

let move_up = fun (tree, path) ->
  match path with
    | Root -> raise Top
    | Left (v, r, tail) -> (makeDTree tree v r, tail)
    | Right (l, v, tail) -> (makeDTree l v tree, tail)

现在我们可以通过以下过程构建任意大的树：

构建一棵有界深度的树
选择下一个要发展的节点
移动当前指针到那个节点
继续构建这课树

Now we can build an arbitrary large tree by the following procedure :
- build a tree of bounded depth
- choose the node which will be developed next
- move the current pointer to that node
- continue building the tree

该过程使用有界的堆栈空间。

This procedure uses a bounded stack space

代码示例 _{Examples of code} [#259]

December 22, 2024
Diego Olivier Fernandez Pons

以下是从 Baire 中提取的一些常用二叉搜索树操作的示例（insert，delete，move_pointer_to，...）（当前版本 11 avril 2003，大量错误和损坏的代码，您可以在 Baire 的下载页面中找到它） :

（译者按：Baire 相关资源基本已不可用。）

Here are some examples of usual binary search trees operations made
with zippers (insert, delete, move_pointer_to, ...) extracted from
Baire (current version 11 avril 2003, plenty of bugs and breaked
code, you will find it in Baire's download pages) :

（译者按：使用 ocamlformat 重新格式化，与原文的格式稍有不同。）

let rec move_to_top = function
  | (tree, path) as pointer ->
    (match path with
     | Root -> pointer
     | Left (v, r, tail) -> move_to_top (makeDTree tree v r, tail)
     | Right (l, v, tail) -> move_to_top (makeDTree l v tree, tail))
;;

let rec move_to x = function
  | (tree, path) as pointer ->
    (match tree with
     | E ->
       (match path with
        | Right (_, rv, _) when x <= rv -> move_to x (move_up pointer)
        | Left (lv, _, _) when x >= lv -> move_to x (move_up pointer)
        | _ -> pointer)
     | N (_, v, _, _) ->
       (match compare x v with
        | n when n < 0 ->
          (match path with
           | Right (_, rv, _) when x < rv -> move_to x (move_up pointer)
           | Right _ | Root | Left _ -> move_to x (move_left pointer))
        | n when n > 0 ->
          (match path with
           | Left (lv, _, _) when x > lv -> move_to x (move_up pointer)
           | Left _ | Root | Right _ -> move_to x (move_right pointer))
        | _ -> pointer))
;;

let rec member_path x = function
  | Right (l, v, tail) ->
    (match compare x v with
     | n when n < 0 -> member x l
     | 0 -> true
     | _ -> member_path x tail)
  | Left (v, r, tail) ->
    (match compare x v with
     | n when n > 0 -> member x r
     | 0 -> true
     | _ -> member_path x tail)
  | Root -> false
;;

let rec zipper_member x = function
  | tree, path ->
    (match tree with
     | E -> member_path x path
     | N (l, v, r, _) ->
       (match compare x v with
        | n when n < 0 ->
          (match path with
           | Right (_, rv, _) when x <= rv -> member_path x path
           | Right _ | Root | Left _ -> member x l)
        | n when n > 0 ->
          (match path with
           | Left (lv, _, _) when x >= lv -> member_path x path
           | Left _ | Root | Right _ -> member x r)
        | _ -> true))
;;

let current_tree = function
  | tree, _ -> tree
;;

let current_value = function
  | tree, _ ->
    (match tree with
     | E -> None
     | N (_, v, _, _) -> Some v)
;;

let current_value' = function
  | tree, _ ->
    (match tree with
     | E -> raise Empty
     | N (_, v, _, _) -> v)
;;

let rec zipper_insert x = function
  | (tree, path) as pointer ->
    (match tree with
     | E ->
       (match path with
        | Right (_, rv, _) when x <= rv -> zipper_insert x (move_up pointer)
        | Left (lv, _, _) when x >= lv -> zipper_insert x (move_up pointer)
        | _ -> makeTree E x E, path)
     | N (_, v, _, _) ->
       (match compare x v with
        | n when n < 0 ->
          (match path with
           | Right (_, rv, _) when x < rv -> zipper_insert x (move_up pointer)
           | Right _ | Root | Left _ -> zipper_insert x (move_left pointer))
        | n when n > 0 ->
          (match path with
           | Left (lv, _, _) when x > lv -> zipper_insert x (move_up pointer)
           | Left _ | Root | Right _ -> zipper_insert x (move_right pointer))
        | _ -> pointer))
;;

let rec zipper_delete x = function
  | (tree, path) as pointer ->
    (match tree with
     | E ->
       (match path with
        | Right (_, rv, _) when x <= rv -> zipper_delete x (move_up pointer)
        | Left (lv, _, _) when x >= lv -> zipper_delete x (move_up pointer)
        | _ -> pointer)
     | N (l, v, r, _) ->
       (match compare x v with
        | n when n < 0 ->
          (match path with
           | Right (_, rv, _) when x < rv -> zipper_delete x (move_up pointer)
           | Right _ | Root | Left _ -> zipper_delete x (move_left pointer))
        | n when n > 0 ->
          (match path with
           | Left (lv, _, _) when x > lv -> zipper_delete x (move_up pointer)
           | Left _ | Root | Right _ -> zipper_delete x (move_right pointer))
        | _ -> move_to x (appendB l r, path)))
;;

let rec path_to_list result = function
  | Root -> result
  | Left (v, r, path) -> path_to_list (result @ (v :: to_ordered_list r)) path
  | Right (l, v, path) -> path_to_list (to_ordered_list_rec (v :: result) l) path
;;

let zipper_to_list = function
  | tree, path -> path_to_list (to_list tree) path
;;

Translation. Guile's reader, in guile [jsr-000R]

December 22, 2024
Andy Wingo with contributions from Jinser Kafka
Original

This article is translated from Andy Wingo's article titled "guile's reader, in guile".

本文翻译自 Andy Wingo 题为《guile's reader, in guile》的文章

tips: Click on the translation to expand and view the original text. 点击译文可展开查看原文。

晚上好！今天的简短（大概？）记录是一些关于 guile 书呆子 ( nargery ) 的内容。

Good evening! A brief note today about some Guile nargery.

历史的弧线 _{the arc of history} [#255]

December 22, 2024
Andy Wingo

就像在你打开收音机并期望听到威豹乐队 ( Def Leppard ) 时开始出现的许多语言实现一样，Guile 也有下半部分和上半部分。下半部分是用 C 编写的，暴露了一个共享库和一个可执行文件，而上半部分是用语言本身（在 Guile 的情况下是“Scheme”）编写的，并在语言实现开始时由 C 代码以某种方式加载。

Like many language implementations that started life when you could turn on the radio and expect to hear Def Leppard, Guile has a bottom half and a top half. The bottom half is written in C and exposes a shared library and an executable, and the top half is written in the language itself (Scheme, in the case of Guile) and somehow loaded by the C code when the language implementation starts.

自2010年左右以来，我们一直在努力将用C语言编写的部分改用 Scheme 编写。上周的信件讨论了动态链接的实现的替换，从使用libltdl库替换为在低级dlopen包装器之上的 Scheme 实现。我以前写过关于在 Scheme 中重写 eval 的内容，更近期则谈到在 Scheme 中实现与 C 语言实现相同性能的道路有时是漫长的。

Since 2010 or so we have been working at replacing bits written in C with bits written in Scheme. Last week's missive was about replacing the implementation of dynamic-link from using the libltdl library to using Scheme on top of a low-level dlopen wrapper. I've written about rewriting eval in Scheme, and more recently about how the road to getting the performance of C implementations in Scheme has been sometimes long.

这些重写有堂吉诃德式 ( quixotic ) 的一面。我内心深处有一些关于正确与错误的感觉，并且我从根本上知道从 C 转向 Scheme 是正确的事情。很多时候这种感觉都是完全不理性的，在许多情况下也显得不合时宜⸺比如，如果你有一项需要为客户完成的任务，你需要坐下来思考从这里到目标的最小步骤，而直觉对于你如何到达那里并没有多大作用。但有一个项目可以让你按照自己喜欢的方式做某件事是美好的，即使这需要 10 年，那也没关系。

These rewrites have a quixotic aspect to them. I feel something in my gut about rightness and wrongness and I know at a base level that moving from C to Scheme is the right thing. Much of it is completely irrational and can be out of place in a lot of contexts -- like if you have a task to get done for a customer, you need to sit and think about minimal steps from here to the goal and the gut doesn't have much of a role to play in how you get there. But it's nice to have a project where you can do a thing in the way you'd like, and if it takes 10 years, that's fine.

不过除了难以言表的动机之外，用 Scheme 重写一些东西还是有具体的好处的。我发现 Scheme 代码更容易维护，嗯，而且相比 C 的常见陷阱 ( pitfall ) ，Scheme 显然更安全。如果有一天我重写 Guile 的垃圾收集器，这会减少我的工作量。而且，Scheme 代码还具有 C 语言所没有的功能：尾部调用 ( tail call ) 、可恢复的分隔延续 ( resumable continuations ) 、运行时检测 ( run-time instrumentation ) 等等。

But besides the ineffable motivations, there are concrete advantages to rewriting something in Scheme. I find Scheme code to be more maintainable, yes, and more secure relative to the common pitfalls of C, obviously. It decreases the amount of work I will have when one day I rewrite Guile's garbage collector. But also, Scheme code gets things that C can't have: tail calls, resumable delimited continuations, run-time instrumentation, and so on.

以定界延续 ( delimited continuations ) 为例，大约五年前，我为 Guile 编写了一个以并行并发 ML 为模型的轻量级并发设施。它允许数百万条 fibers 存在于系统上。当一个 fiber 需要阻塞 I/O 操作（读或写）时，它会暂停其延续，并在操作变得可能时安排重启它。

Taking delimited continuations as an example, five years ago or so I wrote a lightweight concurrency facility for Guile, modelled on Parallel Concurrent ML. It lets millions of fibers to exist on a system. When a fiber would need to block on an I/O operation (read or write), instead it suspends its continuation, and arranges to restart it when the operation becomes possible.

为了让这一切成真，Guile 必须做出很多改变。首先是定界延续本身。后来是 Scheme 中 ports 设施上半部分的完全重写，以允许 port 操作挂起和恢复。可恢复 fibers 的许多障碍已被消除，但 Fibers 手册仍然列出了相当多的障碍。

A lot had to change in Guile for this to become a reality. Firstly, delimited continuations themselves. Later, a complete rewrite of the top half of the ports facility in Scheme, to allow port operations to suspend and resume. Many of the barriers to resumable fibers were removed, but the Fibers manual still names quite a few.

Scheme read, in Scheme [#256]

December 22, 2024
Andy Wingo

这给带来了我们今天的记录：我刚刚在 Scheme 中也重写了 Guile 的 reader！reader 是获取字符流并将其解析为 S 表达式的部分。以前是C语言，现在是 Scheme。

Which brings us to today's note: I just rewrote Guile's reader in Scheme too! The reader is the bit that takes a stream of characters and parses it into S-expressions. It was in C, and now is in Scheme.

这样做的主要动机之一是希望使 read 可挂起。通过此更改，现在可以在 fibers 上实现 REPL（读取-评估-打印循环）。

One of the primary motivators for this was to allow read to be suspendable. With this change, read-eval-print loops are now implementable on fibers.

另一个动机是最终修复 Guile 无法记录某些数据源位置的 bug。Guile 过去会使用弱键 ( weak-key ) 哈希表来使从 read 返回的数据与源位置相关联。但这仅适用于 fresh value，不适用于小整数或字符等立即数，也不适用于 keyword 和 symbol 等全局唯一的非立即数。因此对于这些，我们就没有任何源位置。

Another motivation was to finally fix a bug in which Guile couldn't record source locations for some kinds of datums. It used to be that Guile would use a weak-key hash table to associate datums returned from read with source locations. But this only works for fresh values, not for immediate values like small integers or characters, nor does it work for globally unique non-immediates like keywords and symbols. So for these, we just wouldn't have any source locations.

该问题的一个可靠解决方案是返回带注解 ( annotation ) 的对象，而不是使用另外的表。由于 Scheme 的宏扩展器已经被设置为与带注解的对象（语法对象）一起使用，因此一个新的 read-syntax 接口会非常好用。

A robust solution to that problem is to return annotated objects rather than using a side table. Since Scheme's macro expander is already set to work with annotated objects (syntax objects), a new read-syntax interface would do us a treat.

在 C 语言中实现 read 很难做到。但在 Scheme 中实现 read 则毫无问题。不过，调整扩展器以期望在语法对象内包含源位置有些繁琐，且源位置信息的增加使得输出文件的大小增大了几个百分比⸺这在部分上是 .debug_lines DWARF 数据的增加带来的，但也和宏中语法对象的序列化源位置有关。

With read in C, this was hard to do. But with read in Scheme, it was no problem to implement. Adapting the expander to expect source locations inside syntax objects was a bit fiddly, though, and the resulting increase in source location information makes the output files bigger by a few percent -- due somewhat to the increased size of the .debug_lines DWARF data, but also due to serialized source locations for syntax objects in macros.

速度方面，目前切换到 Scheme 的 read 是一个退步 ( regression ) 。旧的 reader 在这台笔记本电脑上记录源位置时每秒大概可以解析 15 或 16 MB，或者关闭源位置，那么有 22 或 23 MB/s。新的 reader 在旧模式下，使用弱键侧表记录源位置的解析速度大概为 10.5 MB/s，关闭位置时为 13.5 MB/s。新的 read-syntax 速度大约是 12 MB/s。我们将在未来几个月继续优化这些性能，但与原来的 reader 编写时的情况不同的是，现在的 reader 主要在编译时使用。（它在读取 s 表达式作为数据时仍然有用，因此仍然有理由提升其速度。）

Speed-wise, switching to read in Scheme is a regression, currently. The old reader could parse around 15 or 16 megabytes per second when recording source locations on this laptop, or around 22 or 23 MB/s with source locations off. The new one parses more like 10.5 MB/s, or 13.5 MB/s with positions off, when in the old mode where it uses a weak-key side table to record source locations. The new read-syntax runs at around 12 MB/s. We'll be noodling at these in the coming months, but unlike when the original reader was written, at least now the reader is mainly used only at compile time. (It still has a role when reading s-expressions as data, so there is still a reason to make it fast.)

与 eval 的情况一样，在加载 Scheme 版本之前，我们仍然有一个 C 版本的 reader 可用于引导目的。这次重写令人高兴的是，我能够从 C reader 中删除与非默认词法语法相关的所有缺陷，很好地简化了未来的维护。

As is the case with eval, we still have a C version of the reader available for bootstrapping purposes, before the Scheme version is loaded. Happily, with this rewrite I was able to remove all of the cruft from the C reader related to non-default lexical syntax, which simplifies maintenance going forward.

尝试逐个 bug ( bug-for-bug ) 重写的一个有趣方面是你会发现 bug 和意外行为。比如，事实证明，从出现以来，Guile 总是不需要终止分隔符地 read #t 和 #f，因此 read "(#t1)" 将得到列表 (#t 1)。很奇怪，对吧？更奇怪的是，当 #true 和 #false 别名被添加到语言中，Guile 决定默认支持它们，但以一种奇怪的向后兼容的方式…所以 "(#false1)" 读作 (#f 1) 但 "(#falsa1)" 读作 (#f alsa1)。诸如此类的事还有不少。

An interesting aspect of attempting to make a bug-for-bug rewrite is that you find bugs and unexpected behavior. For example, it turns out that since the dawn of time, Guile always read #t and #f without requiring a terminating delimiter, so reading "(#t1)" would result in the list (#t 1). Weird, right? Weirder still, when the #true and #false aliases were added to the language, Guile decided to support them by default, but in an oddly backwards-compatible way... so "(#false1)" reads as (#f 1) but "(#falsa1)" reads as (#f alsa1). Quite a few more things like that.

总的来说，这次重写似乎是成功的，没有引入新的行为，甚至产生了相同的错误。然而，对于回溯 ( backtrace ) 而言，情况并非如此，因为回溯可以暴露出 read 函数的内部实现，而之前由于 C 栈对 Scheme 是不透明的，这种情况并不会发生。因此，我们可能需要在调用 read 的地方添加更合理的错误处理，因为回溯信息无论如何都不是一个好的面向用户的错误反馈。

All in all it would seem to be a successful rewrite, introducing no new behavior, even producing the same errors. However, this is not the case for backtraces, which can expose the guts of read in cases where that previously wouldn't happen because the C stack was opaque to Scheme. Probably we will simply need to add more sensible error handling around callers to read, as a backtrace isn't a good user-facing error anyway.

好吧，今晚的闲聊已经够多了。祝大家 happy hacking，晚安！

OK enough rambling for this evening. Happy hacking to all and to all a good night!

Translation. Baby’s First Garbage Collector [jsr-000P]

December 20, 2024
Bob Nystrom with contributions from Jinser Kafka
Original

This article is translated from Bob Nystrom's article titled "Baby’s First Garbage Collector".

本文翻译自 Bob Nystrom 题为《Baby’s First Garbage Collector》的文章

tips: Click on the translation to expand and view the original text. 点击译文可展开查看原文。

当我感到压力大、要做的事情太多时，我会产生一种矛盾的反应：通过找其他事情来逃避。通常这件事是一个我可以编写和完成的小型独立程序。

When I get stressed out and have too much to do, I have this paradoxical reaction where I escape from that by coming up with another thing to do. Usually it’s a tiny self-contained program that I can write and finish.

一天早上，我正对我正在写的书、我在工作中必须做的事情以及我正在为《Strange Loop》准备的演讲感到害怕，突然间，我想，“我应该写一个垃圾回收器。”

The other morning, I was freaking myself out about the book I’m working on (http://gameprogrammingpatterns.com/) and the stuff I have to do at work (http://dart.dev/) and a talk I’m preparing for Strange Loop (https://www.infoq.com/presentations/dart-introduction/), and all of the sudden, I thought, “I should write a garbage collector.”

是的，我意识到那段话让我看起来有多么疯狂。但我的搭错神经带来的是编程语言实现基础的免费教程！在大约一百行常规 C 代码中，我成功地创建了一个基础的标记清除，你知道的，垃圾回收。

Yes, I realize how crazy that paragraph makes me seem. But my faulty wiring is your free tutorial on a fundamental piece of programming language implementation! In about a hundred lines of vanilla C, I managed to whip up a basic mark-and-sweep (https://en.wikipedia.org/wiki/Tracing_garbage_collection#Na%C3%AFve_mark-and-sweep) collector that actually, you know, collects.

一般认为，垃圾回收是编程中充满挑战的领域之一，不过在这篇文章里，我会引导你走上一条相对平坦的道路。（路上仍然可能有坑，但至少会浅一点）

Garbage collection is considered one of the more shark-infested waters of programming, but in this post, I’ll give you a nice kiddie pool to paddle around in. (There may still be sharks in it, but at least it will be shallower.)

少使用、物尽其用、循环再用 _{Reduce, reuse, recycle} [#260]

December 20, 2024
Bob Nystrom

垃圾回收背后的基本思想是该语言（在大多数情况下）似乎可以访问无限的内存。开发者可以不断地分配、分配、分配，就像变魔术一样，内存永远不会用完。

The basic idea behind garbage collection is that the language (for the most part) appears to have access to infinite memory. The developer can just keep allocating and allocating and allocating and, as if by magic, never run out.

当然，计算机并不具有无限的内存。因此，实现这个魔术的方式是，当它需要分配一些内存并意识到内存不足时，它会回收垃圾。

Of course, machines don’t have infinite memory. So the way the implementation does this is that when it needs to allocate a bit of memory and it realizes it’s running low, it collects garbage.

在这个语境中，“垃圾”是指之前分配的，现在不再使用了的内存。为了让无限内存的幻觉发挥作用，语言需要非常安全地“不再被使用”。当你的程序试图访问随机对象，如果这时它们就开始被回收，那就不好玩了。

“Garbage” in this context means memory it previously allocated that is no longer being used. For the illusion of infinite memory to work, the language needs to be very safe about “no longer being used”. It would be no fun if random objects just started getting reclaimed while your program was trying to access them.

为了可回收，该语言必须确保程序无法再次使用该对象。如果程序无法获取该对象的引用，那么它显然无法再次使用它。所以“使用中”的定义实际上非常简单：

任何被仍在作用域中的变量引用的对象都正在使用中。
任何被另一个正在使用的对象引用的对象都在使用中。

In order to be collectible, the language has to ensure there’s no way for the program to use that object again. If the program can’t get a reference to the object, then it obviously can’t use it again. So the definition of “in use” is actually pretty simple:
  1. Any object being referenced by a variable still in scope is in use.
  2. Any object referenced by another in-use object is in use.

第二条规则是递归规则。如果对象 A 被变量引用，并且它有一些引用对象 B 的字段，则 B 正在使用中，因为你可以通过 A 访问它。

The second rule is the recursive one. If object A is referenced by a variable, and it has some field that references object B, then B is in use since you can get to it through A.

最后我们得到的是可达对象的图⸺可达对象即所有可以从一个变量开始，遍历其对象来到达的对象。任何不在可达对象图中的对象对于程序来说都是死的，它的内存已经时机成熟，可以被回收了。

The end result is a graph of reachable objects—all of the objects in the world that you can get to by starting at a variable and traversing through objects. Any object not in that graph of reachable objects is dead to the program and its memory is ripe for a reaping.

标记清除 _{Marking and sweeping} [#261]

December 20, 2024
Bob Nystrom

有很多不同的方法可以实现查找和回收所有未使用对象的过程，但为此发明的最简单且第一个发明的算法称为“标记清除”。它是由 Lisp 和大胡子的发明者约翰·麦卡锡（John McCarthy）发明的，所以今天实现这个算法有点像与一位远古之神交流，希望不是以某种洛夫克拉夫特式的方式，否则你的思想和视网膜最终会被炸得一干二净。

There are a bunch of different ways (https://en.wikipedia.org/wiki/Tracing_garbage_collection) you can implement the process of finding and reclaiming all of the unused objects, but the simplest and first algorithm ever invented for it is called “mark-sweep”. It was invented by John McCarthy, the man who invented Lisp and beards, so implementing today is like communing with one of the Elder Gods, but hopefully not in some Lovecraftian way that ends with you having your mind and retinas blasted clean.

该算法的工作原理几乎与我们对可达性的定义完全相同：

从根开始，遍历整个对象图。每次到达一个对象时，将其上的“标记”位设置为 true。
完成后，找到所有未设置标记位的对象将其删除。

The algorithm works almost exactly like our definition of reachability:
  1. Starting at the roots, traverse the entire object graph. Every time you reach an object, set a “mark” bit on it to true.
  2. Once that’s done, find all of the objects whose mark bits are not set and delete them.

就是这样。我知道，你本来可以想出这个的，对吧？如果你这么做了，你就会成为一篇被引用数百次的论文的作者。这件事给我们的教训是，要在 CS 领域出名，你不必想出真正晦涩难懂的东西，你只需首先想出明显的东西即可。

That’s it. I know, you could have come up with that, right? If you had, you’d be the author of a paper cited hundreds of times. The lesson here is that to be famous in CS, you don’t have to come up with really obscure stuff, you just have to come up with obvious stuff first.

一对对象 _{A pair of objects} [#262]

December 20, 2024
Bob Nystrom

在我们开始实现这两个步骤之前，让我们先做一些准备工作。我们不会真正实现一种语言的解释器⸺没有语法分析器、字节码或任何弱智的东西⸺但我们确实需要一些最少量的代码来创建一些垃圾来回收。

Before we can get to implementing those two steps, let’s get a couple of preliminaries out of the way. We won’t be actually implementing an interpreter for a language—no parser, bytecode, or any of that foolishness—but we do need some minimal amount of code to create some garbage to collect.

假设我们正在为一种小语言编写一个解释器。它是动态类型的，有两种类型的对象：整数（int）和对（pair）。这是一个用来标记对象类型的枚举：

Let’s play pretend that we’re writing an interpreter for a little language. It’s dynamically typed, and has two types of objects: ints and pairs. Here’s an enum to identify an object’s type:

typedef enum {
  OBJ_INT,
  OBJ_PAIR
} ObjectType;

一个对可以是一对任何东西，两个整数，一个整数和另一个对，等等。仅凭这一点，你就可以走得更远。由于 VM 中的对象可以是其中任何一个，因此 C 中实现它的典型方法是使用标签联合（tagged union）。我们这样定义它：

A pair can be a pair of anything, two ints, an int and another pair, whatever. You can go surprisingly far (http://www.flickr.com/photos/raganwald/212588975/) with just that. Since an object in the VM can be either of these, the typical way in C to implement it is with a tagged union (http://en.wikipedia.org/wiki/Tagged_union). We define it thusly:

typedef struct sObject {
  ObjectType type;

  union {
    /* OBJ_INT */
    int value;

    /* OBJ_PAIR */
    struct {
      struct sObject* head;
      struct sObject* tail;
    };
  };
} Object;

主要的 Object 结构中有一个 type 字段，用于标识它的值类型⸺整数或对。接着它有一个联合（union）来保存这个整数或对的数据。如果你的 C 很生疏，那么联合就是一个结构体，其中字段在内存中重叠。由于给定的对象只能是一个整数或一个对，因此没有理由在单个对象中同时为所有三个字段提供内存。联合就是这么做的。非常好。

The main Object struct has a type field that identifies what kind of value it is—either an int or a pair. Then it has a union to hold the data for the int or pair. If your C is rusty, a union is a struct where the fields overlap in memory. Since a given object can only be an int or a pair, there’s no reason to have memory in a single object for all three fields at the same time. A union does that. Groovy.

迷你虚拟机 _{A minimal virtual machine} [#263]

December 20, 2024
Bob Nystrom

现在我们可以在小虚拟机中使用这个数据类型。虚拟机在这个背景中的作用是拥有一个存储当前范围内的变量的栈。大多数语言的虚拟机要么是基于栈的（如 JVM 和 CLR），要么是基于寄存器的（如 Lua）。在这两种情况下，实际上都仍然存在栈。它用于存储表达式中间所需的局部变量和临时变量。我们明确而简单地建模，如下所示：

Now we can use that datatype in a little virtual machine. The VM’s role in this story is to have a stack that stores the variables that are currently in scope. Most language VMs are either stack-based (like the JVM and CLR) or register-based (like Lua). In both cases, there is actually still a stack. It’s used to store local variables and temporary variables needed in the middle of an expression. We model that explicitly and simply like so:

#define STACK_MAX 256

typedef struct {
  Object* stack[STACK_MAX];
  int stackSize;
} VM;

现在已经有了基本的数据结构，让我们拼凑代码来创建一些东西。首先，编写一个创建并初始化虚拟机的函数：

Now that we’ve got our basic data structures in place, let’s slap together a bit of code to create some stuff. First, let’s write a function that creates and initializes a VM:

VM* newVM() {
  VM* vm = malloc(sizeof(VM));
  vm->stackSize = 0;
  return vm;
}

一旦我们有了虚拟机，我们就需要能够操作它的栈：

Once we have a VM, we need to be able to manipulate its stack:

void push(VM* vm, Object* value) {
  assert(vm->stackSize < STACK_MAX, "Stack overflow!");
  vm->stack[vm->stackSize++] = value;
}

Object* pop(VM* vm) {
  assert(vm->stackSize > 0, "Stack underflow!");
  return vm->stack[--vm->stackSize];
}

现在我们可以将东西放入“变量”中，我们需要能够实际创建对象。首先，一个小辅助函数：

Now that we can stick stuff in “variables”, we need to be able to actually create objects. First, a little helper function:

Object* newObject(VM* vm, ObjectType type) {
  Object* object = malloc(sizeof(Object));
  object->type = type;
  return object;
}

这会执行实际的内存分配并设置类型标记。我们稍后会重新讨论这个问题。使用这个函数，我们可以编写其他函数将每种类型的对象推送到虚拟机的栈上：

That does the actual memory allocation and sets the type tag. We’ll be revisiting this in a bit. Using that, we can write functions to push each kind of object onto the VM’s stack:

void pushInt(VM* vm, int intValue) {
  Object* object = newObject(vm, OBJ_INT);
  object->value = intValue;
  push(vm, object);
}

Object* pushPair(VM* vm) {
  Object* object = newObject(vm, OBJ_PAIR);
  object->tail = pop(vm);
  object->head = pop(vm);

  push(vm, object);
  return object;
}

这就是我们的小虚拟机。如果我们有一个语法分析器和一个解释器来调用这些函数，我们就拥有了一种对上帝诚实的语言。而且，如果我们有无限的内存，它甚至能够运行真正的程序。既然我们不这样做，那我们就开始回收一些垃圾吧。

And that’s it for our little VM. If we had a parser and an interpreter that called those functions, we’d have an honest to God language on our hands. And, if we had infinite memory, it would even be able to run real programs. Since we don’t, let’s start collecting some garbage.

标志标记 _{Marky mark} [#264]

December 20, 2024
Bob Nystrom

第一阶段是标记。我们需要遍历所有可到达的对象并设置它们的标记位。我们需要的第一件事是向 Object 添加一个标记位：

The first phase is marking. We need to walk all of the reachable objects and set their mark bit. The first thing we need then is to add a mark bit to Object:

typedef struct sObject {
  unsigned char marked;
  /* Previous stuff... */
} Object;

当我们创建了这个新对象，我们还要修改 newObject() 以将 marked 初始化为零。

When we create a new object, we modify newObject() to initialize marked to zero.

为了标记所有可到达的对象，我们从内存中的变量开始，这意味着遍历栈。看起来像这样：

To mark all of the reachable objects, we start with the variables that are in memory, so that means walking the stack. That looks like this:

void markAll(VM* vm)
{
  for (int i = 0; i < vm->stackSize; i++) {
    mark(vm->stack[i]);
  }
}

这又调用了 mark。我们将分阶段构建它。首先是：

That in turn calls mark. We’ll build that in phases. First:

void mark(Object* object) {
  object->marked = 1;
}

这是字面上最重要的一点（bit）。我们已将对象本身标记为可达。但记住，我们还需要处理对象中的引用⸺可达性是递归的。如果该对象是一个对，则它的两个字段也是可达的。处理递归很简单：

This is the most important bit, literally. We’ve marked the object itself as reachable. But remember we also need to handle references in objects—reachability is recursive. If the object is a pair, its two fields are reachable too. Handling that is simple:

void mark(Object* object) {
  object->marked = 1;

  if (object->type == OBJ_PAIR) {
    mark(object->head);
    mark(object->tail);
  }
}

你看到这里有一个 bug 了吗？我们现在正在递归，但我们没有检查成环。如果你有一堆在循环中相互指向的对，这将使 C 调用栈溢出并崩溃。

There’s a bug here. Do you see it? We’re recursing now, but we aren’t checking for cycles. If you have a bunch of pairs that point to each other in a loop, this will overflow the C callstack and crash.

为了解决这个问题，我们只需要在到达已经处理过的对象时退出即可。所以完整的 mark() 函数是：

To handle that, we simply need to bail out if we get to an object that we’ve already processed. So the complete mark() function is:

void mark(Object* object) {
  /* If already marked, we're done. Check this first
     to avoid recursing on cycles in the object graph. */
  if (object->marked) return;

  object->marked = 1;

  if (object->type == OBJ_PAIR) {
    mark(object->head);
    mark(object->tail);
  }
}

现在我们可以调用 markAll()，它将正确标记内存中每个可达的对象。我们已经完成一半了！

Now we can call markAll() and it will correctly mark every reachable object in memory. We’re halfway done!

清楚清除 _{Sweepy sweep} [#265]

December 20, 2024
Bob Nystrom

下一阶段是清除我们分配的所有对象并释放任何未标记的对象。但这里有一个问题：根据定义，所有未标记的对象都是无法访问的！我们无法获取他们！

The next phase is to sweep through all of the objects we’ve allocated and free any of them that aren’t marked. But there’s a problem here: all of the unmarked objects are, by definition, unreachable! We can’t get to them!

我们的虚拟机已经实现了对象引用的语言语义，因此我们只在变量和对的字段中存储指向对象的指针。一旦其中一个对象不再指向某个对象，虚拟机就完全丢失了它，并且实际上泄漏了内存。

The VM has implemented the language’s semantics for object references, so we’re only storing pointers to objects in variables and the pair fields. As soon as an object is no longer pointed to by one of those, the VM has lost it entirely and actually leaked memory.

解决这个问题的技巧是，虚拟机可以拥有自己的对象引用，与语言用户可见的语义不同。换句话说，我们可以自己追踪它们。

The trick to solve this is that the VM can have its own references to objects that are distinct from the semantics that are visible to the language user. In other words, we can keep track of them ourselves.

最简单的方法是维护我们分配的每个对象的链表。我们将 Object 本身扩展为该链表中的一个节点：

The simplest way to do this is to just maintain a linked list of every object we’ve ever allocated. We extend Object itself to be a node in that list:

typedef struct sObject {
  /* The next object in the list of all objects. */
  struct sObject* next;

  /* Previous stuff... */
} Object;

虚拟机追踪该链表的头节点：

The VM keeps track of the head of that list:

typedef struct {
  /* The first object in the list of all objects. */
  Object* firstObject;

  /* Previous stuff... */
} VM;

在 newVM() 中，我们确保将 firstObject 初始化为 NULL 。每当我们创建一个对象时，我们都会将其添加到链表中：

In newVM() we make sure to initialize firstObject to NULL. Whenever we create an object, we add it to the list:

Object* newObject(VM* vm, ObjectType type) {
  Object* object = malloc(sizeof(Object));
  object->type = type;
  object->marked = 0;

  /* Insert it into the list of allocated objects. */
  object->next = vm->firstObject;
  vm->firstObject = object;

  return object;
}

这样，即使语言找不到对象，但语言实现仍然可以。要扫描并删除未标记的对象，我们遍历列表：

This way, even if the language can’t find an object, the language implementation still can. To sweep through and delete the unmarked objects, we traverse the list:

void sweep(VM* vm)
{
  Object** object = &vm->firstObject;
  while (*object) {
    if (!(*object)->marked) {
      /* This object wasn't reached, so remove it from the list
         and free it. */
      Object* unreached = *object;

      *object = unreached->next;
      free(unreached);
    } else {
      /* This object was reached, so unmark it (for the next GC)
         and move on to the next. */
      (*object)->marked = 0;
      object = &(*object)->next;
    }
  }
}

由于指针指向指针，该代码读起来有点棘手，但如果你仔细阅读它，你会发现它非常简单。它只是遍历整个链表。每当它遇到未标记的对象时，它就会释放其内存并将其从链表中删除。完成操作后，我们会删除所有无法访问的对象。

That code is a bit tricky to read because of that pointer to a pointer, but if you work through it, you can see it’s pretty straightforward. It just walks the entire linked list. Whenever it hits an object that isn’t marked, it frees its memory and removes it from the list. When this is done, we will have deleted every unreachable object.

恭喜！我们有了一个垃圾回收器！只剩下最后一个碎片：实际调用它。首先让我们将这两个阶段结合在一起：

Congratulations! We have a garbage collector! There’s just one missing piece: actually calling it. First let’s wrap the two phases together:

void gc(VM* vm) {
  markAll(vm);
  sweep(vm);
}

你找不到更清晰的标记清除实现了。最棘手的部分是要弄清楚什么时候真正调用它。“内存不足”到底意味着什么，尤其是在虚拟内存接近无限的现代计算机上？

You couldn’t ask for a more obvious mark-sweep implementation. The trickiest part is figuring out when to actually call this. What does “low on memory” even mean, especially on modern computers with near-infinite virtual memory?

事实证明，这里没有精确的正确或错误答案。这实际上取决于你使用虚拟机的目的以及它运行的硬件类型。为了使这个示例简单，我们将在一定次数的分配后进行回收。这实际上就是某些语言实现的工作原理，而且很容易实现。

It turns out there’s no precise right or wrong answer here. It really depends on what you’re using your VM for and what kind of hardware it runs on. To keep this example simple, we’ll just collect after a certain number of allocations. That’s actually how some language implementations work, and it’s easy to implement.

我们扩展 VM 来追踪我们创建了多少个：

We extend VM to track how many we’ve created:

typedef struct {
  /* The total number of currently allocated objects. */
  int numObjects;

  /* The number of objects required to trigger a GC. */
  int maxObjects;

  /* Previous stuff... */
} VM;

然后初始化它们：

And then initialize them:

VM* newVM() {
  /* Previous stuff... */

  vm->numObjects = 0;
  vm->maxObjects = INITIAL_GC_THRESHOLD;
  return vm;
}

INITIAL_GC_THRESHOLD 将是我们启动第一次 GC 时的对象数量。较小的数字在内存方面更加保守，较大的数字在垃圾回收上花费的时间较少。按个人口味调整。

The INITIAL_GC_THRESHOLD will be the number of objects at which we kick off the first GC. A smaller number is more conservative with memory, a larger number spends less time on garbage collection. Adjust to taste.

我们每创建一个对象，都增加 numObjects 并在其达到最大值时进行一次回收：

Whenever we create an object, we increment numObjects and run a collection if it reaches the max:

Object* newObject(VM* vm, ObjectType type) {
  if (vm->numObjects == vm->maxObjects) gc(vm);

  /* Create object... */

  vm->numObjects++;
  return object;
}

我不会费心展示它，但我们还会调整 sweep() 以在每次释放时减少 numObjects。最后我们修改 gc() 来更新最大值：

I won’t bother showing it, but we’ll also tweak sweep() to decrement numObjects every time it frees one. Finally, we modify gc() to update the max:

void gc(VM* vm) {
  int numObjects = vm->numObjects;

  markAll(vm);
  sweep(vm);

  vm->maxObjects = vm->numObjects * 2;
}

每次回收后，我们都会根据回收后剩余的存活对象数量更新 maxObjects。那里的乘数让我们的堆随着存活对象数量的增加而增长。同样，如果一堆对象最终被释放，它会自动收缩。

After every collection, we update maxObjects based on the number of live objects left after the collection. The multiplier there lets our heap grow as the number of living objects increases. Likewise, it will shrink automatically if a bunch of objects end up being freed.

简单 _Simple [#266]

December 20, 2024
Bob Nystrom

你做到了！如果你跟着做了所有这些，那么你现在已经掌握了简单的垃圾回收算法。如果你想一次查看全部内容，在这里参阅完整代码。我在这里强调一下，虽然这个收集器很简单，但它不是一个玩具。

You made it! If you followed all of this, you’ve now got a handle on a simple garbage collection algorithm. If you want to see it all together, here’s the full code. Let me stress here that while this collector is simple, it isn’t a toy.

你可以在此基础上构建大量优化⸺在垃圾回收和编程语言中，优化占了 90% 的工作量⸺但这里的核心代码是合法的真实垃圾回收器。它与直到最近的 Ruby 和 Lua 中的回收器都非常相似。你可以在生产代码中使用这样的代码。现在去构建一些很棒的东西吧！

There are a ton of optimizations you can build on top of this—in GCs and programming languages, optimization is 90

Translation. Zipper in scheme [jsr-000O]

December 18, 2024
Oleg Kiselyov with contributions from Jinser Kafka
Original

This article is translated from Oleg Kiselyov's article titled "zipper in scheme".

本文翻译自 Oleg Kiselyov 题为《zipper in scheme》的文章

Title added by translator. 标题由译者添加。

tips: Click on the translation to expand and view the original text. 点击译文可展开查看原文。

前言 [#267]

December 18, 2024
Oleg Kiselyov

zipper 是一种非常方便的数据结构，它允许我们替换复杂数据结构（例如树或 term）深处的项，而无需任何可变性。结果数据结构将尽可能多地与旧结构共享其组件 [见附录]。旧的数据结构仍然可用（这便于我们撤销该操作）。本质上，zipper 是一个“可更新”但纯函数式的数据结构游标。

Zipper is a very handy data structure that lets us replace an item deep in a complex data structure, e.g., a tree or a term, without any mutation. The resulting data structure will share as much of its components with the old structure as possible [see addendum]. The old data structure is still available (which can be handy if we wish to 'undo' the operation later on). Zipper is essentially an `updateable' and yet pure functional cursor into a data structure.

有用的参考:

http://www.nist.gov/dads/HTML/zipper.html
http://citeseer.ist.psu.edu/hinze01web.html

Useful references:
        http://www.nist.gov/dads/HTML/zipper.html
        http://citeseer.ist.psu.edu/hinze01web.html

zipper 是一种由定界延续具体化而来的数据结构。不知何故，这个想法在 zipper 领域并不常见。因为 scheme 有一等支持的定界延续，我们可以更容易地派生和使用 zipper。

Zipper is a _delimited continuation_ reified as a data structure. Somehow that idea is not commonly discussed in the zipper literature. Because Scheme has first-class and delimited continuations, we can derive and use zipper far more easily.

本文将给出 zipper 的派生及其使用示例：交换两棵树的两个分支。该使用示例是遗传编程中典型的交叉操作。

Given below is a derivation of zipper and an example of its use: swapping out of two branches of a two trees. The latter is a typical cross-over operation in genetic programming.

noelwelsh@gmail.com (Noel Welsh) 在消息中写

news:<c2b7813c.0410230826.3b0f71f0@posting.google.com>...

我想以纯函数式的方式进行交叉，我设想的算法是：

计算两棵树中每棵树的节点数

选择 0 ...（节点数 - 1）之间的随机整数，在这个（整数索引的）节点进行交叉。

noelwelsh@gmail.com (Noel Welsh) wrote in message
news:<c2b7813c.0410230826.3b0f71f0@posting.google.com>...
I would like to do the crossover in a purely functional manner.  The
algorithm I envisage is:
  - Count the number of nodes for each of the two trees
  - Choose an random integer between 0 ... (number of nodes - 1)
    This is the node where crossover will take place.

正如上文指出的，我们不需要通过计数来从树中选择随机节点。选择节点后，我们可以使用 eq? 测试来向下拉开树中的该节点。然而，在下文中，为了简单起见，我们跳过随机选择，按深度优先顺序和节点索引选择节点。

As pointed out earlier, we don't need counting to select a random node from a tree. After we selected the node, we can zip down to that node in the tree using the eq? test. In the following however, we skip the random selection for simplicity and thus we shall be selecting nodes by their index in the depth-first order.

派生 [#268]

December 18, 2024
Oleg Kiselyov

为了派生 zipper，我们首先编写更熟悉的深度优先遍历例程：

To derive zipper, we first write the familiar depth-first traversal routine:

Welcome to Scheme 48 1.1
> ,open srfi-9
> ,open escapes signals
> ,load /usr/local/lib/scheme48/misc/shift-reset.scm

; deterministic, left-to-right map
(define (map* f l)
  (if (null? l) l
    (cons (f (car l)) (map* f (cdr l)))))

(define (depth-first handle tree)
  (cond
    ((null? tree) tree)
    ((handle tree) => (lambda (new-tree) new-tree))
    ; the node was not handled -- descend
    ((not (pair? tree)) tree) ; an atom
    (else
      (cons (car tree) ; node name
        (map* (lambda (kid) (depth-first handle kid)) (cdr tree))))))

函数 handle，作为 depth-first 的第一个参数，接受一个 node 并应该返回一个 node 或 #f。在第一种情况下，返回的 node 替换结果树中的现有节点。如果 handle 返回 #f，则表明它已拒绝处理该节点，因此我们保留该节点，在可能的情况下继续处理该节点的子节点。

The function handle, the first-argument of depth-first, receives a node and should yield either a node or #f. In the first case, the return node replaces the existing node in the result tree. If the handle returned #f, it has declined to handle that node, so we keep that node and descend into it, if possible.

为了解其工作原理，我们定义两个示例树并打印出它们的节点：

To see how this works, we define two sample trees and print out their nodes:

(define tree1 '(a (b) (c (d 1 2)) e))
(define tree2 '(z (u) (v (w 10 12)) y))

(depth-first (lambda (node) (display node) (newline) #f) tree1)
==> prints
  (a (b) (c (d 1 2)) e)
  (b)
  (c (d 1 2))
  (d 1 2)
  1
  2
  e
==> yields
'(a (b) (c (d 1 2)) e)

我们现在可以定义数据类型 zipper：

We can now define the zipper data structure:

(define-record-type zzipper
  (zipper curr-node k)
  zipper?
  (curr-node z-curr-node)
  (k z-k))

它包含两个字段：树的当前节点和延续。延续应该接收一个 node 或 #f。在前一种情况，接收到的节点将替换现有节点。在后一种情况，我们保留现有节点并继续遍历。这个延续返回一个新的 zipper 或一课树（如果遍历完成）。可以看出，zipper 在某种意义上是函数 handle 的“逆”。

It contains two fields: the current node of a tree, and the continuation. The continuation should receive a node or #f. In the former case, the received node will replace the existing node. In the latter case, we keep the existing node and continue the traversal. The continuation returns either a new zipper, or a tree (if the traversal is finished). One can see that zipper is in a sense an 'inverse' of the function handle.

(define (zip-tree tree)
  (reset (depth-first (lambda (tree) (shift f (zipper tree f))) tree)))

正如所承诺的，zipper 确实是定界延续的一种表现形式。

As promised, zipper is indeed a manifestation of a delimited continuation.

我们应该指出，record zipper 和构造函数 zip-tree 都是 generic 的。它们本身既不依赖树的表示，也不依赖遍历策略。有关树数据结构及其遍历的所有信息都被封装在一个函数 depth-first 中。我们可以通过改变深度优先策略从深度优先策略切换到广度优先策略，或者从嵌套列表切换到树的嵌套向量实现。无论是 zipper、zip-tree 还是任何使用 zipper 的代码（见下文）都不需要任何修改。我们的 zipper 的这一特性与 Gerard Huet 的 zipper 派生形成了鲜明的对比。在 Gerard Huet 的形式中，zipper 确实取决于数据类型的具体实现：zipper 是从数据类型 derived（双关语）的（译者按：派生和求导）。不同的数据类型（以及抽象数据类型的不同实现）将具有不同的对应 zipper 结构。在我们的形式中，zipper 是遍历函数的 generic derivation（双关语）（译者按：派生和导数）。zipper 是遍历函数的导数——就只是机械的导数。因此，shift/reset 可以被视为遍历函数的导数运算符。

We should point out that both the zipper record and the constructor function zip-tree are _generic_. They by themselves depend neither on the representation of the tree nor on the traversal strategy. All the information about the tree data structure and its traversal is encapsulated in one single function depth-first. We can switch from depth-first to breadth-first strategy or from a nested list to a nested vector realization of trees just by changing depth-first. Neither zipper, nor zip-tree, nor any code that uses zipper (see below) will require any modifications. This property of our zipper is in a marked contrast with Gerard Huet's derivation of zipper. In Gerard Huet's formulation, zipper does depend on the concrete realization of the data type: zipper is derived (pun intended) from the data type. Different data types (and different realizations of an abstract data type) will have different corresponding zipper structures. In our formulation, zipper is a _generic_ derivation (pun intended) on the traversal function. Zipper is a derivative of the traversal function -- mechanical derivative at that. So, shift/reset can be considered traversal function derivative operators.

使用 [#269]

December 18, 2024
Oleg Kiselyov

我们现在可以用一种不同的方式打印树：

We can now print out the tree in a different way:

(define (print-tree tree)
  (do ((cursor (zip-tree tree) ((z-k cursor) #f)))
      ((not (zipper? cursor)))
    (display (z-curr-node cursor))
    (newline)))

我们使用 zipper（即 cursor）逐个节点地检查整个树。从某种意义上说，我们翻转了 depth-first 的操作。

we use zipper, which is a cursor, to examine all of the tree, node by node. In a sense, we have inverted the operation of depth-first.

(print-tree tree1)
; prints as before

(print-tree tree2)
  (z (u) (v (w 10 12)) y)
  (u)
  (v (w 10 12))
  (w 10 12)
  10
  12
  y

引入一些有用的函数

We introduce a few helpful functions

(define (zip-all-the-way-up zipper)
  (if (zipper? zipper) (zip-all-the-way-up ((z-k zipper) (z-curr-node zipper)))
    zipper))

(define (locate-nth-node n tree)
  (do ((i 0 (+ 1 i)) (cursor (zip-tree tree) ((z-k cursor) #f)))
    ((and (= i n)
       (if (zipper? cursor) #t
        (error "too few nodes"))) cursor)
    ))

我们已准备好做一些事：

And we are ready for some action:

; replace the 3-d node of tree1 with 'xxx
(let ((desired-node (locate-nth-node 3 tree1)))
  (display "Replacing the node: ")
  (display (z-curr-node desired-node))
  (newline)
  (zip-all-the-way-up ((z-k desired-node) 'xxx)))

==> prints
    Replacing the node: (d 1 2)
==> yieds
    '(a (b) (c xxx) e)

它确实替换了，不是吗？

It did replace it, didn't it?

; cross-over of the 3d node of tree1 and 1st node of tree2
(let* ((desired-node1 (locate-nth-node 3 tree1))
       (_ (begin
       (display "Cross-over the node1: ")
       (display (z-curr-node desired-node1))
       (newline)))
       (desired-node2 (locate-nth-node 1 tree2))
       (_ (begin
       (display "Cross-over the node2: ")
       (display (z-curr-node desired-node2))
       (newline)))
       (new-tree1
         (zip-all-the-way-up ((z-k desired-node1)
                              (z-curr-node desired-node2))))
       (new-tree2
          (zip-all-the-way-up ((z-k desired-node2)
                               (z-curr-node desired-node1))))
        )
  (display "new tree1: ") (display new-tree1) (newline)
  (display "new tree2: ") (display new-tree2) (newline)
)

==> prints
  Cross-over the node1: (d 1 2)
  Cross-over the node2: (u)
  new tree1: (a (b) (c (u)) e)
  new tree2: (z (d 1 2) (v (w 10 12)) y)

嗯，看来可行...

Well, it seems to work...

如果我们交换 tree1 的第3个节点和 tree2 的第5个节点，我们得到

If we swap the 3d node of tree1 and the 5th node of tree2, we get

  Cross-over the node1: (d 1 2)
  Cross-over the node2: 12
  new tree1: (a (b) (c 12) e)
  new tree2: (z (u) (v (w 10 (d 1 2))) y)

总结 [#270]

December 18, 2024
Oleg Kiselyov

总而言之，定界延续非常有用。可以在任何 R5RS scheme 系统中进行模拟；但如果它本身受支持，性能会更好。Scheme48 本身就支持定界延续（Martin Gasbichler and Michael Sperber, ICFP 2002）。如果你最喜欢的 Scheme 系统默认没有提供它们，请向实现者投诉。支持哪个特定的定界延续运算符（shift、control、shift0、splitter、cupto 等）并不重要——所有这些的表达性都是相同的：

To conclude, delimited continuations are quite useful. They can be emulated in any R5RS Scheme system; yet it is better for performance if they are supported natively. Scheme48 does support delimited continuations natively (Martin Gasbichler and Michael Sperber, ICFP 2002). If your favorite Scheme system does not offer them by default, please complain to the implementors. It doesn't matter which particular delimited continuation operator (shift, control, shift0, splitter, cupto, etc) is supported -- all of them are equally expressible:

Chung-chieh Shan, Scheme2004 workshop
http://www.eecs.harvard.edu/~ccshan/recur/

附录 [#271]

December 18, 2024
Oleg Kiselyov

附录，2006 年 6 月 7 日 [受到 Andrew Wilcox 问题的启发] 更准确地说，zipper 与底层枚举器一样保留了共享。以下是最大共享保留枚举器。这两个函数应该取代本文中的函数。

Addendum, June 7, 2006 [inspired by a question from Andrew Wilcox] To be more precise, the zipper preserves sharing as much as the underlying enumerator does. The following is the maximal sharing preserving enumerator. Those two functions should replace the ones in the article.

; deterministic, left-to-right map
; It preserves sharing as much as possible: that is, if given the pair
; l == (cons h t), (and (eq? h (f h)) (eq? t (map* f t))) holds, then
; (eq? (map* f l) l) holds as well.
(define (map* f l)
  (if (null? l) l
    (let ((h (car l)) (t (cdr l)))
      (let ((h1 (f h)) (t1 (map* f t)))
        (if (and (eq? h1 h) (eq? t1 t)) l
            (cons h1 t1))))))

(define (depth-first handle tree)
  (cond
    ((null? tree) tree)
    ((handle tree) => (lambda (new-tree) new-tree))
    ; the node was not handled -- descend
    ((not (pair? tree)) tree) ; an atom
    (else
      (let ((kids1 
             (map* (lambda (kid) (depth-first handle kid)) (cdr tree))))
        (if (eq? kids1 (cdr tree)) tree
            (cons (car tree) ; node name
                kids1))))))

为了测试新的 depth-first 确实保留了共享，我们求值

To test that the new depth-first indeed preserves sharing, we evaluate

(eq? tree1
     (depth-first (lambda (node) (display node) (newline) #f) tree1))

以深度优先顺序打印所有节点后，给出结果 #t。在这种情况下，深度优先（depth-first）返回的树确实是原始树。

which, after printing all nodes in depth-first order, gives the result #t. The tree returned by depth-first in this case is indeed the original tree as it is.

zipper 代码无需更改，按原样工作，具有相同的结果。为了测试共享是否保留，我们首先通过替换 tree2 中的第 6 个节点（即 y）来生成一棵树：

The zipper code needs no changes, and it works as it was, with the same results. To test the sharing preservation, we first produce a tree by replacing the 6th node (which is y) in tree2:

(define tree2*
  (let ((desired-node (locate-nth-node 6 tree2)))
  (display "Replacing the node: ")
  (display (z-curr-node desired-node))
  (newline)
  (zip-all-the-way-up ((z-k desired-node) 'newy))))

这是结果：(z (u) (v (w 10 12)) newy)

here's the result: (z (u) (v (w 10 12)) newy)

现在，我们编写一个函数，它接受两棵树，同步遍历它们并打印出节点以及它们是否共享：

Now, we write a function that takes two trees, traverses them in lockstep and prints out the nodes and if they are shared:

(define (tree-compare-sharing t1 t2)
  (do ((cursor1 (zip-tree t1) ((z-k cursor1) #f))
       (cursor2 (zip-tree t2) ((z-k cursor2) #f)))
      ((cond
        ((and (zipper? cursor1) (zipper? cursor2)) #f)
      ((zipper? cursor1) (display "t2 finished early") #t)
      ((zipper? cursor2) (display "t1 finished early") #t)
      (else #t)))
      (let ((n1 (z-curr-node cursor1)) (n2 (z-curr-node cursor2)))
        (cond
          ((eq? n1 n2) (display "shared node: ") (display n1))
          (else (display "t1 node: ") (display n1) (newline)
                (display "t2 node: ") (display n2)))
        (newline))))

(tree-compare-sharing tree2 tree2*)
===>
t1 node: (z (u) (v (w 10 12)) y)
t2 node: (z (u) (v (w 10 12)) newy)
shared node: (u)
shared node: (v (w 10 12))
shared node: (w 10 12)
shared node: 10
shared node: 12
t1 node: y
t2 node: newy

Put a task list here.