~/data/org-attach by
using the org mode file attachment feature. To retrieve a file, I
usually use agenda search or other org packages which can go over your
org files and retrieve a headline.
But what for those rare moments when I only have a vague idea of what I'm looking for and can't hit the query exactly? Well... we can use a semantic search.
OpenAI recently opened a Large Language Model Embedding API. Roughly speaking, embedding takes a chunk of text and returns a fixed sized vector in the model's "knowledge space" (OpenAI's ada model returns 1536 dimensional vector). Two vectors which are close to each other in this knowledge space should correspond to similar things.
The usual metric of judging closeness of vectors is the "angle" between them. This is hard to imagine in 1536 dimensions, but to keep it short we can use the dot product operation to compute the "similarity".
So the idea here is simple:
I wrote my own "org brain" clone which I called org-graph because I wasn't happy with any available solution. My biggest problem was that most systems either prescribed one file per note, or only worked with top level headlines, or had other artificial limitations. So it was logical this feature would live in that package as well.
But just after I got ready with all the preprocessing and data preparation and API implementation, there came a shock. Emacs math is SLOOOOOOW. So slow this "dot product" operation was taking ages and the interface sucked.
But then I remembered about dynamic modules. Not wanting to give up, I decided to write a C module for some good old linear algebra.
You start with a header and some initialization:
#include <emacs-module.h> int plugin_is_GPL_compatible; int emacs_module_init (struct emacs_runtime *runtime) { emacs_env *env = runtime->get_environment (runtime); return 0; }
Then you can start implementing the functions. I'm not going to repeat everything here, you can find the full C source at GitHub. The following function computes the dot product, which is really just a lot of multiplication and addition.
static emacs_value dot_product (emacs_env *env, ptrdiff_t nargs, emacs_value *args, void *data) { assert (nargs == 2); emacs_value a = args[0]; emacs_value b = args[1]; ptrdiff_t size = env->vec_size (env, a); double dp = 0; for (int i = 0; i < size; i++) { double first = get_at(env, a, i); double second = get_at(env, b, i); dp += first * second; } emacs_value result = env->make_float (env, dp); return result; }
With a simple Makefile
.PHONY: all all: dotproduct.so dotproduct.o: dotproduct.c gcc -Wall -c dotproduct.c dotproduct.so: dotproduct.o gcc -shared -o dotproduct.so dotproduct.o
we can build the module
> make
gcc -Wall -c dotproduct.c
gcc -shared -o dotproduct.so dotproduct.o
Finally, we load the module into Emacs with:
(module-load (expand-file-name "dotproduct.so"))
Armed with the now much faster math routines, I embedded about 2000 headers and now I can search my files by just giving very vague queries---it works surprisingly well, even across multiple different natural languages (i.e. if I ask about Franz Kafka's Castle it will return "Das Schloß" entry from my foreign language reading file).
(org-graph-openai-query "Book about journey to the center of the Earth")
The top 10 results. You can see it mixes the languages but all the things are somewhat related. However, it got the first hit exactly right.
[8de3805f-8971-404a-98d2-84305db1a444] 87.43 Voyage au centre de la Terre
[f128f559-74a6-4435-a357-aa7d35976097] 85.43 Nicolai Klimii Iter Subterraneum
[2e0f3f7a-73bd-4b0f-a588-c757432607ca] 85.26 Endurance: Shackleton's Incredible Voyage
[d2828ffb-343c-431f-8657-521ef32de079] 85.11 Vesmír v orechovej škrupinke
[724b7e0d-f804-4d1d-8767-c4d166492472] 84.83 Oheň nad hlubinou - Pád Straumli
[5e2b17a4-902a-4e55-87f3-25a18f239346] 84.78 The Earthsea
[5e5906eb-7ab6-4b86-b41f-2bd007ff8eba] 84.65 Tolkien: Sur les rivages de la Terre du Milieu
[c33a670e-8ec2-4886-9e12-eee71891d2ea] 84.21 Vladimir Ulrich - Bis ans Ende der Welt - Ein Pilgerbuch
[d3c57fb4-9458-487d-bcf7-9852b9fec3cf] 84.09 The Lost World
[5d0d82de-df40-43f7-9868-3470d1bd376d] 83.95 Oheň nad hlubinou - Planeta spárů
Here's another example which is purposefully silly description of the expected book's title:
(org-graph-openai-query "Philosophical book where someone spoke in a particular way")
The top 10 results from my org files are:
[5d59a53e-fb81-4927-bce2-5096d9fb8417] 88.69 Thus Spoke Zarathustra
[15e61234-4d6b-413c-986d-391996f09a19] 87.82 Quotes
[7922112d-e871-419f-a4f2-68e892d5dad1] 87.04 Epistemology
[82e38b44-5ce1-416f-bd19-075aa70c7bf6] 86.96 Western Philosophy
[babcdf60-5840-4772-ba2b-314faa756997] 86.94 Nietzsche
[b192f472-d4f6-4596-8520-627b2c77e783] 86.90 Treatise on Human Nature
[b6f1d479-53fb-4e88-a474-024fbafab99e] 86.75 The Short History of Modern Philosophy
[1a366e71-eb03-472b-8c7c-d619494e9693] 86.62 The way of the bow
[2d738ef8-4a23-4b32-bcc1-0a1cc941c4be] 86.55 The Discourses
[2f00f564-af67-4d04-bf41-8cabe7764cdb] 86.36 The Life of Reason - Santayana
Not bad, eh :)
To prepare the embeddings, you can run the function
org-graph-compute-embeddings-for-buffer in an org buffer. Make sure
to set the environment variable OPENAI_TOKEN. Also be aware that this
will add the ID property to every headline as this is a way to track
the cached embedding to the particular headline. Make sure to backup
your org files before running this (you have them checked in to git
right... right?!). This will fire several requests to the API (about
20 headings per request), so be patient, it should take about a minute
for 500-600 headings.
You can discuss and ask questions on the discussions board on GitHub.
This blog post was inspired by GPT for second brains.
]]>org-capture user and I use about 10 templates to save the
ideas/tasks to appropriate places (work / life / emacs / other
projects / reading...). Sometimes, however, it is quite difficult to
determine at the time of capture where to put the note, or it would
take a lot of time to categorize properly... or sometimes I'm just
lazy. For these situations I use a general refile.org file. Anything
I don't want to deal with right now goes there.
Then I often end up with 200+ notes in this file and I have to deal with it somehow during my weekly reviews. Many items I simply delete, but some I refine and then refile away to where they belong.
I use about 10 huge org files to store my data and simply calling
org-refile is very slow and the number of targets grows into tens of
thousands which makes the experience sub-optimal.
I've written a simple Elisp defmacro to generate specialized versions
of org-refile where I can limit the targets to one file or a subset of
files. This is done by let-binding org-refine-targets variable and
then calling org-refile---it will pick up the new setting. I also
automatically clear the cache because during this process I often add
or move headlines around and the cache is most of the time stale. In
practice it's not a problem because refiling to just one file is
fast-enough to rebuild the cache on-the-go.
(defmacro my-org-make-refile-command (fn-suffix refile-targets) "Generate a command to call `org-refile' with modified targets." `(defun ,(intern (concat "my-org-refile-" (symbol-name fn-suffix))) () ,(format "`org-refile' to %S" refile-targets) (interactive) (org-refile-cache-clear) (let ((org-refile-target-verify-function nil) (org-refile-targets ,refile-targets)) (call-interactively 'org-refile))))
It's quite straight-forward, we have a defun skeleton and we splice
the name and the target there. The expansion looks like this
(my-org-make-refile-command kb '(("~/data/documents/kb.org" :maxlevel . 9))) ;; expands to (defun my-org-refile-kb nil "`org-refile' to (quote ((\"~/data/documents/kb.org\" :maxlevel . 9)))" (interactive) (org-refile-cache-clear) (let ((org-refile-target-verify-function nil) (org-refile-targets '(("~/data/documents/kb.org" :maxlevel . 9)))) (call-interactively 'org-refile)))
Throw in a cool hydra and you're all set!
(my-org-make-refile-command kb '(("~/data/documents/kb.org" :maxlevel . 9))) (my-org-make-refile-command reading '(("~/org/reading.org" :maxlevel . 9))) (my-org-make-refile-command this-file `((,(buffer-file-name) :maxlevel . 9))) (defhydra my-org-refile-hydra (:color blue :hint nil) " _t_his file Special files: --------------------- _k_b.org _r_eading.org" ("k" my-org-refile-kb) ("r" my-org-refile-reading) ("t" my-org-refile-this-file)) (bind-key "C-c r" #'my-org-refile-hydra/body org-mode-map)
People who know my rants on Emacs and especially font-lock-mode know that I consider it a rather crappy hack. Parsing complex context sensitive languages with a bunch of very weak regexes2 just screams This is a really bad idea! Well, either way I was always forced to admit that yes, it is a hack, but damn does it work in practice! Very rarely there is some problem you can't solve, and if the need comes, you can actually use arbitrary elisp code as the matcher so long as it sets match-data the same way re-search-forward would.
Today I had a problem I thought would finally prove my point about how bad font-lock is and that we should all bike-shed and invent totally awesome formal parsers... then I went back to the docstring and of course Emacs can actually solve the problem.
The issue is the following: I'm writing a DSL which looks kind of like Haskell types, but written in sexps. So where in Haskell one writes
function :: Int -> String -> (String -> Int) -> [Float]
in my DSL it would look something like
(type function :: int -> string -> (string -> int) -> [float])
Now, how would I fontify those string and int occurrences only when they occur inside the type form? Turns out font lock supports Anchored matchers.
The anchored matchers work by first searching for an anchor and only then searching for the thing you want to highlight. This basically allows you to do look-ahead context-sensitive fontification in the sense that the subsequent matchers are tried but if they fail the process continues from where the anchor match ended.3.
For the longest time I struggled to understand how the font-lock specifications worked because there is so many different ways to write them. What actually helped me to understand this once and for all was to simply look into the source code and read how it works. I remembered the recent post by Irreal on reading source code. It really is an effective way to learn, especially with software like Emacs being absolutely transparent about everything that is going on inside.
A font lock rule starts with a matcher followed by one or more HIGHLIGHT forms. A HIGHLIGHT form either specifies how to fontify group matched by the matcher or is actually another matcher (this is the anchored matcher). The highlight forms are tried in order and applied one after another, whatever their type is.
The specification is not completely recursive because it only allows one level of nesting, so an anchored matcher can not have other anchored matchers inside it. The anchored matcher has the following syntax:
(MATCHER PRE-MATCH-FORM POST-MATCH-FORM MATCH-HIGHLIGHT ...)
where MATCHER is the search regexp that is tried after the anchor was found, PRE-MATCH-FORM and POST-MATCH-FORM are executed before and after the MATCHER is run so you can set search limits and do other magic if necessary. MATCH-HIGHLIGHT are the usual forms with the groups and faces.
The cool and crucial ingredient is that the MATCHER is run in a cycle until the point goes after the limit. This means that we in a sense "fontify" the region from the anchor to the limit we provide (or end of line by default). We can then reset the position in the POST-MATCH-FORM so the next HIGHLIGHT (anchored matcher) will start from the beginning of the same "region" again. This allows us to define "region specific" font-locking. So cool!
The final annotated rule looks as follows:
(font-lock-add-keywords nil ;; the first regexp is the anchor of the fontification, meaning the ;; "starting point" of the region '(("(\\(type\\) +\\(\\(?:\\sw\\|\\s_\\)+\\) +::" ;; fontify the `type' as keyword (1 font-lock-keyword-face) ;; fontify the function name as function (2 font-lock-function-name-face) ;; look for symbols after the `::', they are types ("\\_<\\(\\(?:\\sw\\|\\s_\\)+\\)\\_>" ;; set the limit of search to the current `type' form only (save-excursion (up-list) (point)) ;; when we found all the types in the region (`type' form) go ;; back to the `::' marker (re-search-backward "::") ;; fontify each matched symbol as type (0 font-lock-type-face)) ;; when done with the symbols look for the arrows ("->" ;; we are starting from the `::' again, so set the same limit as ;; for the previous search (the `type' form) (save-excursion (up-list) (point)) ;; do not move back when we've found all matches to ensure ;; forward progress. At this point we are done with the form nil ;; fontify the found arrows as variables (whatever...) (0 font-lock-variable-name-face t)))))
And the forms are fontified in very much the same way as the Haskell code above (thanks to Emacs's amazing consistency with font-lock faces, another brilliant design decision).
(type function :: int -> string -> (string -> int) -> [float]) (type constant :: int) (defun string (string int) "The keywords outside of the type form are *not* fontified!")
I repeat it here just for completeness:
function :: Int -> String -> (String -> Int) -> [Float] constant :: Int
Awesome.
]]>org-attach. Pretty much all my binary data
from the last fifteen years live somewhere under ~/data/org-attach
(set via org-attach-id-dir), further nested under the org headline ID.
After experimenting with many ways to organize data, including
tagsistant and other semantic filesystems, this is what stuck the
best:
Make a headline in some of your org files (I have various files such
as knowledgebase.org, bookmarks.org, movies.org, emacs.org, ...), hit
C-c C-a and attach the file to the "headline". To later search for it
you can use all the powerful indexing and search facilities of
org-mode. The whole directory is checked into git-annex and backed in
various cloud providers and external drives.
I don't really care about where or how the data itself is stored and I
treat the org-attach directory as an opaque "blob"
store4.
This works 99% of the time because I usually want to find the file
where I have some vague semantic idea of what it is and usually find
it via org interface and then open the attachment.
For the rare cases I can't figure out where I stored a file, I use the
usual locate or find utilities. When I finally get to the dired
buffer for this attachment, I usually want to visit its corresponding
headline to either add more keywords or somehow make it easier to find
this file again through the org interface.
So I wrote this simple utility function to jump back to the headline to edit it:
(defun my-org-attach-visit-headline-from-dired () "Go to the headline corresponding to this org-attach directory." (interactive) (let* ((id-parts (last (split-string default-directory "/" t) 2)) (id (apply #'concat id-parts))) (let ((m (org-id-find id 'marker))) (unless m (user-error "Cannot find entry with ID \"%s\"" id)) (pop-to-buffer (marker-buffer m)) (goto-char m) (move-marker m nil) (org-fold-show-context))))
Bind this to some free key in the dired mode map and you can jump back and forth with ease.
]]>[/] or [%] in the title
of a TODO task or parent item in a list, hit C-c C-c and Org will
calculate the progress for you.
* TODO foo [1/3] ** TODO one ** TODO two ** DONE three - parent [66%] - [X] one - [X] two - [ ] three
These are fontified with org-checkbox-statistics-todo to make them
easily stand out. However, for some reason this face is not applied
in the Org agenda buffer.
Because the agenda buffer does not use font-lock for fontifying and
instead inserts already fontified text in the buffer directly, we
can't simply add a regexp with font-lock-add-keywords. But the
solution is nonetheless very straight-forward. Create a function
which will search for the regexp in the buffer and add the face text
property. Then add it to the org-agenda-finalize-hook and that's
that!
(defun my-fontify-progress-cookie () "Fontify progress cookies in org agenda." (save-excursion (goto-char (point-min)) (while (re-search-forward "\\[[[:digit:]]+/[[:digit:]]+\\]" nil t) (add-face-text-property (match-beginning 0) (match-end 0) 'org-checkbox-statistics-todo)) (goto-char (point-min)) (while (re-search-forward "\\[[[:digit:]]+%\\]" nil t) (add-face-text-property (match-beginning 0) (match-end 0) 'org-checkbox-statistics-todo)))) (add-hook 'org-agenda-finalize-hook 'my-fontify-progress-cookie)
org-agenda-set-restriction-lock is very useful for
speeding up agenda when working on a specific project (implemented as
a file or an Orgmode subtree). Personally, I use two agenda views,
one "quick" with 5 simple sections and one "full" with 10 rather
complicated sections.
The quick one lists all the actionable tasks, all the stuck tasks or notes that need to be processed and refiled. The full one lists all the tasks from the project, including hierarchical project dependencies, tasks on hold, bugs, waiting tasks and so on. The full view takes a lot more processing power and is not useful maybe 80% of the time when I simply want to find work to do next.
For the times when I want to get a complete overview over a project and do some light management or planning, I use the full agenda view.
One thing that kept bothering me was that the only option was to restrict to a file or a subtree, but nothing in between5, such as a region spanning multiple subtrees. Since I'm not a huge fan of nesting headers just for the sake of nesting (flatter structures and graphs are much nicer for organization).
Luckily, the function org-agenda-set-restriction-lock is fairly
hackable. It uses overlays and markers for managing the restriction,
so all we need to do is grab the current active region's bounds and
set the org variables appropriately.
(defun my-org-agenda-set-restriction-lock (orig-fun &optional type) (if (not (use-region-p)) ;; unless a region is active, use the original function for ;; cancel/file/subtree (funcall orig-fun type) ;; here we do approximately the same as subtree except find the ;; beginning of subtree at region's beginning and end of subtree ;; at region's end (could span multiple subtrees) (setq org-agenda-restrict (current-buffer)) ;; use 'my-region to avoid potential future conflict (setq org-agenda-overriding-restriction 'my-region) (put 'org-agenda-files 'org-restrict (list (buffer-file-name (buffer-base-buffer)))) (let ((beg (region-beginning)) (end (region-end))) (save-excursion (goto-char beg) (org-back-to-heading t) (setq beg (point))) (save-excursion (goto-char end) (org-end-of-subtree t t) (setq end (point))) (move-overlay org-agenda-restriction-lock-overlay beg (if org-agenda-restriction-lock-highlight-subtree end (point-at-eol))) (move-marker org-agenda-restrict-begin beg) (move-marker org-agenda-restrict-end end)) (message "Locking agenda restriction to region") (org-agenda-maybe-redo))) (advice-add 'org-agenda-set-restriction-lock :around #'my-org-agenda-set-restriction-lock)
So I finally bit the bullet and decided to integrate Google Calendar into my org agenda. I didn't have to go a long way before finding org-gcal.el.
My setup is taken mostly from Using Emacs - 26 - Google Calendar, Org Agenda by the amazing Mike Zamansky. One difference from Mike's setup is that I'm using a single-way sync only, that is I only fetch from google calendar and do not publish anything.
The reason is that I use multiple calendars (I basically have a google account at every company I work for plus a personal calendar) and the workflow with events and inviting myself from one calendar to another as attendees is too complex and fragile to trust some automated tool. And I can not afford my calendars to break.
(use-package org-gcal :after org :config (setq org-gcal-client-id "781554523097-ocjovnfpqgtpoc4qv7ubr8c679t96bv7.apps.googleusercontent.com" org-gcal-client-secret "<<gcal-secret>>" org-gcal-file-alist '(("[email protected]" . "~/org/gcal.org")) org-gcal-header-alist '(("[email protected]" . "#+PROPERTY: TIMELINE_FACE \"pink\"\n")) org-gcal-auto-archive nil org-gcal-notify-p nil) (add-hook 'org-agenda-mode-hook 'org-gcal-fetch) (add-hook 'org-capture-after-finalize-hook 'org-gcal-fetch))
I'm also using org-timeline so I add some extra header arguments to the generated file to add a different color to the Google Calendar entries.
]]>I don't use these logs very often in a review or retrospective but it helped me a bunch of times to figure out the circumstances of my past actions (e.g. rescheduling, postponing work etc.) so I find it worth to spend 30 seconds jotting down a simple note as opposed to then trying to figure out everything from scratch for hours.
Especially useful for when you are not meeting client's deadlines. Papertrail is good!
Also being a daily journalist and somewhat obsessive about tracking my life my settings here are pretty aggressive.
One thing that buggs me is, being not a native English speaker, is
that when org-mode pops the note buffer its input method resets to
English. Given the fact that the past and current org maintainers
also don't speak English as a first language kind of led me to expect
there to be some setting to inherit the input method of the original
buffer6. Sadly, I
couldn't find it, so I decided to "roll my own".
Now here comes the part that blew my mind... I've realized I wrote the whole code in under 2 minutes... where simply trying to read the manual and search the code would easily take more time7. This is the nice feature of being an Emacs power-user. I wrote the code on the first try, registered it in a hook which name I've guessed and it all worked flawlessly. Nice!
(defun my-org-inherit-input-method () "Set the input method of this buffer to that of original's buffer." (let* ((note-buffer (marker-buffer org-log-note-marker)) (im (with-current-buffer note-buffer current-input-method))) (set-input-method im))) (add-hook 'org-log-buffer-setup-hook 'my-org-inherit-input-method)
Of course, I've spent thousands of hours learning Elisp, so I'm not sure where or when the time/productivity curves actually crossed.
]]>org-extend-today-until. It does quite what you
would expect: you can tell org-mode when your "logical" midnight is.
For me, I rarely go to sleep before 12 pm so I set it to 4 am just to
be sure. This way even if it's already 0:15 and I refresh the agenda
view it still displays "yesterday".
The trouble is that not a lot of org mode actually respects this
setting, so far the only things mentioned in the docstring are the
agenda day switch and something related to reading dates from the user
(I think through C-c .) but I can't see any difference in that. If
you are using the org modeline and summary clock for today's time
spent on a task this will also only count contributions from the
specified hour which is nice. There is probably more but I haven't
noticed yet.
Since I'm an org-agenda-clockreport-mode I want to have that
consistent with the modeline information. However it goes through
entirely different machinery and so the easiest extension point is
simply put an advice on the function which collects the data
(org-clock-get-table-data) and in case we are working in the agenda
scope adjust the :tstart and :tend properties to respect
org-extend-today-until.
(defun my-convert-org-today-to-timestamp (ts) "Convert TS to timestamp. TS is an absolute number of days since 0001-12-31bce The timestamp returned is in the format YYYY-MM-DD hh:mm. The hour is adjusted according to `org-extend-today-until'." (let ((ts-greg (calendar-gregorian-from-absolute ts))) (format "%4d-%02d-%02d %02d:00" (nth 2 ts-greg) (car ts-greg) (nth 1 ts-greg) org-extend-today-until))) (defun my-org-clock-get-table-data-adjust-start (origfun file params) "Adjust the start and end arguments to respect `org-extend-today-until'." (when (and (eq (plist-get params :scope) 'agenda) (integerp (plist-get params :tstart))) (let ((ts (my-convert-org-today-to-timestamp (plist-get params :tstart))) (te (my-convert-org-today-to-timestamp (plist-get params :tend)))) (setq params (plist-put params :tstart ts)) (setq params (plist-put params :tend te)))) (funcall origfun file params)) (advice-add 'org-clock-get-table-data :around #'my-org-clock-get-table-data-adjust-start)
Recently I've been adding some nice improvements to my org-timeline package which draws a visual representation of all the scheduled/clocked items (see README for visuals). I'll make sure it respects this setting as well. So far I've instinctively set it to start drawing at 5:00.
]]>org-emphasis-alist) if they extend through at
most one newline. This is probably a performance optimization, one
wholly unnecessary on modern hardware.
As per this stack overflow post I re-set the constant to 10 lines and can probably even increase it if necessary.
(setcar (nthcdr 4 org-emphasis-regexp-components) 10)
Before this starts to work you need to re-save org-emphasis-alist
through the customize interface because it is using a custom setter
org-set-emph-re to compute the regexpses (or, gulp, restart Emacs).
Here I quote the answer in case it ever gets lost:
By default, org-mode allows a single newline. So if you want to be
able to add markup to text that spans more than two consecutive lines,
you'll need to modify this entry.
(setcar (nthcdr 4 org-emphasis-regexp-components) N)
... where N is the number of newlines you want to allow.
Now before we go further, you need to register on OpenAI platform and the API costs money. The good news is that it is extremely cheap. It will cost you $1 to embed 2.4 MILLION tokens. With a query being roughly 10 words which corresponds to 10-15 tokens, one query will cost you about $0.0000006. So it's pretty much free and you only need a credit card to formally register. You can also set monthly spending limit to $0.01 and you would probably never run over the limit. The step 2 will cost based on how much data you have. I have about 200000 lines of org files and so far I spent less than $0.50 including all the experimenting.
Emacs RE engine is a lot less powerful than PCRE engines, it doesn't support look-ahead nor back-references among other less commonly used features
For those familiar with Parsec, this is basically the try combinator
This really removed a lot of "create a perfect file hierarchy" anxiety that ultra-orderly people like me get all the time. I am no longer slave to the perpetual fine-tuning of what is nested where. The files on the disk are actually stored in a flat two-level hierarchy determined by some hash or uuid. This is great! And the semantics of what the file is and how to find it is delegated to org mode. This is even greater because its metadata are so much ritcher than what you can store in the file system itself.
While it is possible to restrict to a region from the org-agenda speed dial, I find it quite impractical and prefer to do the restrictions from the project's buffer
And really, 99% of the time, when you say "I'm going to write an org-extension", it already is in core.
This is not the greatest engineering and you should almost always prefer a well-tested lib over your own... on the other hand, being a pragmatic professional, I value my time over code purity