Creating Discourse permalinks with shell commands
The other day, I deleted a bunch of categories in Discourse. In the process, I broke a bunch of links. I hate breaking links, but it just never occurred to me that deleting categories would break links too. But, of course, Discourse can’t know that I moved all these topics to school-specific tags that should replace the school-specific categories.
The pattern for redirection is extremely simple:
c/musical-theater-schools/american-university-mt/658
=> tag/american-university-mtc/musical-theater-schools/american-university-mt/658
It’s not all that hard to do this by hand, but I have 58 permalinks I need to create! This sort of work is pretty mindless. I could probably do it while watching a single sitcom episode. But it’s also extremely easy to mess up if you aren’t paying enough attention. Writing a quick script can be nearly as easy and a lot more enjoyable. So that’s what I’m going to do in this case. (And make it take even longer by writing this post at the same time.)
Discourse supports permalinks and it’s not hard to reverse engineer the API for inserting them. The first thing I did was to insert a sample permalink using the admin interface:
From the developer interface, I see the request is to
/admin/permalinks.json
with the following payload:
url=c%2Fmusical-theater-schools%2Famerican-university-mt%2F658&permalink_type=tag_name&permalink_type_value=american-university-mt
Converting that into a more readable format:
url = "c/musical-theater-schools/american-university-mt/658"
permalink_type = "tag_name"
permalink_type_value = "american-university-mt"
So those are the three parameters I need. url
is the URL
I want to redirect, so it’s my input. permalink_type
is
static. I always want the link to point to a tag.
permalink_type_value
is the place I want to redirect to. In
this case, it’s the tag name I want to point to instead of the category.
Thankfully, I created
the tags using the category slug, so I just need to extract that
substring from the url
.
The steps I need to do are:
- Get a list of category URLs that I removed.
- Extract the tag name from each URL.
- Post a request to
/admin/permalinks.json
with the correct parameters.
When writing a shell script, it’s often easiest to start with the last step and work forward. So if I can post one request with hardcoded values, I know I can build a request based on one URL. And if I have a command that parameterized one URL, I can iterate over a list of URLs.
To put it another way, shell scripts often take the form of a pipeline:
$ first_command | middle_command | last_command
Since the last command produces the output you want, it’s important to give it the input it needs. So there’s no point in starting the middle command (much less the first command) before you know what the last command is going to need.
The final step, in this case, is a call to the Discourse API. Whenever you hear “API” in the context of shell scripting, you should think “curl”. I’m going to skip over a few steps I took reminding myself how curl works and jump straight to the sample command I executed:
curl $DISCOURSE_HOST/admin/permalinks.json \
-H "Api-Username: $DISCOURSE_USER" \
-H "Api-Key: $DISCOURSE_API" \
-d "url=/c/musical-theater-schools/american-university-mt/658&permalink_type=tag_name&permalink_type_value=american-university-mt"
I’ve put a few bits of data behind environment variables to make life
easier when I move from testing commands to executing them on production
servers. $DISCOURSE_HOST
is the host I’m working on. In my
case it’s set to https://talk.collegeconfidential.com when I’m ready to
try commands on production. $DISCOURSE_USER
is my username
on the site. (I’m CCadmin_Jon
on College Confidential.) $DISCOURSE_API
is an API key I
generated by visiting /admin/api/keys
on a Discourse site
where I’m an admin. Only admins can create API keys (or parmalinks).
The part I need to change is the -d
parameter. It should
look somewhat familiar since same payload I got from the dev tools when
using the site interface. It has two parts that are variable:
- /c/musical-theater-schools/american-university-mt/658
- american-university-mt
Extracting the category slug can be done several different ways. sed is probably the right choice, but I’m going to use something less obvious:
$ dirname /c/musical-theater-schools/american-university-mt/658 | xargs basename
american-university-mt
dirname
selects everything before the last
/
and basename
selects everything
after the last slash. xargs is my go-to tool
for chaining commands. If basename
accepted arguments form
standard input, it wouldn’t be necessary. (Spoiler alert: I’m going to
use xargs again later on.)
At this point, I’m going to start putting my script into a file so
that it’s easier to edit. I’m going to call it
create_permalink.ksh
and it looks like this:
#!/usr/bin/env ksh
u=$1
t=`dirname $u| xargs basename`
curl $DISCOURSE_HOST/admin/permalinks.json \
-H "Api-Key: $DISCOURSE_API" \
-H "Api-Username: $DISCOURSE_USER" \
-d "url=$u&permalink_type=tag_name&permalink_type_value=$t"
Before I can use it the first time, I need to make it executable:
chmod +x create_permalink.ksh
Then I can test it:
$ ./create_permalink.ksh /c/musical-theater-schools/american-university-mt/658
{"status":500,"error":"Internal Server Error"}%
I’m getting an internal server error because I’ve already created this permalink. Actually I’ve created it several times for testing and removed it using the admin interface on the site.
Now there are some shell oddities in the script that you might not be familiar with. So I’ll go over it one command at a time:
#!/usr/bin/env ksh
This tells the interactive shell which interpreter to use. I’m a fan of ksh, but it
should work with any Bourne shell
descendant. I could have used #!/usr/bin/ksh
or even
just #!ksh
, but the more complicated command is slightly more
potable.
u=$1
I’m putting the URL in a variable called $u
. This
commend sets the variable using the value of the first input parameter.
It’s a bit unnecessary since I could just use $1
everywhere
in the script. But I got the habit of assigning arguments to variables
so that I could use
"$@"
for lists of parameters. It also is slightly more
readable if you name your variables instead of number them.
t=`dirname $u| xargs basename`
The tag name goes in $t
. I’m using backticks (`) to capture the output
of the command I mentioned above for extracting the category slug.
The more modern method would be to use $(...)
instead.
curl $DISCOURSE_HOST/admin/permalinks.json \
-H "Api-Key: $DISCOURSE_API" \
-H "Api-Username: $DISCOURSE_USER" \
-d "url=$u&permalink_type=tag_name&permalink_type_value=$t"
This is a single curl command put on separate lines. The only thing I changed is making the URL and the tag string variables.
Now I can do a bunch of permalinks by running this command several times. I could set up a for loop in my script:
for u in "$@"
do
t=`dirname $u| xargs basename`
curl $DISCOURSE_HOST/admin/permalinks.json \
-H "Api-Key: $DISCOURSE_API" \
-H "Api-Username: $DISCOURSE_USER" \
-d "url=$u&permalink_type=tag_name&permalink_type_value=$t"
done
Or I can pipe the list into xargs
:
$ echo $urls | xargs -n 1 ./create_permalink.ksh
But how do I get a list of the categories I’ve already deleted? I figured out that there was a problem because we noticed 404s in Google’s Search Console, so I could pull the list from there. But I can also get a list from the Discourse logs, which is handy given that future mistakes might not be so easily tracked by a third party.
If you are an admin on a Discourse site, you can visit
/admin/logs/staff_action_logs
to get a complete list of
actions staff have performed, including deleting categories. There’s
also a button to export the logs as a CSV
file. After downloading the file, I used less
to browse
it. The first two lines helped me understand the format:
staff_user,action,subject,created_at,details,context
CCadmin_Jon,entity_export,staff_action,2021-03-09 18:40:41 UTC,,
Typically the first line explains the columns, as you can see in this case. And the second line shows the very last thing that was logged, which was me requesting the log to be exported. So I know this log is reverse chronological with the most recent events listed first. That’s a bit atypical because normally log entries are appended to the CSV file as they happen. Discourse logs are stored in database tables, so they can be exported in whatever order makes the most sense.
Using /
to search through the logs, I found one of my
delete_category
entries.
CCadmin_Jon,delete_category,,2021-02-21 00:37:52 UTC,"created_at: 2020-11-26 11:45:12 UTC
Unfortunately, that line doesn’t say which category was deleted. For that, I needed to read down a few more lines:
CCadmin_Jon,delete_category,,2021-02-21 00:37:52 UTC,"created_at: 2020-11-26 11:45:12 UTC
name: American University MT
permissions: {}
parent_category: Musical Theater Schools",/c/musical-theater-schools/american-university-mt/658
It took me a bit to figure out how this entry spanned multiple lines.
The details
entry includes a double quote, which signals
the start of a string. Since there is no close quote before the end of
the line, the next line is also part of the string. So the
details
column ends on line 4 with the closing quote. I’m
most interested in the final column, context
, which
includes the path of the category I deleted. That’s going to be my input
to create_permalink.ksh
.
Allowing newlines embedded in columns is pretty handy because it
allows the log to show, for instance, the full body of a post that was
deleted. For deleting a category, it shows some metadata about the
category. But this throws a wrench in my plans to use grep
to find the row where I deleted each category and extract the path. I
didn’t sign up to parse complicated CSV files.
I could switch the Database Explorer API. But why should I when I already got the data I need in the log export? I just need to be a little creative in how I get it. Look again at the line that contains the path I’m looking for:
parent_category: Musical Theater Schools",/c/musical-theater-schools/american-university-mt/658
Looking at other deleted categories, I noticed they all look the same
right up to the slug. Just as importantly, only the lines for
deleted Musical Theater School categories start with this string.
That means I can find what I’m looking for by using this
grep
command:
grep 'parent_category: Musical Theater Schools",/c/musical-theater-schools/' staff-action-210309-184041.csv
This isn’t technically right. There is a universe in which this command will cause hard-to-debug problems. Which brings me to the first rule of script programming: All is fair in love and shell. Alternatively: There’s no wrong way to get the right answer. An awful lot of learning to be a programmer centers on the concept of correctness. That’s because you never know how your code will be used in the future. You have to make sure it’s robust against bad input that could make everything go very wrong.
But if you are writing a script for yourself and for a specific purpose, you don’t have to worry about it being used by someone else for some purpose where it might, I don’t know, delete their hard drive or something. But that puts the onus on you to make sure you feed your script good input. So the next thing I did was sort the list and eliminate any duplicates to make sure I had the right lines:
$ grep [big long string] [file] | sort -u
Finally, I need to pull out the path. This time I will use
sed
:
$ grep [big long string] [file] | sort -u | sed -e 's/^.*,//'
/c/musical-theater-schools/american-university-mt/658
/c/musical-theater-schools/baldwin-wallace-college-mt/662
/c/musical-theater-schools/ball-state-university-mt/661
...
The key bit is s/$.*,//
. To work out what that means,
you need to know regular expressions, which is a lot to learn. They are
incredibly useful, however, so it’s worth the effort. This command
substitutes (s/
) the part of the string that matches a
pattern ($.*,
) with an empty string (//
). The
pattern ($.*,
) begins with the start of the line
($
), continues with any number of characters
(.*
) until it come across a comma (,
). We are
left with everything following the first comma, which is exactly what we
want to find.
After double- and triple-checking the output, I put the whole thing together and created my permalinks:
grep 'parent_category: Musical Theater Schools",/c/musical-theater-schools/' staff-action-210309-184041.csv \
| sort -u \
| sed -e 's/^.*,//' \
| xargs -n 1 ./create_permalink.ksh
There’s xargs
again! -n 1
is only necessary
if I don’t include the for u in "$@"
loop in
create_permalink.ksh
. It tells xargs
to
repeatedly call the command with just one parameter each time it’s
called.
At long last, I have 58 new permalinks. I can check there were no
errors by looking at the output of each curl
command:
{"permalink":{"id":406987,"url":"c/musical-theater-schools/american-university-mt/658","topic_id":null,"topic_title":null,"topic_url":null,"post_id":null,"post_url":null,"post_number":null,"post_topic_title":null,"category_id":null,"category_name":null,"category_url":null,"external_url":null,"tag_id":572,"tag_name":"american-university-mt","tag_url":"https://talk.collegeconfidential.com/tag/american-university-mt"}}%
I can also go to /admin/customize/permalinks
on my
Discourse host. Finally, I can check the
formerly broken link and make sure it ends up in the
right place.
If you followed me this far, I hope you enjoyed reading this and aren’t some sort of compulsive reader who somehow couldn’t change to another page to read something more enjoyable. I enjoy writing shell scripts to get work done and I certainly spent more time than needed on this one.