Creating an Arch Linux AWS AMI (Amazon Web Services - Amazon Machine Image)


Table of Contents

1. Overview
2. Requirements
3. Setup
3.1. Be root!
3.2. Create the image file
3.3. Create a loop device
3.4. Partion the loop device
3.5. Create a file system and mount it
4. Mirrors and base installation
4.1. Find a good mirror
4.2. Rolling back or downgrading to a previous version of Arch Linux
4.3. Update the package database
4.4. Install the core packages into our empty block device.
4.5. Create a filesystems table
5. System configuration
5.1. Set the locale
5.2. Set the clock to use UTC time
5.3. Install the base system
5.4. Install various packages
5.5. Set up the Linux audit framework
5.6. Running mkinitcpio
6. cloud-init
6.1. cloud-init overview
6.2. Logging for cloud-init
6.3. Setting up cloud.cfg
7. Final set up
7.1. Setting up grub
7.2. Miscellaneous locale
7.3. Enable services
7.4. Firewall
7.5. Miscellaneous
7.6. Network
8. Local testing
9. AWS installation
9.1. Copy to AWS S3
9.2. Launch an AWS instance and get the S3 image
9.3. Create an EBS volume
9.4. Create the AMI
9.5. Run the AMI

Copyright 2021 S. Sullivan, www.mathcom.com

1. Overview

The Arch Linux system gives the user detailed control over many aspects of Linux. Sometimes it’s useful to run Arch at AWS. But you first need a suitable Amazon Machine Image, or AMI.

Although there are pre-built AMIs at Uplink Labs, you may prefer the flexibility to create your own.

This guide gives the detailed process, using only the software and tools that come with Arch.

2. Requirements

You will need …

  • An account on AWS (Amazon Web Services)
  • A local workstation running Arch Linux
  • Knowledge of the AWS CLI (Command line interface)
  • Knowledge of the Linux shell CLI, for example the bash shell.

If you’re not at home in the command line world, you could use one of the pre-built AMIs from Uplink Labs.

There are three machines we’ll use …

  • work: A local Arch linux workstation used for development
  • guest: The image we are creating.
  • awsTemp: An AWS instance we need briefly. It can be any flavor — Ubuntu is fine.

3. Setup

This procedure assumes you have Arch Linux running locally. We’ll call this the work machine.

3.1. Be root!

Many of the following assumes you are running as root. To get there use:

sudo su

Warning

Running as root enables you to do a world of damage. Walk carefully! You have been warned!

To exit root, enter exit

3.2. Create the image file

Create a file that will contain the machine image. Typically this is 4G or 8G in size. There is no sense in making it larger, since when the image is launched at AWS it will be resized to the future user’s specs.

If the future user is using the web management console, at "Configure Instance Details" … "Add Storage" they will specify the storage size. If the future user is using the AWS CLI, they will specify the storage size in the run-instances command, for example,

aws ec2 run-instances ... \
  --block-device-mapping \
    "[ { \"DeviceName\": \"/dev/xvda\", \"Ebs\": { \"VolumeSize\": 8 } } ]"

Here we will use 8G. If you are installing a great deal of software or data in your AMI, you might choose a larger size.

dd if=/dev/zero of=imga.raw bs=1 count=0 seek=8G

The dd command copies an empty file, /dev/zero, to the new image, imga.raw, but starts at 8G. This leaves all the space before 8G empty, creating a sparse file. In a sparse file, only the non-empty parts of the file are actually written to disk. So while ls shows an 8G file, du shows that the actual disk usage is close to zero.

ls -l imga.raw            # returns: 8589934592
du imga.raw               # returns: 0

3.3. Create a loop device

As root, create a loop device for the file, so we can access it as a block device:

losetup -f --show imga.raw     # returns something like: /dev/loop0

Make sure the block device is found:

dvc=/dev/loop0          # Use the loop device returned by losetup

lsblk $dvc
  # Returns:
  # NAME  MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  # loop0   7:0    0   8G  0 loop

3.4. Partion the loop device

As root, partition the disk. You can use either parted, the newer way of doing it, or fdisk, an older but still fine method.

With parted:

parted -s $dvc -- mklabel msdos
parted -s $dvc -- mkpart primary 0% 100%
parted -s $dvc -- toggle 1 boot
parted -l $dvc
  # Returns something like:
  # Model: ATA TOSHIBA MK5055GS (scsi)
  # Disk /dev/sda: 500GB
  # Sector size (logical/physical): 512B/512B
  # Partition Table: msdos
  # Disk Flags:
  #
  # Number  Start   End     Size    Type     File system     Flags
  #  1      1049kB  30.0GB  30.0GB  primary  ext4            boot, esp
  #  2      30.0GB  80.0GB  50.0GB  primary  ext4
  #  3      80.0GB  490GB   410GB   primary  ext4
  #  4      490GB   500GB   10.1GB  primary  linux-swap(v1)

Or, with fdisk:

fdisk $dvc
o          # create a new empty DOS partition table
n          # new partition
p          # primary partition
1          # partition number
           # Enter to take default: first sector
           # Enter to take default: last sector (at max)
a          # make bootable
w          # write changes

After creating the partitions, if you reboot and run losetup again, add the -P flag. The flag causes the kernel to find the new partition and add a block device for it, like /dev/loop0p0, as well as for /dev/loop0.

losetup -f -P --show imga.raw

Now, lsblk should show the new partition, /dev/loop0p1:

lsblk $dvc
  # Returns:
  # NAME      MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  # loop0       7:0    0   8G  0 loop
  # └─loop0p1 259:0    0   8G  0 part

partition=/dev/loop0p1

3.5. Create a file system and mount it

Finally, make an ext4 file system on the partition and mount it:

mkfs.ext4 $partition
  # Returns:
  # mke2fs ...
  # Writing superblocks ... done

mount $partition /mnt
ls -al /mnt
  # Returns three entries:  .  ..  lost+found

4. Mirrors and base installation

4.1. Find a good mirror

The reflector package helps find high-performing nearby mirrors.

pacman -S reflector
reflector --help
reflector --list-countries

Specify the countries near you. Some examples are:

country='France,Germany'
country='FR,DE'
country='United Kingdom'
country='GB'
country='United States,Canada'
country='US,CA'
country='Taiwan,Singapore'
country='TW,SG'

Set the mirrorlist:

country='US,CA'        # for example
cp /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.bk01

echo === reflector country: $country
reflector --country "$country" \
  --protocol https \
  --score 20 \
  --sort rate \
  --save /etc/pacman.d/mirrorlist

4.2. Rolling back or downgrading to a previous version of Arch Linux

Occasionally a person may want to install an old version of Arch. Fortunately the Arch project maintains a directory hierarchy for just that purpose.

For example, to install the Arch distribution as of 2020-01-01, use:

echo 'Server=https://archive.archlinux.org/repos/2020/01/01/$repo/os/$arch' > /etc/pacman.d/mirrorlist

See the discussions at rdeeson and archlinux.

4.3. Update the package database

This will use the mirrorlist we set above.

pacman -Syy

4.4. Install the core packages into our empty block device.

This may take a few minutes.

pacstrap /mnt base grub mkinitcpio

4.5. Create a filesystems table

genfstab -U -p /mnt >> /mnt/etc/fstab

5. System configuration

5.1. Set the locale

cp /mnt/etc/locale.gen /mnt/etc/locale.gen.bk01
vim /mnt/etc/locale.gen       # or nano or other programming editor
  # Uncomment the line corresponding to your locale.
  # For example, in the United States uncomment #en_US.UTF-8 UTF-8
  # In Spain, uncomment #es_ES.UTF-8 UTF-8
  # In Germany, uncomment #de_DE.UTF-8 UTF-8
  # etc.

Then set up the local. The arch-chroot command is similar to chroot, but adds mounts for /proc, /sys, /dev, /dev/pts, /dev/shm, /run, /tmp.

arch-chroot /mnt /bin/bash -c "locale-gen"

5.2. Set the clock to use UTC time

ln -sf ../usr/share/zoneinfo/UTC /mnt/etc/localtime

5.3. Install the base system

Copy the work machine mirrorlist, that we set up above, into the guest.

cp /mnt/etc/pacman.d/mirrorlist /mnt/etc/pacman.d/mirrorlist.bk01
/bin/cp /etc/pacman.d/mirrorlist /mnt/etc/pacman.d/mirrorlist

Insure the guest package is up to date (according to our mirrorlist), and install linux on the guest.

arch-chroot /mnt /bin/bash -c "pacman -Sy"

arch-chroot /mnt /bin/bash -c \
  "pacman --needed --noconfirm -S linux linux-headers btrfs-progs dosfstools e2fsprogs"
arch-chroot /mnt /bin/bash -c \
  "pacman --needed --noconfirm -S exfatprogs ntfs-3g reiserfsprogs xfsprogs"

  # This gives many messages.  Warnings like these can be ignored:
  # ==> WARNING: Possibly missing firmware for module: csiostor
  # ==> WARNING: Possibly missing firmware for module: cxgb3

5.4. Install various packages

Install various packages we’ll need. I’ll discuss cloud-init more below.

arch-chroot /mnt /bin/bash -c \
  "pacman --needed --noconfirm -S man which lsof reflector base-devel multilib-devel python3 "

arch-chroot /mnt /bin/bash -c "pacman --needed --noconfirm -S audit irqbalance openssh haveged"

arch-chroot /mnt /bin/bash -c "pacman --needed --noconfirm -S rsync vim which"

arch-chroot /mnt /bin/bash -c "pacman --needed --noconfirm -S cloud-init cloud-utils"
  # If this returns:
  #   error: netplan: signature from ... is unknown trust
  # then try:
  #   arch-chroot /mnt /bin/bash -c "pacman -S archlinux-keyring"

arch-chroot /mnt /bin/bash -c "pacman --needed --noconfirm -S aws-cli"

5.5. Set up the Linux audit framework

Thanks to Steven Noonan at UpLink Labs for much of this content. See the audit documentation at archlinux.

mkdir /mnt/etc/audit/rules.d

# Set audit rules
cat > /mnt/etc/audit/rules.d/audit.rules <<"EOF"
# From:
# https://security.blogoverflow.com/2013/01/a-brief-introduction-to-auditd/
# This file contains the auditctl rules that are loaded
# whenever the audit daemon is started via the initscripts.
# The rules are simply the parameters that would be passed
# to auditctl.
# First rule - delete all
-D

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 1024
-a always,exit -S adjtimex -S settimeofday -S stime -k time-change
-a always,exit -S clock_settime -k time-change
-a always,exit -S sethostname -S setdomainname -k system-locale
-w /etc/group -p wa -k identity
-w /etc/passwd -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k identity
-w /var/run/utmp -p wa -k session
-w /var/log/wtmp -p wa -k session
-w /var/log/btmp -p wa -k session
-w /etc/selinux/ -p wa -k MAC-policy

# Disable adding any additional rules.
# Note that adding new rules will require a reboot
-e 2

# from uplinklabs:
# -a never,task
EOF

Clean up and check it …

ls -al /mnt/etc/audit
chmod -R o-rwx /mnt/etc/audit
ls -al /mnt/etc/audit
less /mnt/etc/audit/rules.d/audit.rules

5.6. Running mkinitcpio

mkinitcpio will read the configuration files /etc/mkinitcpio.d/linux.preset and /etc/mkinitcpio.conf. It will generate a an initial ramdisk in /boot/initramfs-linux.img.

mkinitcpio hooks are short scripts to make extensions. A common hook is autodetect, which reduces the kernel size by filtering out unneeded modules. However, in our case the kernel we’re creating may have a different environment than the current machine. Using autodetect may eliminate modules that the guest may need, so we turn off autodetect.

# Blacklist the floppy to get rid of messages like:
# blk_update_request: I/O error, dev fd0, ...
# Buffer I/O error on dev fd0, ...
echo "blacklist floppy" > /mnt/etc/modprobe.d/blacklist-floppy.conf

# Include modules that may be needed in a variety
# of hypervisors, depending on where the guest is run.
MODULES=""

# Support power-off requests.
# ipmi is Intelligent Platform Management Interface,
# used to manage a machine outside the OS.
MODULES+="button ipmi-msghandler ipmi-poweroff"

# Support nvme, Non-Volatile Memory Express, a controller spec for SSDs
MODULES+=" nvme"

# Support the KVM, kernel-based virtual machine
MODULES+=" virtio virtio-blk virtio-net virtio-pci virtio-ring"

# Support the Xen virtual machine
MODULES+=" xen-blkfront xen-netfront xen-pcifront xen-privcmd"

# Support SR-IOV, single root i/o virtualization
MODULES+=" ixgbevf"

# Support for AWS EC2 ENA, Elastic Network Adapter
MODULES+=" ena"

# Set up mkinitcpio.conf:
# Set MODULES, and get rid of the floppy device.
sed -ri "s/^MODULES=.*/MODULES=($MODULES)/g" /mnt/etc/mkinitcpio.conf
sed -ri "s/^FILES=.*/FILES=(\/etc\/modprobe.d\/blacklist-floppy.conf)/g" /mnt/etc/mkinitcpio.conf

# Disable module auto-detection
ls /mnt/etc/mkinitcpio.d
mv /mnt/etc/mkinitcpio.d/linux.preset /mnt/etc/mkinitcpio.d/linux.preset.bk01

cat > /mnt/etc/mkinitcpio.d/linux.preset <<EOF
# mkinitcpio preset file for linux
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/boot/vmlinuz-linux"
PRESETS=('default')
default_image="/boot/initramfs-linux.img"
# Turn off autodetect:
default_options="-S autodetect"
EOF

# Finally, run mkinitcpio.
# Reads /etc/mkinitcpio.conf, /etc/mkinitcpio.d/*.preset.
# As specified in .conf, writes /boot/initramfs-linux.img
# Can ignore messages like:
# ==> WARNING: Possibly missing firmware for module: csiostor
# ...

time arch-chroot /mnt /bin/bash -c "mkinitcpio -P"

# Show the new /boot/initramfs-linux.img
ls /mnt/boot

# To display the contents of initramfs-linux.img, use:
# lsinitcpio /mnt/boot/initramfs-linux.img

6. cloud-init

6.1. cloud-init overview

The cloud-init system plays a crucial part in getting a guest running in the cloud. The documentation is at cloud-init.

During the guest boot process, the guest cloud-init package issues an HTTP request to a server hosted by the hypervisor. Typically this is at http://169.254.169.254. The hypervisor server provides two main types of information, as documented at cloud-init datasources.

  • meta-data: ami-id, hostname, instance-type, security-groups, etc.
  • user-data: name and ssh keys for the default user, startup scripts to be run, etc.

The guest cloud-init modules set the user’s password or .ssh/authorized_keys, so the user can ssh into the guest.

The cloud-init system contains about 60 modules. Each module runs at one of three stages during the boot process, as specified in /etc/cloud/cloud.cfg. The three stages are:

  • init: Set up hostname, users, groups, ssh
  • config: Set passwords, ntp, timezone
  • final: Install packages, run user scripts

Cloud-init modules and user scripts can run either:

  • per-boot: at every boot
  • per-instance: once, when a new instance is first booted
  • per-once: once, at first boot, even if the instance gets a new instance-id

Some module examples are:

  • init stage:

    • seed_random (per-instance): initialize the RNG seed
    • mounts (per-instance): add mounts and swap to /etc/fstab.
    • users-groups (per-instance): configures users, groups, etc
    • ssh (per-instance): ssh host keys, ssh authorized keys
  • config stage:

    • set-passwords (per-instance): enables and sets passwords
    • ntp (per-instance): set up NTP, the network time protocol
    • timezone (per-instance): set it
  • final stage:

    • package-update-upgrade-install (per-instance): exactly as it says
    • puppet (per-instance): install and start puppet
    • scripts-per-boot (per-boot): user scripts to run at every boot
    • scripts-per-instance (per-instance): user scripts to run on the first boot

The complete list of enabled modules is in the guest’s /etc/cloud/cloud.cfg.

6.2. Logging for cloud-init

Normally cloud-init logging in the guest goes to

  /var/log/cloud-init-output.log     # brief
  /var/log/cloud-init.log            # detailed

This can be changed in the cloud-init config file in the guest: /etc/cloud/cloud.cfg.d/05_logging.cfg. See cloud-init logging.

6.3. Setting up cloud.cfg

The cloud-init locale module creates an invalid format in /etc/locale.gen, resulting in the error message:

cloud-init[324]: error: Bad entry 'LANG="en_US.UTF-8" '

The file in question is /etc/locale.gen

  • as generated: LANG="en_US.UTF-8"
  • should be: LANG=en_US.UTF-8

So we delete the cloud-init locale module.

cp /mnt/etc/cloud/cloud.cfg /mnt/etc/cloud/cloud.cfg.bk01
sed -ri '/- locale/s/^/#/' /mnt/etc/cloud/cloud.cfg

7. Final set up

7.1. Setting up grub

arch-chroot /mnt /bin/bash -c "grub-install --target=i386-pc --recheck ${dvc}"

# Set grub options
cp /mnt/etc/default/grub /mnt/etc/default/grub.bk01
sed -ri 's/GRUB_TIMEOUT=5/GRUB_TIMEOUT=3/' /mnt/etc/default/grub
sed -ri 's/^#GRUB_TERMINAL_OUTPUT/GRUB_TERMINAL_OUTPUT/' /mnt/etc/default/grub
sed -ri \
  "s/^GRUB_CMDLINE_LINUX_DEFAULT.*/GRUB_CMDLINE_LINUX_DEFAULT=\"console=ttyS0 earlyprintk=serial,ttyS0,keep loglevel=5 nomodeset\"/g" \
  /mnt/etc/default/grub
sed -ri '/^GRUB_TIMEOUT=/a GRUB_DISABLE_SUBMENU=y' /mnt/etc/default/grub

arch-chroot /mnt /bin/bash -c "grub-mkconfig > /boot/grub/grub.cfg"

7.2. Miscellaneous locale

Set the locale to the same one you uncommented in locale.gen (see above).

cat > /mnt/etc/vconsole.conf << "EOF"
KEYMAP=us
FONT=LatArCyrHeb-14
EOF

cp /etc/locale.conf /mnt/etc/locale.conf
# or
cat > /mnt/etc/locale.conf << "EOF"
LANG=en_US.utf8
EOF

7.3. Enable services

arch-chroot /mnt /bin/bash -c "systemctl enable systemd-timesyncd.service"
arch-chroot /mnt /bin/bash -c "systemctl enable nscd.service"
arch-chroot /mnt /bin/bash -c "systemctl enable auditd.service"
arch-chroot /mnt /bin/bash -c "systemctl enable haveged.service"
arch-chroot /mnt /bin/bash -c "systemctl enable irqbalance.service"
arch-chroot /mnt /bin/bash -c "systemctl enable sshd.service"
arch-chroot /mnt /bin/bash -c "systemctl enable cloud-init.service"
arch-chroot /mnt /bin/bash -c "systemctl enable cloud-config.service"
arch-chroot /mnt /bin/bash -c "systemctl enable cloud-final.service"

7.4. Firewall

Generally the hypervisor provides a firewall, such as AWS Security Groups. But if you want another layer of protection, you can set up UFW.

arch-chroot /mnt /bin/bash -c "pacman --needed --noconfirm -S ufw"
arch-chroot /mnt /bin/bash -c "ufw enable"

# *** Change the ip address below to your ip address ***
arch-chroot /mnt /bin/bash -c "ufw allow from 111.222.333.444/32 to any port 22"
arch-chroot /mnt /bin/bash -c "ufw status numbered

7.5. Miscellaneous

# Change the default from 'graphical' to 'multi-user'
ln -sf ../../../../usr/lib/systemd/system/multi-user.target /mnt/etc/systemd/system/default.target

7.6. Network

The syntax is documented at systemd network.

cat > /mnt/etc/systemd/network/20.ethernet << "EOF"
[Match]
Name = en* eth*
[Network]
DHCP = yes
[DHCP]
UseMTU = yes
UseDNS = yes
UseDomains = yes
EOF

arch-chroot /mnt /bin/bash -c "systemctl enable systemd-networkd"
arch-chroot /mnt /bin/bash -c "systemctl enable systemd-resolved"
arch-chroot /mnt /bin/bash -c "systemctl enable sshd.service"

sed -ri '1i Port 22' /mnt/etc/ssh/sshd_config

8. Local testing

This completes the creation of the local image. Creating an AMI takes a few more steps.

If you plan to test local image with a local VM such as qemu or xen, make a copy of the image first. The local testing will probably involve its own cloud-init, which may inhibit the AWS cloud-init later on.

9. AWS installation

9.1. Copy to AWS S3

Insure you have the AWS CLI installed on your work machine:

pacman -S aws-cli

Also copy your AWS credentials to ~/.aws.

We want to copy imga.raw to AWS S3. But if we use the aws s3 cp command to copy it, the s3 cp command will copy the entire 8 GB, even though most of it is empty. The s3 cp command does not understand sparse files.

But tar does understand sparse. So tar it before copying it to S3:

tar cvSzf imga.tgz imga.raw
aws s3 cp imga.tgz s3://myBucket

Copying the tar file takes about 12% of the time of copying the original.

Copy to your AWS S3 bucket:

aws s3 cp imga.raw s3://myBucket

9.2. Launch an AWS instance and get the S3 image

If you don’t have a work instance at AWS, launch one. It doesn’t matter what instance type — Ubuntu is fine. We’ll call it awsTemp. It should have at least 12 GB disk space.

Note the instance id of awsTemp, for example "i-01234567890123456".

Login to awsTemp, get the image from S3 and untar it.

aws s3 cp s3://myBucket/imga.tgz .
tar xvSzf imga.tgz

9.3. Create an EBS volume

Find your availability zone. You can see all the possible availability zones by:

aws ec2 describe-availability-zones

Create the EBS volume. In this example we use zone us-west-2a.

aws ec2 create-volume \
  --availability-zone us-west-2a \
  --no-encrypted \
  --volume-type gp2 \
  --size 8

Note the returned VolumeId, for example "vol-11122233344455566".

Attach the volume to the running Ubuntu instance.

aws ec2 attach-volume \
  --device /dev/xvdf \
  --instance i-01234567890123456 \
  --volume-id vol-11122233344455566

On awsTemp, copy the image to the new EBS volume:

dd if=imga.raw of=/dev/xvdf bs=512 conv=sparse,fsync

Detach the volume from the instance and create a snapshot:

aws ec2 detach-volume \
  --volume-id vol-11122233344455566
aws ec2 create-snapshot \
  --volume-id vol-11122233344455566

Creating a snapshot is an offloaded process. It typically takes several minutes to complete.

Note the SnapshotId, for example "snap-99988877766655544".

Occasionally check the status of the snapshot request by

aws ec2 describe-snapshots --snapshot-id snap-99988877766655544

Wait until it responds "State: completed".

9.4. Create the AMI

The name is up to you, and may contain alphanumerics and ()[]./-_@

aws ec2 register-image \
  --architecture x86_64 \
  --block-device-mappings "DeviceName=/dev/xvda,Ebs={SnapshotId=snap-99988877766655544}" \
  --name my.name.for.imga \
  --root-device-name /dev/xvda \
  --virtualization-type hvm

Note the returned ImageId, such as "ami-01234567890123456"

We are done with the awsTemp machine, and it can be terminated.

9.5. Run the AMI

You can run it using either the AWS EC2 management console, or using the CLI.

An example CLI use is below. Supply your AWS key name, here shown as myKeyName, and your own subnet id and security group id. In this example we expand the root volume to 12 GB.

aws ec2 run-instances \
  --image-id ami-01234567890123456 \
  --count 1 \
  --instance-type t2.micro \
  --key-name myKeyName \
  --subnet-id mySubnetId \
  --security-group-ids mySecurityGroupId \
  --block-device-mapping \
    "[ { \"DeviceName\": \"/dev/xvda\", \"Ebs\": { \"VolumeSize\": 12 } } ]"