Reading mail

Connecting

I have never really worked with the inner workings of mails before. So I’m not quite sure what to expect. But the connection to the server was similar enough to any connection to an imap server (the server, username, password etc):

server = imaplib.IMAP4_SSL(s) # s is server address
server.login(n, p) # n = username p = password
server.select("inbox", readonly=only_read_mails)
obj, data = server.search(None, '(FROM %s UNSEEN)' % f) # f = required from address

The search part above gives back a list (data) with all the mails in the inbox, that matches the search query (from given address and unseen).

The only thing to note here, is the variable only_read_mails. If that is set to True, the mails will not by marked read. This made it quicker to test the fetching and reading of mails. Which brings me to:

Fetching and reading

From there it was easy to download/fetch the mails. I ended up with:

typ, data = server.fetch(num, '(RFC822)')

The parameter given num is from a for loop, and indicates which number mail from the search result from earlier, that should be fetched. The RFC822 part, is the encoding that the mail will be fetched in. I don’t now anything about RFC822. But it was given in a lot of the examples, and worked with the email library, so I kept using it, and didn’t have to read more into it. Not good coding practices. But this is only for fun, after all.

The real work began getting the stuff I wanted from all the data there are in a mail. This is a part of all the data (the mail addresses have been removed):

Date: Fri, 06 Oct 2023 23:31:49 +0200
From: Emil Harder <---@---.--->
To: ---@---.---
Subject: Deploy test
User-Agent: K-9 Mail for Android
Message-ID: <---@---.--->
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary=----XOIDOQFDS4UYUVXEFUPVCF0TJEH7YJ
Content-Transfer-Encoding: 7bit

------XOIDOQFDS4UYUVXEFUPVCF0TJEH7YJ
Content-Type: text/plain;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable

From=20mobile
------XOIDOQFDS4UYUVXEFUPVCF0TJEH7YJ
Content-Type: text/html;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><body><div dir=3D"auto">From mobile</div></body></html=>
------XOIDOQFDS4UYUVXEFUPVCF0TJEH7YJ--

The parts of the mail I wanted, was the from address, the subject field and the body. I realised that I could use the date from the mail and use that for publishing date for the Hugo page.

That’s all fine and well. But when I suddenly tried to test it with a mail send from Thunderbird, I got this:

Date: Fri, 6 Oct 2023 22:18:45 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: da
To: ---@---.---
From: Emil Harder <---@---.--->
Subject: Efter test
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Stadig fyuldt?

Getting the right outputs

First of all: I’m glad I didn’t “just” try to handle the string with some splitting and regexp.. That would have been a draft. Luckily the email library made it very easy to extract those parts. First the whole mail:

mail = email.message_from_bytes(data[0][1])

The date parts was easy to get:

date = decode_header(mail["Date"])[0][0]

The subject (and later body as you’ll see later) needs some decoding to be able to handle utf-8:

text, encoding = email.header.decode_header(mail['subject'])[0]
subject = text.decode('utf-8')

There might be a prettier solution than this. With fewer variables and writes to ememory. But again; it works and I grew tired of testing.

The thing that caught me was different types of email. The Content-Type part. In the first text bite, the Content-Type for the mail is multipart/alternative, in the second bite is it text/plain (and some other stuff).

While there might be others kinds of mail types, I don’t intent to use other mail clients than K-9 and Thunderbird. So I take the chance and only check for those two kinds in the code.

The email library can give the content type. Which makes it easy to create an if statement:

if mail.is_multipart():
    ...
elif mail.get_content_type() == "text/plain":
    ...

While this isn’t the most elegant if statement I’ve made, it works. I could not find any “is_text” function, but I could have used something like:

mail.get_content_type() == "multipart/alternative"

~~I might do that one day, if the sloppiness will ever irritate me enough~~.

Scratch that! I did it.. Now it looks better, and I can sleep again:

if mail.get_content_type() == "multipart/alternative":
    ...
elif mail.get_content_type() == "text/plain":
    ...

For the multipart, another for loop is used to traverse the different parts of the mail. (The parts between ——XOIDOQFDS4UYUVXEFUPVCF0TJEH7YJ.) I used an if statement to look for the text/plain part. That could be stored, just like when the whole mail is a text/plain type:

body = part.get_payload(decode=True).decode('utf-8')

That pretty much all there is to the “mail handling” part. Next post will be about the “text formatting” and “file creating” parts.