Introducing PagePagePage
Introduction
Sometimes, I wanted to publish a little something. Maybe a small article, maybe a few pictures, maybe a GPS recording from a hike. Possibly all of those together in a single page. However, even though I can do it here, it's quite cumbersome: it's a static site generated from Markdown files, processed by a site generator. It has some benefits, and that's why I chose that in the first place. However, I cannot publish on-the-fly, and I clearly do not want to be in a walled garden such as Instagram.
I could install something like WordPress, but it's very heavy, and the web interface is not the most convenient to use on mobile. A mobile app may exist, but I think I have everything already installed on my phone, and I do not want to install something else.
And then came the enlightenment: Telegram's user interface and user experience (UI/UX) is very polished. There are a lot of features for styling text in a message, it's possible to send individual images, create a group of photos, upload files, etc. It would be nice to simply chat on Telegram, and a website is generated from what you send. And it turns out it's possible, thanks to Telegram bots!
A bot?
Telegram bots can do multiple different things. What I want from my bot is to generate a website from the messages (whether it's text or images) it receives. Therefore, it needs to receive everything I send. For that, after creating a bot, I define a webhook. By registering a webhook for my Telegram bot, Telegram servers will forward all messages sent to my bot to the URL of my choice.
After checking the documentation, it turns out that I need to register my webhook as such:
curl -F "url=https://b8171e08d567.ngrok-free.app/src/PagePagePage.php" https://api.telegram.org/botSECRET_BOT_ID:SECRET_BOT_KEY/setWebhook
Yep, I've decided on a PHP endpoint to receive all the messages from Telegram. It's widespread, it's easy to find a web host, and I naïvely thought I could use some free hosting with the French provider Free.fr. Plot twist, Telegram requires an HTTPS endpoint (which is good), and the personal homepages from Free are stuck in the past, with PHP 4 and no HTTPS at all. Too bad, I'll be using something else. For example, the current server at <gregoire.surrel.org>!
But I did not do that (for development). As you noticed, the address of the webhook is b8171e08d567.ngrok-free.app
. It turns out that many developers had the problem of "How can I develop my script reacting to a webhook, while being unable to route the requests to my laptop that's hidden from the Internet behind a NAT?" It turns out there is no local Telegram simulator, but the nice people at ngrok provide the free service of redirecting all requests they receive on a specific address assigned to your computer directly to your computer. Yes, this sentence makes sense. Basically, run the ngrok
client locally on your computer, and it will connect to the ngrok
servers, and keep the connection alive for incoming requests.
Introducing the PagePagePage bot
My bot is called PagePagePage, and of course I've created a logo:
I've created the first draft of the service in a day or two, with a very simplified approach. For posterity, I've kept that very first version in the repository. There is zero structure, just a quick and dirty concept. Then I spent many more hours creating a nice-ish architecture, with a lot of features, split into many different files to isolate responsibilities.
Flow and architecture
Let's have a text diagram of the flow:
┌────────────────────────────┐
│ Telegram Webhook │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ PagePagePage.php │
│ (bootstrap + kernel entry) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ BotKernel.php │
│ (Main orchestrator logic) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ UpdateParser.php │
│ (Parses Telegram update → │
│ DTO objects) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ TelegramUpdate DTO │
│ (Structured representation │
│ of the update) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ CommandRouter.php │
│ (Routes to correct command │
│ or callback handler) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ CommandBus.php │
│ (Dispatches command class) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────────────────┐
│ [SpecificCommand].php │
│ (e.g. StartCommand, CreateArticle...) │
└────────────────────────────────────────┘
There are additional abstractions here and there because the code evolved from simple to more complex: there are some helper files, some configuration context is passed from place to place, etc. Nothing overwhelming, but still, more complexity than a single, flat file.
Database
One pain point I had was the database: I first started with serializing all the data to a parsable file. I've used both PHP and JSON, but while it worked well most of the time, it does NOT work well with concurrent requests! Indeed, this happens easily when sending multiple images in the same Telegram message. As the bot downloads the images received, if a new image arrives while the previous one is still being downloaded, there will be data lost in the "database".
Before you yell "of couuuuurse, you should never do that anyway, you deserved it", let me remind you that:
- It's a single-user service: one author, one database, one website. There is no concurrency to expect, only when there is unexpected concurrency when Telegram actually sends multiple message events at once to the webhook.
- Similar tools (e.g. BlogoText or Shaarli) do or did use a plain text file as a database. And it's rock solid, and lowers the amount of dependencies required to run the tool!
Anyway, I could not serialize properly and neatly the different requests, so I've used the smallest and most widespread database engine available: SQLite.
Security
The security does not exist. Because it does not need to exist.
Actually, I'm a troll, but not so much. First and foremost, the website created by PagePagePage is static. This means that there is no content that is rendered on the fly as all pages are generated ahead of time.
The only interactive part is the Telegram webhook for the bot. This is where Telegram is actually pretty smart: it is possible to register the webhook with a special secret token. All incoming requests from Telegram will bear this secret token, which you can validate. In other words, if an attacker finds the address of your webhook, it's still not possible to send fake requests as the attacker does not know the secret!
File management
I built the service in such a way that all the data for a given Telegram user is stored in a single directory, as well as the potential log files. The service is literally split per-user. The database is in the user's personal directory. The generated website is in the user's personal directory. All the images and assets are in the user's personal directory.
The good question now is "How to expose only the generated website and assets without exposing the database to the world?"
One approach can be to copy all the generated website with the assets to a public directory. While it works, it's very wasteful. Also, it's unclear how to properly and efficiently regenerate the website without re-downloading the photos from Telegram.
The approach I've used relies heavily on symbolic links. Basically, it's a file that redirects to another file in the filesystem. It means that I can create a link from the public folder directly to the generated site in the user's private folder. Similarly, I can store all the pictures and files sent by the user in an assets store and create symbolic links from the generated site to the assets store to get the data. Zero copy, full efficiency!
Website rendering
I've had some fun when rendering the website, using features that are seldom used. For example, it is possible to use CSS colors that depend on the user's system theme. This works especially well with Firefox, while Chrome derivatives will default to a shade of blue.
Along with a dark/light mode that's automatically applied, it yields pretty neat results.
Here is a demo of using the CSS system colors to adapt to the user's theme.
Similarly, it is now possible to give some structure to the CSS files. I have tried to create a clean stylesheet that follows the semantic structure of the HTML:
/* Main article container */
main {
background: Canvas;
article {
section {
p {
margin: 1.2em 0;
}
code {
font-family: monospace;
background-color: ButtonFace;
padding: 0.2em 0.4em;
border-radius: var(--radius);
}
blockquote {
margin: 1em 0;
padding: 0.5em 1em;
background-color: ButtonFace;
border-left: 4px solid var(--accent);
&[cite]::after {
content: attr(cite);
display: block;
color: var(--lessaccent);
text-align: right;
font-style: italic;
}
}
/* ... */
}
}
}
Yes, as you have noticed, the generated website is featureless: it is the shiny web3, without any blockchain, without NFTs, without the metaverse. It's also not the web with metrics, statistics, engagement, social anything.
It's the pure web made of simple pages, that are usable, easy to understand, quick to load and fast to display. The absence of features is a feature.
If you read French, you can read Korben's article on the IndieWeb, as PagePagePage is clearly created with that in mind. Or you can directly read the IndieWeb information at the source.
Special features
While the generated site does not contain any Javascript to run, ensuring very fast display, there is an exception. When sending a .gpx
file, the generation embeds a small map rendering the GPS track recorded:
Also, not really a special feature, but something nice to have, is a syndication feed. This way, you can easily subscribe to any PagePagePage website and follow any author you like.
Putting it in production
The service is already in production on PagePagePage.org. Over one month, I have received only a single bug report. Otherwise, it has been flawless 👌
There are however some fun statistics regarding the access logs. Let's have a quick look at the "404 Page Not Found" errors:
/wp-admin/setup-config.php
/wordpress/wp-admin/setup-config.php
/robots.txt
//shop/wp-includes/wlwmanifest.xml
//xmlrpc.php
//test/wp-includes/wlwmanifest.xml
//wordpress/wp-includes/wlwmanifest.xml
//blog/wp-includes/wlwmanifest.xml
//wp/wp-includes/wlwmanifest.xml
//site/wp-includes/wlwmanifest.xml
Who says that WordPress sites are attacked on a very regular basis? I hope all WordPress administrators apply the security updated the day before they are released, for maximum safety!
Conclusion
I would be happy if you try PagePagePage, and report any problems you might have.
As for closing words, I would like to say the following:
Do not use the PagePagePage.org service! Please host it yourself if you can!