Moving my blog from Jekyll to Hugo with Claude Code

Historically this blog started in 2003 was using SPIP, then I moved to Jekyll to get a more simple setup without the need for a database. Jekyll was a good choice at the time, but it has become a bit cumbersome to maintain, especially with the need to run Ruby and its dependencies. I was stuck on an old version of Jekyll and Ruby, and each time I wanted to publish a new post, I had to spend time fighting with the environment.

Read more...

Streamlining School Menu Extraction with Mistral's Latest OCR Technology

With the recent announcement of Mistral OCR, an old idea of using code to extract my son’s school menu has been revived.

The menu looks like this: School Menu

The goal is to extract the starter, main course, dessert, and snack for each day.

Manually extracting this information can be challenging because the document is designed to be visually appealing for humans, not machines. However, with Mistral’s new API, this task can now be accomplished with just a few lines of code.

Read more...

Colvert

In my free time, I’m working on a toy project named: Colvert. It’s allowed me to test some ideas and play with technology I’m interested (Python, DuckDB, HTMX). But more importantly, it’s software I’m using for my personal needs.

It’s fast UX that allows exploring large CSV/Parquet files using SQL. It’s refreshed as you type and get a graphic with one click. It’s much faster than a spreadsheet and as a developer I feel SQL more comfortable.

Their is a toy LLM integration for text to SQL. It’s domain I want to explore more this year.

Read more...

Use Common Crawl to access web data

Common Crawl is a non-profit that freely provides petabytes of web data, making it a goldmine for AI and data projects. Instead of crawling the web yourself, you can tap into their regularly updated archives hosted on AWS.

This guide shows you how to:

  • Access and query the dataset via HTTP, S3, or AWS Athena
  • Use the Common Crawl Index API to locate specific pages
  • Efficiently extract only the data you need without downloading terabytes

Use a proxy with Waydroid

Waydroid is a project that allows you to run Android applications on a Linux distribution. It’s a fork of the project Anbox-Android-in-a-Box. Android applications are run in a container and do not have the overhead of emulators.

This article will explain how to use a proxy with Waydroid and intercept the traffic using a proxy. This can be useful to reverse engineer an API or for security testing.

Read more...