preloader
post-thumb

Last Update: May 2, 2025


BYauthor-thumberic


Keywords

Building an AI Knowledge Base: Using tyo-crawler to Convert Evernote Notes into Markdown

In the age of AI, having a well-structured and easily accessible knowledge base is crucial. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are revolutionizing how we interact with information. However, the effectiveness of these systems heavily relies on the quality and format of the data they access. Many of us have accumulated vast amounts of information in note-taking applications like Evernote. While Evernote is great for personal organization, it's not inherently designed for seamless integration with AI systems.

This is where the need for a robust knowledge base in a format like Markdown becomes apparent. Markdown's simplicity and versatility make it an ideal format for AI consumption. In this article, we'll explore how to transform your Evernote notes into a structured Markdown knowledge base using tyo-crawler, a powerful open-source web crawler. This process will prepare your data for integration with various open-source RAG tools, enhancing the capabilities of your AI applications.

The Power of RAG and the Need for Structured Data

Retrieval-Augmented Generation (RAG) is a game-changer in the AI landscape. It enhances LLMs by allowing them to access and incorporate external knowledge, leading to more accurate, relevant, and grounded responses. As we discussed in our previous post, "Building a Retrieval-Augmented Generation System Using Open-Source Tools", open-source tools like Open Web UI, Verba, Onyx, RagFlow, and others provide robust frameworks for building RAG systems.

However, these tools are only as good as the data they consume. RAG systems thrive on structured, easily parsable data. While Evernote is excellent for capturing and organizing notes, its proprietary format isn't directly compatible with most RAG tools. This is where converting your Evernote notes to Markdown becomes essential.

Why Markdown?

Markdown is a lightweight markup language that's easy to read and write. Its simplicity makes it ideal for:

  • AI Consumption: RAG systems can easily parse and understand Markdown's structure.
  • Version Control: Markdown files can be tracked using version control systems like Git.
  • Interoperability: Markdown is widely supported across various platforms and tools.
  • Flexibility: It can be easily converted to other formats like HTML, PDF, etc.

Introducing tyo-crawler: Your Evernote-to-Markdown Bridge

tyo-crawler is a versatile, open-source web crawler designed for efficient data extraction and transformation. As detailed in our previous articles, "A Deep Dive into the tyo-crawler Web Crawler" and "How to Scrape Twitter (X) Data Using tyo-crawler", tyo-crawler excels at navigating complex websites, extracting specific data, and converting it into structured formats.

While tyo-crawler is primarily used for web scraping, its powerful parsing and transformation capabilities make it an excellent tool for converting Evernote notes to Markdown. Here's how we can leverage it:

Please refer to this post mentioned before for the installation and usage instructions of tyo-crawler. Similar to the Twitter scraping example, we will use an evernote processor and a specific set of actions to extract data from Evernote and convert it into Markdown format.

Actions File for Evernote

The tyo-crawler uses an actions.json file to define the actions it should perform during the crawling process. This file specifies how to interact with the web pages, including logging in, navigating, and extracting data. The actions file is crucial for customizing the crawling process to suit your needs.

An example of the actions file (examples/evernote.example.json) for crawling EverNote notes is located in the tyo-crawler directory under the examples folder. You don't need to modify this file, but you can refer to it for understanding how the crawling process works if you want.

Preparing Evernote Links

Assuming you have a list of Evernote links, you can create a file named evernote_links.txt containing the URLs of your notes. Each link should be on a new line. For example:

plaintext
https://www.evernote.com/shard/s123/nl/1234567890/abcd1234-5678-90ef-ghij-klmnopqrstuv
https://www.evernote.com/shard/s123/nl/1234567890/wxyz9876-5432-10ab-cdef-ghijklmnopqr

Running tyo-crawler

To run tyo-crawler with the Evernote processor, use the following command:

bash
tyo-crawler --browser-wait-time 8000 --actions-file ./examples/evernote.example.json --with-cookies true  --processor evernote  --links-file ./evernote_links.txt 

All converted Markdown files will be saved in the output directory.

Example Output

The output will be a set of folders, each corresponding to an Evernote note. The folder name will be named based on the note titles and will contain the note's content (index.md), including text, and links to images (saved under ./[NOTE_TITLE]/resources/). For example, a note titled "My First Note" will be saved as output/My_First_Note in the output directory. The content will be structured in Markdown format, making it easy to read and integrate with other tools.

Organizing Your Markdown Knowledge Base

After the conversion, you'll have a collection of Markdown files. It's a good practice to organize them into folders based on topics or notebooks. This structure will make it easier to manage and use your knowledge base with RAG tools.

Integrating with RAG Tools

Now that your Evernote notes are in Markdown, you can easily integrate them with open-source RAG tools. Here's a brief overview of how you might do this with some of the tools we discussed earlier:

  • Open Web UI: Upload your Markdown files to Open Web UI's knowledge library. The system will index them, allowing you to retrieve relevant information during conversations.
  • Verba: Use Verba's document ingestion feature to add your Markdown files. Verba will create embeddings and enable citation-based answers.
  • Onyx: Connect Onyx to the folder containing your Markdown files. Onyx will automatically index the content and make it searchable.
  • RagFlow: Use RagFlow to ingest your markdown files, and then you can start to ask questions.
  • RAG Web UI: You can upload your markdown files to create a knowledge base, and then start to chat with it.
  • MaxKB: You can upload your markdown files to create a knowledge base, and then start to chat with it.

Conclusion

Building an AI knowledge base is a crucial step in leveraging the power of RAG systems. By converting your Evernote notes to Markdown using tyo-crawler, you create a structured, AI-ready data source. This process not only enhances the capabilities of your RAG applications but also provides a more organized and accessible way to manage your personal or organizational knowledge.

tyo-crawler's flexibility and power make it an invaluable tool for data transformation tasks like this. As AI continues to evolve, having a well-structured knowledge base will become increasingly important. By taking the time to convert your notes to Markdown, you're setting yourself up for success in the AI-driven future.

Previous Article
post-thumb

Oct 03, 2021

Setting up Ingress for a Web Service in a Kubernetes Cluster with NGINX Ingress Controller

A simple tutorial that helps configure ingress for a web service inside a kubernetes cluster using NGINX Ingress Controller

Next Article
post-thumb

Apr 13, 2025

How to Scrape Twitter (X) User's Tweets Using tyo-crawler

In this article, we will explore how to use tyo-crawler to scrape tweets from X.

agico

We transform visions into reality. We specializes in crafting digital experiences that captivate, engage, and innovate. With a fusion of creativity and expertise, we bring your ideas to life, one pixel at a time. Let's build the future together.

Copyright ©  2025  TYO Lab