Is migrating from Confluence to Markdown in GitLab worth it at all?

For long-lived technical documentation – ADRs, runbooks, API docs, architecture decisions, onboarding – almost always. You get cleaner versioning, far better LLM processability and reduced vendor lock-in. For highly collaborative or short-lived content (meeting notes, workshops), tools like Confluence, Teams or Miro are often still the better choice. The right answer is rarely "migrate everything" – it is "the right document type into the right system".

Can Confluence content be exported to Markdown losslessly?

In practice, rarely. As soon as macros, panels, embedded diagrams, plugins or complex tables are involved, you should expect information loss. A serious migration therefore needs more than an exporter – it needs a quality control concept: which pages are reworked manually, which are deliberately not migrated, and how are diagrams or screenshots described in text so they remain useful for AI?

Why S3 and AWS Bedrock as the knowledge base – why not directly from GitLab?

GitLab is the ideal editorial source, but it is not an ingestion layer for Bedrock. With S3 in between, you cleanly separate maintenance from delivery: the build pipeline pushes approved Markdown files (with metadata) to S3, and Bedrock reads from there. That keeps the model modular, allows filtering by sensitivity level and makes a future switch to other retrieval systems significantly easier.

How do you prevent the AI from accessing confidential content?

Permissions have to be decided before ingestion, not at the frontend. Practically: confidentiality levels in the front matter, separate S3 prefixes per sensitivity level, possibly multiple Bedrock knowledge bases per audience, and a clear filter step in the build pipeline. Without this design, that is exactly the risk: the AI suddenly knows things that were never intended for everyone.

What is the most important first step before the migration?

Defining the metadata / front matter schema and the ownership rules. Anyone who migrates first and adds metadata later ends up with an inconsistent knowledge base. Whoever starts with a clear schema (doc_type, owner, lifecycle_status, confidentiality, tags, last_reviewed_at) builds a foundation that dramatically improves both human reviews and RAG retrieval from day one.

Back to Blog

April 22, 202610 min read

From Confluence to GitLab: How to turn your documentation into the foundation for internal AI use cases

Confluence out, Markdown in GitLab in, then off to an AWS Bedrock knowledge base. On paper, a powerful setup. We show why the approach is strategically strong, where it fails in practice, and how to turn it into a real AI-ready knowledge system.

Software DevelopmentAI Automation

Migrating from a Confluence wiki to Markdown files in GitLab as the foundation for an AI knowledge base

A lot of software teams are facing the same question right now: do we keep our documentation in Confluence, or do we move it as Markdown files into GitLab and make it ready for AI, versioning and knowledge base ingestion?

I had an interesting conversation about exactly this today. The idea: take existing Confluence documentation, push it through a Markdown exporter, version the files in GitLab and, in a second step, make them available via S3 to an AWS Bedrock knowledge base.

On paper, that sounds like a very strong setup. In practice, it often is. But not automatically.

Why teams migrate Confluence documentation to Markdown

The appeal is easy to understand. Confluence is convenient for collaborative writing. But for long-lived technical documentation, clean change tracking and AI readiness, it is often not ideal. Git-based Markdown documentation has several real advantages here.

1. Versioning is finally taken seriously

In GitLab, documentation is no longer just lying around in a wiki – it becomes part of a controlled change process. Changes go through merge requests, are diffable, reviewable, and can be linked directly to releases, issues or architecture decisions.

Especially for architecture documentation, ADRs, runbooks or API docs, that is a real benefit. What matters there is not only what was documented, but also when, why and by whom something changed.

2. Markdown is far more LLM-friendly than wiki chaos

Markdown is not a silver bullet. But for large language models, it is usually much easier to process than inconsistent, organically grown Confluence pages full of rich text, macros, panels and embedded special formats.

Headings, lists, semantic sections and clearly structured content help later with chunking, retrieval quality and reuse in a knowledge base.

3. Documentation becomes AI-ready

For many teams, this is the actual driver right now. Anyone who stores technical documentation in GitLab in a structured way today and serves it via S3 is not just building a better doc repository. They are building the foundation for internal AI assistants, RAG systems and semantic search.

That makes the approach strategically interesting – not only from a documentation perspective, but also from a business perspective. Whoever structures their knowledge base early gets a head start on every later AI use case.

Is Confluence to Markdown in GitLab really the better solution?

Often, yes – but not by default. The strategy is strong if you treat it as a Docs-as-Code and knowledge architecture initiative. It is weak if you treat it as a mere file format swap.

If you simply export unstructured, outdated or poorly maintained content from Confluence, you are not migrating knowledge. You are migrating chaos – into a new tool.

The technical advantages of the approach

Let us look at the technical side first.

GitLab as the source of truth is a strong model

When code, infrastructure and technical documentation live in the same place, the chance that documentation actually stays current goes up. Teams can couple docs more tightly to their delivery process. Ownership becomes clearer. Reviews become more binding.

This is especially valuable where documentation lives close to the product:

Architecture decisions (ADRs)
Runbooks
Operational knowledge
API documentation
Setup and deployment guides
Developer onboarding

In these areas, Markdown in GitLab is often significantly more robust than classic wiki management.

S3 as the ingestion layer is architecturally clean

Going through S3 is the right call. It separates the editing and maintenance of documentation from the AI ingestion and serving layer. GitLab stays the editorial source. S3 is a clean delivery channel for Bedrock.

That is technically modular and reduces coupling. It also lets you plug in other retrieval or search systems later without rebuilding the underlying documentation foundation.

Where the strategy can fail in practice

Now to the more important part: the risks. This is where a good idea separates from a working system.

1. Export is not the same as lossless migration

The biggest mistake in such projects is to assume that Confluence content can be transferred to Markdown without loss. That is rarely the case.

Once your Confluence documentation depends heavily on embedded macros, panels, drawings, special plugins, complex table constructs or elaborate layouts, you have to expect information loss.

That does not have to be a deal breaker. But it does mean: a migration needs quality control, not just automation.

Solution: migrate in waves, with triage and automated diff reports

Instead of "all at once", a three-stage triage works in practice. First: build an inventory (read the Confluence API, classify pages by last-edit date, author, space, size and macro usage). Second: form three buckets – "migrate as-is", "clean up before migration", "deliberately do not migrate". Third: after every export run, generate an automated diff report that lists per page which macros, embeds or tables got lost during conversion. That list goes straight to the owners as a task – not to "the docs team".

Rule of thumb: ~60 % of Confluence content can be migrated automatically, ~25 % needs manual rework, ~15 % should not be migrated at all. Whoever does not weed out that 15 % drags legacy clutter along permanently.

2. Images, diagrams and tables remain a special case

Many teams overestimate text-based documentation. In reality, critical knowledge often lives in:

Architecture diagrams
Screenshots
PDF attachments
Tables
Draw.io or PlantUML content

If that knowledge is not properly described in text, a pure Markdown migration will remain incomplete for AI purposes.

Solution: diagram-as-code plus AI-generated image descriptions

Diagrams belong in the repo as code, not as binary images. Concretely: embed PlantUML, Mermaid or Draw.io XML directly in the Markdown file. That makes the diagram diffable, reviewable and readable by an LLM. For existing screenshots and images you do not want to redraw, a one-time vision pass is worth it: a multimodal model generates a textual description ("The diagram shows a payment service with three upstream consumers and an SQS queue as a buffer") that gets stored as alt text and as a follow-up Markdown paragraph below the image.

Practical tip: convert tables from Confluence into CSV or structured YAML before exporting, instead of squeezing them into a Markdown table. Complex tables are unreadable in Markdown – as files they are far more usable both for humans and for RAG.

3. Permissions quickly become a risk

Confluence often has very fine-grained permissions. An S3-plus-Bedrock setup is, in many cases, much coarser. That is not just a technical concern – it is a governance concern.

Anyone serving content from multiple spaces or teams centrally has to define very clearly:

which content gets ingested
who is allowed to query it
what sensitivity levels exist
whether multiple knowledge bases are needed

Without that design, a classic mistake is just around the corner: the AI suddenly has access to content that, organisationally, was never intended for that audience.

Solution: confidentiality level in the front matter + separate S3 prefixes per knowledge base

Permissions get decided before ingestion, not at the frontend. Concretely: every Markdown file carries a `confidentiality` field in its front matter (e.g. `public`, `internal`, `restricted`, `secret`). The CI pipeline routes files based on that field into different S3 prefixes. A separate Bedrock knowledge base is set up per sensitivity level, with IAM policies that define exactly which audience may query which KB.

For the migration this means: every page exported from Confluence automatically inherits the source space’s confidentiality level as a default. Owners can raise the level per file in the merge request, but never silently lower it – a CI check enforces that.

Side effect: with this setup, non-technical stakeholders can hit the `internal` KB through a simple AI chat, while developers additionally use the `restricted` KB through IDE integrations – without either side seeing the other.

4. Bad structure stays bad structure

Markdown does not turn bad documentation into good documentation. If pages were unclearly named, unmaintained, without owners or without a lifecycle, the problem persists. It just lives in GitLab now.

So the real question is not "How do we export Confluence to Markdown?", but "How do we structure knowledge so that humans and LLMs can reliably work with it?".

Solution: enforce doc types, templates and lifecycle status

Without a schema, sprawl stays sprawl. Define a small, hard list of document types before the migration – e.g. `adr`, `runbook`, `api`, `concept`, `onboarding`, `howto`. Each doc type has a template (mandatory headings, predefined front matter fields, empty example sections). A GitLab CI pipeline rejects merge requests when mandatory fields are missing or the heading structure does not match the doc type.

On top of that, every document carries a `lifecycle_status` (`active`, `deprecated`, `archived`) and a `last_reviewed_at`. A nightly job automatically flags documents that have not been reviewed for more than 6 months as "review due" – visible as an issue for the owning team. That makes maintenance a visible task, not a nice-to-have.

Plan a one-off cleanup sprint before exporting from Confluence: pages with no owner, no edit in the last 24 months or fewer than 5 views per quarter get archived, not migrated. That often reduces the migration volume by 30–50 %.

The business perspective: why the approach is strategically strong

From a business angle, the approach is often even more interesting than from a technical one.

Less tool lock-in

Knowledge in Markdown files is more portable than knowledge in a proprietary wiki format. That reduces dependencies and increases your long-term ability to move.

Better governance

Git-based docs nudge teams toward ownership, review and clearer responsibilities. That is not always more comfortable, but it is often significantly more effective.

AI enablement with substance

A lot of organisations talk about AI without preparing their knowledge base for it. That is exactly why many internal AI projects later fail. Whoever structures, versions and tags documentation early dramatically reduces the effort needed later for RAG, semantic search and internal knowledge assistants.

Faster answers, better onboarding

When documentation is well structured, AI systems are not the only ones that benefit. Humans find information faster, new team members onboard more efficiently, and the volume of operational questions drops. That is the actual business case.

Our verdict: pros and cons at a glance

In favour of the migration

clean versioning and traceable changes
documentation becomes reviewable and release-aligned
better foundation for LLMs and RAG
less vendor lock-in
S3 and Bedrock fit architecturally well
metadata, structured ingestion and reusability become easier

Against the migration

Confluence macros and complex layouts do not always translate cleanly
images, diagrams and embedded content need special handling
non-technical stakeholders are often reluctant to work in Git-centric processes
without governance, you only migrate chaos to a new medium
permissions and sensitivity have to be rethought

The most important improvements for a really good setup

If you want to take this strategy seriously, I would not adopt it one to one – I would improve it.

1. Do not migrate everything into the same target system

Not every Confluence page belongs in GitLab. Technical docs, ADRs, runbooks and product-near content are a great fit for a Docs-as-Code model. For highly collaborative documentation with a short lifespan, other tools are often a better fit.

For example:

Teams Notes for ongoing alignment, workshops or meeting documentation
Miro boards for collaborative concept work, early ideas and visual workshops
Ticket systems or project tools for operational tasks, status updates and short-term project communication

The goal stays clear: Confluence gets replaced, not run in parallel. Maintaining knowledge in two systems means properly maintaining it in neither. So the question is not whether Confluence stays – it is which document type goes into the right target system (GitLab, Teams, Miro, Jira).

2. Standardise front matter and metadata

Every file should carry structured metadata. That improves not only discoverability for humans, but also retrieval quality for AI systems later. A possible example:

--- title: "Payment Service Runbook" · doc_type: "runbook" · system: "payments-platform" · owner: "team-platform" · team: "platform-engineering" · lifecycle_status: "active" · confidentiality: "internal" · tags: [payments, runbook, operations] · source_url: "https://confluence.example.com/display/PLAT/Payment+Service+Runbook" · last_reviewed_at: "2026-04-15" ---

That gives you a clear basis for governance, filtering and later knowledge base usage. Especially in larger organisations, this is often the difference between usable and unusable documentation.

3. Design chunking deliberately

The quality of your later AI answers depends heavily on how content is segmented. For technical docs, it is usually best to chunk along headings, logical sections and clearly separable knowledge units. A runbook step, an architecture decision or an API explanation should stay readable as its own meaningful unit.

4. Do not forget diagrams and visual content

If a diagram matters, it needs textual context. An architecture diagram without a description is of limited use to a retrieval system. Diagrams, screenshots and tables should always be accompanied by explanatory text. AI systems benefit – and so do new team members and non-expert readers.

5. Establish quality gates in GitLab

For example:

mandatory metadata in the front matter
automated link checks
mandatory reviewers via CODEOWNERS
an explicit owner field per document
review date / required re-review after X months
checks for dead references and missing attachments

Only then does "docs in Git" really become a scalable model.

Conclusion: a strong strategy – if you understand it correctly

Migrating Confluence to Markdown in GitLab is not just a format decision. It is an architectural decision about knowledge. Technically, the approach is strong. From a business perspective, it is often even stronger. But only if you do not treat it as a simple export job.

Done right, you get:

better versioning
more robust technical documentation
a cleaner foundation for internal AI use cases
less lock-in
more governance

Done wrong, you get:

Markdown files full of legacy clutter
lost semantics
poor retrieval results
a new mess on top of the old problem

The right question is therefore not "Should we migrate Confluence to Markdown?", but "Which parts of our knowledge belong in a Docs-as-Code model – and how do we turn that into a truly AI-ready knowledge base?". That is where the difference between a doc migration and a real knowledge system begins.

FAQ

Frequently asked questions

Answers to the most important questions on this topic.

All articles

Software DevelopmentAI Automation