From Confluence to GitLab: How to turn your documentation into the foundation for internal AI use cases
Confluence out, Markdown in GitLab in, then off to an AWS Bedrock knowledge base. On paper, a powerful setup. We show why the approach is strategically strong, where it fails in practice, and how to turn it into a real AI-ready knowledge system.

A lot of software teams are facing the same question right now: do we keep our documentation in Confluence, or do we move it as Markdown files into GitLab and make it ready for AI, versioning and knowledge base ingestion?
I had an interesting conversation about exactly this today. The idea: take existing Confluence documentation, push it through a Markdown exporter, version the files in GitLab and, in a second step, make them available via S3 to an AWS Bedrock knowledge base.
On paper, that sounds like a very strong setup. In practice, it often is. But not automatically.
Why teams migrate Confluence documentation to Markdown
The appeal is easy to understand. Confluence is convenient for collaborative writing. But for long-lived technical documentation, clean change tracking and AI readiness, it is often not ideal. Git-based Markdown documentation has several real advantages here.
1. Versioning is finally taken seriously
In GitLab, documentation is no longer just lying around in a wiki – it becomes part of a controlled change process. Changes go through merge requests, are diffable, reviewable, and can be linked directly to releases, issues or architecture decisions.
Especially for architecture documentation, ADRs, runbooks or API docs, that is a real benefit. What matters there is not only what was documented, but also when, why and by whom something changed.
2. Markdown is far more LLM-friendly than wiki chaos
Markdown is not a silver bullet. But for large language models, it is usually much easier to process than inconsistent, organically grown Confluence pages full of rich text, macros, panels and embedded special formats.
Headings, lists, semantic sections and clearly structured content help later with chunking, retrieval quality and reuse in a knowledge base.
3. Documentation becomes AI-ready
For many teams, this is the actual driver right now. Anyone who stores technical documentation in GitLab in a structured way today and serves it via S3 is not just building a better doc repository. They are building the foundation for internal AI assistants, RAG systems and semantic search.
That makes the approach strategically interesting – not only from a documentation perspective, but also from a business perspective. Whoever structures their knowledge base early gets a head start on every later AI use case.
Is Confluence to Markdown in GitLab really the better solution?
Often, yes – but not by default. The strategy is strong if you treat it as a Docs-as-Code and knowledge architecture initiative. It is weak if you treat it as a mere file format swap.
If you simply export unstructured, outdated or poorly maintained content from Confluence, you are not migrating knowledge. You are migrating chaos – into a new tool.
The technical advantages of the approach
Let us look at the technical side first.
GitLab as the source of truth is a strong model
When code, infrastructure and technical documentation live in the same place, the chance that documentation actually stays current goes up. Teams can couple docs more tightly to their delivery process. Ownership becomes clearer. Reviews become more binding.
This is especially valuable where documentation lives close to the product:
- Architecture decisions (ADRs)
- Runbooks
- Operational knowledge
- API documentation
- Setup and deployment guides
- Developer onboarding
In these areas, Markdown in GitLab is often significantly more robust than classic wiki management.
S3 as the ingestion layer is architecturally clean
Going through S3 is the right call. It separates the editing and maintenance of documentation from the AI ingestion and serving layer. GitLab stays the editorial source. S3 is a clean delivery channel for Bedrock.
That is technically modular and reduces coupling. It also lets you plug in other retrieval or search systems later without rebuilding the underlying documentation foundation.
Where the strategy can fail in practice
Now to the more important part: the risks. This is where a good idea separates from a working system.
1. Export is not the same as lossless migration
The biggest mistake in such projects is to assume that Confluence content can be transferred to Markdown without loss. That is rarely the case.
Once your Confluence documentation depends heavily on embedded macros, panels, drawings, special plugins, complex table constructs or elaborate layouts, you have to expect information loss.
That does not have to be a deal breaker. But it does mean: a migration needs quality control, not just automation.
Solution: migrate in waves, with triage and automated diff reports
Instead of "all at once", a three-stage triage works in practice. First: build an inventory (read the Confluence API, classify pages by last-edit date, author, space, size and macro usage). Second: form three buckets – "migrate as-is", "clean up before migration", "deliberately do not migrate". Third: after every export run, generate an automated diff report that lists per page which macros, embeds or tables got lost during conversion. That list goes straight to the owners as a task – not to "the docs team".
Rule of thumb: ~60 % of Confluence content can be migrated automatically, ~25 % needs manual rework, ~15 % should not be migrated at all. Whoever does not weed out that 15 % drags legacy clutter along permanently.
2. Images, diagrams and tables remain a special case
Many teams overestimate text-based documentation. In reality, critical knowledge often lives in:
- Architecture diagrams
- Screenshots
- PDF attachments
- Tables
- Draw.io or PlantUML content
If that knowledge is not properly described in text, a pure Markdown migration will remain incomplete for AI purposes.
Solution: diagram-as-code plus AI-generated image descriptions
Diagrams belong in the repo as code, not as binary images. Concretely: embed PlantUML, Mermaid or Draw.io XML directly in the Markdown file. That makes the diagram diffable, reviewable and readable by an LLM. For existing screenshots and images you do not want to redraw, a one-time vision pass is worth it: a multimodal model generates a textual description ("The diagram shows a payment service with three upstream consumers and an SQS queue as a buffer") that gets stored as alt text and as a follow-up Markdown paragraph below the image.
Practical tip: convert tables from Confluence into CSV or structured YAML before exporting, instead of squeezing them into a Markdown table. Complex tables are unreadable in Markdown – as files they are far more usable both for humans and for RAG.
3. Permissions quickly become a risk
Confluence often has very fine-grained permissions. An S3-plus-Bedrock setup is, in many cases, much coarser. That is not just a technical concern – it is a governance concern.
Anyone serving content from multiple spaces or teams centrally has to define very clearly:
- which content gets ingested
- who is allowed to query it
- what sensitivity levels exist
- whether multiple knowledge bases are needed
Without that design, a classic mistake is just around the corner: the AI suddenly has access to content that, organisationally, was never intended for that audience.
Solution: confidentiality level in the front matter + separate S3 prefixes per knowledge base
Permissions get decided before ingestion, not at the frontend. Concretely: every Markdown file carries a `confidentiality` field in its front matter (e.g. `public`, `internal`, `restricted`, `secret`). The CI pipeline routes files based on that field into different S3 prefixes. A separate Bedrock knowledge base is set up per sensitivity level, with IAM policies that define exactly which audience may query which KB.
For the migration this means: every page exported from Confluence automatically inherits the source space’s confidentiality level as a default. Owners can raise the level per file in the merge request, but never silently lower it – a CI check enforces that.
Side effect: with this setup, non-technical stakeholders can hit the `internal` KB through a simple AI chat, while developers additionally use the `restricted` KB through IDE integrations – without either side seeing the other.
4. Bad structure stays bad structure
Markdown does not turn bad documentation into good documentation. If pages were unclearly named, unmaintained, without owners or without a lifecycle, the problem persists. It just lives in GitLab now.
So the real question is not "How do we export Confluence to Markdown?", but "How do we structure knowledge so that humans and LLMs can reliably work with it?".
Solution: enforce doc types, templates and lifecycle status
Without a schema, sprawl stays sprawl. Define a small, hard list of document types before the migration – e.g. `adr`, `runbook`, `api`, `concept`, `onboarding`, `howto`. Each doc type has a template (mandatory headings, predefined front matter fields, empty example sections). A GitLab CI pipeline rejects merge requests when mandatory fields are missing or the heading structure does not match the doc type.
On top of that, every document carries a `lifecycle_status` (`active`, `deprecated`, `archived`) and a `last_reviewed_at`. A nightly job automatically flags documents that have not been reviewed for more than 6 months as "review due" – visible as an issue for the owning team. That makes maintenance a visible task, not a nice-to-have.
Plan a one-off cleanup sprint before exporting from Confluence: pages with no owner, no edit in the last 24 months or fewer than 5 views per quarter get archived, not migrated. That often reduces the migration volume by 30–50 %.
The business perspective: why the approach is strategically strong
From a business angle, the approach is often even more interesting than from a technical one.
Less tool lock-in
Knowledge in Markdown files is more portable than knowledge in a proprietary wiki format. That reduces dependencies and increases your long-term ability to move.
Better governance
Git-based docs nudge teams toward ownership, review and clearer responsibilities. That is not always more comfortable, but it is often significantly more effective.
AI enablement with substance
A lot of organisations talk about AI without preparing their knowledge base for it. That is exactly why many internal AI projects later fail. Whoever structures, versions and tags documentation early dramatically reduces the effort needed later for RAG, semantic search and internal knowledge assistants.
Faster answers, better onboarding
When documentation is well structured, AI systems are not the only ones that benefit. Humans find information faster, new team members onboard more efficiently, and the volume of operational questions drops. That is the actual business case.
Our verdict: pros and cons at a glance
In favour of the migration
- clean versioning and traceable changes
- documentation becomes reviewable and release-aligned
- better foundation for LLMs and RAG
- less vendor lock-in
- S3 and Bedrock fit architecturally well
- metadata, structured ingestion and reusability become easier
Against the migration
- Confluence macros and complex layouts do not always translate cleanly
- images, diagrams and embedded content need special handling
- non-technical stakeholders are often reluctant to work in Git-centric processes
- without governance, you only migrate chaos to a new medium
- permissions and sensitivity have to be rethought
The most important improvements for a really good setup
If you want to take this strategy seriously, I would not adopt it one to one – I would improve it.
1. Do not migrate everything into the same target system
Not every Confluence page belongs in GitLab. Technical docs, ADRs, runbooks and product-near content are a great fit for a Docs-as-Code model. For highly collaborative documentation with a short lifespan, other tools are often a better fit.
For example:
- Teams Notes for ongoing alignment, workshops or meeting documentation
- Miro boards for collaborative concept work, early ideas and visual workshops
- Ticket systems or project tools for operational tasks, status updates and short-term project communication
The goal stays clear: Confluence gets replaced, not run in parallel. Maintaining knowledge in two systems means properly maintaining it in neither. So the question is not whether Confluence stays – it is which document type goes into the right target system (GitLab, Teams, Miro, Jira).
2. Standardise front matter and metadata
Every file should carry structured metadata. That improves not only discoverability for humans, but also retrieval quality for AI systems later. A possible example:
--- title: "Payment Service Runbook" · doc_type: "runbook" · system: "payments-platform" · owner: "team-platform" · team: "platform-engineering" · lifecycle_status: "active" · confidentiality: "internal" · tags: [payments, runbook, operations] · source_url: "https://confluence.example.com/display/PLAT/Payment+Service+Runbook" · last_reviewed_at: "2026-04-15" ---
That gives you a clear basis for governance, filtering and later knowledge base usage. Especially in larger organisations, this is often the difference between usable and unusable documentation.
3. Design chunking deliberately
The quality of your later AI answers depends heavily on how content is segmented. For technical docs, it is usually best to chunk along headings, logical sections and clearly separable knowledge units. A runbook step, an architecture decision or an API explanation should stay readable as its own meaningful unit.
4. Do not forget diagrams and visual content
If a diagram matters, it needs textual context. An architecture diagram without a description is of limited use to a retrieval system. Diagrams, screenshots and tables should always be accompanied by explanatory text. AI systems benefit – and so do new team members and non-expert readers.
5. Establish quality gates in GitLab
For example:
- mandatory metadata in the front matter
- automated link checks
- mandatory reviewers via CODEOWNERS
- an explicit owner field per document
- review date / required re-review after X months
- checks for dead references and missing attachments
Only then does "docs in Git" really become a scalable model.
Conclusion: a strong strategy – if you understand it correctly
Migrating Confluence to Markdown in GitLab is not just a format decision. It is an architectural decision about knowledge. Technically, the approach is strong. From a business perspective, it is often even stronger. But only if you do not treat it as a simple export job.
Done right, you get:
- better versioning
- more robust technical documentation
- a cleaner foundation for internal AI use cases
- less lock-in
- more governance
Done wrong, you get:
- Markdown files full of legacy clutter
- lost semantics
- poor retrieval results
- a new mess on top of the old problem
The right question is therefore not "Should we migrate Confluence to Markdown?", but "Which parts of our knowledge belong in a Docs-as-Code model – and how do we turn that into a truly AI-ready knowledge base?". That is where the difference between a doc migration and a real knowledge system begins.
FAQ
Frequently asked questions
Answers to the most important questions on this topic.