Positioning

Technical documentation is often the first casualty of time pressure in data and DWH projects. Acceptances are signed, systems go live -- and documentation remains incomplete or outdated. The consequences show up later: during onboarding of new team members, when troubleshooting poorly documented ETL pipelines, or when operational responsibility changes hands. I bring technical documentation into every project as a first-class deliverable -- not as an afterthought, but as an integral part of the work.

My background is in database development, DWH engineering, and infrastructure operations. I write documentation not from the perspective of a pure technical writer, but as someone who has built and operated the underlying systems. This combination -- technical depth and documentation experience -- allows me to produce Data Dictionaries that are genuinely accurate, runbooks that actually work in the field, and architecture documents that make decisions comprehensible.

Concrete references: at an insurance company I contributed to and extended a web-based, XML-backed database documentation system. At a public sector / research organisation I created and maintained requirements and test documentation in a regulated DWH environment. For my own infrastructure setup I built a central operations documentation repository integrating automated configuration backups.

Documentation is also a question of format. Not every document belongs in a Word file or a Confluence wiki. Runbooks benefit from Git versioning. Data Dictionaries can be auto-generated from database metadata. Architecture concepts gain from diagrams maintained alongside the text. I choose format and tooling based on the intended use -- favouring open, long-lived standards.

Core principle: Technical documentation is only valuable when it is current, findable, and comprehensible. I build documentation structures that can be sustainably maintained -- through automation, versioning, and clear ownership.

Importance of Technical Documentation in DWH and Infrastructure

In data and DWH projects, documentation covers far more than technical descriptions of tables and fields. It includes the business meaning of the data, its origin (source systems, transformation logic), the dependencies between tables and ETL pipelines, the operational procedures for monitoring and error handling, and the test documentation required for formal acceptance. A complete documentation picture has at least four layers: the concept layer (architecture, decisions), the operations layer (setup, monitoring, configuration), the runbook layer (step-by-step for recurring tasks), and the data layer (Data Dictionary, lineage, metadata).

Four documentation layers: Concept, Operations, Runbook and Data Dictionary at a glance

The four layers of technical documentation: from architecture decisions and operations guides through runbooks to the complete Data Dictionary. All layers versioned in Git and published to the wiki or as static HTML.

Knowledge silos and their consequences

Knowledge silos arise when documentation lives in the heads of individuals rather than in written, findable form. In DWH environments this is especially critical: if only one person knows how a particular ETL package transforms source data, or which column in a fact table implements a given business rule, the team has a single point of failure. The consequences are onboarding times of weeks instead of days, errors in extensions due to missing context, and avoidable operational outages.

Documentation as a measurable investment

The effort invested in documentation pays back quickly: complete operations documentation significantly reduces onboarding times for new team members. Automated Data Dictionary generation from database metadata saves hours of manual maintenance per week. Versioned runbooks prevent critical steps from being forgotten under the stress of a production incident. Requirements documentation protects both parties -- client and service provider -- in disputes about the agreed scope of work.

Concept documentation: architecture decisions, system boundaries, integration points
Operations documentation: server overviews, configurations, contacts, SLAs
Runbooks: recurring tasks, error handling, emergency procedures step by step
Data Dictionary: tables, views, columns, data types, business meaning, relationships
Data Lineage: origin and transformation of data through all layers
Interface documentation: API parameters, data formats, external system interfaces
Requirements documentation: business requirements, acceptance criteria, open items
Test documentation: test cases, results, acceptance records

I recommend planning documentation from the start as a formal project deliverable -- with defined deliverables, assigned owners, and a review process that assures quality. Documentation created as a late afterthought is usually incomplete and goes stale rapidly.

Good documentation is not a question of effort, but of habit. Whoever builds documentation as an integral step into every development and operational process has less work in the long run -- not more.

Operations Documentation & Runbooks

Operations documentation is the institutional memory of an IT environment. It describes what exists (inventory, configurations, interfaces), how it is operated (processes, tasks, schedules), and what to do when things go wrong (runbooks, escalation paths). Without this foundation, operations depend on the knowledge of specific individuals -- a risk that materialises at the worst possible moment.

In my own infrastructure setup -- based on Proxmox VE, Debian, NGINX, WireGuard and various services -- I built a central operations documentation repository integrating automated configuration backups. Configuration changes are automatically saved as text files and version-controlled, so the current state is always traceable without manual documentation effort.

Structure of operations documentation

Useful operations documentation follows a consistent structure. At the top level sits the inventory: all servers, services, interfaces, access points and responsibilities. Below that come system-specific sections with configuration details, dependencies and operational notes. The third layer covers runbooks for recurring tasks and emergency procedures.

Runbooks: actionable and tested

A runbook is only valuable if it works in the field. That means: concrete step-by-step instructions, no room for interpretation, explicit requirements for permissions and tools, and a realistic execution time. Runbooks must be tested regularly -- a runbook that has not been executed in two years may well be outdated. Git versioning makes changes traceable and prevents different teams from working with different versions.

Markdown · Runbook template (operations documentation for a SQL Server service)

# Runbook: SQL Server Agent -- Restart after planned patching

## Metadata
- **Version**: 1.3
- **Last tested**: 2025-04-15
- **Tested by**: DBA team
- **Applies to**: SQL Server 2019 / 2022 on-premises

## Prerequisites
- Windows account with local administrator rights on the target server
- PowerShell 5.1+ or PowerShell 7+
- dbatools module installed (Import-Module dbatools)

## Procedure

### Step 1: Check and stop active jobs
```powershell
# Show all currently running SQL Agent jobs
Get-DbaAgentJob -SqlInstance 'sql-prod-01' | Where-Object { $_.CurrentRunStatus -eq 'Executing' }
# Stop jobs if necessary
Stop-DbaAgentJob -SqlInstance 'sql-prod-01' -Job 'ETL_Night_FullLoad'
```

### Step 2: Stop the service in a controlled manner
```powershell
Stop-DbaService -SqlInstance 'sql-prod-01' -Type Agent
# Status check after 30 seconds
Start-Sleep 30
Get-DbaService -SqlInstance 'sql-prod-01' | Select-Object ServiceName, State
```

### Step 3: Await patching (performed by patch team)

### Step 4: Restart and verify the service
```powershell
Start-DbaService -SqlInstance 'sql-prod-01' -Type Agent
# All jobs in status 'Idle' = restart successful
Get-DbaAgentJob -SqlInstance 'sql-prod-01' | Select-Object Name, CurrentRunStatus
```

## Error handling
| Symptom                       | Cause                              | Action                               |
|-------------------------------|-----------------------------------|--------------------------------------|
| Service does not start        | Missing service account password  | Check / renew password in AD         |
| Jobs stuck in Running status  | Hung job                          | Execute sp_stop_job via T-SQL        |
| Connection errors after start | Network firewall issue            | Verify port 1433 in firewall rules   |

## Escalation
If unresolved after 3 attempts: DBA on-call +49 173 XXXXXXX

Runbooks in Markdown format can be versioned directly in Git and published to Confluence or GitHub Pages. The error-handling table makes the runbook immediately usable in the field.

The runbook template above demonstrates the practical value of Markdown for operations documentation: clear structure, metadata header for version tracking, embedded code examples that can be copied directly, and an error-handling table. Such templates can be standardised across the organisation and used as the basis for new runbooks.

Runbooks that exist but have never been tested create false confidence. Testing a runbook once per quarter -- including emergency procedures -- is the only way to ensure the instructions actually work when needed.

Data Dictionary & Data Model Documentation

A Data Dictionary is the central reference for all tables, views, columns and relationships in a database environment. It answers the questions that business users and developers ask every day: What does the column 'customer_status' mean? What values can it hold? Where does it come from? Which tables depend on it? Without this information, every extension or troubleshooting effort involves unnecessary overhead.

During my work at an insurance company I contributed to a web-based, XML-backed database documentation system. The system stored metadata about tables, views and columns in structured form and generated an HTML view accessible to both developers and business users. This project sharpened my understanding of how a Data Dictionary must be both technically accurate and business-comprehensible.

Structure of a complete Data Dictionary

A complete Data Dictionary contains at the table level: physical name, logical name, business description, category (fact table / dimension / staging), source tables, and dependencies. At the column level: physical column name, logical name, data type, nullable, value range, business meaning, example values, and transformation rule (if derived). In addition: relationships (foreign keys), partitioning information, and version / change history.

SQL / T-SQL · Extract a Data Dictionary from sys views and INFORMATION_SCHEMA

-- Extracts a complete Data Dictionary from SQL Server system views.
-- Returns tables, columns, data types, nullable status and extended properties.
-- Extended properties (MS_Description) can be maintained in SSMS or via
-- sp_addextendedproperty and are read here as machine-readable metadata.

SELECT
    t.TABLE_SCHEMA                          AS schema_name,
    t.TABLE_NAME                            AS table_name,
    t.TABLE_TYPE                            AS object_type,      -- BASE TABLE or VIEW
    c.COLUMN_NAME                           AS column_name,
    c.ORDINAL_POSITION                      AS col_order,
    c.DATA_TYPE                             AS data_type,
    COALESCE(
        CAST(c.CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10)),
        CAST(c.NUMERIC_PRECISION        AS VARCHAR(10))
    )                                       AS type_detail,
    c.IS_NULLABLE                           AS is_nullable,
    -- Business description from Extended Properties (MS_Description)
    CAST(ep_t.value AS NVARCHAR(500))       AS table_description,
    CAST(ep_c.value AS NVARCHAR(500))       AS column_description
FROM INFORMATION_SCHEMA.TABLES  t
JOIN INFORMATION_SCHEMA.COLUMNS c
    ON  t.TABLE_SCHEMA = c.TABLE_SCHEMA
    AND t.TABLE_NAME   = c.TABLE_NAME
-- Extended Property at table level
LEFT JOIN sys.extended_properties ep_t
    ON  ep_t.major_id    = OBJECT_ID(t.TABLE_SCHEMA + '.' + t.TABLE_NAME)
    AND ep_t.minor_id    = 0
    AND ep_t.name        = 'MS_Description'
-- Extended Property at column level
LEFT JOIN sys.extended_properties ep_c
    ON  ep_c.major_id    = OBJECT_ID(t.TABLE_SCHEMA + '.' + t.TABLE_NAME)
    AND ep_c.minor_id    = c.ORDINAL_POSITION
    AND ep_c.name        = 'MS_Description'
WHERE t.TABLE_SCHEMA NOT IN ('sys','information_schema')
ORDER BY t.TABLE_SCHEMA, t.TABLE_NAME, c.ORDINAL_POSITION;

This query is the starting point for any automated documentation generation. The MS_Description fields can be maintained via SSMS, SSDT or sp_addextendedproperty and are thus available as a machine-readable metadata source.

Foreign keys and relationship diagrams

Alongside table and column metadata, relationships between tables are an essential component of the Data Dictionary. Foreign-key relationships can be extracted from sys.foreign_keys and sys.foreign_key_columns and visualised in a relationship diagram. In large data models a complete ER diagram is often too complex; thematic subdivision by business area or star schema is more practical.

A Data Dictionary that is maintained manually goes stale quickly. The combination of automated extraction from database metadata with manually added business descriptions is the optimal approach: technical accuracy through automation, business context through human curation.

Docs-as-Code: Markdown, Git and CI/CD

The 'Docs-as-Code' concept treats documentation like source code: written in text formats (Markdown, AsciiDoc, reStructuredText), versioned in Git, reviewed via pull requests, and automatically built and published via CI/CD. This approach has clear advantages over traditional methods (Word documents, unversioned wikis): every change is traceable, reviews can be integrated into the code review process, and the published documentation is always in sync with the repository.

Docs-as-Code workflow: from sources and metadata through generator and CI/CD to publication

The complete Docs-as-Code workflow: DB metadata and code comments are extracted, a generator (Sphinx, mkdocs, Python script) builds Markdown or HTML, the CI/CD pipeline runs review and tests, and the result is published to Confluence, GitHub Pages or as static HTML.

Markdown as a universal documentation format

Markdown is simple enough to be readable without tooling, and powerful enough to represent complex structures: tables, code blocks, images, links and nesting. With extensions (Mermaid diagrams in GitHub Markdown, Admonitions in mkdocs, Directives in Sphinx) Markdown becomes a fully capable format for technical content. The frontmatter header (YAML block at the start of the file) allows metadata such as author, creation date, version, and category to be captured in structured form.

Markdown · Frontmatter structure for technical documentation articles

---
# Frontmatter structure for technical documentation articles.
# Parsed by mkdocs, Sphinx, and many other tools.
title:       "DWH Customer Dimension Load Process"
slug:        dwh-customer-dimension-load
category:    ETL Documentation
version:     "2.1"
date:        2025-03-10
last_review: 2025-05-20
reviewed_by: Stefan Corsten
status:      active         # active | deprecated | in-progress
tags:
  - DWH
  - ETL
  - CustomerDimension
  - SSIS
related:
  - dwh-source-crm
  - ssis-deployment-process
---

# DWH Customer Dimension Load Process

## Purpose and overview
This article describes the SSIS-based load process for the customer dimension in the DWH.
Source system is the CRM (table `dbo.customers`), target is `dim.Customer` in the DWH.

## Business rule
- Only customers with `active_flag = 1` are loaded.
- Address changes are historicised as SCD Type 2.
- Missing postal codes are substituted with '00000'.

## Technical details
- SSIS package: `Load_DimCustomer.dtsx`
- Schedule: daily 02:30, SQL Agent job `DWH_NightLoad`
- Dependency: `Load_Staging_CRM` must have completed.
- Expected runtime: 8-12 minutes

## Error handling
If errors exceed 10 rejected rows: job aborts and alert sent to dwh-team@example.com

The frontmatter header makes metadata machine-readable: mkdocs automatically generates table-of-contents, search indexes and navigation trees from these fields. The review workflow is integrated directly into the header.

CI/CD for documentation

A CI/CD pipeline for documentation checks on every commit whether the documentation is valid (no broken links, no invalid frontmatter), complete (all required fields present) and consistent (links to other articles resolve correctly). After a successful check the documentation is automatically deployed to the target system -- without manual release steps.

CI YAML · mkdocs build and deployment in GitHub Actions / GitLab CI

# CI pipeline for Docs-as-Code with mkdocs.
# Triggered on every push to main or on pull requests.
# Compatible with GitHub Actions (file: .github/workflows/docs.yml)

name: Build and Deploy Documentation

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build-docs:
    runs-on: ubuntu-latest
    steps:
      # Step 1: Check out the repository
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # full Git history for git-revision-date plugin

      # Step 2: Set up Python with mkdocs and plugins
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install dependencies
        run: |
          pip install mkdocs-material                       mkdocs-git-revision-date-localized-plugin                       mkdocs-minify-plugin                       pymdown-extensions

      # Step 3: Build docs and check for broken links
      - name: mkdocs build (strict mode -- fail on broken links)
        run: mkdocs build --strict

      # Step 4: Deploy only on pushes to main (not on pull requests)
      - name: Deploy to GitHub Pages
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: mkdocs gh-deploy --force

  # Optional: Sphinx-based build (for Python projects)
  # build-sphinx:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - uses: actions/checkout@v4
  #     - run: pip install sphinx sphinx-rtd-theme
  #     - run: sphinx-build -b html docs/ docs/_build/html -W

This pipeline builds the documentation on every push, checks for broken links in strict mode, and deploys to GitHub Pages. For GitLab CI the structure is equivalent with '.gitlab-ci.yml' and a 'pages' job name.

Docs-as-Code does not mean that all documentation must be written in Markdown. It means that documentation deserves the same discipline as code: versioning, review, automated testing, and reproducible deployment.

Automated Documentation Generation from DB Metadata

The most labour-intensive documentation task in database projects is maintaining the Data Dictionary. Documenting hundreds of tables and thousands of columns manually is time-consuming and error-prone -- and outdated documentation is worse than none, because it actively misleads. The solution: automatic generation of the technical base structure from database metadata, combined with manually maintained business descriptions.

Data lineage and metadata documentation flow: from DB metadata via Python/PowerShell to publication

The documentation generation flow: DB metadata (sys.tables, INFORMATION_SCHEMA, SSIS catalogue) is extracted by a Python or PowerShell script, prepared as Markdown files, and published via CI/CD to Confluence or as static HTML.

Python-based Markdown generation from DB metadata

With Python, the SQL queries described above can be turned into complete Markdown documents: one file per table or schema group, with a column table, metadata header, relationship notes, and placeholder sections for manually added business descriptions. The script runs in the CI pipeline or as a SQL Agent job and automatically updates the documentation whenever the schema changes.

Python · Generate Markdown table documentation from DB metadata

#!/usr/bin/env python3
# Generates Markdown files as Data Dictionary documentation from SQL Server metadata.
# Connection via pyodbc, output is one Markdown file per table.
# Prerequisite: pip install pyodbc

import pyodbc
import os
import re

# Connection parameters (in production, read from KeyVault or environment variable)
CONN_STR = (
    "DRIVER={ODBC Driver 18 for SQL Server};"
    "SERVER=localhost;"
    "DATABASE=DWH_Production;"
    "Trusted_Connection=yes;"
)
OUT_DIR = "./docs/data-dictionary"

# SQL: load tables and columns with extended properties
SQL = '''
    SELECT
        t.TABLE_SCHEMA    AS schema_name,
        t.TABLE_NAME      AS table_name,
        c.COLUMN_NAME     AS col_name,
        c.ORDINAL_POSITION AS col_order,
        c.DATA_TYPE       AS data_type,
        COALESCE(
            CAST(c.CHARACTER_MAXIMUM_LENGTH AS NVARCHAR(10)),
            CAST(c.NUMERIC_PRECISION        AS NVARCHAR(10))
        )                 AS type_detail,
        c.IS_NULLABLE     AS nullable,
        CAST(ep_t.value AS NVARCHAR(500)) AS tbl_desc,
        CAST(ep_c.value AS NVARCHAR(500)) AS col_desc
    FROM INFORMATION_SCHEMA.TABLES  t
    JOIN INFORMATION_SCHEMA.COLUMNS c
        ON  t.TABLE_SCHEMA = c.TABLE_SCHEMA
        AND t.TABLE_NAME   = c.TABLE_NAME
    LEFT JOIN sys.extended_properties ep_t
        ON  ep_t.major_id = OBJECT_ID(t.TABLE_SCHEMA+'.'+t.TABLE_NAME)
        AND ep_t.minor_id = 0 AND ep_t.name = 'MS_Description'
    LEFT JOIN sys.extended_properties ep_c
        ON  ep_c.major_id = OBJECT_ID(t.TABLE_SCHEMA+'.'+t.TABLE_NAME)
        AND ep_c.minor_id = c.ORDINAL_POSITION AND ep_c.name = 'MS_Description'
    WHERE t.TABLE_SCHEMA NOT IN ('sys','information_schema')
    ORDER BY t.TABLE_SCHEMA, t.TABLE_NAME, c.ORDINAL_POSITION
'''

def sanitize(name):
    # Derive safe filenames from table names (alphanumeric + hyphen only)
    return re.sub(r'[^a-zA-Z0-9_-]', '_', name).lower()

def gen_markdown(schema, table, tbl_desc, columns):
    # Generate a Markdown file for a single table
    lines = [
        f"---",
        f"title: "{schema}.{table}"",
        f"category: Data Dictionary",
        f"generated: auto",
        f"---",
        f"",
        f"# {schema}.{table}",
        f"",
        f"{tbl_desc or '_No business description available -- please add._'}",
        f"",
        f"## Columns",
        f"",
        f"| # | Column name | Data type | Nullable | Description |",
        f"|---|-------------|-----------|----------|-------------|",
    ]
    for col in columns:
        dtype = col['data_type']
        if col['type_detail']:
            dtype += f"({col['type_detail']})"
        desc = col['col_desc'] or ''
        lines.append(
            f"| {col['col_order']} | `{col['col_name']}` | {dtype} "
            f"| {col['nullable']} | {desc} |"
        )
    lines += ["", "## Relationships", "", "_Please add manually._", ""]
    return "\n".join(lines)

def main():
    os.makedirs(OUT_DIR, exist_ok=True)
    conn = pyodbc.connect(CONN_STR)
    cursor = conn.cursor()
    cursor.execute(SQL)
    rows = cursor.fetchall()
    cols = [d[0] for d in cursor.description]

    # Group rows by table
    tables = {}
    for row in rows:
        rec = dict(zip(cols, row))
        key = (rec['schema_name'], rec['table_name'])
        if key not in tables:
            tables[key] = {'desc': rec['tbl_desc'], 'cols': []}
        tables[key]['cols'].append(rec)

    # Write one Markdown file per table
    for (schema, table), data in tables.items():
        md = gen_markdown(schema, table, data['desc'], data['cols'])
        fname = os.path.join(OUT_DIR, f"{sanitize(schema)}__{sanitize(table)}.md")
        with open(fname, 'w', encoding='utf-8') as f:
            f.write(md)
        print(f"Generated: {fname}")

    print(f"Done: {len(tables)} tables documented.")
    conn.close()

if __name__ == "__main__":
    main()

This script generates a complete Markdown file per table with header, column table and placeholder for manual additions. It can run daily via SQL Agent or CI pipeline to keep the documentation automatically up to date.

PowerShell alternative for Windows environments

In Windows-dominated environments the same approach can be implemented with PowerShell. The dbatools module provides Get-DbaDbTable and Get-DbaDbColumn, which deliver the metadata directly without SQL queries. Combined with Out-File or Set-Content, Markdown files can be generated directly from PowerShell.

Automation and human curation are not mutually exclusive. The script delivers the technical structure accurately and up to date. Business meaning, business rules and context come from people. This combination produces a Data Dictionary that is both correct and usable.

Confluence & Wiki Structuring

Confluence is the central knowledge management system in many organisations for technical and business documentation. Like any wiki, it can quickly become an unstructured repository: outdated pages, inconsistent naming, missing links, unclear ownership. A useful Confluence structure requires a well-thought-out concept from the start for spaces, page types, naming conventions and review processes.

Space concept and page hierarchy

I structure Confluence spaces by purpose: a technical space for operations documentation and runbooks, a project space for requirements and test documentation, and a data space for Data Dictionary and lineage information. Within each space the page hierarchy follows a consistent template: overview page (scope, links to sub-sections), system-specific pages, individual runbooks or dictionary entries.

Naming conventions and metadata labels

Consistent page titles and labels are the key to a searchable Confluence structure. Page titles follow a scheme: '[System] -- [Topic] -- [Type]', for example 'DWH-Prod -- CustomerDimension -- LoadProcess'. Labels enable cross-space search: 'runbook', 'data-dictionary', 'incident-response', 'sla-relevant'. Confluence macros such as 'Page Tree' and 'Content by Label' generate automatic overview pages without manual maintenance.

Migration from Word documents and SharePoint to Confluence

Migrating existing documentation from Word files or SharePoint to Confluence is a common scenario. The technical migration (import via API or Confluence import tool) is often simpler than the content review: sorting out outdated content, identifying missing information, creating links. I support such migrations both technically and in terms of content, and ensure the result is a navigable, current knowledge base -- not a digital archive of old documents.

Confluence space concept: technical docs, project docs, Data Dictionary as separate spaces
Page templates: consistent frontmatter, review date, responsible owner
Naming conventions: [System] -- [Topic] -- [Type] as a consistent scheme
Labels and macros: cross-space search, automatic overview pages
Migration: Word/SharePoint to Confluence with content review
Archiving process: mark outdated pages rather than deleting

A Confluence wiki is only as good as its maintenance discipline. Review cycles, clear owners and an archiving process for outdated content are just as important as the initial structure.

Data Lineage & Metadata Documentation

Data lineage describes the path of data through all transformation steps: from the source system through staging and the DWH to the BI front end. This information is equally indispensable for audits, troubleshooting, and compliance. When a KPI in a report is wrong, it must be traceable which source it came from, what transformation rules were applied, and where in the chain an error may have occurred.

Complete data lineage is rarely captured in a single tool. In SSIS-based environments the SSIS catalogue (SSISDB) provides information about package executions and data flows. In Azure Data Factory, activity logs and monitoring are available. I combine these sources with a Markdown-based lineage documentation that is human-readable and versioned in Git.

SSIS catalogue as a metadata source

The SSIS catalogue (SSISDB) is an underestimated metadata source. It contains not only execution logs but also structured information about package parameters, connections and package configuration. T-SQL queries on SSISDB.catalog can produce a complete inventory of all SSIS packages, their connections and parameters -- the foundation for automated lineage documentation.

Metadata from ADF and Azure Synapse

In Azure environments, Azure Data Factory and Synapse Analytics provide similar metadata via their REST APIs. Pipelines, datasets and linked services can be queried via Azure PowerShell or Python (azure-mgmt-datafactory) and translated into lineage documentation. This automatically generated base is supplemented with manually created flow diagrams that show business relationships not directly readable from metadata alone.

SSIS catalogue: packages, connections, parameters, execution logs as metadata source
ADF / Synapse: pipeline metadata via REST API or azure-mgmt-datafactory
Markdown lineage: tabular representation of source-target relationships
Mermaid diagrams: automatically generated flow diagrams from metadata
Version history: schema changes and transformation rules traceable over time
Audit support: lineage documentation as evidence for compliance requirements

A central lineage document that tabularly describes for each KPI / data field the source, the transformation rule, and the target is often sufficient for audit purposes. It does not need to be fully automated -- more important is that it is complete and current. A hybrid approach (automated base plus manual additions) is usually the best practical compromise.

Data lineage is not a nice-to-have, but a core element of data quality assurance. Without lineage documentation the origin of every KPI is a black box -- a situation that becomes expensive during audits, corrections and extensions.

Requirements & Test Documentation

Requirements documentation is especially important in DWH and data projects because business and technical requirements are tightly intertwined. A poorly documented business requirement leads to technical implementations that produce surprises at acceptance testing. I create requirements documentation that is comprehensible to both clients and developers -- with clear acceptance criteria that allow unambiguous testing of whether a requirement has been met.

At the public sector / research organisation I created and maintained requirements and test documentation in a strictly regulated DWH environment. The particular demands of the public sector for traceability, versioning and formal acceptance sharpened my understanding of quality requirements for documentation in regulated environments.

Requirements structure for DWH projects

A requirement in the DWH context should contain: unique ID, business context (source, target, purpose), technical description (transformation rule, data types, exceptions), acceptance criteria (testable, measurable), dependencies (other requirements, source systems), priority and status. This structure allows requirements to be linked to test cases and systematically worked through at acceptance testing.

Test documentation and acceptance records

Test documentation covers more than bare test results. It documents the test framework (which environment, which data, who tested), the test cases (what is being tested, what result is expected), the actual results and open items. Acceptance records formalise the client's decision and create a reliable basis for follow-on work.

Requirement ID and versioning: every change traceable
Acceptance criteria: measurable, testable, unambiguously formulated
Test cases: positive tests, negative tests, boundary value tests
Traceability matrix: requirement to test case to acceptance result
Acceptance record: date, tester, result, reservation, sign-off
Open items list: remaining defects with due date and responsible party

Requirements documentation protects both parties: the client against incomplete deliveries, and the service provider against scope creep. Clear acceptance criteria are the most effective measure against acceptance disputes.

Knowledge Transfer & Training Material

At the end of a project the internal team should be able to continue independently. This is my guiding principle, and it applies not only to operations documentation but also to active knowledge transfer: training sessions, pair-programming, annotated code reviews, and written tutorials that explain why something was implemented that way -- not just how.

Documentation as training material

Well-structured operations documentation and runbooks are simultaneously training material. A new colleague who finds a complete runbook for the monthly DWH load process can execute that process independently after one supervised run-through. That is more efficient than any formal training. I write documentation with this purpose explicitly in mind: understandable to someone who does not yet know the system.

Glossary and naming conventions

A glossary defining business terms, technical abbreviations and organisation-specific terminology is often the underestimated foundation of all other documentation. If 'customer' means something different in the source system than in the DWH, and nobody has explicitly documented that, misunderstandings arise at every level. Naming conventions for tables, columns, SSIS packages and variables -- written down and consistently applied -- significantly reduce the cognitive load when working with an unfamiliar system.

Multilingual documentation

In international projects, multilingual documentation is sometimes required: German operations documentation for the local IT team, English API documentation for external partners, Portuguese user documentation for Brazilian sites. I create documentation in German, English and Portuguese and understand the content challenges of multilingual documentation: terminology consistency, keeping versions in sync after changes, and the differences in writing style between technical texts in the three languages.

Pair programming and code reviews as informal knowledge transfer
Annotated tutorials: explain the Why, not just the How
Glossary: business and technical terms, abbreviations, organisation-specific terminology
Naming conventions: written, binding, with examples
Multilingual documentation: DE / EN / PT, terminologically consistent
Onboarding packages: documentation tree + runbooks + glossary for new team members

A team that works confidently and independently after the project ends is the best evidence of successful knowledge transfer. Documentation is the medium -- but it only takes effect when filled with real knowledge that is actively passed on.

Working approach

A documentation project starts with a stock-taking: what is already documented? What is missing? In what format does existing documentation exist? Who are the users of the documentation, and what questions must they be able to answer with it? This analysis typically takes one to two days and is the basis for a realistic implementation plan.

Stock-taking: existing docs, gaps, formats, user groups
Concept: documentation structure, tooling, versioning strategy
Automation: metadata extraction, CI/CD pipeline for doc build
Creation: operations docs, runbooks, Data Dictionary, requirements docs
Review: content review by subject-matter experts and technical review
Publication: deployment to Confluence, GitHub Pages or internal portal
Handover: maintenance processes, ownership, review cycles defined

I work remotely, in hybrid mode or on-site. For creating operations documentation and Data Dictionaries, remote work is well suited; for requirements workshops and knowledge transfer to internal teams, personal presence is often more effective. I adapt the working mode to the project.

An important aspect of my approach is involving the internal team from the start. Documentation that comes from outside and is not owned by the internal team will not be maintained. I actively involve subject-matter experts and developers -- through reviews, through joint development of glossary and naming conventions, and through handover of documentation tools and processes that the team can continue after the engagement.

Documentation is a team sport. The best results arise when subject-matter experts contribute their context, developers supply technical details, and an experienced documentation consultant provides the structure and ensures consistency. My role is to drive and facilitate this process.

Typical services in Technical Documentation

The service range in technical documentation and knowledge management spans from one-off documentation creation to building permanent structures and processes. Depending on the project phase and needs I take on individual areas or the complete spectrum.

Operations documentation: server inventories, configuration manuals, contact directories
Runbooks: step-by-step for load processes, monthly jobs, error handling, emergency procedures
Data Dictionary: extraction from DB metadata, business enrichment, HTML/Markdown/Confluence
Data Lineage: source-target documentation, transformation rules, lineage diagrams
Docs-as-Code: Markdown/AsciiDoc structures, mkdocs/Sphinx setup, CI/CD pipeline for docs
Automation: Python/PowerShell scripts for documentation generation from metadata
Confluence structuring: space concept, templates, naming conventions, migration
Requirements documentation: requirement IDs, acceptance criteria, traceability
Test documentation: test cases, acceptance records, open-items lists
Interface documentation: API descriptions, data format specifications
Glossary and naming conventions: organisation-wide terminology standards
Knowledge transfer: training, annotated code reviews, onboarding packages
Multilingual documentation: DE / EN / PT for international projects

I prefer to work on projects where documentation is understood as a quality attribute -- not as an unwelcome obligation. These are typically regulated environments, larger DWH new-build projects, migration projects where knowledge needs to be transferred, and infrastructure projects intended for long-term operation. The combination of technical depth and documentation experience allows me to produce documentation that is genuinely accurate and useful.

Whether as a dedicated documentation freelancer, as a technical author alongside development work, or as an advisor for building documentation processes: I adapt the collaboration to the project's needs. Remote work suits the bulk of the work; workshops and reviews occasionally benefit from personal presence.

Selected anonymised reference projects

Insurance / Reinsurance

DB Documentation System · Web/XML-based · Metadata maintenance · Extension

Extension of a web-based, XML-backed database documentation system. The system stored metadata about tables, views and columns in structured form and generated an HTML-based documentation view. Work included extending the metadata structure, integrating new schemas and improving the user interface for developers and business users. Result: complete, searchable Data Dictionary for the entire database landscape.

Public Sector / Research Organisation

Requirements documentation · Test documentation · DWH · regulated environment

Creation and maintenance of requirements and test documentation in a regulated DWH environment with strict requirements for traceability and formal acceptance. Documentation covered business requirement descriptions, acceptance criteria, test cases, acceptance records and a traceability matrix. The documentation was used in internal audits and data protection reviews.

Own infrastructure / Self-operated

Central operations documentation · automated config backup · Markdown/Git

Construction of central operations documentation for a Proxmox-based infrastructure with Debian, NGINX, WireGuard, Docker and additional services. Integration of automated configuration backups that regularly save configuration files to Git. The operations documentation is written in Markdown, versioned in Git and accessible as static HTML -- always current and without manual maintenance effort.

Industrial company / Mechanical engineering

SSIS documentation · deployment process docs · Azure DevOps · YAML

Documentation of the SSIS migration approach from SQL Server 2016 to 2022 and creation of process documentation for the newly introduced Azure DevOps build and deployment process. Documentation included technical specifications, release-management runbooks and an onboarding package for the internal team.

Frequently asked questions about technical documentation

What documentation formats do you prefer?

It depends on the intended use. For operations docs and runbooks I prefer Markdown in Git -- versionable, easy to maintain, publishable to Confluence or as HTML. For Data Dictionaries I combine auto-generated Markdown files with manual additions. For requirements documentation I use Confluence or structured Markdown templates. Word documents only when the client requires a formal acceptance document.

How do you deal with outdated documentation?

Outdated documentation is often more harmful than none. My approach: stock-taking with age stamp (when was this page last reviewed?), prioritisation by criticality (business-critical runbooks first), archiving outdated content with a marker rather than deletion, and setting up a review cycle (e.g. quarterly) with named owners.

Can you automatically generate documentation from database metadata?

Yes. With Python (pyodbc + pandas) or PowerShell (dbatools) I extract metadata from INFORMATION_SCHEMA, sys.tables, sys.columns and Extended Properties. The result is Markdown files with business descriptions as placeholders. This generation can run in a CI pipeline or as a SQL Agent job and keeps the technical base of the Data Dictionary automatically current.

What is Docs-as-Code and why is it useful?

Docs-as-Code means treating documentation like source code: written in Markdown or AsciiDoc, versioned in Git, reviewed via pull request, and automatically built and deployed via CI/CD. This is useful because it makes documentation traceable, reviewable and reproducible -- exactly the qualities that have long been standard in software development.

How do you support building a Data Dictionary?

In three steps: first, stock-taking of existing metadata (Extended Properties, existing documentation, team knowledge). Second, building the extraction -- SQL scripts or Python scripts that pull metadata from the database and translate it into Markdown or Confluence. Third, enrichment with business context through workshops with the business side. The result is a Data Dictionary that is both technically correct and business-comprehensible.

How do you version documentation?

Via Git. Every change to a runbook, a configuration description or a Data Dictionary entry is traceable as a commit -- who changed what, when, and why (commit message). Branches allow simultaneous work on different versions. Tags mark released documentation states for formal acceptances.

Can you create documentation in German and Portuguese as well?

Yes. German, English and Portuguese -- all three fluently, including technical texts with domain terminology. I understand the differences in writing style between technical texts in these three languages and ensure terminology consistency across languages.

How long does a typical documentation project take?

It depends strongly on the scope. Complete operations documentation for a medium DWH environment (20-30 systems) typically requires 4-8 weeks. A Data Dictionary for 100-200 tables with automation can be built in 2-4 weeks. Setting up a Docs-as-Code workflow with CI/CD is a 1-2-week project. I give concrete estimates after an initial stock-taking.

Technical Documentation & Knowledge Management -- structured, versioned, automated

Positioning

Importance of Technical Documentation in DWH and Infrastructure

Knowledge silos and their consequences

Documentation as a measurable investment

Operations Documentation & Runbooks

Structure of operations documentation

Runbooks: actionable and tested

Data Dictionary & Data Model Documentation

Structure of a complete Data Dictionary

Foreign keys and relationship diagrams

Docs-as-Code: Markdown, Git and CI/CD

Markdown as a universal documentation format

CI/CD for documentation

Automated Documentation Generation from DB Metadata

Python-based Markdown generation from DB metadata

PowerShell alternative for Windows environments

Confluence & Wiki Structuring

Space concept and page hierarchy

Naming conventions and metadata labels

Migration from Word documents and SharePoint to Confluence

Data Lineage & Metadata Documentation

SSIS catalogue as a metadata source

Metadata from ADF and Azure Synapse

Requirements & Test Documentation

Requirements structure for DWH projects

Test documentation and acceptance records

Knowledge Transfer & Training Material

Documentation as training material

Glossary and naming conventions

Multilingual documentation

Working approach

Typical services in Technical Documentation

Selected anonymised reference projects

Insurance / Reinsurance

Public Sector / Research Organisation

Own infrastructure / Self-operated

Industrial company / Mechanical engineering

Frequently asked questions about technical documentation

What documentation formats do you prefer?

How do you deal with outdated documentation?

Can you automatically generate documentation from database metadata?

What is Docs-as-Code and why is it useful?

How do you support building a Data Dictionary?

How do you version documentation?

Can you create documentation in German and Portuguese as well?

How long does a typical documentation project take?

Related services

Contact

Need support with ETL, Data Vault, BI architecture, SQL Server or Azure?