All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
node:20-alpine to node:22-alpine. Both the builder and final runtime stages now run apt-get upgrade at build time to patch known OS CVEs (libssl3, zlib1g, ncurses, libc). The final image user (greenkube, UID/GID 10001) is created with an explicit groupadd/useradd and /sbin/nologin shell.runAsNonRoot: true, runAsUser/Group: 10001, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities.drop: [ALL], and seccompProfile.type: RuntimeDefault. /tmp directories served by emptyDir volumes (64 MiB each) to satisfy Python’s runtime tmp needs under a read-only root.runAsUser/Group: 70 (upstream requirement), readOnlyRootFilesystem: true, capabilities.drop: [ALL], seccompProfile.type: RuntimeDefault. /var/run/postgresql and /tmp mounted as emptyDir volumes. PostgreSQL upgraded from 17-alpine to 18-alpine for longer upstream lifecycle.POSTGRES_INITDB_ARGS set to --auth-host=scram-sha-256 --auth-local=scram-sha-256 — replaces the default md5 password hashing with the stronger SCRAM-SHA-256 protocol. Liveness and readiness probes added via pg_isready.secrets from the ClusterRole resource list, eliminating the critical RBAC over-permission (KSV-0041) that allowed the service account to read cluster-wide secrets.SecurityHeadersMiddleware (Starlette BaseHTTPMiddleware) added to the FastAPI app, injecting seven OWASP-recommended headers on every response: X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, Referrer-Policy, Permissions-Policy, Cache-Control, and a strict Content-Security-Policy. CORS is now restricted to GET, POST, OPTIONS methods and Authorization/Content-Type headers (previously wildcard)..github/workflows/security.yml workflow running on every push/PR to main/dev and weekly (Monday 06:00 UTC) — five jobs: Trivy image scan for the GreenKube image (exit 1 on CRITICAL/HIGH), Trivy image scan for PostgreSQL (informational), Trivy IaC config scan for Dockerfile + Helm chart, Trivy filesystem scan for Python dependencies, and npm audit for the frontend. SARIF results uploaded to GitHub Security..trivyignore: Documents eight upstream-unfixable CVEs (gosu/Go-stdlib CVEs in the Alpine postgres image, one OpenSSL CMS CVE, and one zlib utility CVE) with justifications and a quarterly review date.secrets.existingSecret: New secrets.existingSecret value allows passing the name of a pre-created Kubernetes Secret instead of letting the chart manage one. When set, the chart skips Secret creation entirely and all secrets.* inline values are ignored — recommended for production to avoid storing credentials in values.yaml.SQLiteNodeRepository now implements a Slowly Changing Dimensions Type 2 pattern to deduplicate node records across collection cycles. A separate node_snapshots_scd table stores only rows where tracked columns (instance_type, vcpu, memory_gb, region, provider, zone) actually changed, avoiding write amplification on stable clusters. Migration 0003 creates this table and the associated indexes.scope column: recommendation_history table now includes a scope TEXT column (values: pod, namespace, node) to allow filtering recommendations by granularity. pod_name and namespace columns are nullable for node-scope and cluster-scope recommendations. Applied in migration 0003 for both PostgreSQL and SQLite.DB_POOL_MIN_SIZE (default: 2) and DB_POOL_MAX_SIZE (default: 10) environment variables control asyncpg’s connection pool bounds. Exposed as db.poolMinSize / db.poolMaxSize in helm-chart/values.yaml and propagated via configmap.yaml.DB_STATEMENT_TIMEOUT_MS environment variable (default: 30000 ms) sets a per-statement timeout on the PostgreSQL connection pool via server_settings. Exposed as db.statementTimeoutMs in helm-chart/values.yaml.combined_metrics(namespace, timestamp), namespace_cache(last_seen), and carbon_intensity_history(datetime) to accelerate the most frequent query patterns.helm-chart/Chart.yaml enriched with full Artifact Hub annotations — artifacthub.io/category, artifacthub.io/screenshots (6 screenshots), artifacthub.io/links, artifacthub.io/recommendations, artifacthub.io/changes, artifacthub.io/images (linux/amd64 + linux/arm64), artifacthub.io/maintainers, and artifacthub.io/readme (fixes “no README” on the listing page). Chart now includes keywords, home, sources, and maintainers fields for richer search indexing.artifacthub-repo.yml: Artifact Hub repository metadata file with repositoryID for Verified Publisher badge. Automatically copied to gh-pages by the release workflow alongside index.yaml.llms.txt (greenkube-website/public/): LLM/AI crawler guidance file following the llms.txt convention — enables AI assistants (Claude, ChatGPT, Perplexity) to understand GreenKube when crawling the website.assets/demo-report.png and assets/demo-settings.png added to README, Chart.yaml Artifact Hub screenshots, and llms.txt.scripts/pg_upgrade_17_to_18.sh: New maintenance script to upgrade an existing PostgreSQL 17 data directory to version 18 in-place using a Kubernetes Job and pg_upgrade --link, preserving all data with an automatic backup.aggregate_summary and aggregate_timeseries now correctly query both the raw combined_metrics table and the pre-aggregated hourly_metrics table, ensuring historical reports cover the full retention window without gaps at the boundary between live and archived data.METRICS_AGGREGATED_RETENTION_DAYS now defaults to -1 (infinite retention), preserving all historical data by default. This is the correct default for CSRD/ESRS E1 compliance, which requires multi-year reporting. Set an explicit positive integer to enforce a rolling window.init-pgrun-perms: Added readOnlyRootFilesystem: true to the PostgreSQL init container’s securityContext, resolving the HIGH misconfiguration finding.svelte, vite, rollup, picomatch, devalue, and @sveltejs/kit to their latest compatible versions, resolving all HIGH-severity advisories.table-format step (exit-code 1, visible in log) and a separate sarif step (exit-code 0, uploaded to GitHub Security tab). Added pull: true to the Docker build step so the base image layers are always pulled fresh from the registry, preventing stale GHA cache from hiding unfixed CVEs.artifacthub-repo.yml: Owner name and email corrected to match the actual GitHub account (Hugo Lelievre / hugo@greenkube.cloud).src/greenkube/storage/ package is split into three sub-packages — storage/postgres/, storage/sqlite/, and storage/elastic/ — each with its own __init__.py. All cross-package imports updated. Test suite reorganized to mirror the new structure with dedicated tests/core/, tests/grafana/, and tests/helm/ directories.pyproject.toml: Added 20 SEO keywords, 5 new PyPI classifiers, and 4 additional project URLs (Documentation, Changelog, Docker Hub, Repository).release.yml: Release workflow now copies artifacthub-repo.yml to gh-pages on every release so Artifact Hub always picks up the latest metadata.scripts/sync_version.py: update_helm_chart_yaml() now also keeps the artifacthub.io/images annotation in sync with the new version on each release.NodeCollector: _detect_cloud_provider now recognises Scaleway nodes via k8s.scaleway.com/* labels (primary signal set by the Scaleway Cloud Controller Manager on every Kapsule node) and falls back to node.spec.provider_id starting with scaleway:// for clusters where those labels may be absent. _extract_node_pool returns k8s.scaleway.com/nodepool-name (with nodepool-id as a fallback). Scaleway region mappings (fr-par, nl-ams, pl-waw → Electricity Maps zones) and PUE profile (1.37) were already present in the data layer and are now fully wired up.HealthCheckService (src/greenkube/core/health.py) that performs periodic connectivity checks against all data sources — Prometheus, OpenCost, Electricity Maps, Boavizta, and Kubernetes. Each probe reports status (healthy, degraded, unreachable, unconfigured), latency, resolved URL, and whether the service was auto-discovered or manually configured.GET /api/v1/health/services endpoint: Returns aggregated health status for all data sources with per-service details. Supports ?force=true to bypass the 30-second cache and trigger fresh probes.GET /api/v1/health/services/{service_name} endpoint: Returns health status for a single named service.POST /api/v1/config/services endpoint: Allows updating service URLs (Prometheus, OpenCost, Boavizta) and the Electricity Maps token at runtime from the frontend. Changes are session-scoped and do not persist across pod restarts.ServiceHealth, HealthCheckResponse, and ServiceConfigUpdate Pydantic models in src/greenkube/models/health.py.HealthBadge component: Reusable Svelte component (frontend/src/lib/components/HealthBadge.svelte) for color-coded service health indicators.HealthPopup component: Modal component (frontend/src/lib/components/HealthPopup.svelte) for first-connection service configuration.--no-color flag (and NO_COLOR env var support) to disable Rich formatting for clean pipeline logs. New --fail-on-recommendations flag on greenkube recommend to exit with code 1 when recommendations are found. New --fail-on-co2-threshold and --fail-on-cost-threshold flags on greenkube report to enforce carbon/cost policy gates in CI/CD pipelines.frontend/tests/) with 133 tests across 8 files covering all JS utility modules (formatters, API client, Svelte stores, ECharts option builders) and Svelte components (StatCard, Card, DataState, HealthBadge) using @testing-library/svelte. Added npm test, npm run test:watch, and npm run test:coverage scripts./report route in the SvelteKit SPA — a full-featured report builder that lets users configure time range (1 h → 1 y), namespace filter, aggregation (hourly/daily/weekly/monthly/yearly) and export format (CSV or JSON), preview totals before downloading, then trigger a direct browser download — no CLI or kubectl exec required.GET /api/v1/report/summary endpoint: Returns a preview of the report (row count, unique pods/namespaces, CO₂e, embodied CO₂e, energy, cost) for the current filter/aggregation parameters.GET /api/v1/report/export endpoint: Streams a downloadable file (CSV or JSON) with correct Content-Disposition headers. Supports the same namespace, last, aggregate, and granularity parameters as the CLI greenkube report command.ReportSummaryResponse schema: New Pydantic response model in api/schemas.py.Config.get_pue_for_provider() now falls back to the raw DEFAULT_PUE environment variable (default 1.3) when a node’s cloud provider is absent or not in DATACENTER_PUE_PROFILES, instead of incorrectly re-resolving through self.DEFAULT_PUE (which returns the configured CLOUD_PROVIDER’s profile — e.g. AWS=1.15 — even for unrelated unknown nodes). The estimation_reasons message now correctly reports 1.3 for unknown providers.CLOUD_PROVIDER default changed from aws to unknown: The env var and helm-chart/values.yaml previously defaulted to "aws", silently applying AWS’s PUE profile (1.15) on clusters where no cloud provider was configured. The default is now "unknown", which correctly triggers the DEFAULT_PUE fallback (1.3) and produces an explicit warning log instead of a silent wrong value.health.status === 'healthy' while the API returns "ok". Fixed to health.status === 'ok'.SustainabilityScorer class (src/greenkube/core/sustainability_score.py) computes a composite 0–100 score (100 = perfect cluster) across seven weighted dimensions:
grid_intensity × PUE; penalises both dirty grids and inefficient datacentres equallyeffective_intensity = grid_intensity × PUE so that a high-PUE datacenter (e.g. OVH=1.37) is penalised relative to a hyperscaler-efficient one (e.g. GCP=1.09) even on the same electrical grid. Invalid/missing PUE safely defaults to 1.0.SustainabilityResult Pydantic model: Carries overall_score and a dimension_scores dict for structured downstream consumption.greenkube_sustainability_score{cluster} — composite 0–100 scoregreenkube_sustainability_dimension_score{cluster, dimension} — per-dimension breakdowncluster, namespace, pod, node, and region labels, matching kube-state-metrics conventions and enabling seamless Grafana variable-based filtering.cluster and region drop-down template variables added to the pre-built Grafana dashboard for multi-cluster/multi-region environments.docs/sustainability-score.md — full description of the 7-dimension scoring model, formulas, reference thresholds, and PUE impact table.carbon_intensity dimension → carbon_efficiency: The scoring dimension was renamed and its formula extended to include PUE (effective_intensity = grid_intensity × PUE). The raw Prometheus gauges greenkube_carbon_intensity_score and greenkube_carbon_intensity_zone are kept unchanged for backward compatibility.CLUSTER_NAME now propagated to the metrics endpoint so the cluster label is always populated.ci-cd.yml workflow with three focused workflows: ci.yml (lint & test on all PRs/pushes), dev-build.yml (dev Docker images on dev branch), release.yml (production builds triggered by semver git tags)dev-<sha> and dev-latest; release images use the semver version and latestvX.Y.Z tag is pushed — no more mutable version tagspre-install-check CRD validation job now uses a dedicated ServiceAccount created via a pre-install hook, fixing the race condition where the job started before the main ServiceAccount existedpost-install-hook ready-check job now uses a dedicated ServiceAccount with its own hook lifecycle, preventing “serviceaccount not found” errors during fresh installs and upgradestopology.kubernetes.io/zone=nova (OpenStack default AZ name) is now ignored and the lookup falls through to the region label (GRA11, RBX8, …); numeric suffixes are stripped (GRA11 → GRA) before CSV lookup — all OVH data-centres now resolve to the correct Electricity Maps zonenode.k8s.ovh/type (current OVHcloud MKS generation) are now correctly identified as provider ovh; the previous check only matched the legacy k8s.ovh.net/ prefixcloud_region_electricity_maps_mapping.csv with uppercase trigrams (GRA, RBX, SBG, WAW, BHS, LIM, ERI, VIN, HIL, YYZ, SGP, SYD, YNM) and all new-API long-form region IDs (eu-west-par, eu-west-gra, eu-central-waw, ca-east-bhs, us-east-vin, ap-southeast-sgp, ap-southeast-syd, ap-south-mum, …)ServiceMonitor and NetworkPolicy are now disabled by default — fresh installs no longer fail on clusters without the Prometheus Operator (monitoring.coreos.com/v1 CRD)pre-install-check hook that validates the Prometheus Operator CRD is present before creating a ServiceMonitor, with a clear actionable error messagesum() to prevent duplicate series when multiple targets report the same metricpod_name=None) when saving to history — prevents integrity errors and irrelevant entriesgreenkube start hanging in Docker containers due to buffered stdout; invisible INFO logs now correctly flushed to the consolemonitoring section comments to distinguish GreenKube→Prometheus (automatic) from Prometheus→GreenKube (optional, for Grafana)/api/v1/metrics/summary and /api/v1/metrics/timeseries: aggregation now happens directly in the database (SQLite and PostgreSQL) instead of loading all rows into Python — typically 10–20× faster for large datasets and demo modedashboards/greenkube-grafana.json with KPIs, time-series, per-namespace breakdown, node utilization, grid intensity, and recommendations panels/prometheus/metrics endpoint: Comprehensive metric exposition (CO₂e, cost, energy, CPU, memory, network, disk, restarts, nodes, grid intensity, recommendations) with correct label relabelinggreenkube demo command generates 7 days of realistic sample data (22 pods, 5 namespaces) in a standalone SQLite instance — explore the dashboard without a live clusterCarbonIntensityRepository split: Dedicated repository implementations per backend (Postgres, SQLite, Elasticsearch) following the same pattern as other repositoriesCollectionOrchestrator, MetricAssembler, NodeZoneMapper, PrometheusResourceMapper, CostNormalizer, HistoricalRangeProcessor, EmbodiedEmissionsServiceCONTRIBUTING.md)docs/architecture.mdGREENKUBE_API_KEY), configurable CORS origins, rate limiting via slowapiGET /api/v1/metrics now supports offset and limit query parametersHEALTHCHECK instruction for standalone usagehelm test connectivity validation via test-connection.yamlpreStop lifecycle hook on the API containerparse_duration() utility used by both CLI and APIConfig.reload() for clean test isolation%-formatting throughout the codebaseRecommendation model uses typed scope field instead of sentinel pod_name="*"recommend command now uses the unified recommendation engine (all 9 types) instead of legacy 2-type APIrecommend reads from database by default (consistent with API); added --live flag for real-time moderead_combined_metrics_from_database() called with correct parameter names (start_time/end_time)run_range() now divides range total by number of time stepsUSER_AGENT header dynamically reflects the actual package versionDEFAULT_COST class attribute from ConfigrecommendSystemNamespaces moved inside recommendations scope in values.yamlcollect_detailed_info() now delegates to collect() to avoid inconsistent results.tgz artifacts from git tracking/api/v1/metrics/summary and /api/v1/metrics/timeseries: Aggregation is now performed directly in the database (SQLite and PostgreSQL) instead of loading all rows into Python objects — typically 10–20× faster for large datasets and demo mode/metrics)