GEKS Index — Composite Measure of Biological Complexity

Kriger, B., & Espesset, D. (2026). The GEKS Index: A Composite Measure of Biological Complexity Across Informational, Structural, Functional, and Evolutionary Dimensions. Zenodo. https://doi.org/10.5281/zenodo.18775491

The Problem

Biological complexity is widely discussed but rarely measured. Is a crow more complex than a trout? Is an octopus more complex than a cricket? Intuitively, the answers seem obvious, but science requires more than intuition.

Existing approaches each capture only one facet of complexity. Shannon entropy measures informational diversity but cannot distinguish meaningful structure from random noise. Gell-Mann's effective complexity captures meaningful structure but is notoriously hard to compute for real organisms. Phylogenetic trait matrices can count evolutionary innovations, but the result depends entirely on which traits the researcher chooses to include. None of these alone provides a satisfying measure of biological complexity.

The GEKS Index addresses this by combining four complementary measures into a single weighted composite score. Each component captures a different dimension of what it means to be "complex," and together they provide a richer, more robust picture than any single metric alone.

The Formula

GEKS = α · Snorm + β · Gnorm + γ · Enorm + δ · Knorm

where α + β + γ + δ = 1, and all components are normalized to [0, 1]

The index is a weighted sum of four normalized components. Each measures a distinct aspect of biological complexity. The weighting coefficients α, β, γ, δ are adjustable depending on the research context and available data. When all four are equal (0.25 each), no dimension is privileged over any other.

The Four Components

S — Shannon

S_norm — Informational Diversity

S = − Σ p_i · log₂(p_i)

What it measures: How diverse and evenly distributed are the building blocks of the organism? Shannon entropy, originally developed in information theory by Claude Shannon (1948), quantifies the amount of "surprise" or "information" in a system. Applied to biology, it can measure the diversity of nucleotides, codons, protein domains, cell types, tissue types, or any discrete biological unit.

How it works: If all element types are equally frequent, entropy is maximal. If one type dominates, entropy is low. A genome with rich codon diversity scores higher than one dominated by a few repeated sequences.

p_i — frequency of the i-th element type in the system
Normalization: S_norm = S / S_max, where S_max = log₂(n) and n is the number of element types
S_norm → 1: maximum diversity (uniform distribution)
S_norm → 0: minimum diversity (one type dominates)

Limitation: High Shannon entropy does not always mean high complexity. Random noise has maximum entropy but zero meaningful structure. This is precisely why the Gell-Mann component is essential.

G — Gell-Mann

G_norm — Effective Complexity

G = L(R) — length of the compressed description of the system's regularities

What it measures: How much meaningful, non-random structure does the organism contain? Effective complexity, as defined by Murray Gell-Mann and Seth Lloyd, separates the structured (regular) information in a system from its random (incompressible) component. It measures the length of the shortest possible description of the system's patterns and regularities, ignoring random variation.

How it works: A crystal has low effective complexity because its repeating pattern can be described very briefly. Random gas also has low effective complexity because it contains no patterns to describe at all. Biological systems sit at the peak: they contain vast amounts of non-trivial, hierarchical structure. A genome full of regulatory networks, feedback loops, and modular gene families has very high effective complexity.

L(R) — length of the description of the set of regularities R
Operational proxy: difference between uncompressed and compressed data size (e.g., genome before and after algorithmic compression)
Normalization: relative to the maximum value in the study sample

Intuition: Shannon tells you how much total information is in the system. Gell-Mann tells you how much of that information is organized. Together, they distinguish a complex organism (high on both) from random noise (high Shannon, low Gell-Mann) and from a simple repetitive structure (low on both).

E — Espesset

E_norm — Morpho-Evolutionary Complexity

E = (1/K) · Σ_k=1..K ( Σ_j c_ij / N_k )

What it measures: How many evolutionary innovations has an organism accumulated over geological time? This component, proposed by David Espesset, repurposes phylogenetic character matrices — tables normally used to build evolutionary trees — as a tool for quantifying morphological and genetic novelty.

How it works: In a phylogenetic character matrix, each row represents a species and each column a trait. A "1" indicates the derived (evolutionarily novel) state; a "0" indicates the ancestral state. For each species, we sum the derived states and normalize by the number of traits in that matrix. This is then averaged across many different matrices — morphological, molecular, anatomical — to produce a robust mean score.

K — total number of phylogenetic matrices used
c_ij — value of the j-th character for species i (0 or 1) in a given matrix
N_k — number of characters in the k-th matrix (normalizing divisor)
Normalization: E_norm = E / E_max across the study sample

Why average across many matrices? Any single matrix reflects the researcher's choice of traits and can be biased. Averaging across hundreds of independently constructed matrices — morphological, molecular, developmental — dilutes individual biases and converges toward a more objective measure of accumulated evolutionary innovation.

K — Kriger

K_norm — Functional Complexity (System–Environment Interface)

K = L / L_max , where L ∈ {1, 2, 3, 4, 5}

What it measures: How sophisticated is the organism's relationship with its environment? The other three components measure what an organism is made of — its informational content, structural organization, and evolutionary novelty. The Kriger index measures what an organism does with its complexity: how it maintains its boundary with the environment, how it regulates its internal state, and how actively it reshapes the world around it.

How it works: Organisms are assigned to one of ten functional levels based on the sophistication of their system–environment interface. Each level adds a qualitatively new type of boundary between the organism and its world. Every level includes all preceding levels.

L	Name	What is new at the boundary	Example	K_norm
1	Membrane homeostasis	Lipid membrane as selective barrier. Passive and active transport maintain internal conditions.	Mycoplasma	0.10
2	Active motility	Organism selects its environment by directed movement (chemotaxis). Boundary is no longer static.	E. coli	0.20
3	Cell differentiation	Different cell types perform different functions at the boundary. Division of labour emerges.	Volvox, Anabaena	0.30
4	Tissue organisation	Differentiated tissues form multi-layered boundaries. Coordinated intercellular signalling.	Sponges, cnidarians	0.40
5	Organ systems	Specialised organs create nested homeostatic feedback loops. Multiple simultaneous regulatory channels.	Fish, insects	0.50
6	Sensory integration	Centralised nervous system integrates signals from multiple sensory modalities. Coordinated responses to complex stimuli.	Reptiles, amphibians	0.60
7	Cognitive modelling	Internal models of the environment enable anticipatory behaviour: learning, memory, problem-solving.	Crows, octopuses	0.70
8	Social cognition	The organism models other agents: their intentions, knowledge states, and likely actions (theory of mind).	Chimpanzees, dolphins	0.80
9	Symbolic communication	Internal models encoded in arbitrary symbols and transmitted through syntactic language. Knowledge transmissible across generations.	Early Homo	0.90
10	Cultural reflexivity	The individual is shaped by cumulative culture and consciously transforms it. Not merely transmitting culture — reflecting on, critiquing, and deliberately reconstructing it. Unique to modern humans.	Modern humans	1.00

L — functional level of the organism (1 to 10)
L_max = 10 (cultural reflexivity)
K_norm = L / 10, yielding values in the range [0.10, 1.00]
Fractional values are permitted for intermediate cases (e.g., L = 7.5 for a species with cognitive modelling and emerging social cognition)

The organising principle: Each level adds a new type of boundary at the system–environment interface. Levels 1–2 are physical (membrane, motility). Levels 3–5 are structural (cell types, tissues, organs). Levels 6–7 are informational (sensory integration, predictive models). Levels 8–10 are social and symbolic (other minds, language, reflexive culture). A corvid is anatomically a “standard bird” yet scores Level 7 — the Kriger component captures what organisms do, not just what they have.

Why Four Dimensions?

Each component answers a fundamentally different question about the organism:

Component	Core Question	What It Captures
S (Shannon)	How diverse are its building blocks?	Informational variety
G (Gell-Mann)	How much of that information is organized?	Meaningful structure
E (Espesset)	How many evolutionary innovations has it accumulated?	Morpho-evolutionary novelty
K (Kriger)	What does it do with its complexity?	Functional organization

These four dimensions are partially independent. Two organisms can have the same overall GEKS score but for entirely different reasons. A bat and a crow, for example, may score identically — but the bat achieves this through morphological innovations (powered flight, echolocation), while the crow achieves it through cognitive sophistication (tool manufacture, causal reasoning). The GEKS Index captures this multidimensional profile, turning a single score into a window on qualitatively different kinds of complexity.

Choosing Weights

The weighting coefficients α, β, γ, δ must sum to 1 and are chosen based on the research question and data availability. The following presets are suggested starting points:

Scenario	α (S)	β (G)	γ (E)	δ (K)	Rationale
Balanced (default)	0.25	0.25	0.25	0.25	No dimension privileged; recommended when all data types are available
Genomic focus	0.35	0.35	0.10	0.20	Full genome available; information-theoretic measures most reliable
Paleontological focus	0.10	0.15	0.55	0.20	Fossil organisms; only morphological data available
Information-theoretic	0.30	0.40	0.10	0.20	Emphasis on structure vs. randomness distinction
Ecological / behavioral	0.15	0.15	0.20	0.50	Focus on cognitive ecology, niche construction, behavioral complexity

Researchers must always report their chosen weights. Sensitivity analysis — recalculating GEKS with different weight presets — is recommended to test the robustness of conclusions.

Interactive Calculator

Compute GEKS Index

Load preset organism or system

S_norm (Shannon, 0–1)

G_norm (Gell-Mann, 0–1)

E_norm (Espesset, 0–1)

K_norm (Kriger, 0–1)

K Scale Mapping (Level → K_norm)

L1=0.10 L2=0.20 L3=0.30 L4=0.40 L5=0.50 L6=0.60 L7=0.70 L8=0.80 L9=0.90 L10=1.00

α (Shannon weight)

β (Gell-Mann weight)

γ (Espesset weight)

δ (Kriger weight)

Result

—

Shannon

Gell-Mann

Espesset

Kriger

Illustrative Example

The following table shows hypothetical GEKS values for a range of organisms, using balanced weights (α = β = γ = δ = 0.25). All values are illustrative, not empirical.

Organism	S_norm	G_norm	E_norm	K_norm	GEKS
E. coli	0.45	0.30	0.05	0.20	0.25
Earthworm	0.55	0.42	0.35	0.40	0.43
Cricket	0.62	0.55	0.52	0.60	0.57
Trout	0.68	0.60	0.58	0.60	0.62
Octopus	0.70	0.68	0.62	0.70	0.68
Pigeon	0.75	0.70	0.72	0.60	0.69
Crow	0.76	0.72	0.74	0.80	0.76
Bat	0.80	0.78	0.85	0.60	0.76
Chimpanzee	0.82	0.84	0.86	0.80	0.83
Human	0.82	0.85	0.88	1.00	0.89

Note the crow and bat: identical GEKS scores (0.76) achieved through different paths. The bat excels on Espesset (morphological innovations: powered flight, echolocation). The crow excels on Kriger (functional complexity: tool manufacture, causal reasoning, social cognition). The composite score is the same, but the complexity profiles are qualitatively different.

Properties

Strengths

Modularity. If one component is unavailable (e.g., no genomic data for a fossil), the formula reduces to three, two, or even one component with renormalized weights. The index degrades gracefully rather than becoming unusable.

Multidimensionality. Four components cover informational (S), structural (G), evolutionary (E), and functional (K) aspects of complexity. Organisms with identical total scores may have entirely different complexity profiles — and this is informative, not a flaw.

Scalability. Each component can be computed at different levels of biological organization — from genome to cell to organism to ecosystem — allowing the index to be applied across scales.

Falsifiability. The index generates testable predictions. For example: on average, more phylogenetically derived lineages should exhibit higher GEKS scores. This can be verified against empirical data.

Limitations

Weight selection is subjective. Different weight presets produce different rankings. This is mitigated by transparency (weights must be reported) and sensitivity analysis, but not eliminated entirely.

Normalization is sample-dependent. All components are normalized relative to the study sample, so GEKS values are meaningful only within a given comparison group, not as absolute measures.

Effective complexity is hard to compute. The Gell-Mann component requires approximation (e.g., via compression ratios), and different approximation methods may yield somewhat different results.

The Kriger scale is discrete. Five levels inevitably simplify a continuous spectrum of cognitive and functional abilities. Fractional values help, but the boundaries between levels require careful, evidence-based justification.

This is not scalism. The GEKS Index does not rank organisms as "higher" or "lower," "superior" or "inferior." It quantifies accumulated innovations and functional organization along multiple independent axes. Two organisms with equal scores can be complex in entirely different ways, and a high score does not imply evolutionary "superiority."

Potential Applications

Comparative evolutionary biology. Compare rates of complexity increase across different lineages and geological periods. Which clades have complexified fastest? Has complexity evolution been constant or punctuated?

Paleobiology. Using the Espesset component with high weight, estimate relative complexity of fossil organisms from morphological data alone — the only data available for most of life's history.

Cognitive ecology. Using the Kriger component with high weight, investigate the relationship between functional complexity and ecological niche, social structure, brain size, or lifespan.

Astrobiology. If life is discovered elsewhere, the GEKS framework provides a ready-made scale for characterizing its complexity level — from single-celled homeostasis to (hypothetically) culture-forming cognition.

Punctuated equilibria. If complexity increases are concentrated at specific moments in geological time (as predicted by the theory of punctuated equilibria), this should be detectable as step-changes in GEKS scores across a phylogeny.

THE GEKS INDEX

The Problem

The Formula

The Four Components

Snorm — Informational Diversity

Gnorm — Effective Complexity

Enorm — Morpho-Evolutionary Complexity

Knorm — Functional Complexity (System–Environment Interface)

Why Four Dimensions?

Choosing Weights

Interactive Calculator

Illustrative Example

Properties

Strengths

Limitations

Potential Applications

S_norm — Informational Diversity

G_norm — Effective Complexity

E_norm — Morpho-Evolutionary Complexity

K_norm — Functional Complexity (System–Environment Interface)