Responsible AI Platform

Article 4 AI literacy evidence benchmark: a scorecard for organisations

··7 min read
TL;DR

An AI literacy certificate can be useful, but it is not a complete Article 4 evidence file. Strong evidence shows, per role, which AI systems people use, which risks matter, which training or guidance was completed, how knowledge was assessed and how management follows up. This scorecard gives eight criteria to assess your evidence level.

Article 4 of the EU AI Act has applied since 2 February 2025. Providers and deployers of AI systems must ensure a sufficient level of AI literacy for staff and other persons dealing with AI systems on their behalf. The European Commission explains that this should take into account technical knowledge, experience, education, training, the context in which the AI systems are used and the persons affected by those systems.

The practical question has changed. Not: "Do we need to do something about AI literacy?" The answer is yes. The better question is:

How strong is our evidence that AI literacy has been organised in a role-based and risk-based way?

That is where a benchmark helps. Not as a formal certification, but as a practical scorecard for leadership, compliance, HR, privacy, IT and AI governance.

Note: this scorecard is not a legal certification and does not guarantee compliance. It is a practical assessment framework to expose weak spots in your Article 4 evidence file.

What you need to evidence

Article 4 does not say that everyone must complete the same course. It also does not say that one online certificate is enough. The duty is contextual:

  • What is the person's role?
  • Which AI systems does that person use?
  • Which decisions or outputs are influenced?
  • Which risks matter for customers, citizens, candidates, workers or patients?
  • Which knowledge is needed to use the system responsibly?
  • How can the organisation later show that it took suitable measures?

That makes AI literacy a governance issue. Training is one measure. Evidence only becomes strong when training is connected to roles, systems, risks, assessment and follow-up.

See also the central guide to AI literacy, the practical page on Article 4 AI literacy evidence and the article on how to prove AI literacy to a supervisor.

The Article 4 evidence benchmark

Use the scorecard below with four levels:

  • 0 - Missing: no reliable evidence exists.
  • 1 - Ad hoc: something has been done, but not systematically.
  • 2 - Basic: the main elements exist, but they are not sufficiently role- or risk-based.
  • 3 - Strong: evidence is connected to roles, AI systems, risks and follow-up.
  • 4 - Audit-ready: evidence is current, repeatable, explainable and governed.

1. AI systems and use context

Question: do you know which AI systems staff use and in what context?

Score low when AI use is mostly informal: "we use Copilot", "marketing uses ChatGPT", "HR has a screening tool". Score high when there is an internal AI register with system name, purpose, owner, user group, risk category, vendor and relevant policy rules.

Strong evidence:

  • AI register or system inventory
  • Owner per system
  • Distinction between generic AI tools and process-critical AI
  • Link with risk, privacy and governance

2. Role matrix

Question: is it clear which knowledge level is needed per role?

A board member, HR recruiter, lawyer, developer and customer service employee do not need the same AI literacy level. A generic training for everyone is a start, but not mature evidence.

Strong evidence:

  • Role matrix per function group
  • Explanation of why that role needs that knowledge level
  • Specific attention for high-impact functions such as HR, legal, compliance, IT, leadership and customer contact
  • Separate route for people who assess AI output or prepare decisions

3. Risk-based learning objectives

Question: are learning objectives linked to the risk of the AI use?

Staff who only use generative AI for summaries need a different level than teams using AI in recruitment, credit, healthcare, education or public services. Article 4 requires suitability.

Strong evidence:

  • Learning objectives per risk or application category
  • Attention for bias, hallucinations, privacy, confidentiality, transparency and human control
  • Sector cases when the team works in HR, healthcare, finance, government or education
  • Update process when new AI tools are added

4. Training records

Question: can you show who completed what, and when?

Many organisations have delivered presentations, but cannot later show who attended, what was covered or whether the right roles were reached. That is weak evidence.

Strong evidence:

  • Employee, role, department and date
  • Module, topic or session
  • Trainer or source
  • Version of material
  • Renewal date or validity
  • Export for compliance, HR or audit

You can start with the AI Training Records template or the Article 4 Evidence Dossier Checklist.

5. Assessment and score

Question: do you know whether people actually understand the essentials?

Attendance is weaker than assessment. A short quiz, scenario exercise or practical case shows better whether staff can assess AI output critically.

Strong evidence:

  • Baseline assessment
  • Score per person or team
  • Retake or follow-up for low scores
  • Role-specific scenarios
  • Team reporting for management

Start with the AI literacy test. For team level, a structured assessment route is more practical.

6. Policy connection

Question: does AI literacy connect to policy, register and governance?

Training without policy remains fragile. Staff need to know which tools are allowed, which data may not be entered, when human review is required and where incidents or doubts should be reported.

Strong evidence:

  • AI policy or AI use guideline
  • Link with AI register
  • Link with privacy, information security and procurement
  • Reporting route for incidents and doubts
  • Periodic review by governance or compliance

7. Management reporting

Question: can leadership see where the gaps are?

Article 4 is not just an HR action. It touches risk management, governance and supervision. Management should be able to see which teams lag behind and where additional measures are needed.

Strong evidence:

  • Coverage per department or role
  • Average score per team
  • Open actions
  • High-risk roles shown separately
  • Periodic reporting to leadership, risk committee or AI governance board

8. Updating

Question: does the evidence stay current when AI use changes?

AI literacy ages quickly. New tools, new workflows and new regulatory guidance can change the required knowledge level.

Strong evidence:

  • Annual or semi-annual review
  • New training when new AI systems are introduced
  • Update after incidents or policy changes
  • Version control of material
  • Refresh cycle for critical roles

How to interpret your score

Add the score for the eight areas. The maximum score is 32.

ScoreInterpretationPractical meaning
0-8VulnerableThere is little evidence. Start with inventory, role matrix and training records.
9-16BasicMeasures exist, but the file is not yet easy to explain.
17-24DefensibleThe core is in place. Strengthen assessment, management reporting and updates.
25-32StrongEvidence is role-based, risk-based and useful for governance.

The key mistake is to look only at the total score. An organisation can score well on training records but poorly on the role matrix. Or it can have many certificates but no connection to the AI register. Those gaps determine how credible the evidence file is.

What a supervisor will probably want to understand

A supervisor will not only ask whether "a training" happened. The logical questions are more concrete:

  • Which AI systems are used?
  • Who works with them?
  • Which knowledge do those people need?
  • How was that determined?
  • Which measures were taken?
  • How were participation and assessment recorded?
  • What does the organisation do with low scores or missing attendance?
  • How is everything kept current?

A strong Article 4 file answers those questions quickly.

What to do now

Start small, but make the evidence solid from the beginning:

  1. List AI systems and generic AI tools.
  2. Build a role matrix for the most important user groups.
  3. Run a baseline assessment.
  4. Record training, score and follow-up per role.
  5. Report team gaps to management.
  6. Connect this to your AI register, AI policy and governance meeting.

For individual orientation, start with the AI literacy test. For teams, the logical next step is a route where assessment, learning path, certificate, training records and reporting stay together.

Next step: use LearnWize when you want team gaps, role-based learning paths, certificates and evidence records in one place: start the AI literacy scan. Use Embed AI when you also need governance, AI register, policy and evidence-file implementation: view the Article 4 Evidence Sprint.

Sources

⚖️ Referenced Legislation