1.165 Stroke Order & Writing (CJK)#

SVG stroke order data, animated dictionaries, and stroke count databases for Chinese, Japanese, and Korean character learning


Explainer

Stroke Order & Writing: Domain Explainer#

Research ID: research-k6iy Date: 2026-01-29 Audience: Technical decision makers, product managers, architects without CJK language expertise


What This Solves#

The Problem: Chinese, Japanese, and Korean characters are made up of multiple strokes (individual pen movements), and there’s a specific sequence that experienced writers follow. Without guidance, learners often develop inconsistent or inefficient habits that are difficult to correct later.

Who Encounters This:

  • Educational platforms teaching CJK languages
  • Language learning apps adding writing practice features
  • Digital textbooks and reference materials
  • Handwriting recognition systems that need stroke order data
  • Calligraphy training applications

Why It Matters: Learning correct stroke order isn’t just about aesthetics. It affects:

  • Writing speed - The standard sequence is optimized for flow
  • Character recognition - Handwritten characters become more legible
  • Memory retention - The kinesthetic pattern helps learners remember characters
  • Cultural authenticity - Shows respect for the writing tradition

Getting the sequence wrong doesn’t prevent others from reading your writing (like spelling mistakes might), but learning the standard sequence from the start is far easier than breaking bad habits later.


Accessible Analogies#

Dance Choreography#

Learning to write complex characters is like learning dance choreography. There’s a specific sequence that experienced practitioners follow - you could technically reach the same final position by moving randomly, but:

  • The standard sequence flows naturally
  • It’s easier to perform at speed once memorized
  • Everyone learning the same sequence can practice together
  • Teachers can spot when you’re doing it wrong and correct you early

Just as dancers learn “step-ball-change” as a unit, Chinese writers learn common stroke patterns like “horizontal-then-vertical” or “left-falling-then-right-falling.”

Following a Recipe#

Think of stroke order like following a recipe’s sequence: you could add ingredients in any order and still end up with something edible, but the standard order exists because it:

  • Makes the process more efficient
  • Produces more consistent results
  • Makes it easier to troubleshoot when something goes wrong
  • Allows you to learn transferable techniques

Just as you learn “mise en place” (prep everything first) in cooking, you learn “outside-then-inside” in character writing.

Assembly Instructions#

Like assembling furniture, there’s often a “right” order that makes the process easier. You could attach parts in any sequence, but the manual’s sequence:

  • Prevents you from getting stuck (painted into a corner)
  • Makes the structure stable during assembly
  • Follows a logical progression that experienced builders recognize

Stroke order follows similar principles - certain sequences make the character easier to balance while writing and create natural momentum for the next stroke.


When You Need This#

You NEED stroke order data when:#

Building Educational Features:

  • Adding writing practice to a language learning app
  • Creating interactive worksheets or exercises
  • Building a handwriting evaluation system
  • Designing a character lookup tool for learners

Interactive Content:

  • Animated character demonstrations in digital textbooks
  • Step-by-step writing tutorials
  • Gamified learning experiences with stroke-by-stroke feedback
  • Calligraphy practice applications

Assessment Tools:

  • Evaluating whether learners are writing correctly
  • Providing real-time feedback during practice
  • Generating progress reports on writing accuracy

You DON’T need stroke order data when:#

  • Display-only applications (dictionaries that just show finished characters)
  • Reading comprehension tools (no writing practice involved)
  • Typography/font rendering (fonts handle display automatically)
  • Speech-focused learning (listening and speaking only)
  • Input methods (typing Chinese on a keyboard)

Decision Criteria:#

Ask yourself: “Will users be learning to write or evaluating their writing?” If yes, you need stroke order data. If users only need to recognize or type characters, you probably don’t.


Trade-offs#

Data Source Complexity Spectrum#

Low Complexity → High Flexibility

Option 1: Pre-built Library (Hanzi Writer)

  • ✅ Fastest integration (< 1 day)
  • ✅ Battle-tested and actively maintained
  • ✅ Handles animation and rendering automatically
  • ❌ Less control over appearance and behavior
  • ❌ Web-focused (mobile requires additional work)
  • Use when: You want to ship quickly with standard features

Option 2: Raw SVG Data (Make Me a Hanzi, KanjiVG)

  • ✅ Complete control over rendering and animation
  • ✅ Use on any platform (web, mobile, desktop)
  • ✅ Customize appearance and interaction patterns
  • ❌ Requires building animation system yourself
  • ❌ More initial development time (1-2 weeks)
  • Use when: You need custom behavior or non-web platforms

Option 3: Stroke Count/Metadata Only (CCDB API, Unihan)

  • ✅ Lightweight data (just numbers and metadata)
  • ✅ Fast lookups for reference purposes
  • ✅ Minimal integration effort
  • ❌ No visual demonstration of sequence
  • ❌ Cannot show animated writing
  • Use when: You only need stroke counts for sorting/searching

Build vs. Integrate#

Integrate Existing Data (Recommended):

  • Pros: Data already verified by language experts, covers thousands of characters, maintained by community
  • Cons: Must accept existing coverage (some rare characters missing)
  • Timeline: MVP in < 1 week

Build Your Own Dataset:

  • Pros: Complete control over coverage and accuracy
  • Cons: Requires language expertise, time-intensive (months), error-prone without validation
  • Timeline: 3-6 months for basic coverage
  • Reality check: Only consider this if you have native-level expertise in the target language AND need characters not covered by existing datasets

Verdict: Unless you’re a language institute with specific research needs, integrate existing open-source data. The community datasets are production-ready and cover 99%+ of use cases.


Implementation Reality#

Realistic Timeline Expectations#

Week 1: Research and Setup

  • Evaluate data sources for your needs
  • Verify licensing compatibility with your project
  • Set up development environment
  • Download and test datasets locally

Weeks 2-3: Core Development

  • Integrate stroke order library or build rendering system
  • Create basic practice interface
  • Implement character lookup and display
  • Build minimal progress tracking

Weeks 4-6: Polish and Testing

  • Add animations and visual feedback
  • Test across devices and browsers
  • Create learning content and exercises
  • Beta test with real learners

Reality Check: Getting a basic demo working takes days. Building a polished, learner-friendly experience takes weeks. Creating comprehensive curriculum content (selecting which characters to teach, in what order, with what exercises) takes months.

Team Skill Requirements#

Minimum Team:

  • 1 frontend developer (React/Vue/Angular or mobile native)
  • 1 backend developer (if building API, otherwise optional)
  • 1 content creator/language expert (part-time) for curriculum

Skills Needed:

  • Frontend: Working with SVG, animations (CSS/Canvas/WebGL)
  • Backend: REST APIs, database queries (if building lookup service)
  • Language: Understanding of target language writing system (or access to expert reviewer)

Can One Person Do This?: Yes, for an MVP. A full-stack developer with basic knowledge of the target language can build a functional prototype. However, quality content creation and cultural accuracy require native-level expertise.

Common Pitfalls#

Underestimating Mobile Performance:

  • SVG animations can be sluggish on older devices
  • Solution: Test early on low-end hardware, optimize rendering, consider Canvas instead of SVG for complex animations

Assuming All Characters Are Available:

  • Even comprehensive datasets have gaps (rare variants, historical forms)
  • Solution: Check coverage for YOUR specific character set early, have a fallback display for missing data

Ignoring Regional Variations:

  • Simplified vs. Traditional Chinese have different forms
  • Japanese kanji may differ from Chinese equivalents
  • Solution: Clearly define your target writing system upfront

Overlooking Licensing:

  • Some datasets have share-alike requirements (CC BY-SA)
  • Solution: Review licenses in Phase 1, ensure compliance with attribution requirements

First 90 Days: What to Expect#

Days 1-30: Building

  • You’ll have a working prototype that can display and animate characters
  • Expect excitement as you see characters come to life
  • Also expect frustration with edge cases and rendering quirks

Days 31-60: Testing

  • Beta testers will find bugs you never imagined
  • You’ll realize content creation (writing exercises, learning paths) is more work than the tech
  • Performance issues on real-world devices will surface

Days 61-90: Refining

  • You’ll iterate based on user feedback
  • The tech will feel stable, but content creation will feel endless
  • Marketing and user acquisition will become the bottleneck

Key Insight: The technical challenge of stroke order visualization is solved (libraries exist, data is available, integration is straightforward). The real work is creating engaging educational content and building a user base. Budget your time accordingly - 20% tech, 80% content and marketing.


Summary for Decision Makers#

The Data Exists: Open-source stroke order datasets cover 9,000+ Chinese characters and full Japanese kanji coverage, all with permissive licenses suitable for commercial use.

The Tools Are Ready: Libraries like Hanzi Writer make integration straightforward for web applications. Raw SVG data is available for custom implementations.

The Challenge Is Execution: Technology is not the bottleneck. Success depends on:

  • Creating effective learning content
  • Designing an engaging user experience
  • Acquiring and retaining learners

Time to First Demo: < 1 week for basic web implementation using existing libraries.

Time to Production: 6-8 weeks for a polished MVP with core features and initial content.

Skills Required: Frontend developer + language expert (or consultant) for content validation.

Cost: Primarily developer time and content creation - all stroke order data is free and open-source. No API costs or licensing fees for the core data.


Document Status: Complete Related Documents: See 01-discovery/ for detailed technical resources and implementation guides

S1: Rapid Discovery

Stroke Order Resources: Quick Reference#

Research ID: research-k6iy Date: 2026-01-29 Pass: S1 (Rapid Discovery)

TL;DR#

Need stroke order data for CJK characters? Start here:

ResourceLanguageCoverageBest For
Hanzi WriterChinese9,000+Web apps (easiest)
Make Me a HanziChinese9,000+Custom implementations
KanjiVGJapaneseKanjiProduction-ready SVGs
animCJKCJK (all)7,672+Multi-language apps
CCDB APIChinese20,902Stroke count lookups

Quick Start#

Web App (5 minutes):

import HanziWriter from 'hanzi-writer';
const writer = HanziWriter.create('div-id', '你', {
  width: 100, height: 100
});
writer.animateCharacter();

Stroke Count Lookup:

  • API: http://ccdb.hemiola.com/characters/unicode/{codepoint}
  • Python: pip install cjklib
  • Database: ChineseStrokes (81,000 characters)

Licensing Quick Check#

ResourceLicenseCommercial OK?
Hanzi WriterMIT✅ Yes
KanjiVGCC BY-SA 3.0✅ Yes (with attribution)
Make Me a HanziMixed⚠️ Check repo
animCJKOpen-source⚠️ Verify license
cjklibLGPL✅ Yes

Next Steps#

  1. For web apps: Start with Hanzi Writer (easiest integration)
  2. For custom needs: Use Make Me a Hanzi or KanjiVG SVGs directly
  3. For stroke counts: CCDB API or cjklib
  4. For deep dive: See S2-comprehensive for full catalog

Key Files Location#

  • S2-comprehensive/: Full catalog of all data sources
  • S3-need-driven/: Implementation guides and use cases
  • S4-strategic/: Implementation roadmap and metrics
S2: Comprehensive

Stroke Order Data Sources: Comprehensive Catalog#

Research ID: research-k6iy Date: 2026-01-29 Pass: S2 (Comprehensive Coverage) Purpose: Complete catalog of SVG stroke order data, stroke count databases, and animated dictionary resources for CJK characters


1. SVG Stroke Order Data Sources#

1.1 Make Me a Hanzi (Chinese Characters)#

Repository: skishore/makemeahanzi Website: makemeahanzi Coverage: 9,000+ most common simplified and traditional Chinese characters License: Multiple (see repository for details)

Key Features:

  • Stroke-order vector graphics for all characters
  • Dictionary data (definitions, pinyin)
  • Graphical data (stroke decomposition)
  • Experimental animated SVGs (svgs.tar.gz)
  • SVGs named by Unicode codepoint

Data Format:

  • dictionary.txt - character definitions, pronunciations
  • graphics.txt - stroke order and decomposition data
  • svgs.tar.gz - pre-rendered animated SVG files

Use Cases:

  • Foundation for building stroke order animation systems
  • Reference for stroke decomposition algorithms
  • Educational apps requiring accurate stroke order

1.2 KanjiVG (Japanese Kanji)#

Repository: KanjiVG/kanjivg Website: kanjivg.tagaini.net Coverage: Japanese kanji characters License: Creative Commons Attribution-Share Alike 3.0

Key Features:

  • SVG file for each character with stroke shape and direction
  • Stroke order information
  • Component metadata (radicals, stroke types)
  • Variant forms included
  • Widely adopted (used by Duolingo, many dictionary sites)

Distribution:

  • Zip file with all non-variant SVG files
  • Individual files in repository
  • Vector graphics suitable for scaling

Notable Users:

  • Duolingo (language learning platform)
  • Multiple Japanese dictionary websites
  • Educational apps for kanji learning

1.3 HanziVG (Chinese Hanzi)#

Repository: Connum/hanzivg Goal: Become for Chinese what KanjiVG is for Japanese Coverage: Traditional and Simplified Chinese characters

Key Features:

  • SVG stroke order files with metadata
  • Radical information
  • Character component decomposition
  • Modeled after KanjiVG structure

Status: Active development, growing coverage


1.4 animCJK (Multi-Language)#

Repository: parsimonhi/animCJK Coverage: Chinese, Japanese (Kanji + Kana), Korean (Hanja) Total Characters: 7,672+ in Chinese simplified folder

Key Features:

  • Animated stroke order using SVG
  • Free and open-source
  • Multi-language support (CJK)
  • Organized by language:
    • svgsZhHans/ - Simplified Chinese (7,000 common + uncommon)
    • Traditional Chinese variants
    • Japanese Kanji and Kana
    • Korean Hanja
    • Basic strokes and components

Use Cases:

  • Universal CJK character applications
  • Cross-language learning platforms
  • Comparative stroke order analysis

1.5 Hanzi Writer (JavaScript Library + Data)#

Repository: chanind/hanzi-writer Website: hanziwriter.org Data Explorer: chanind.github.io/hanzi-writer-data Type: JavaScript library with accompanying SVG data

Key Features:

  • Free and open-source library for stroke order animations
  • Based on Make Me a Hanzi data
  • HTML5 + SVG rendering
  • Stroke order practice quizzes
  • Embeddable in web applications
  • Character data in separate repository

Technical Stack:

  • JavaScript/TypeScript
  • SVG rendering
  • No backend required

Use Cases:

  • Web-based character writing practice
  • Interactive quizzes
  • Browser-based learning applications

2. Online Animated Dictionaries#

2.1 strokeorder.info#

URL: strokeorder.info Format: Animated GIFs Coverage: 4,000+ characters

Features:

  • Pre-rendered animated GIFs
  • Instant playback (no JavaScript required)
  • Easy to embed in static sites

2.2 strokeorder.com#

URL: strokeorder.com

Features:

  • Type-to-animate interface
  • Automatic playback on character entry
  • Interactive stroke order display

2.3 Chinese Character Web API#

URL: ccdb.hemiola.com Type: RESTful API Data Source: Unihan Database (MySQL + PHP)

Key Features:

  • 20,902 characters (CJK Unified Ideographs range)
  • Stroke count information
  • Radical lookup (kRSKangXi field)
  • Programmatic access

Use Cases:

  • Backend for dictionary apps
  • Automated stroke count lookup
  • Character metadata retrieval

3. Stroke Count Databases#

3.1 Chinese Character Stroke Count Resources#

GitHub Repository: caiguanhao/ChineseStrokes Coverage: 81,000+ Chinese characters Purpose: Sort characters by stroke count

Key Features:

  • Comprehensive stroke count data
  • Suitable for dictionary lookup systems
  • Enables stroke-based search

Use Cases:

  • Implement radical/stroke lookup in dictionaries
  • Sort characters by complexity
  • Character learning progression systems

3.2 Unihan Database (kTotalStrokes)#

Source: Unicode Consortium Coverage: 101,996 CJK unified ideographs (as of Unicode 17.0) Field: kTotalStrokes

Note: Some errors exist in the data. Cross-reference recommended.

Access Methods:

  • Direct download from Unicode.org
  • Via libraries (cjklib, Python)
  • Through APIs (CCDB)

3.3 cjklib (Python Library)#

PyPI: cjklib Documentation: cjklib.readthedocs.io

Key Features:

  • Language routines for Han characters (Chinese, Japanese, Korean, Vietnamese)
  • Character pronunciations
  • Radical information
  • Glyph component analysis
  • Stroke decomposition
  • Variant information
  • Locale-aware stroke counts (simplified vs. traditional)

Important: Stroke counts can vary by locale (traditional vs. simplified Chinese)

Use Cases:

  • Building Python-based dictionary tools
  • Linguistic analysis
  • Character decomposition systems

3.4 KRADFILE/RADKFILE (Kanji Radical Decomposition)#

Maintainer: Electronic Dictionary Research and Development Group (EDRDG) Website: edrdg.org/krad/kradinf.html License: EDRDG License Coverage: 6,355+ kanji (JIS X 0208-1997) + 5,801 (JIS X 0212)

Key Features:

  • Kanji decomposition into visual elements/radicals
  • Enables radical-based lookup
  • KRADFILE: Kanji → Radicals mapping
  • RADKFILE: Radicals → Kanji mapping (inverted, used by lookup software)

Historical Context:

  • Initial work by Michael Raine (1994/1995)
  • Revised by Jim Breen (1995)
  • Extended by Jim Rose (2007)

Use Cases:

  • Implement radical-based kanji search
  • Component-based learning systems
  • Dictionary lookup by visual elements

4. Reference Data#

4.1 Frequency and Stroke Count Tables#

Resource: technology.chtsai.org/charfreq

Available Data:

  • Characters sorted by frequency
  • Stroke counts for common characters
  • Statistical analysis

4.2 Wiktionary Appendix#

Resource: Wiktionary - Chinese total strokes

Features:

  • Community-maintained stroke count data
  • Free to use
  • Multiple character variants

References#

Primary Sources#

APIs and Libraries#

Reference Databases#

Online Tools#


Document Status: Complete Last Updated: 2026-01-29

S3: Need-Driven

Stroke Order Implementation Guide#

Research ID: research-k6iy Date: 2026-01-29 Pass: S3 (Need-Driven Application) Purpose: Practical guidance for implementing stroke order features in educational platforms


1. Implementation Considerations#

1.1 Licensing#

Open Licenses:

  • Make Me a Hanzi: Mixed licenses (check repository)
  • KanjiVG: CC BY-SA 3.0 (attribution + share-alike)
  • animCJK: Open-source (verify specific license)
  • KRADFILE: EDRDG License (check restrictions)

Action Items:

  • Review license terms before commercial use
  • Provide proper attribution
  • Comply with share-alike requirements where applicable

1.2 Data Formats#

SVG (Recommended for stroke order):

  • Scalable without quality loss
  • Embeddable in web/mobile apps
  • Supports animation paths
  • Lightweight

JSON (Recommended for metadata):

  • Easy to parse
  • Works with all modern platforms
  • Suitable for APIs

GIF (Legacy, limited use):

  • Pre-rendered animations
  • No customization
  • Larger file sizes

1.3 Technical Integration#

For Web Applications:

// Example: Hanzi Writer
import HanziWriter from 'hanzi-writer';

const writer = HanziWriter.create('character-target-div', '你', {
  width: 100,
  height: 100,
  padding: 5
});

writer.animateCharacter();

For Mobile Applications:

  • Embed SVG files directly
  • Use native SVG rendering libraries
  • Pre-cache common characters for offline use

For Backend Systems:

  • cjklib (Python) for character analysis
  • Chinese Character Web API for stroke counts
  • PostgreSQL with Unihan data for lookups

1.4 Performance Optimization#

Strategies:

  1. Lazy Loading: Load stroke data only when character is displayed
  2. Caching: Pre-cache common characters (top 3,000)
  3. CDN: Serve SVG files from CDN for faster delivery
  4. Progressive Enhancement: Show static character first, animate on interaction

Estimated Data Sizes:

  • Per-character SVG: 2-10 KB
  • 1,000 characters: 2-10 MB
  • Full dataset (9,000+): 18-90 MB

2. Use Cases for Learning Applications#

2.1 Stroke Order Practice#

Features:

  • Display stroke-by-stroke animation
  • User traces character with finger/stylus
  • Real-time validation of stroke direction and order
  • Feedback on accuracy

Data Required:

  • SVG stroke paths (from Make Me a Hanzi or KanjiVG)
  • Stroke sequence metadata
  • Direction vectors

2.2 Dictionary Lookup by Stroke Count#

Features:

  • Filter characters by total stroke count
  • Combine with radical lookup
  • Progressive narrowing (radical + stroke count)

Data Required:

  • Stroke count database (Unihan or ChineseStrokes)
  • Radical decomposition (KRADFILE)

Example Lookup:

User: "Radical 水 (water) + 7 strokes"
Result: 汰, 汲, 汴, 汾 (candidates)

2.3 Handwriting Recognition Training#

Features:

  • Collect user stroke data
  • Train ML models for character recognition
  • Validate correct stroke order

Data Required:

  • Labeled stroke order sequences
  • Variant forms (different handwriting styles)
  • Stroke direction and timing

2.4 Gamified Learning#

Features:

  • “Draw the character” challenges
  • Timed stroke order races
  • Achievement badges for stroke accuracy

Engagement Mechanics:

  • Progress tracking (characters mastered)
  • Leaderboards (speed + accuracy)
  • Unlock levels based on stroke complexity

2.5 Adaptive Learning Paths#

Features:

  • Start with simple characters (few strokes)
  • Progress to complex characters
  • Focus on commonly confused characters

Data-Driven Approach:

  • Sort characters by stroke count (ascending)
  • Track user errors (confusion matrix)
  • Recommend practice based on weak points

3. Integration with Educational Platforms#

3.1 Docusaurus Integration#

Approach:

  • Create MDX components for stroke order display
  • Embed Hanzi Writer or animCJK SVGs
  • Add interactive quizzes

Example MDX:

import StrokeOrder from '@site/src/components/StrokeOrder';

<StrokeOrder character="学" />

3.2 QRCards Certificate Integration#

Certificate Fields:

{
  "certification_info": {
    "type": "competency_badge",
    "name": "Hanzi Writing Fundamentals",
    "issued_date": "2026-XX-XX",
    "level": 1
  },
  "skills": {
    "characters_mastered": 500,
    "stroke_accuracy": "95%",
    "writing_speed": "15 chars/min"
  },
  "portfolio_evidence": [
    {
      "name": "Stroke Order Video",
      "url": "example.com/demo"
    }
  ]
}

3.3 Learning Path Design#

Beginner Path (8 weeks):

  • Week 1-2: Basic strokes (8 types)
  • Week 3-4: Simple characters (1-4 strokes)
  • Week 5-6: Radicals (214 traditional)
  • Week 7-8: Common characters (200 most frequent)

Intermediate Path (12 weeks):

  • Compound characters (5-12 strokes)
  • Stroke order rules and exceptions
  • Handwriting speed optimization
  • Character variants (simplified vs. traditional)

Advanced Path (16 weeks):

  • Complex characters (13+ strokes)
  • Calligraphy styles (kaishu, xingshu)
  • Historical forms
  • Error correction (common mistakes)

4.1 For Web-Based Learning Apps#

Frontend:

  • React/Next.js for UI
  • Hanzi Writer for character animations
  • SVG.js for custom stroke rendering

Backend:

  • Node.js API for character data
  • PostgreSQL with Unihan data
  • Redis for caching common characters

Data Storage:

  • CDN for SVG files (Cloudflare)
  • JSON API for metadata
  • User progress in database

4.2 For Mobile Apps#

iOS:

  • SwiftUI for UI
  • Core Graphics for SVG rendering
  • Local SQLite database with stroke data

Android:

  • Jetpack Compose for UI
  • AndroidX SVG libraries
  • Room database for offline data

Cross-Platform:

  • React Native + react-native-svg
  • Flutter + flutter_svg

5. Example Implementations#

5.1 Web Component (React)#

import React, { useEffect, useRef } from 'react';
import HanziWriter from 'hanzi-writer';

const StrokeOrderDisplay = ({ character }) => {
  const targetRef = useRef(null);
  const writerRef = useRef(null);

  useEffect(() => {
    if (targetRef.current) {
      writerRef.current = HanziWriter.create(targetRef.current, character, {
        width: 200,
        height: 200,
        padding: 10,
        showOutline: true,
        strokeAnimationSpeed: 1,
        delayBetweenStrokes: 300
      });
    }

    return () => {
      if (writerRef.current) {
        writerRef.current = null;
      }
    };
  }, [character]);

  const handleAnimate = () => {
    writerRef.current?.animateCharacter();
  };

  const handleQuiz = () => {
    writerRef.current?.quiz();
  };

  return (
    <div>
      <div ref={targetRef} />
      <button onClick={handleAnimate}>Animate</button>
      <button onClick={handleQuiz}>Practice</button>
    </div>
  );
};

export default StrokeOrderDisplay;

5.2 Backend API (Node.js + Express)#

const express = require('express');
const { Pool } = require('pg');

const app = express();
const pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

// Get stroke count for a character
app.get('/api/strokes/:character', async (req, res) => {
  const { character } = req.params;
  const codepoint = character.codePointAt(0).toString(16).toUpperCase();

  const result = await pool.query(
    'SELECT stroke_count, radical FROM unihan WHERE codepoint = $1',
    [codepoint]
  );

  if (result.rows.length === 0) {
    return res.status(404).json({ error: 'Character not found' });
  }

  res.json(result.rows[0]);
});

// Search characters by stroke count
app.get('/api/search/strokes/:count', async (req, res) => {
  const { count } = req.params;

  const result = await pool.query(
    'SELECT codepoint, character FROM unihan WHERE stroke_count = $1 LIMIT 100',
    [parseInt(count)]
  );

  res.json(result.rows);
});

app.listen(3000, () => {
  console.log('API running on port 3000');
});

5.3 Python Stroke Analysis#

from cjklib import characterlookup

cjk = characterlookup.CharacterLookup('C')  # 'C' for Chinese

# Get stroke count
character = '学'
stroke_count = cjk.getStrokeCount(character)
print(f"Stroke count for {character}: {stroke_count}")

# Get radicals
radicals = cjk.getCharacterRadicalResidualStrokeCount(character)
print(f"Radicals: {radicals}")

# Find characters by stroke count
chars_with_5_strokes = cjk.getCharactersForStrokeCount(5)
print(f"Characters with 5 strokes: {chars_with_5_strokes[:10]}")

6. Testing and Validation#

6.1 Data Quality Checks#

Validation Steps:

  1. Verify stroke count matches across data sources
  2. Check SVG files render correctly
  3. Validate stroke order follows standard conventions
  4. Test on different screen sizes

Automated Testing:

describe('Stroke Order Data', () => {
  test('SVG files exist for common characters', async () => {
    const commonChars = ['的', '一', '是', '不', '了'];

    for (const char of commonChars) {
      const svg = await loadCharacterSVG(char);
      expect(svg).toBeDefined();
      expect(svg).toContain('<path');
    }
  });

  test('Stroke counts match database', async () => {
    const testCases = [
      { char: '一', expectedStrokes: 1 },
      { char: '二', expectedStrokes: 2 },
      { char: '三', expectedStrokes: 3 }
    ];

    for (const { char, expectedStrokes } of testCases) {
      const count = await getStrokeCount(char);
      expect(count).toBe(expectedStrokes);
    }
  });
});

6.2 User Experience Testing#

Test Scenarios:

  • Stroke animation speed (too fast/slow?)
  • Touch responsiveness on mobile
  • Accuracy threshold for practice mode
  • Feedback clarity (correct/incorrect strokes)

Metrics to Track:

  • Animation load time
  • Practice completion rate
  • User accuracy over time
  • Session engagement duration

7. Deployment Checklist#

7.1 Data Preparation#

  • Download required datasets (Make Me a Hanzi, KanjiVG, etc.)
  • Process SVG files for CDN delivery
  • Set up database with Unihan data
  • Create character metadata JSON files
  • Implement caching strategy

7.2 Infrastructure#

  • Set up CDN for SVG files
  • Configure API endpoints
  • Set up Redis for caching
  • Configure database backups
  • Set up monitoring and logging

7.3 Integration#

  • Test Hanzi Writer integration
  • Verify mobile responsiveness
  • Test offline functionality
  • Validate cross-browser compatibility
  • Test performance under load

7.4 Content#

  • Create learning path content
  • Write exercise instructions
  • Prepare quiz questions
  • Create tutorial videos (optional)
  • Design achievement badges

Document Status: Complete Last Updated: 2026-01-29 Related: See S2-comprehensive for data sources, S4-strategic for roadmap

S4: Strategic

Stroke Order Implementation: Strategic Roadmap#

Research ID: research-k6iy Date: 2026-01-29 Pass: S4 (Strategic Planning) Purpose: High-level implementation strategy, research gaps, success metrics, and recommendations


1. Research Gaps and Future Work#

1.1 Missing Coverage#

Gaps:

  • Korean Hangul stroke order (limited resources)
  • Vietnamese Chu Nom characters
  • Historical Chinese variants
  • Regional variations in stroke order

Opportunities:

  • Crowdsource additional data
  • Partner with language institutes
  • Expand animCJK coverage

1.2 Quality Improvements#

Needed:

  • Error correction in Unihan stroke counts
  • Standardization across datasets
  • Variant form mapping (simplified ↔ traditional)
  • Handwriting style variations

1.3 AI/ML Applications#

Potential:

  • Stroke prediction models (next stroke suggestion)
  • Handwriting style transfer
  • Automated stroke order generation for rare characters
  • Personalized difficulty adaptation

2. Implementation Roadmap#

Phase 1: Data Acquisition (Week 1)#

Objectives:

  • Secure all required datasets
  • Verify licensing compatibility
  • Set up local development environment

Tasks:

  • Download Make Me a Hanzi dataset
  • Clone KanjiVG repository
  • Set up local mirror of CCDB API
  • Download ChineseStrokes database
  • Review license terms for commercial use

Deliverables:

  • Local data repository
  • License compliance documentation
  • Data inventory spreadsheet

Phase 2: Infrastructure Setup (Week 2)#

Objectives:

  • Build backend infrastructure
  • Set up data pipelines
  • Configure hosting and CDN

Tasks:

  • Set up PostgreSQL with Unihan data
  • Create CDN bucket for SVG files
  • Build REST API for character lookup
  • Implement caching layer (Redis)
  • Configure monitoring and logging

Deliverables:

  • API endpoints (stroke count, character lookup)
  • CDN with SVG files
  • Database with metadata
  • Performance monitoring dashboard

Phase 3: Frontend Development (Week 3-4)#

Objectives:

  • Build user-facing components
  • Implement interactive features
  • Ensure mobile responsiveness

Tasks:

  • Create Hanzi Writer integration
  • Build stroke order visualization component
  • Implement practice mode with validation
  • Add progress tracking
  • Design responsive layouts
  • Test cross-browser compatibility

Deliverables:

  • React components for stroke order display
  • Practice mode with scoring
  • Mobile-optimized interface
  • User progress tracking system

Phase 4: Content Creation (Week 5-6)#

Objectives:

  • Develop learning curriculum
  • Create exercises and assessments
  • Prepare supporting materials

Tasks:

  • Design learning path curriculum
  • Write exercises and quizzes
  • Create video tutorials (optional)
  • Develop grading rubrics
  • Design achievement badges
  • Write instructional content

Deliverables:

  • Structured learning paths (Beginner, Intermediate, Advanced)
  • 50+ practice exercises
  • Quiz bank (100+ questions)
  • Achievement system
  • Tutorial videos (if included)

Phase 5: Testing & Launch (Week 7-8)#

Objectives:

  • Validate functionality
  • Optimize performance
  • Launch pilot program

Tasks:

  • Beta test with 10 learners
  • Collect feedback on UX
  • Optimize performance
  • Launch pilot learning path
  • Monitor initial usage metrics
  • Iterate based on feedback

Deliverables:

  • Beta test report
  • Performance optimization results
  • Launch-ready platform
  • Initial user feedback summary

3. Success Metrics#

3.1 Engagement Metrics#

Daily Active Users (DAU):

  • Target: 50+ users within first month
  • Growth rate: 20% month-over-month

Characters Practiced per Session:

  • Target: 10-20 characters
  • Indicator of engagement depth

Session Duration:

  • Target: 15+ minutes average
  • Indicates meaningful practice time

Return Rate:

  • Target: 40%+ weekly return rate
  • Measures habit formation

3.2 Learning Outcome Metrics#

Stroke Accuracy Improvement:

  • Baseline: Initial assessment score
  • Target: 20%+ improvement after 4 weeks
  • Measure: Automated scoring of practice exercises

Character Retention Rate:

  • 1 week retention: 70%+ (characters practiced still remembered)
  • 1 month retention: 50%+ (long-term memory formation)
  • Measure: Periodic review quizzes

Writing Speed Increase:

  • Baseline: Characters per minute at start
  • Target: 30%+ improvement after 8 weeks
  • Measure: Timed writing exercises

Mastery Progression:

  • Beginner (1-4 strokes): 80%+ accuracy within 2 weeks
  • Intermediate (5-12 strokes): 80%+ accuracy within 6 weeks
  • Advanced (13+ strokes): 70%+ accuracy within 12 weeks

3.3 Business Metrics#

Learning Path Completion Rate:

  • Target: 50%+ completion for enrolled users
  • Industry benchmark: 30-40% for online courses
  • Indicates content quality and engagement

Certificate Issuance Volume:

  • Target: 20+ certificates in first quarter
  • Demonstrates skill achievement
  • Marketing value (user testimonials)

User Satisfaction (NPS Score):

  • Target: NPS > 40 (good)
  • Stretch goal: NPS > 70 (excellent)
  • Measure: Post-learning path survey

Cost per Acquisition (CPA):

  • Baseline: Track marketing spend
  • Target: CPA < $10 for free tier users
  • Measure: Marketing spend / new users

Lifetime Value (LTV):

  • For paid tiers (if applicable)
  • Target: LTV > 3x CPA
  • Measure: Average revenue per user over 12 months

3.4 Technical Performance Metrics#

Page Load Time:

  • Target: < 2 seconds
  • Critical for user experience

API Response Time:

  • Stroke count lookup: < 100ms
  • Character metadata: < 200ms

CDN Cache Hit Rate:

  • Target: > 95% for SVG files
  • Reduces bandwidth costs

Error Rate:

  • Target: < 0.1% of requests
  • Monitoring critical for reliability

4. Risk Assessment and Mitigation#

4.1 Technical Risks#

Risk: Data quality issues (incorrect stroke orders)

  • Impact: Medium (user confusion, learning incorrect forms)
  • Probability: Low (using established datasets)
  • Mitigation: Cross-reference multiple sources, community validation

Risk: Performance issues at scale

  • Impact: High (poor user experience)
  • Probability: Medium (depends on infrastructure)
  • Mitigation: Load testing, CDN optimization, caching strategy

Risk: Mobile compatibility issues

  • Impact: High (majority of language learners use mobile)
  • Probability: Low (tested during development)
  • Mitigation: Responsive design, device testing matrix

4.2 Business Risks#

Risk: Low user adoption

  • Impact: High (project viability)
  • Probability: Medium (depends on marketing)
  • Mitigation: Beta testing, user feedback loops, marketing strategy

Risk: Licensing issues with data sources

  • Impact: High (legal liability)
  • Probability: Low (verified during Phase 1)
  • Mitigation: Legal review, proper attribution, license compliance

Risk: Competition from established platforms

  • Impact: Medium (market share)
  • Probability: High (Duolingo, Pleco, etc. exist)
  • Mitigation: Differentiation strategy, unique features, niche targeting

4.3 Operational Risks#

Risk: Content creation bottleneck

  • Impact: Medium (delays launch)
  • Probability: Medium (resource-intensive)
  • Mitigation: Prioritize core content, phase additional content

Risk: Maintenance burden for data updates

  • Impact: Low (gradual degradation)
  • Probability: Medium (Unicode updates, new characters)
  • Mitigation: Automated data refresh scripts, community contributions

5. Strategic Recommendations#

Minimum Viable Product (MVP):

  1. Web-first approach using Hanzi Writer

    • Fastest time to market
    • Lowest development cost
    • Proven technology stack
  2. Focus on Chinese characters initially

    • Largest user base
    • Best data availability (Make Me a Hanzi)
    • Expand to Japanese/Korean later
  3. Core features only:

    • Stroke order animation
    • Practice mode with basic validation
    • Progress tracking (characters completed)
    • Single learning path (Beginner)

Rationale: Validate product-market fit before investing in advanced features.


5.2 Differentiation Strategy#

How to Stand Out:

  1. Integration with existing platforms

    • Docusaurus plugin for documentation sites
    • Embeddable widgets for blogs/tutorials
    • API for third-party apps
  2. Credential-focused

    • Issue verifiable certificates (QRCards)
    • Portfolio evidence (practice videos)
    • LinkedIn-compatible badges
  3. Adaptive learning

    • Personalized difficulty adjustment
    • Focus on user’s weak points
    • Spaced repetition for retention
  4. Community features

    • Leaderboards (opt-in)
    • Shared progress achievements
    • Study groups / cohorts

5.3 Technology Choices#

Recommended Stack:

  • Frontend: Next.js + React

    • Server-side rendering for SEO
    • Fast page loads
    • Large ecosystem
  • Stroke Animation: Hanzi Writer

    • Battle-tested library
    • Active development
    • Good documentation
  • Backend: Node.js + Express + PostgreSQL

    • JavaScript everywhere (full-stack)
    • PostgreSQL for complex queries (stroke count + radical lookup)
    • Redis for caching
  • Hosting: Vercel (frontend) + Railway (backend)

    • Easy deployment
    • Auto-scaling
    • Good free tiers for MVP
  • CDN: Cloudflare

    • Free tier sufficient for MVP
    • Global distribution
    • DDoS protection

5.4 Go-to-Market Strategy#

Phase 1: Beta Launch (Weeks 1-4)

  • Recruit 10-20 beta testers
  • Offer free lifetime access for feedback
  • Iterate based on user input

Phase 2: Soft Launch (Weeks 5-8)

  • Launch on Product Hunt, Hacker News
  • Target language learning communities (Reddit, forums)
  • Content marketing (blog posts, tutorials)

Phase 3: Growth (Weeks 9-16)

  • SEO optimization for “Chinese stroke order” keywords
  • Partnership with language schools/tutors
  • Paid ads (Google, Facebook) if budget allows

Phase 4: Scale (Weeks 17+)

  • Expand to Japanese and Korean
  • Add advanced features (calligraphy styles, handwriting recognition)
  • Enterprise sales to educational institutions

5.5 Monetization Options#

Freemium Model (Recommended):

  • Free: Basic stroke order practice (200 characters)
  • Paid ($5/month): Full character set, certificates, advanced features

One-Time Purchase:

  • $29 for lifetime access to full content
  • Appeals to serious learners

Enterprise Licensing:

  • API access for third-party apps
  • White-label for educational institutions
  • Custom content for corporate training

6. Alternative Approaches#

6.1 If Limited Resources#

Approach: Start even smaller

  • Use Hanzi Writer demo page as MVP
  • Embed pre-existing tools (strokeorder.info)
  • Focus on content curation, not tech development
  • Validate demand before building custom platform

6.2 If Large Budget Available#

Approach: Build comprehensive platform from day one

  • Mobile apps (iOS + Android) alongside web
  • AI-powered handwriting recognition
  • Live tutoring integration
  • Gamification with 3D animations
  • Multi-language from launch (Chinese + Japanese + Korean)

6.3 If Targeting Niche Audience#

Approach: Specialize deeply

  • Focus on calligraphy enthusiasts (not general learners)
  • Historical script variants (seal script, clerical script)
  • Professional certification for Chinese teachers
  • Premium pricing, boutique experience

7. Conclusion#

7.1 Key Takeaways#

  1. Ecosystem is Mature: Open-source data for CJK stroke order is production-ready (Make Me a Hanzi, KanjiVG)

  2. Low Barrier to Entry: Hanzi Writer library makes web integration straightforward (< 1 week MVP)

  3. Market Validation: Existing platforms (Duolingo, Pleco) prove demand for stroke order features

  4. Differentiation Possible: Credentials, integration, and adaptive learning offer competitive advantage

  5. Execution Matters: Success depends more on product design and marketing than data availability


Immediate (This Week):

  1. Select target language (Chinese recommended)
  2. Choose data source (Hanzi Writer for easiest start)
  3. Prototype stroke order component (1 day)
  4. Show to 3-5 potential users for feedback

Short-term (Weeks 2-4):

  • Build MVP with core features only
  • Beta test with 10 users
  • Validate product-market fit

Medium-term (Months 2-3):

  • Launch publicly
  • Iterate based on usage data
  • Expand content and features

Long-term (Months 4-12):

  • Scale to additional languages
  • Add advanced features (AI recognition, calligraphy)
  • Explore monetization strategies

7.3 Critical Success Factors#

  1. User Experience: Stroke animation must be smooth and intuitive
  2. Content Quality: Learning paths must be well-structured and effective
  3. Performance: Fast load times critical for mobile learners
  4. Engagement: Gamification and progress tracking keep users coming back
  5. Differentiation: Clear value proposition vs. existing platforms

7.4 Final Recommendation#

Start with Hanzi Writer for web-based Chinese stroke order practice.

  • Fastest path to MVP
  • Proven technology
  • Best data availability
  • Largest potential user base
  • Expandable to Japanese/Korean later

Once product-market fit is validated, invest in:

  • Mobile apps
  • Advanced features (AI recognition)
  • Multi-language expansion
  • Enterprise features

The data is ready. The tools exist. The market is proven. Success depends on execution.


Document Status: Complete Last Updated: 2026-01-29 Related: See S1-rapid for quick start, S2-comprehensive for data sources, S3-need-driven for implementation details

Published: 2026-03-06 Updated: 2026-03-06