1.165 Stroke Order & Writing (CJK)#
SVG stroke order data, animated dictionaries, and stroke count databases for Chinese, Japanese, and Korean character learning
Explainer
Stroke Order & Writing: Domain Explainer#
Research ID: research-k6iy Date: 2026-01-29 Audience: Technical decision makers, product managers, architects without CJK language expertise
What This Solves#
The Problem: Chinese, Japanese, and Korean characters are made up of multiple strokes (individual pen movements), and there’s a specific sequence that experienced writers follow. Without guidance, learners often develop inconsistent or inefficient habits that are difficult to correct later.
Who Encounters This:
- Educational platforms teaching CJK languages
- Language learning apps adding writing practice features
- Digital textbooks and reference materials
- Handwriting recognition systems that need stroke order data
- Calligraphy training applications
Why It Matters: Learning correct stroke order isn’t just about aesthetics. It affects:
- Writing speed - The standard sequence is optimized for flow
- Character recognition - Handwritten characters become more legible
- Memory retention - The kinesthetic pattern helps learners remember characters
- Cultural authenticity - Shows respect for the writing tradition
Getting the sequence wrong doesn’t prevent others from reading your writing (like spelling mistakes might), but learning the standard sequence from the start is far easier than breaking bad habits later.
Accessible Analogies#
Dance Choreography#
Learning to write complex characters is like learning dance choreography. There’s a specific sequence that experienced practitioners follow - you could technically reach the same final position by moving randomly, but:
- The standard sequence flows naturally
- It’s easier to perform at speed once memorized
- Everyone learning the same sequence can practice together
- Teachers can spot when you’re doing it wrong and correct you early
Just as dancers learn “step-ball-change” as a unit, Chinese writers learn common stroke patterns like “horizontal-then-vertical” or “left-falling-then-right-falling.”
Following a Recipe#
Think of stroke order like following a recipe’s sequence: you could add ingredients in any order and still end up with something edible, but the standard order exists because it:
- Makes the process more efficient
- Produces more consistent results
- Makes it easier to troubleshoot when something goes wrong
- Allows you to learn transferable techniques
Just as you learn “mise en place” (prep everything first) in cooking, you learn “outside-then-inside” in character writing.
Assembly Instructions#
Like assembling furniture, there’s often a “right” order that makes the process easier. You could attach parts in any sequence, but the manual’s sequence:
- Prevents you from getting stuck (painted into a corner)
- Makes the structure stable during assembly
- Follows a logical progression that experienced builders recognize
Stroke order follows similar principles - certain sequences make the character easier to balance while writing and create natural momentum for the next stroke.
When You Need This#
You NEED stroke order data when:#
Building Educational Features:
- Adding writing practice to a language learning app
- Creating interactive worksheets or exercises
- Building a handwriting evaluation system
- Designing a character lookup tool for learners
Interactive Content:
- Animated character demonstrations in digital textbooks
- Step-by-step writing tutorials
- Gamified learning experiences with stroke-by-stroke feedback
- Calligraphy practice applications
Assessment Tools:
- Evaluating whether learners are writing correctly
- Providing real-time feedback during practice
- Generating progress reports on writing accuracy
You DON’T need stroke order data when:#
- Display-only applications (dictionaries that just show finished characters)
- Reading comprehension tools (no writing practice involved)
- Typography/font rendering (fonts handle display automatically)
- Speech-focused learning (listening and speaking only)
- Input methods (typing Chinese on a keyboard)
Decision Criteria:#
Ask yourself: “Will users be learning to write or evaluating their writing?” If yes, you need stroke order data. If users only need to recognize or type characters, you probably don’t.
Trade-offs#
Data Source Complexity Spectrum#
Low Complexity → High Flexibility
Option 1: Pre-built Library (Hanzi Writer)
- ✅ Fastest integration (< 1 day)
- ✅ Battle-tested and actively maintained
- ✅ Handles animation and rendering automatically
- ❌ Less control over appearance and behavior
- ❌ Web-focused (mobile requires additional work)
- Use when: You want to ship quickly with standard features
Option 2: Raw SVG Data (Make Me a Hanzi, KanjiVG)
- ✅ Complete control over rendering and animation
- ✅ Use on any platform (web, mobile, desktop)
- ✅ Customize appearance and interaction patterns
- ❌ Requires building animation system yourself
- ❌ More initial development time (1-2 weeks)
- Use when: You need custom behavior or non-web platforms
Option 3: Stroke Count/Metadata Only (CCDB API, Unihan)
- ✅ Lightweight data (just numbers and metadata)
- ✅ Fast lookups for reference purposes
- ✅ Minimal integration effort
- ❌ No visual demonstration of sequence
- ❌ Cannot show animated writing
- Use when: You only need stroke counts for sorting/searching
Build vs. Integrate#
Integrate Existing Data (Recommended):
- Pros: Data already verified by language experts, covers thousands of characters, maintained by community
- Cons: Must accept existing coverage (some rare characters missing)
- Timeline: MVP in < 1 week
Build Your Own Dataset:
- Pros: Complete control over coverage and accuracy
- Cons: Requires language expertise, time-intensive (months), error-prone without validation
- Timeline: 3-6 months for basic coverage
- Reality check: Only consider this if you have native-level expertise in the target language AND need characters not covered by existing datasets
Verdict: Unless you’re a language institute with specific research needs, integrate existing open-source data. The community datasets are production-ready and cover 99%+ of use cases.
Implementation Reality#
Realistic Timeline Expectations#
Week 1: Research and Setup
- Evaluate data sources for your needs
- Verify licensing compatibility with your project
- Set up development environment
- Download and test datasets locally
Weeks 2-3: Core Development
- Integrate stroke order library or build rendering system
- Create basic practice interface
- Implement character lookup and display
- Build minimal progress tracking
Weeks 4-6: Polish and Testing
- Add animations and visual feedback
- Test across devices and browsers
- Create learning content and exercises
- Beta test with real learners
Reality Check: Getting a basic demo working takes days. Building a polished, learner-friendly experience takes weeks. Creating comprehensive curriculum content (selecting which characters to teach, in what order, with what exercises) takes months.
Team Skill Requirements#
Minimum Team:
- 1 frontend developer (React/Vue/Angular or mobile native)
- 1 backend developer (if building API, otherwise optional)
- 1 content creator/language expert (part-time) for curriculum
Skills Needed:
- Frontend: Working with SVG, animations (CSS/Canvas/WebGL)
- Backend: REST APIs, database queries (if building lookup service)
- Language: Understanding of target language writing system (or access to expert reviewer)
Can One Person Do This?: Yes, for an MVP. A full-stack developer with basic knowledge of the target language can build a functional prototype. However, quality content creation and cultural accuracy require native-level expertise.
Common Pitfalls#
Underestimating Mobile Performance:
- SVG animations can be sluggish on older devices
- Solution: Test early on low-end hardware, optimize rendering, consider Canvas instead of SVG for complex animations
Assuming All Characters Are Available:
- Even comprehensive datasets have gaps (rare variants, historical forms)
- Solution: Check coverage for YOUR specific character set early, have a fallback display for missing data
Ignoring Regional Variations:
- Simplified vs. Traditional Chinese have different forms
- Japanese kanji may differ from Chinese equivalents
- Solution: Clearly define your target writing system upfront
Overlooking Licensing:
- Some datasets have share-alike requirements (CC BY-SA)
- Solution: Review licenses in Phase 1, ensure compliance with attribution requirements
First 90 Days: What to Expect#
Days 1-30: Building
- You’ll have a working prototype that can display and animate characters
- Expect excitement as you see characters come to life
- Also expect frustration with edge cases and rendering quirks
Days 31-60: Testing
- Beta testers will find bugs you never imagined
- You’ll realize content creation (writing exercises, learning paths) is more work than the tech
- Performance issues on real-world devices will surface
Days 61-90: Refining
- You’ll iterate based on user feedback
- The tech will feel stable, but content creation will feel endless
- Marketing and user acquisition will become the bottleneck
Key Insight: The technical challenge of stroke order visualization is solved (libraries exist, data is available, integration is straightforward). The real work is creating engaging educational content and building a user base. Budget your time accordingly - 20% tech, 80% content and marketing.
Summary for Decision Makers#
The Data Exists: Open-source stroke order datasets cover 9,000+ Chinese characters and full Japanese kanji coverage, all with permissive licenses suitable for commercial use.
The Tools Are Ready: Libraries like Hanzi Writer make integration straightforward for web applications. Raw SVG data is available for custom implementations.
The Challenge Is Execution: Technology is not the bottleneck. Success depends on:
- Creating effective learning content
- Designing an engaging user experience
- Acquiring and retaining learners
Time to First Demo: < 1 week for basic web implementation using existing libraries.
Time to Production: 6-8 weeks for a polished MVP with core features and initial content.
Skills Required: Frontend developer + language expert (or consultant) for content validation.
Cost: Primarily developer time and content creation - all stroke order data is free and open-source. No API costs or licensing fees for the core data.
Document Status: Complete
Related Documents: See 01-discovery/ for detailed technical resources and implementation guides
S1: Rapid Discovery
Stroke Order Resources: Quick Reference#
Research ID: research-k6iy Date: 2026-01-29 Pass: S1 (Rapid Discovery)
TL;DR#
Need stroke order data for CJK characters? Start here:
| Resource | Language | Coverage | Best For |
|---|---|---|---|
| Hanzi Writer | Chinese | 9,000+ | Web apps (easiest) |
| Make Me a Hanzi | Chinese | 9,000+ | Custom implementations |
| KanjiVG | Japanese | Kanji | Production-ready SVGs |
| animCJK | CJK (all) | 7,672+ | Multi-language apps |
| CCDB API | Chinese | 20,902 | Stroke count lookups |
Quick Start#
Web App (5 minutes):
import HanziWriter from 'hanzi-writer';
const writer = HanziWriter.create('div-id', '你', {
width: 100, height: 100
});
writer.animateCharacter();Stroke Count Lookup:
- API:
http://ccdb.hemiola.com/characters/unicode/{codepoint} - Python:
pip install cjklib - Database: ChineseStrokes (81,000 characters)
Licensing Quick Check#
| Resource | License | Commercial OK? |
|---|---|---|
| Hanzi Writer | MIT | ✅ Yes |
| KanjiVG | CC BY-SA 3.0 | ✅ Yes (with attribution) |
| Make Me a Hanzi | Mixed | ⚠️ Check repo |
| animCJK | Open-source | ⚠️ Verify license |
| cjklib | LGPL | ✅ Yes |
Next Steps#
- For web apps: Start with Hanzi Writer (easiest integration)
- For custom needs: Use Make Me a Hanzi or KanjiVG SVGs directly
- For stroke counts: CCDB API or cjklib
- For deep dive: See S2-comprehensive for full catalog
Key Files Location#
- S2-comprehensive/: Full catalog of all data sources
- S3-need-driven/: Implementation guides and use cases
- S4-strategic/: Implementation roadmap and metrics
S2: Comprehensive
Stroke Order Data Sources: Comprehensive Catalog#
Research ID: research-k6iy Date: 2026-01-29 Pass: S2 (Comprehensive Coverage) Purpose: Complete catalog of SVG stroke order data, stroke count databases, and animated dictionary resources for CJK characters
1. SVG Stroke Order Data Sources#
1.1 Make Me a Hanzi (Chinese Characters)#
Repository: skishore/makemeahanzi Website: makemeahanzi Coverage: 9,000+ most common simplified and traditional Chinese characters License: Multiple (see repository for details)
Key Features:
- Stroke-order vector graphics for all characters
- Dictionary data (definitions, pinyin)
- Graphical data (stroke decomposition)
- Experimental animated SVGs (svgs.tar.gz)
- SVGs named by Unicode codepoint
Data Format:
dictionary.txt- character definitions, pronunciationsgraphics.txt- stroke order and decomposition datasvgs.tar.gz- pre-rendered animated SVG files
Use Cases:
- Foundation for building stroke order animation systems
- Reference for stroke decomposition algorithms
- Educational apps requiring accurate stroke order
1.2 KanjiVG (Japanese Kanji)#
Repository: KanjiVG/kanjivg Website: kanjivg.tagaini.net Coverage: Japanese kanji characters License: Creative Commons Attribution-Share Alike 3.0
Key Features:
- SVG file for each character with stroke shape and direction
- Stroke order information
- Component metadata (radicals, stroke types)
- Variant forms included
- Widely adopted (used by Duolingo, many dictionary sites)
Distribution:
- Zip file with all non-variant SVG files
- Individual files in repository
- Vector graphics suitable for scaling
Notable Users:
- Duolingo (language learning platform)
- Multiple Japanese dictionary websites
- Educational apps for kanji learning
1.3 HanziVG (Chinese Hanzi)#
Repository: Connum/hanzivg Goal: Become for Chinese what KanjiVG is for Japanese Coverage: Traditional and Simplified Chinese characters
Key Features:
- SVG stroke order files with metadata
- Radical information
- Character component decomposition
- Modeled after KanjiVG structure
Status: Active development, growing coverage
1.4 animCJK (Multi-Language)#
Repository: parsimonhi/animCJK Coverage: Chinese, Japanese (Kanji + Kana), Korean (Hanja) Total Characters: 7,672+ in Chinese simplified folder
Key Features:
- Animated stroke order using SVG
- Free and open-source
- Multi-language support (CJK)
- Organized by language:
svgsZhHans/- Simplified Chinese (7,000 common + uncommon)- Traditional Chinese variants
- Japanese Kanji and Kana
- Korean Hanja
- Basic strokes and components
Use Cases:
- Universal CJK character applications
- Cross-language learning platforms
- Comparative stroke order analysis
1.5 Hanzi Writer (JavaScript Library + Data)#
Repository: chanind/hanzi-writer Website: hanziwriter.org Data Explorer: chanind.github.io/hanzi-writer-data Type: JavaScript library with accompanying SVG data
Key Features:
- Free and open-source library for stroke order animations
- Based on Make Me a Hanzi data
- HTML5 + SVG rendering
- Stroke order practice quizzes
- Embeddable in web applications
- Character data in separate repository
Technical Stack:
- JavaScript/TypeScript
- SVG rendering
- No backend required
Use Cases:
- Web-based character writing practice
- Interactive quizzes
- Browser-based learning applications
2. Online Animated Dictionaries#
2.1 strokeorder.info#
URL: strokeorder.info Format: Animated GIFs Coverage: 4,000+ characters
Features:
- Pre-rendered animated GIFs
- Instant playback (no JavaScript required)
- Easy to embed in static sites
2.2 strokeorder.com#
URL: strokeorder.com
Features:
- Type-to-animate interface
- Automatic playback on character entry
- Interactive stroke order display
2.3 Chinese Character Web API#
URL: ccdb.hemiola.com Type: RESTful API Data Source: Unihan Database (MySQL + PHP)
Key Features:
- 20,902 characters (CJK Unified Ideographs range)
- Stroke count information
- Radical lookup (kRSKangXi field)
- Programmatic access
Use Cases:
- Backend for dictionary apps
- Automated stroke count lookup
- Character metadata retrieval
3. Stroke Count Databases#
3.1 Chinese Character Stroke Count Resources#
GitHub Repository: caiguanhao/ChineseStrokes Coverage: 81,000+ Chinese characters Purpose: Sort characters by stroke count
Key Features:
- Comprehensive stroke count data
- Suitable for dictionary lookup systems
- Enables stroke-based search
Use Cases:
- Implement radical/stroke lookup in dictionaries
- Sort characters by complexity
- Character learning progression systems
3.2 Unihan Database (kTotalStrokes)#
Source: Unicode Consortium
Coverage: 101,996 CJK unified ideographs (as of Unicode 17.0)
Field: kTotalStrokes
Note: Some errors exist in the data. Cross-reference recommended.
Access Methods:
- Direct download from Unicode.org
- Via libraries (cjklib, Python)
- Through APIs (CCDB)
3.3 cjklib (Python Library)#
PyPI: cjklib Documentation: cjklib.readthedocs.io
Key Features:
- Language routines for Han characters (Chinese, Japanese, Korean, Vietnamese)
- Character pronunciations
- Radical information
- Glyph component analysis
- Stroke decomposition
- Variant information
- Locale-aware stroke counts (simplified vs. traditional)
Important: Stroke counts can vary by locale (traditional vs. simplified Chinese)
Use Cases:
- Building Python-based dictionary tools
- Linguistic analysis
- Character decomposition systems
3.4 KRADFILE/RADKFILE (Kanji Radical Decomposition)#
Maintainer: Electronic Dictionary Research and Development Group (EDRDG) Website: edrdg.org/krad/kradinf.html License: EDRDG License Coverage: 6,355+ kanji (JIS X 0208-1997) + 5,801 (JIS X 0212)
Key Features:
- Kanji decomposition into visual elements/radicals
- Enables radical-based lookup
- KRADFILE: Kanji → Radicals mapping
- RADKFILE: Radicals → Kanji mapping (inverted, used by lookup software)
Historical Context:
- Initial work by Michael Raine (1994/1995)
- Revised by Jim Breen (1995)
- Extended by Jim Rose (2007)
Use Cases:
- Implement radical-based kanji search
- Component-based learning systems
- Dictionary lookup by visual elements
4. Reference Data#
4.1 Frequency and Stroke Count Tables#
Resource: technology.chtsai.org/charfreq
Available Data:
- Characters sorted by frequency
- Stroke counts for common characters
- Statistical analysis
4.2 Wiktionary Appendix#
Resource: Wiktionary - Chinese total strokes
Features:
- Community-maintained stroke count data
- Free to use
- Multiple character variants
References#
Primary Sources#
- Make Me a Hanzi - Chinese stroke order SVG data
- KanjiVG - Japanese kanji stroke order
- HanziVG - Chinese hanzi stroke order
- animCJK - Multi-language CJK animations
- Hanzi Writer - JavaScript library and data
APIs and Libraries#
- Chinese Character Web API - Unihan-based API
- cjklib - Python library for CJK processing
- ChineseStrokes - Stroke count database
Reference Databases#
- KRADFILE/RADKFILE - Kanji radical decomposition
- Frequency and Stroke Counts - Statistical data
- Wiktionary - Chinese total strokes - Community data
Online Tools#
- strokeorder.info - Animated GIF dictionary
- strokeorder.com - Interactive stroke order
- Hanzi Writer Data Explorer - Browse character data
Document Status: Complete Last Updated: 2026-01-29
S3: Need-Driven
Stroke Order Implementation Guide#
Research ID: research-k6iy Date: 2026-01-29 Pass: S3 (Need-Driven Application) Purpose: Practical guidance for implementing stroke order features in educational platforms
1. Implementation Considerations#
1.1 Licensing#
Open Licenses:
- Make Me a Hanzi: Mixed licenses (check repository)
- KanjiVG: CC BY-SA 3.0 (attribution + share-alike)
- animCJK: Open-source (verify specific license)
- KRADFILE: EDRDG License (check restrictions)
Action Items:
- Review license terms before commercial use
- Provide proper attribution
- Comply with share-alike requirements where applicable
1.2 Data Formats#
SVG (Recommended for stroke order):
- Scalable without quality loss
- Embeddable in web/mobile apps
- Supports animation paths
- Lightweight
JSON (Recommended for metadata):
- Easy to parse
- Works with all modern platforms
- Suitable for APIs
GIF (Legacy, limited use):
- Pre-rendered animations
- No customization
- Larger file sizes
1.3 Technical Integration#
For Web Applications:
// Example: Hanzi Writer
import HanziWriter from 'hanzi-writer';
const writer = HanziWriter.create('character-target-div', '你', {
width: 100,
height: 100,
padding: 5
});
writer.animateCharacter();For Mobile Applications:
- Embed SVG files directly
- Use native SVG rendering libraries
- Pre-cache common characters for offline use
For Backend Systems:
- cjklib (Python) for character analysis
- Chinese Character Web API for stroke counts
- PostgreSQL with Unihan data for lookups
1.4 Performance Optimization#
Strategies:
- Lazy Loading: Load stroke data only when character is displayed
- Caching: Pre-cache common characters (top 3,000)
- CDN: Serve SVG files from CDN for faster delivery
- Progressive Enhancement: Show static character first, animate on interaction
Estimated Data Sizes:
- Per-character SVG: 2-10 KB
- 1,000 characters: 2-10 MB
- Full dataset (9,000+): 18-90 MB
2. Use Cases for Learning Applications#
2.1 Stroke Order Practice#
Features:
- Display stroke-by-stroke animation
- User traces character with finger/stylus
- Real-time validation of stroke direction and order
- Feedback on accuracy
Data Required:
- SVG stroke paths (from Make Me a Hanzi or KanjiVG)
- Stroke sequence metadata
- Direction vectors
2.2 Dictionary Lookup by Stroke Count#
Features:
- Filter characters by total stroke count
- Combine with radical lookup
- Progressive narrowing (radical + stroke count)
Data Required:
- Stroke count database (Unihan or ChineseStrokes)
- Radical decomposition (KRADFILE)
Example Lookup:
User: "Radical 水 (water) + 7 strokes"
Result: 汰, 汲, 汴, 汾 (candidates)2.3 Handwriting Recognition Training#
Features:
- Collect user stroke data
- Train ML models for character recognition
- Validate correct stroke order
Data Required:
- Labeled stroke order sequences
- Variant forms (different handwriting styles)
- Stroke direction and timing
2.4 Gamified Learning#
Features:
- “Draw the character” challenges
- Timed stroke order races
- Achievement badges for stroke accuracy
Engagement Mechanics:
- Progress tracking (characters mastered)
- Leaderboards (speed + accuracy)
- Unlock levels based on stroke complexity
2.5 Adaptive Learning Paths#
Features:
- Start with simple characters (few strokes)
- Progress to complex characters
- Focus on commonly confused characters
Data-Driven Approach:
- Sort characters by stroke count (ascending)
- Track user errors (confusion matrix)
- Recommend practice based on weak points
3. Integration with Educational Platforms#
3.1 Docusaurus Integration#
Approach:
- Create MDX components for stroke order display
- Embed Hanzi Writer or animCJK SVGs
- Add interactive quizzes
Example MDX:
import StrokeOrder from '@site/src/components/StrokeOrder';
<StrokeOrder character="学" />3.2 QRCards Certificate Integration#
Certificate Fields:
{
"certification_info": {
"type": "competency_badge",
"name": "Hanzi Writing Fundamentals",
"issued_date": "2026-XX-XX",
"level": 1
},
"skills": {
"characters_mastered": 500,
"stroke_accuracy": "95%",
"writing_speed": "15 chars/min"
},
"portfolio_evidence": [
{
"name": "Stroke Order Video",
"url": "example.com/demo"
}
]
}3.3 Learning Path Design#
Beginner Path (8 weeks):
- Week 1-2: Basic strokes (8 types)
- Week 3-4: Simple characters (1-4 strokes)
- Week 5-6: Radicals (214 traditional)
- Week 7-8: Common characters (200 most frequent)
Intermediate Path (12 weeks):
- Compound characters (5-12 strokes)
- Stroke order rules and exceptions
- Handwriting speed optimization
- Character variants (simplified vs. traditional)
Advanced Path (16 weeks):
- Complex characters (13+ strokes)
- Calligraphy styles (kaishu, xingshu)
- Historical forms
- Error correction (common mistakes)
4. Recommended Tech Stack#
4.1 For Web-Based Learning Apps#
Frontend:
- React/Next.js for UI
- Hanzi Writer for character animations
- SVG.js for custom stroke rendering
Backend:
- Node.js API for character data
- PostgreSQL with Unihan data
- Redis for caching common characters
Data Storage:
- CDN for SVG files (Cloudflare)
- JSON API for metadata
- User progress in database
4.2 For Mobile Apps#
iOS:
- SwiftUI for UI
- Core Graphics for SVG rendering
- Local SQLite database with stroke data
Android:
- Jetpack Compose for UI
- AndroidX SVG libraries
- Room database for offline data
Cross-Platform:
- React Native + react-native-svg
- Flutter + flutter_svg
5. Example Implementations#
5.1 Web Component (React)#
import React, { useEffect, useRef } from 'react';
import HanziWriter from 'hanzi-writer';
const StrokeOrderDisplay = ({ character }) => {
const targetRef = useRef(null);
const writerRef = useRef(null);
useEffect(() => {
if (targetRef.current) {
writerRef.current = HanziWriter.create(targetRef.current, character, {
width: 200,
height: 200,
padding: 10,
showOutline: true,
strokeAnimationSpeed: 1,
delayBetweenStrokes: 300
});
}
return () => {
if (writerRef.current) {
writerRef.current = null;
}
};
}, [character]);
const handleAnimate = () => {
writerRef.current?.animateCharacter();
};
const handleQuiz = () => {
writerRef.current?.quiz();
};
return (
<div>
<div ref={targetRef} />
<button onClick={handleAnimate}>Animate</button>
<button onClick={handleQuiz}>Practice</button>
</div>
);
};
export default StrokeOrderDisplay;5.2 Backend API (Node.js + Express)#
const express = require('express');
const { Pool } = require('pg');
const app = express();
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
// Get stroke count for a character
app.get('/api/strokes/:character', async (req, res) => {
const { character } = req.params;
const codepoint = character.codePointAt(0).toString(16).toUpperCase();
const result = await pool.query(
'SELECT stroke_count, radical FROM unihan WHERE codepoint = $1',
[codepoint]
);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Character not found' });
}
res.json(result.rows[0]);
});
// Search characters by stroke count
app.get('/api/search/strokes/:count', async (req, res) => {
const { count } = req.params;
const result = await pool.query(
'SELECT codepoint, character FROM unihan WHERE stroke_count = $1 LIMIT 100',
[parseInt(count)]
);
res.json(result.rows);
});
app.listen(3000, () => {
console.log('API running on port 3000');
});5.3 Python Stroke Analysis#
from cjklib import characterlookup
cjk = characterlookup.CharacterLookup('C') # 'C' for Chinese
# Get stroke count
character = '学'
stroke_count = cjk.getStrokeCount(character)
print(f"Stroke count for {character}: {stroke_count}")
# Get radicals
radicals = cjk.getCharacterRadicalResidualStrokeCount(character)
print(f"Radicals: {radicals}")
# Find characters by stroke count
chars_with_5_strokes = cjk.getCharactersForStrokeCount(5)
print(f"Characters with 5 strokes: {chars_with_5_strokes[:10]}")6. Testing and Validation#
6.1 Data Quality Checks#
Validation Steps:
- Verify stroke count matches across data sources
- Check SVG files render correctly
- Validate stroke order follows standard conventions
- Test on different screen sizes
Automated Testing:
describe('Stroke Order Data', () => {
test('SVG files exist for common characters', async () => {
const commonChars = ['的', '一', '是', '不', '了'];
for (const char of commonChars) {
const svg = await loadCharacterSVG(char);
expect(svg).toBeDefined();
expect(svg).toContain('<path');
}
});
test('Stroke counts match database', async () => {
const testCases = [
{ char: '一', expectedStrokes: 1 },
{ char: '二', expectedStrokes: 2 },
{ char: '三', expectedStrokes: 3 }
];
for (const { char, expectedStrokes } of testCases) {
const count = await getStrokeCount(char);
expect(count).toBe(expectedStrokes);
}
});
});6.2 User Experience Testing#
Test Scenarios:
- Stroke animation speed (too fast/slow?)
- Touch responsiveness on mobile
- Accuracy threshold for practice mode
- Feedback clarity (correct/incorrect strokes)
Metrics to Track:
- Animation load time
- Practice completion rate
- User accuracy over time
- Session engagement duration
7. Deployment Checklist#
7.1 Data Preparation#
- Download required datasets (Make Me a Hanzi, KanjiVG, etc.)
- Process SVG files for CDN delivery
- Set up database with Unihan data
- Create character metadata JSON files
- Implement caching strategy
7.2 Infrastructure#
- Set up CDN for SVG files
- Configure API endpoints
- Set up Redis for caching
- Configure database backups
- Set up monitoring and logging
7.3 Integration#
- Test Hanzi Writer integration
- Verify mobile responsiveness
- Test offline functionality
- Validate cross-browser compatibility
- Test performance under load
7.4 Content#
- Create learning path content
- Write exercise instructions
- Prepare quiz questions
- Create tutorial videos (optional)
- Design achievement badges
Document Status: Complete Last Updated: 2026-01-29 Related: See S2-comprehensive for data sources, S4-strategic for roadmap
S4: Strategic
Stroke Order Implementation: Strategic Roadmap#
Research ID: research-k6iy Date: 2026-01-29 Pass: S4 (Strategic Planning) Purpose: High-level implementation strategy, research gaps, success metrics, and recommendations
1. Research Gaps and Future Work#
1.1 Missing Coverage#
Gaps:
- Korean Hangul stroke order (limited resources)
- Vietnamese Chu Nom characters
- Historical Chinese variants
- Regional variations in stroke order
Opportunities:
- Crowdsource additional data
- Partner with language institutes
- Expand animCJK coverage
1.2 Quality Improvements#
Needed:
- Error correction in Unihan stroke counts
- Standardization across datasets
- Variant form mapping (simplified ↔ traditional)
- Handwriting style variations
1.3 AI/ML Applications#
Potential:
- Stroke prediction models (next stroke suggestion)
- Handwriting style transfer
- Automated stroke order generation for rare characters
- Personalized difficulty adaptation
2. Implementation Roadmap#
Phase 1: Data Acquisition (Week 1)#
Objectives:
- Secure all required datasets
- Verify licensing compatibility
- Set up local development environment
Tasks:
- Download Make Me a Hanzi dataset
- Clone KanjiVG repository
- Set up local mirror of CCDB API
- Download ChineseStrokes database
- Review license terms for commercial use
Deliverables:
- Local data repository
- License compliance documentation
- Data inventory spreadsheet
Phase 2: Infrastructure Setup (Week 2)#
Objectives:
- Build backend infrastructure
- Set up data pipelines
- Configure hosting and CDN
Tasks:
- Set up PostgreSQL with Unihan data
- Create CDN bucket for SVG files
- Build REST API for character lookup
- Implement caching layer (Redis)
- Configure monitoring and logging
Deliverables:
- API endpoints (stroke count, character lookup)
- CDN with SVG files
- Database with metadata
- Performance monitoring dashboard
Phase 3: Frontend Development (Week 3-4)#
Objectives:
- Build user-facing components
- Implement interactive features
- Ensure mobile responsiveness
Tasks:
- Create Hanzi Writer integration
- Build stroke order visualization component
- Implement practice mode with validation
- Add progress tracking
- Design responsive layouts
- Test cross-browser compatibility
Deliverables:
- React components for stroke order display
- Practice mode with scoring
- Mobile-optimized interface
- User progress tracking system
Phase 4: Content Creation (Week 5-6)#
Objectives:
- Develop learning curriculum
- Create exercises and assessments
- Prepare supporting materials
Tasks:
- Design learning path curriculum
- Write exercises and quizzes
- Create video tutorials (optional)
- Develop grading rubrics
- Design achievement badges
- Write instructional content
Deliverables:
- Structured learning paths (Beginner, Intermediate, Advanced)
- 50+ practice exercises
- Quiz bank (100+ questions)
- Achievement system
- Tutorial videos (if included)
Phase 5: Testing & Launch (Week 7-8)#
Objectives:
- Validate functionality
- Optimize performance
- Launch pilot program
Tasks:
- Beta test with 10 learners
- Collect feedback on UX
- Optimize performance
- Launch pilot learning path
- Monitor initial usage metrics
- Iterate based on feedback
Deliverables:
- Beta test report
- Performance optimization results
- Launch-ready platform
- Initial user feedback summary
3. Success Metrics#
3.1 Engagement Metrics#
Daily Active Users (DAU):
- Target: 50+ users within first month
- Growth rate: 20% month-over-month
Characters Practiced per Session:
- Target: 10-20 characters
- Indicator of engagement depth
Session Duration:
- Target: 15+ minutes average
- Indicates meaningful practice time
Return Rate:
- Target: 40%+ weekly return rate
- Measures habit formation
3.2 Learning Outcome Metrics#
Stroke Accuracy Improvement:
- Baseline: Initial assessment score
- Target: 20%+ improvement after 4 weeks
- Measure: Automated scoring of practice exercises
Character Retention Rate:
- 1 week retention: 70%+ (characters practiced still remembered)
- 1 month retention: 50%+ (long-term memory formation)
- Measure: Periodic review quizzes
Writing Speed Increase:
- Baseline: Characters per minute at start
- Target: 30%+ improvement after 8 weeks
- Measure: Timed writing exercises
Mastery Progression:
- Beginner (1-4 strokes): 80%+ accuracy within 2 weeks
- Intermediate (5-12 strokes): 80%+ accuracy within 6 weeks
- Advanced (13+ strokes): 70%+ accuracy within 12 weeks
3.3 Business Metrics#
Learning Path Completion Rate:
- Target: 50%+ completion for enrolled users
- Industry benchmark: 30-40% for online courses
- Indicates content quality and engagement
Certificate Issuance Volume:
- Target: 20+ certificates in first quarter
- Demonstrates skill achievement
- Marketing value (user testimonials)
User Satisfaction (NPS Score):
- Target: NPS > 40 (good)
- Stretch goal: NPS > 70 (excellent)
- Measure: Post-learning path survey
Cost per Acquisition (CPA):
- Baseline: Track marketing spend
- Target: CPA < $10 for free tier users
- Measure: Marketing spend / new users
Lifetime Value (LTV):
- For paid tiers (if applicable)
- Target: LTV > 3x CPA
- Measure: Average revenue per user over 12 months
3.4 Technical Performance Metrics#
Page Load Time:
- Target: < 2 seconds
- Critical for user experience
API Response Time:
- Stroke count lookup: < 100ms
- Character metadata: < 200ms
CDN Cache Hit Rate:
- Target: > 95% for SVG files
- Reduces bandwidth costs
Error Rate:
- Target: < 0.1% of requests
- Monitoring critical for reliability
4. Risk Assessment and Mitigation#
4.1 Technical Risks#
Risk: Data quality issues (incorrect stroke orders)
- Impact: Medium (user confusion, learning incorrect forms)
- Probability: Low (using established datasets)
- Mitigation: Cross-reference multiple sources, community validation
Risk: Performance issues at scale
- Impact: High (poor user experience)
- Probability: Medium (depends on infrastructure)
- Mitigation: Load testing, CDN optimization, caching strategy
Risk: Mobile compatibility issues
- Impact: High (majority of language learners use mobile)
- Probability: Low (tested during development)
- Mitigation: Responsive design, device testing matrix
4.2 Business Risks#
Risk: Low user adoption
- Impact: High (project viability)
- Probability: Medium (depends on marketing)
- Mitigation: Beta testing, user feedback loops, marketing strategy
Risk: Licensing issues with data sources
- Impact: High (legal liability)
- Probability: Low (verified during Phase 1)
- Mitigation: Legal review, proper attribution, license compliance
Risk: Competition from established platforms
- Impact: Medium (market share)
- Probability: High (Duolingo, Pleco, etc. exist)
- Mitigation: Differentiation strategy, unique features, niche targeting
4.3 Operational Risks#
Risk: Content creation bottleneck
- Impact: Medium (delays launch)
- Probability: Medium (resource-intensive)
- Mitigation: Prioritize core content, phase additional content
Risk: Maintenance burden for data updates
- Impact: Low (gradual degradation)
- Probability: Medium (Unicode updates, new characters)
- Mitigation: Automated data refresh scripts, community contributions
5. Strategic Recommendations#
5.1 Recommended Starting Point#
Minimum Viable Product (MVP):
Web-first approach using Hanzi Writer
- Fastest time to market
- Lowest development cost
- Proven technology stack
Focus on Chinese characters initially
- Largest user base
- Best data availability (Make Me a Hanzi)
- Expand to Japanese/Korean later
Core features only:
- Stroke order animation
- Practice mode with basic validation
- Progress tracking (characters completed)
- Single learning path (Beginner)
Rationale: Validate product-market fit before investing in advanced features.
5.2 Differentiation Strategy#
How to Stand Out:
Integration with existing platforms
- Docusaurus plugin for documentation sites
- Embeddable widgets for blogs/tutorials
- API for third-party apps
Credential-focused
- Issue verifiable certificates (QRCards)
- Portfolio evidence (practice videos)
- LinkedIn-compatible badges
Adaptive learning
- Personalized difficulty adjustment
- Focus on user’s weak points
- Spaced repetition for retention
Community features
- Leaderboards (opt-in)
- Shared progress achievements
- Study groups / cohorts
5.3 Technology Choices#
Recommended Stack:
Frontend: Next.js + React
- Server-side rendering for SEO
- Fast page loads
- Large ecosystem
Stroke Animation: Hanzi Writer
- Battle-tested library
- Active development
- Good documentation
Backend: Node.js + Express + PostgreSQL
- JavaScript everywhere (full-stack)
- PostgreSQL for complex queries (stroke count + radical lookup)
- Redis for caching
Hosting: Vercel (frontend) + Railway (backend)
- Easy deployment
- Auto-scaling
- Good free tiers for MVP
CDN: Cloudflare
- Free tier sufficient for MVP
- Global distribution
- DDoS protection
5.4 Go-to-Market Strategy#
Phase 1: Beta Launch (Weeks 1-4)
- Recruit 10-20 beta testers
- Offer free lifetime access for feedback
- Iterate based on user input
Phase 2: Soft Launch (Weeks 5-8)
- Launch on Product Hunt, Hacker News
- Target language learning communities (Reddit, forums)
- Content marketing (blog posts, tutorials)
Phase 3: Growth (Weeks 9-16)
- SEO optimization for “Chinese stroke order” keywords
- Partnership with language schools/tutors
- Paid ads (Google, Facebook) if budget allows
Phase 4: Scale (Weeks 17+)
- Expand to Japanese and Korean
- Add advanced features (calligraphy styles, handwriting recognition)
- Enterprise sales to educational institutions
5.5 Monetization Options#
Freemium Model (Recommended):
- Free: Basic stroke order practice (200 characters)
- Paid ($5/month): Full character set, certificates, advanced features
One-Time Purchase:
- $29 for lifetime access to full content
- Appeals to serious learners
Enterprise Licensing:
- API access for third-party apps
- White-label for educational institutions
- Custom content for corporate training
6. Alternative Approaches#
6.1 If Limited Resources#
Approach: Start even smaller
- Use Hanzi Writer demo page as MVP
- Embed pre-existing tools (strokeorder.info)
- Focus on content curation, not tech development
- Validate demand before building custom platform
6.2 If Large Budget Available#
Approach: Build comprehensive platform from day one
- Mobile apps (iOS + Android) alongside web
- AI-powered handwriting recognition
- Live tutoring integration
- Gamification with 3D animations
- Multi-language from launch (Chinese + Japanese + Korean)
6.3 If Targeting Niche Audience#
Approach: Specialize deeply
- Focus on calligraphy enthusiasts (not general learners)
- Historical script variants (seal script, clerical script)
- Professional certification for Chinese teachers
- Premium pricing, boutique experience
7. Conclusion#
7.1 Key Takeaways#
Ecosystem is Mature: Open-source data for CJK stroke order is production-ready (Make Me a Hanzi, KanjiVG)
Low Barrier to Entry: Hanzi Writer library makes web integration straightforward (< 1 week MVP)
Market Validation: Existing platforms (Duolingo, Pleco) prove demand for stroke order features
Differentiation Possible: Credentials, integration, and adaptive learning offer competitive advantage
Execution Matters: Success depends more on product design and marketing than data availability
7.2 Recommended Next Steps#
Immediate (This Week):
- Select target language (Chinese recommended)
- Choose data source (Hanzi Writer for easiest start)
- Prototype stroke order component (1 day)
- Show to 3-5 potential users for feedback
Short-term (Weeks 2-4):
- Build MVP with core features only
- Beta test with 10 users
- Validate product-market fit
Medium-term (Months 2-3):
- Launch publicly
- Iterate based on usage data
- Expand content and features
Long-term (Months 4-12):
- Scale to additional languages
- Add advanced features (AI recognition, calligraphy)
- Explore monetization strategies
7.3 Critical Success Factors#
- User Experience: Stroke animation must be smooth and intuitive
- Content Quality: Learning paths must be well-structured and effective
- Performance: Fast load times critical for mobile learners
- Engagement: Gamification and progress tracking keep users coming back
- Differentiation: Clear value proposition vs. existing platforms
7.4 Final Recommendation#
Start with Hanzi Writer for web-based Chinese stroke order practice.
- Fastest path to MVP
- Proven technology
- Best data availability
- Largest potential user base
- Expandable to Japanese/Korean later
Once product-market fit is validated, invest in:
- Mobile apps
- Advanced features (AI recognition)
- Multi-language expansion
- Enterprise features
The data is ready. The tools exist. The market is proven. Success depends on execution.
Document Status: Complete Last Updated: 2026-01-29 Related: See S1-rapid for quick start, S2-comprehensive for data sources, S3-need-driven for implementation details