Skip to content

Commit 251b89f

Browse files
committed
Update dev documentation
1 parent 76068bd commit 251b89f

File tree

4 files changed

+54
-41
lines changed

4 files changed

+54
-41
lines changed

‎.npmignore

+3
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@ utils
1111
.eslintrc
1212
.prettierignore
1313
.prettierrc
14+
15+
tsconfig.json
16+
.gitattributes

‎Readme.md

+2-21
Original file line numberDiff line numberDiff line change
@@ -81,27 +81,8 @@ To summary in one sentence:
8181
8282
[More Benchmark Information](./docs/benchmark.md)
8383

84-
--
84+
---
8585

8686
## Developer
8787

88-
```sh
89-
# Install
90-
yarn
91-
92-
# Build
93-
yarn build
94-
95-
# Test
96-
yarn test
97-
98-
# Lint / Auto-fix code style problems
99-
yarn lint
100-
101-
# Optional, used to generate src/profiles/* data from language dataset
102-
# Warning: This step is time consuming and require to install big datasets (described in ./docs/dev.md)
103-
yarn train
104-
105-
# Optional, used to generate benchmark data/bench/*
106-
yarn bench
107-
```
88+
You want to **Contribute** or **Open a PR**, it's recommend to take a look [at the dev documentation](./docs/dev.md)

‎docs/dev.md

+48-18
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,60 @@
11
# Development
22

3-
## Setup
4-
5-
To be able to train the model
6-
7-
- Download the [Tatoeba sentence export](https://downloads.tatoeba.org/exports/sentences.tar.bz2)
8-
- Extract in `data/tatoeba.csv`
9-
10-
- Download the [UDHR](https://unicode.org/udhr/assemblies/udhr_txt.zip)
11-
- Extract in `data/udhr/`
12-
133
## Commands
144

155
```sh
16-
# install deps
6+
# Install
177
yarn
188

19-
# train and generate language profiles
20-
yarn train
21-
22-
# build the library
9+
# Build
2310
yarn build
2411

25-
# code style linting
12+
# Test
13+
yarn test
14+
15+
# Lint / Auto-fix code style problems
2616
yarn lint
17+
```
2718

28-
# test
29-
yarn test
19+
---
20+
21+
## Install issues
22+
23+
For the moment the library has lot of dev-dependencies purely for the benchmark process.
24+
Some of those libraries need to compile native code, which can be problematic (gcc, gyp, python, ...)
25+
26+
If you run into those issues, one of the easiest solution is to remove the problematic dependencies from `package.json` then try again to install.
27+
28+
[like here](https://github.com/komodojp/tinyld/issues/10#issuecomment-1019085476)
29+
30+
It will only cause issue with `yarn bench`, but everything else should still work normally
31+
32+
---
33+
34+
## Optional
35+
36+
### 1. Generate profiles (`yarn train`)
37+
38+
This step require lot of data and time, so it's optional and the result are store directly in git.
39+
40+
This will analyse lot fo text in different language and build statistics to be able to identify the best features for each language
41+
42+
To be able to train the model, you will need first to have the dataset locally
43+
44+
```
45+
Download Datasets
46+
- Download the [Tatoeba sentence export](https://downloads.tatoeba.org/exports/sentences.tar.bz2)
47+
- Extract in `data/tatoeba.csv`
48+
- Download the [UDHR](https://unicode.org/udhr/assemblies/udhr_txt.zip)
49+
- Extract in `data/udhr/`
50+
51+
Run yarn train
52+
- For each language, it will build statistics for words and n-grams
53+
- This goes through massive amount of data and will take time, prepare few coffee
54+
55+
When your profile files are generated, you can run `yarn build` and you will have a build with those new data
3056
```
57+
58+
### 2. Generate benchmark data (`yarn bench`)
59+
60+
This step require a bit of time, it will run lot of different test for a set of libraries to generate the benchmark page and diagrams.

‎package.json

+1-2
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,7 @@
6767
"test:unit": "uvu tests",
6868
"test:dependencies": "yarn audit --level high || echo \"Run 'yarn update' to interactively update dependencies for this project\"",
6969
"test:lint": "eslint --ext .js,.ts ./ && prettier --config .prettierrc --ignore-path .prettierignore --check \"**/*.{ts,js}\"",
70-
"test:types": "tsc --noEmit",
71-
"update": "yarn upgrade-interactive"
70+
"test:types": "tsc --noEmit"
7271
},
7372
"devDependencies": {
7473
"@types/node": "^16.4.13",

0 commit comments

Comments
 (0)