Skip to content

Commit b18d4b6

Browse files
Update README.md
1 parent 45c42ae commit b18d4b6

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Schema Matching by XGboost
2-
Using XGboost to perform schema matching task on tables. Support multi-language column names matching and can be used without column names.
2+
Using XGboost to perform schema matching task on tables. Support multi-language column names matching and can be used without column names. Both csv and json file type are supported.
33

44
## What is schema matching?
55

@@ -44,7 +44,7 @@ python cal_column_similarity.py -p Test\ Data/self -m model/2022-04-11-17-10-11
4444
python cal_column_similarity.py -p Test\ Data/authors -m model/2022-04-11-17-10-11 -t 0.9
4545
```
4646
Parameters:
47-
- -p: Path to test data folder, must contain "Table1.csv" and "Table2.csv"
47+
- -p: Path to test data folder, must contain **"Table1.csv" and "Table2.csv" or "Table1.json" and "Table2.json"**.
4848
- -m: Path to trained model folder, which must contain at least one pair of ".model" file and ".threshold" file.
4949
- -t: Threshold, you can use this parameter to specify threshold value, suggest 0.9 for easy matching(column name very similar). Default value is calculated from training data, which is around 0.15-0.2. This value is used for difficult matching(column name masked or very different).
5050
- -s: Strategy, there are two options: "one-to-one" and "one-to-many". "one-to-one" means that one column can only be matched to one column. "one-to-many" means that there is no restrictions. Default is "one-to-many".

0 commit comments

Comments
 (0)