Auto-detect CSV configuration
Add "Autodetect" button to the CSV configuration section of the form. A Javascript function inspects the text in the csvText
text area and tries to automatically detect the
- separator char
- quote char
- escape char
- decimal mark
Sensible defaults to start from are tab
, "
, \
and .
.
Separator char
Simple version: try tab
, ,
,
, |
in this order and find for which candidate char the first 10 rows (or less) return the same number of columns when splitting the row.
Text analysis version: built a map of character occurrences for each of the first few lines. It is unlikely that [a-Z0-9]
would be used as separator, so those can be ignored. The character that has the most similar number of occurrences in all rows is most likely the separator.
Quote char
After selecting the separator, extract all column values of the first 10 rows.
Simple version: test if "
or '
appear as the first and last character.
Text analysis version: build the character occurrence map of first and last characters (they must must match) of all column values. It is again unlikely that [a-Z0-9]
would be used as the quote char. The character that appears most as the first and last character is likely the quote char.
Escape char
After selecting the separator and quote char, extract all column values of the first 10 rows that start and end with the quote char. Build an occurrence map of the character immediately before any occurrence of quote char within the column values.
Decimal mark
Note: decimal mark appears only once in a number.
Extract all column values of the first 10 rows that contain at least one digit. Build a character occurrence map of all non-digit characters, ignoring +
, -
and [a-Z]
. Decimal mark is the character that appears at most once in all column values that contain digits.