pytsv package¶
Submodules¶
pytsv.configs module¶
configuration for this project
- class pytsv.configs.ConfigAggregateColumns[source]¶
Bases:
Config
Parameters to select which columns to aggregate
- aggregate_columns = []¶
- class pytsv.configs.ConfigBucketNumber[source]¶
Bases:
Config
Parameters to configure the bucket number for a histogram
- bucket_number = 10¶
- class pytsv.configs.ConfigCheckUnique[source]¶
Bases:
Config
Configure whether or not to check a column for uniqueness
- check_unique = True¶
- class pytsv.configs.ConfigColumn[source]¶
Bases:
Config
Parameters to select which column to work on
- column = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigColumns[source]¶
Bases:
Config
Parameters to select which columns to use
- columns = []¶
- class pytsv.configs.ConfigCsvToTsv[source]¶
Bases:
Config
Parameters to control the CSV to TSV conversion process
- check_num_fields = True¶
- replace_tabs_with_spaces = True¶
- set_max = True¶
- class pytsv.configs.ConfigFixTypes[source]¶
Bases:
Config
Parameters to control which fixes to apply to a TSV file.
- clean_edges = True¶
- lower_case = True¶
- remove_non_ascii = True¶
- sub_trailing = True¶
- class pytsv.configs.ConfigFloatingPoint[source]¶
Bases:
Config
Parameters to select whether to work with floating point or not
- floating_point = True¶
- class pytsv.configs.ConfigInputFile[source]¶
Bases:
Config
Parameters to specify input file
- input_file = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigInputFiles[source]¶
Bases:
Config
Parameters to specify input files
- input_files = []¶
- class pytsv.configs.ConfigJoin[source]¶
Bases:
Config
Parameters to configure a TSV join operation
- hash_file = <pytconf.param.Unique object>¶
- hash_key_column = <pytconf.param.Unique object>¶
- hash_value_column = <pytconf.param.Unique object>¶
- input_key_column = <pytconf.param.Unique object>¶
- output_add_unknown = False¶
- output_insert_column = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigMajority[source]¶
Bases:
Config
Config the parameters for the majority algorithm
- input_first_column = <pytconf.param.Unique object>¶
- input_multiplication_column = <pytconf.param.Unique object>¶
- input_second_column = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigMatchColumns[source]¶
Bases:
Config
Parameters to select which columns to match by
- match_columns = []¶
- class pytsv.configs.ConfigNumFields[source]¶
Bases:
Config
Parameter to config number of fields in a TSV file
- num_fields = None¶
- class pytsv.configs.ConfigOutputFile[source]¶
Bases:
Config
Parameters to configure the output file
- output_file = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigParallel[source]¶
Bases:
Config
Parameters to configure how thing should run in parallel
- jobs = 8¶
- parallel = False¶
- class pytsv.configs.ConfigPattern[source]¶
Bases:
Config
Parameters to configure patterns of files generated
- final_pattern = '{key}.tsv.gz'¶
- pattern = '{key}_{i:04d}.tsv.gz'¶
- class pytsv.configs.ConfigProgress[source]¶
Bases:
Config
Parameters to control progress reporting
- progress = True¶
- class pytsv.configs.ConfigReplace[source]¶
Bases:
Config
Configure whether you want replacements or not
- replace = False¶
- class pytsv.configs.ConfigSampleByColumnOld[source]¶
Bases:
Config
Parameters to configure the old sample by column algorithm
- hits_mode = False¶
- class pytsv.configs.ConfigSampleByTwoColumns[source]¶
Bases:
Config
Parameters for the sample by column command
- group_column = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigSampleColumn[source]¶
Bases:
Config
Configuration options for sampling
- sample_column = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigSampleSize[source]¶
Bases:
Config
Configure sample size
- size = <pytconf.param.Unique object>¶
- class pytsv.configs.ConfigTree[source]¶
Bases:
Config
Parameters to configure the parameters of a tree to show
- child_column = <pytconf.param.Unique object>¶
- parent_column = <pytconf.param.Unique object>¶
- roots = []¶
pytsv.core module¶
- class pytsv.core.TsvReader(filename: str, mode: str = 'rt', use_any_format: bool = True, validate_all_lines_same_number_of_fields: bool = True, num_fields: int | None = None, skip_comments: bool = False, check_non_ascii: bool = False, newline: str | None = '\n')[source]¶
Bases:
object
- class pytsv.core.TsvWriter(filename: str, mode: str = 'wt', throw_exceptions: bool = False, sanitize: bool = True, fields_to_clean: List[int] | None = None, clean_edges: bool = True, sub_trailing: bool = True, remove_non_ascii: bool = True, lower_case: bool = True, check_num_fields: bool = True, num_fields: int | None = None, convert_to_string: bool = True, do_gzip: bool = False, filename_detect: bool = True)[source]¶
Bases:
object
- pytsv.core.clean(text: str, clean_edges: bool = True, sub_trailing: bool = True, remove_non_ascii: bool = True, lower_case: bool = True) str [source]¶
- pytsv.core.do_aggregate(input_file_names: Iterable[str], match_columns: List[int], aggregate_columns: List[int], output_file_name: str, floating_point: bool) None [source]¶
This function aggregates a bunch of input files by integers. :param input_file_names: :param match_columns: :param aggregate_columns: :param output_file_name: :param floating_point: :return:
pytsv.main module¶
- class pytsv.main.JobInfo(check_not_ascii: bool, input_file: str, serial: int, progress: bool, pattern: str, columns: List[int])[source]¶
Bases:
object
- check_not_ascii: bool¶
- columns: List[int]¶
- input_file: str¶
- pattern: str¶
- progress: bool¶
- serial: int¶
- class pytsv.main.JobReturnValue(serial: int, files: Dict[str, str])[source]¶
Bases:
object
- files: Dict[str, str]¶
- serial: int¶
- class pytsv.main.MyEventTypes(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
- key_found = 1¶
- key_not_found = 0¶
- unknown_added = 2¶
- pytsv.main.check() None [source]¶
TODO: - add ability to say how many lines are bad and print their content
- pytsv.main.check_file(params_for_job: ParamsForJob) bool [source]¶
- pytsv.main.majority() None [source]¶
This means that if x1 appears more with y2 than any other values in column Y then x1, y2 will be in the output and no other entry with x1 will appear
- pytsv.main.process_single_file(job_info: JobInfo) JobReturnValue [source]¶
pytsv.static module¶
version which can be consumed from within the module