pytsv package¶

Submodules¶

pytsv.configs module¶

configuration for this project

class pytsv.configs.ConfigAggregateColumns[source]¶

Bases: Config

Parameters to select which columns to aggregate

aggregate_columns = []¶

class pytsv.configs.ConfigBucketNumber[source]¶

Bases: Config

Parameters to configure the bucket number for a histogram

bucket_number = 10¶

class pytsv.configs.ConfigCheckUnique[source]¶

Bases: Config

Configure whether or not to check a column for uniqueness

check_unique = True¶

class pytsv.configs.ConfigColumn[source]¶

Bases: Config

Parameters to select which column to work on

column = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigColumns[source]¶

Bases: Config

Parameters to select which columns to use

columns = []¶

class pytsv.configs.ConfigCsvToTsv[source]¶

Bases: Config

Parameters to control the CSV to TSV conversion process

check_num_fields = True¶

replace_tabs_with_spaces = True¶

set_max = True¶

class pytsv.configs.ConfigFixTypes[source]¶

Bases: Config

Parameters to control which fixes to apply to a TSV file.

clean_edges = True¶

lower_case = True¶

remove_non_ascii = True¶

sub_trailing = True¶

class pytsv.configs.ConfigFloatingPoint[source]¶

Bases: Config

Parameters to select whether to work with floating point or not

floating_point = True¶

class pytsv.configs.ConfigInputFile[source]¶

Bases: Config

Parameters to specify input file

input_file = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigInputFiles[source]¶

Bases: Config

Parameters to specify input files

input_files = []¶

class pytsv.configs.ConfigJoin[source]¶

Bases: Config

Parameters to configure a TSV join operation

hash_file = <pytconf.param.Unique object>¶

hash_key_column = <pytconf.param.Unique object>¶

hash_value_column = <pytconf.param.Unique object>¶

input_key_column = <pytconf.param.Unique object>¶

output_add_unknown = False¶

output_insert_column = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigMajority[source]¶

Bases: Config

Config the parameters for the majority algorithm

input_first_column = <pytconf.param.Unique object>¶

input_multiplication_column = <pytconf.param.Unique object>¶

input_second_column = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigMatchColumns[source]¶

Bases: Config

Parameters to select which columns to match by

match_columns = []¶

class pytsv.configs.ConfigNumFields[source]¶

Bases: Config

Parameter to config number of fields in a TSV file

num_fields = None¶

class pytsv.configs.ConfigOutputFile[source]¶

Bases: Config

Parameters to configure the output file

output_file = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigParallel[source]¶

Bases: Config

Parameters to configure how thing should run in parallel

jobs = 8¶

parallel = False¶

class pytsv.configs.ConfigPattern[source]¶

Bases: Config

Parameters to configure patterns of files generated

final_pattern = '{key}.tsv.gz'¶

pattern = '{key}_{i:04d}.tsv.gz'¶

class pytsv.configs.ConfigProgress[source]¶

Bases: Config

Parameters to control progress reporting

progress = True¶

class pytsv.configs.ConfigReplace[source]¶

Bases: Config

Configure whether you want replacements or not

replace = False¶

class pytsv.configs.ConfigSampleByColumnOld[source]¶

Bases: Config

Parameters to configure the old sample by column algorithm

hits_mode = False¶

class pytsv.configs.ConfigSampleByTwoColumns[source]¶

Bases: Config

Parameters for the sample by column command

group_column = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigSampleColumn[source]¶

Bases: Config

Configuration options for sampling

sample_column = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigSampleSize[source]¶

Bases: Config

Configure sample size

size = <pytconf.param.Unique object>¶

class pytsv.configs.ConfigTree[source]¶

Bases: Config

Parameters to configure the parameters of a tree to show

child_column = <pytconf.param.Unique object>¶

parent_column = <pytconf.param.Unique object>¶

roots = []¶

class pytsv.configs.ConfigTsvReader[source]¶

Bases: Config

Parameters to configure a TSV reader object

check_non_ascii = False¶

validate_all_lines_same_number_of_fields = True¶

class pytsv.configs.ConfigWeightValue[source]¶

Bases: Config

Config weight and Value

value_column = <pytconf.param.Unique object>¶

weight_column = <pytconf.param.Unique object>¶

pytsv.core module¶

core.py

class pytsv.core.TsvReader(filename: str, mode: str = 'rt', use_any_format: bool = True, validate_all_lines_same_number_of_fields: bool = True, num_fields: int | None = None, skip_comments: bool = False, check_non_ascii: bool = False, newline: str | None = '\n')[source]¶

Bases: object

close() → None[source]¶

class pytsv.core.TsvWriter(filename: str, mode: str = 'wt', throw_exceptions: bool = False, sanitize: bool = True, fields_to_clean: List[int] | None = None, clean_edges: bool = True, sub_trailing: bool = True, remove_non_ascii: bool = True, lower_case: bool = True, check_num_fields: bool = True, num_fields: int | None = None, convert_to_string: bool = True, do_gzip: bool = False, filename_detect: bool = True)[source]¶

Bases: object

close() → None[source]¶

write(input_list: Sequence[str]) → None[source]¶

pytsv.core.clean(text: str, clean_edges: bool = True, sub_trailing: bool = True, remove_non_ascii: bool = True, lower_case: bool = True) → str[source]¶

pytsv.core.do_aggregate(input_file_names: Iterable[str], match_columns: List[int], aggregate_columns: List[int], output_file_name: str, floating_point: bool) → None[source]¶: This function aggregates a bunch of input files by integers. :param input_file_names: :param match_columns: :param aggregate_columns: :param output_file_name: :param floating_point: :return:

pytsv.core.group_by(input_file_names: Iterable[str], group_by_columns: List[int], collect_columns: List[int], output_file_template: str) → List[str][source]¶

pytsv.core.is_ascii(s: str) → bool[source]¶

pytsv.core.write_data(data: List[List[str]], output_file_name: str) → None[source]¶

pytsv.core.write_dict(filename: str, d: Dict[str, str]) → None[source]¶

pytsv.main module¶

main.py

class pytsv.main.JobInfo(check_not_ascii: bool, input_file: str, serial: int, progress: bool, pattern: str, columns: List[int])[source]¶

Bases: object

check_not_ascii: bool¶

columns: List[int]¶

input_file: str¶

pattern: str¶

progress: bool¶

serial: int¶

class pytsv.main.JobReturnValue(serial: int, files: Dict[str, str])[source]¶

Bases: object

files: Dict[str, str]¶

serial: int¶

class pytsv.main.MyEventTypes(*values)[source]¶

Bases: Enum

key_found = 1¶

key_not_found = 0¶

unknown_added = 2¶

class pytsv.main.ParamsForJob[source]¶: Bases: object

pytsv.main.aggregate() → None[source]¶

pytsv.main.check() → None[source]¶: TODO: - add ability to say how many lines are bad and print their content

pytsv.main.check_columns_unique() → None[source]¶

pytsv.main.check_file(params_for_job: ParamsForJob) → bool[source]¶

pytsv.main.clean_by_field_num() → None[source]¶

pytsv.main.csv_to_tsv() → None[source]¶

pytsv.main.cut() → None[source]¶

pytsv.main.drop_duplicates_by_columns() → None[source]¶

pytsv.main.fix_columns() → None[source]¶

pytsv.main.histogram_by_column() → None[source]¶

pytsv.main.join() → None[source]¶

pytsv.main.lc() → None[source]¶

pytsv.main.main()[source]¶

pytsv.main.majority() → None[source]¶: This means that if x1 appears more with y2 than any other values in column Y then x1, y2 will be in the output and no other entry with x1 will appear

pytsv.main.multiply() → None[source]¶

pytsv.main.process_single_file(job_info: JobInfo) → JobReturnValue[source]¶

pytsv.main.read() → None[source]¶

pytsv.main.remove_quotes() → None[source]¶

pytsv.main.sample_by_column() → None[source]¶: To run this you must supply a ‘value_column’ (the column which will be sampled) and a ‘weight_column’ which must be convertible to a floating point number.

pytsv.main.sample_by_column_old() → None[source]¶

pytsv.main.sample_by_two_columns() → None[source]¶

pytsv.main.split_by_columns() → None[source]¶

pytsv.main.split_by_columns_parallel() → None[source]¶

pytsv.main.sum_columns() → None[source]¶

pytsv.main.tree() → None[source]¶: You can also see only parts of the tree

pytsv.main.tsv_to_csv() → None[source]¶

pytsv.static module¶

version which can be consumed from within the module

Module contents¶

Initialize the module

pytsv package¶

Submodules¶

pytsv.configs module¶

pytsv.core module¶

pytsv.main module¶

pytsv.static module¶

Module contents¶

pytsv

Navigation

Related Topics