gdutils.extract¶
extract is a module in package gdutils that provides a class
ExtractTable. ExtractTable is used in converting and extracting
tabular data.
Module Functions¶
extract.read_file¶
-
gdutils.extract.read_file(filename: str, column: Optional[str] = None, value: Union[str, List[str], None] = None)¶ Returns an ExtractTable instance with a specified input filename.
Parameters: - filename (str) – Name/path of input file of tabular data to read.
- column (str | None, optional, default =
None) – Label of column to use as index for extracted table. - value (str | List[str] | None, optional, default =
None) – Value(s) of specified column in rows to extract.
Returns: Return type: Examples
>>> et1 = extract.read_file('example/input.shp')
>>> et2 = extract.read_file('example/file.csv', column='ID')
>>> et3 = extract.read_file('in.shp', column='foo', value='bar')
>>> et4 = extract.read_file('in.csv', column='X', value=['1','3'])
Class gdutils.extract.ExtractTable¶
-
class
gdutils.extract.ExtractTable(infile: Union[str, geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame, None] = None, outfile: Optional[str] = None, column: Optional[str] = None, value: Union[str, List[str], None] = None)¶ For extracting tabular data. Run
help(extract.ExtractTable)to view docs.Specifying outfile determines the filetype of the output table. Specifying column uses given column as output’s index. Specifying value isolates output to rows that contain values in specified column.
-
infile¶ Name/path of input file of tabular data to read.
Type: str, optional, default = None
-
outfile¶ Path of output file for writing.
Type: pathlib.Path, optional, default = None
-
column¶ Label of column to use as index for extracted table.
Type: str, optional, default = None
-
value¶ Value(s) of specified column in rows to extract.
Type: str | List[str], optional, default = None
-
Class Methods¶
extract.ExtractTable.__init__¶
-
gdutils.extract.ExtractTable.__init__(self, infile: Union[str, geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame, None] = None, outfile: Optional[str] = None, column: Optional[str] = None, value: Union[str, List[str], None] = None)¶ ExtractTable initializer. Returns an ExtractTable instance.
Parameters: - infile (str | gpd.GeoDataFrame | pd.DataFrame | None, optional, default =
None) – Name/path of input file of tabular data to read or geopandas GeoDataFrame or pandas DataFrame. - outfile (str | None, optional, default =
None) – Name/path of output file for writing. - column (str | None, optional, default = None) – Label of column to use as index for extracted table
- value (str | List[str] | None, optional, default =
None) – Value(s) of specified column in rows to extract.
Returns: An ExtractTable instance.
Return type: See also
extract.read_file()Examples
>>> et1 = extract.ExtractTable() # creates an empty ExtractTable instance
>>> et2 = extract.ExtractTable('example/input.shp') # initializes the input tabular data
>>> et3 = extract.ExtractTable('example/file.csv', column='ID') # initializes the input tabular data and sets the column to use # as the index
>>> et4 = extract.ExtractTable('input.xlsx', 'output.md') # initializes the input tabular data and specifies the output # file
>>> et5 = extract.ExtractTable('in.csv', 'out.tex', 'ID', '01') # initializes input tabular data source, output file, column to # use as index, and value that isolates subtable to extract
>>> et6 = extract.ExtractTable('in.shp', column='ID', value=['1', '3']) # initializes input data source, column, and a list of values that # isolate the subtable to extract
>>> et7 = extract.ExtractTable(gpd.GeoDataFrame()) # initializes the input data source as a geopandas GeoDataFrame
>>> et8 = extract.ExtractTable(pd.DataFrame()) # initializes the input data source as a pandas DataFrame
- infile (str | gpd.GeoDataFrame | pd.DataFrame | None, optional, default =
Instance Methods¶
extract.ExtractTable.extract¶
-
gdutils.extract.ExtractTable.extract(self) → geopandas.geodataframe.GeoDataFrame¶ Returns a GeoPandas GeoDataFrame containing extracted subtable.
Returns: A geopandas GeoDataFrame of the extracted table. Return type: gpd.GeoDataFrame Raises: RuntimeError– Raised if trying to extract from non-existent tabular data.See also
extract.ExtractTable.extract_to_file()Examples
>>> et = extract.read_file('input.csv') >>> df1 = et.extract() # extracts a GeoDataFrame from a '.csv' file >>> print(df1.head()) Unnamed: 0 col1 col2 geometry 0 asdf a b None 1 fdsa c d None 2 lkjh c 3 None
>>> et.column = 'col1' # sets index from column 'col1' >>> print(et.extract().head()) Unnamed: 0 col2 geometry col1 a asdf b None c fdsa d None c lkjh 3 None
>>> et.value = 'c' # sets the isolating value to 'c' >>> print(et.extract().head()) Unnamed: 0 col2 geometry col1 c fdsa d None c lkjh 3 None
extract.ExtractTable.extract_to_file¶
-
gdutils.extract.ExtractTable.extract_to_file(self, outfile: Optional[str] = None, driver: Optional[str] = None) → NoReturn¶ Writes the tabular extracted data to a file.
Given an optional Fiona support OGR driver, writes to file using the driver. If outfile is None, data is printed as plaintext to stdout.
Parameters: - outfile (str | None, optional, default =
None) – Name of file to write extracted data. - driver (str | None, optional, default =
None) – Name of Fiona supported OGR drivers to use for file writing.
Raises: RuntimeError– Raised if unable to extract to output file.See also
extract.ExtractTable.extract()Examples
>>> et1 = extract.read_file('input.csv', 'col2', ['b', 'd']) >>> et1.extract_to_file() # outputs the extracted table to standard output Unnamed: 0 col1 col2 b asdf a d fdsa c
>>> et1.outfile = 'output.xlsx' # sets the output file to 'output.xlsx' >>> et1.extract_to_file() # outputs the extracted Excel table to `output.xlsx'
>>> et2 = extract.ExtractTable('input.shp', 'output', 'col1', 'square') # sets the output file to 'output' >>> et2.extract_to_file('ESRI Shapefile') # extracts table to 'output' in specified format of 'ESRI Shapefile'
- outfile (str | None, optional, default =
extract.ExtractTable.list_columns¶
-
gdutils.extract.ExtractTable.list_columns(self) → numpy.ndarray¶ Returns a list of all columns in the initialized source tabular data.
Returns: An array of column names in the initialized table. Return type: np.ndarray Raises: RuntimeError– Raised if trying to list columns from non-existent tabular data.See also
extract.ExtractTable.list_values()Examples
>>> et = extract.read_file('input.csv) >>> cols = et.list_columns()) # gets a list of columns from 'input.csv' >>> print(cols) ['Unnamed: 0' 'col1' 'col2']
extract.ExtractTable.list_values¶
-
gdutils.extract.ExtractTable.list_values(self, column: Optional[str] = None, unique: Optional[bool] = False) → Union[numpy.ndarray, geopandas.array.GeometryArray]¶ Returns a list of values in the initialized column (default).
Returns a list of values in the given column (if specified). Returns a list of unique values (if specified)
Parameters: - column (str | NoneType, optional, default =
None) – Name of the column whose values are to be listed. If None, lists the values of the initialized column. - unique (bool, optional, default =
False) – If True, function lists only unique values.
Returns: An array of values in the given column of the initialized source table. If the column is the ‘geometry’ column of a geopandas GeoDataFrame, the return value is a GeometryArray.
Return type: np.ndarray | gpd.array.GeometryArray
Raises: RuntimeError– Raised if trying to list values from non-existent tabular data.KeyError– Raised if column does not exist in tabular data.RuntimeError– Raised if trying to list values from non-existent column.
See also
extract.ExtractTable.list_columns()Examples
>>> et = extract.read_file('input.csv', 'col2') >>> vals = et.list_values # gets a list of values in 'col2' from 'input.csv' >>> print(vals) ['b' 'd' '3' '5' '10']
>>> vals = et.list_values('col1') # gets a list of values in 'col1' from 'input.csv' >>> print(vals) ['a' 'c' 'c' 'c' 'b']
>>> vals = et.list_values('col1', unique=True) # gets a list of unique values in 'col1' from 'input.csv' >>> print(vals) ['a' 'c' 'b']
- column (str | NoneType, optional, default =