edpop_explorer package¶

Module contents¶

class edpop_explorer.BasePreparedQuery¶

Bases: object

Empty base dataclass for prepared queries. For prepared queries that can be represented by a single string, do not inherit from this class but use a simple string instead.

class edpop_explorer.BibliographicalRecord(from_reader: Type[Reader])¶

Bases: Record

Python representation of edpoprec:BibliographicalRecord.

This subclass adds fields that are specific for bibliographical records.

alternative_title: Field | None = None¶

bibliographical_format: Field | None = None¶

bookseller: Field | None = None¶

collation_formula: Field | None = None¶

contributors: List[Field] | None = None¶

dating: Field | None = None¶

digitization: List[Field] | None = None¶

extent: Field | None = None¶

fingerprint: Field | None = None¶

genres: List[Field] | None = None¶

holdings: List[Field] | None = None¶

languages: List[Field] | None = None¶

physical_description: Field | None = None¶

place_of_publication: Field | None = None¶

publisher_or_printer: Field | None = None¶

size: Field | None = None¶

title: Field | None = None¶

typographical_features: List[Field] | None = None¶

class edpop_explorer.BiographicalRecord(from_reader: Type[Reader])¶

Bases: Record

Python representation of edpoprec:BiographicalRecord.

This subclass adds fields that are specific for biographical records.

activities: List[Field] | None = None¶

activity_timespan: Field | None = None¶

gender: Field | None = None¶

name: Field | None = None¶

place_of_birth: Field | None = None¶

place_of_death: Field | None = None¶

places_of_activity: List[Field] | None = None¶

timespan: Field | None = None¶

variant_names: List[Field] | None = None¶

class edpop_explorer.CERLReader¶

Bases: GetByIdBasedOnQueryMixin, Reader

A generic reader class for the CERL databases on the data.cerl.org platform.

This is an abstract class – to use, derive from this class, set the API_URL, API_BY_ID_BASE_URL and LINK_BASE_URL constant attributes, and implement the _convert_record class method.

API_BY_ID_BASE_URL: str¶: The base URL of the API for retrieving single records, of the form https://data.cerl.org/<CATALOGUE>/.

API_URL: str¶: The base URL of the search API, of the form https://data.cerl.org/<CATALOGUE>/_search.

DEFAULT_RECORDS_PER_PAGE: int = 10¶: The number of records to fetch at a time using the fetch() method if not determined by user.

LINK_BASE_URL: str¶: The base URL for userfriendly representations of single records.

MAXIMUM_RECORDS_PER_PAGE: int | None = 100¶: Maximum number of records to fetch per page. If not set, there is no predefined limit.

additional_params: Dict[str, str] | None = None¶

fetch_range(range_to_fetch: range) → range¶

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:: range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.
Returns:: The range of record indexes that has actually been fetched.

classmethod transform_query(query) → str¶

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

class edpop_explorer.DatabaseFileMixin¶

Bases: object

Mixin that adds a method prepare_data to a Reader class, which will make the database file available in the database_path attribute as a pathlib.Path object. If the constant attribute DATABASE_URL is given, the database will be downloaded from that URL if the data is not yet available. The database file will be (expected to be) stored in the application data directory using the filename specified in the constant attribute DATABASE_FILENAME, which has to be specified by the user of this mixin.

DATABASE_FILENAME: str¶: The filename (not the full path) under which the database is expected to be stored.

DATABASE_LICENSE: str | None = None¶: A URL that contains the license of the downloaded database file.

DATABASE_URL: str | None = None¶: The URL to download the database file from. If this attribute is None, automatically downloading the database file is not supported.

database_path: Path | None = None¶: The path to the database file. Will be set by the prepare_data method.

prepare_data() → None¶: Prepare the database file by confirming that it is available, and if not, by attempting to download it.

class edpop_explorer.DigitizationField(original_text: str)¶

Bases: Field

description: str | None = None¶

iiif_manifest: str | None = None¶

preview_url: str | None = None¶

property summary_text: str | None¶

url: str | None = None¶

class edpop_explorer.Field(original_text: str)¶

Bases: object

Python representation of edpoprec:Field.

This base class has two user-defined subfields: original_text (which is required and should be passed to the constructor), and unknown. User-defined subfields are simple object attributes and can be accessed directly. In addition, this base class defines an automatic subfield normalized_text, which is a read-only property that is only available if normalization is supported by the field – this is not the case for this base class. In those cases, it is still possible to set this field using the set_normalized_text method. Except original_text, all subfields are optional and are None by default. Use to_graph() to obtain an RDF graph. The subject node is by default a blank node, but this may be overridden by setting the subject_node attribute.

Subclasses should override the _rdf_class attribute to the corresponding RDF class. Subclasses can define additional subfields by adding additional public attributes and by registring them in the SUBFIELDS constant attribute. For registring, a constructor __init__ should be defined that first calls the parent’s constructor and then adds the subfields one by one using self.SUBFIELDS.append(('<attribute-name>', EDPOPREC.<rdf-property-name>, '<datatype>')), where <datatype> is any of the datatypes defined in the DATATYPES constant of this module. Subclasses may furthermore define the _normalized_text private method.

authority_record: str | None = None¶: Subfield – may contain the URI of an authority record

normalize() → NormalizationResult¶: Perform normalization on this field, based on the normalizer attribute. Subclasses of Field may predefine a normalizer function, but this can always be overridden.

normalizer: Callable | None = None¶

original_text: str¶: Subfield – text of this field according to the original record.

subject_node: Node¶: This field’s subject node if converted to RDF. This is a blank node by default.

property summary_text: str | None¶

to_graph() → Graph¶: Create an rdflib RDF graph according to the current data.

unknown: bool | None = None¶: Subfield – indicates whether the value of this field is explicitly marked as unknown in the original record.

exception edpop_explorer.FieldError¶: Bases: Exception

class edpop_explorer.GetByIdBasedOnQueryMixin¶

Bases: ABC

Mixin for readers that are based on an API that has no special way of retrieving single records – instead, these readers fetch single records using a list query. To use, make sure to override the _prepare_get_by_id_query method, which defines the list query that should be used.

classmethod get_by_id(identifier: str) → Record¶

class edpop_explorer.LazyRecordMixin¶

Bases: ABC

Abstract mixin that adds an interface for lazy loading to a Record.

To use, implement the fetch() method and make sure that it fills the record’s data attributes and its Fields and that the fetched attribute is set to True.

abstractmethod fetch() → None¶

fetched: bool = False¶

class edpop_explorer.LocationField(original_text: str)¶

Bases: Field

COUNTRY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/country')¶

LOCALITY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/locality')¶

location_type: URIRef | None = None¶

class edpop_explorer.Marc21BibliographicalReaderMixin¶: Bases: Reader, ABC

class edpop_explorer.Marc21BibliographicalRecord(from_reader: Type[Reader])¶

Bases: Marc21DataMixin, BibliographicalRecord

A combination of BibliographicalRecord and Marc21DataMixin.

class edpop_explorer.Marc21Data(fields: List[Marc21Field] = <factory>, controlfields: Dict[str, str]=<factory>, picafields: List[Marc21Field] = <factory>, raw: dict = <factory>)¶

Bases: RawData

Python representation of the data inside a Marc21 record

controlfields: Dict[str, str]¶

fields: List[Marc21Field]¶

get_all_subfields(fieldnumber: str, subfield: str, picaxml=False) → List[str]¶: Return a list of subfields that matches the requested field number and subfield. May return an empty list if the field and subfield do not occur.

get_fields(fieldnumber: str, picaxml=False) → List[Marc21Field]¶: Return a list of fields with a given field number. May return an empty list if field does not occur.

get_first_field(fieldnumber: str, picaxml=False) → Marc21Field | None¶: Return the first occurance of a field with a given field number. May be useful for fields that appear only once, such as 245. Return None if field is not found.

get_first_subfield(fieldnumber: str, subfield: str | tuple[str], picaxml=False) → str | None¶: Return the requested subfield of the first occurance of a field with the given field number. Return None if field is not found or if the subfield is not present on the first occurance of the field. subfield may be a tuple, in that case a concatenation of all given subfields is returned.

picafields: List[Marc21Field]¶

raw: dict¶

to_dict() → dict¶: Give a dict representation of the raw data.

class edpop_explorer.Marc21DataMixin¶

Bases: object

A mixin that adds a data attribute to a Record class to contain an instance of Marc21Data.

data: Marc21Data | None = None¶

show_record() → str¶

class edpop_explorer.Marc21Field(fieldnumber: str, indicator1: str | None = None, indicator2: str | None = None, subfields: Dict[str, str]=<factory>, description: str | None = None)¶

Bases: object

Python representation of a single field in a Marc21 record

description: str | None = None¶

fieldnumber: str¶

indicator1: str | None = None¶

indicator2: str | None = None¶

subfields: Dict[str, str]¶

exception edpop_explorer.NotFoundError¶: Bases: ReaderError

class edpop_explorer.RawData¶

Bases: ABC

Base class to store raw original data of a record. Only defines an abstract method to_dict.

abstractmethod to_dict() → dict¶: Give a dict representation of the raw data.

class edpop_explorer.Reader¶

Bases: ABC

Base reader class (abstract).

This abstract base class provides a common interface for all readers. To use, instantiate a subclass, set a query using the prepare_query() or set_query() method, call fetch() and subsequently fetch_next() until you have the number of results that you want. The attributes number_of_results, number_fetched and records will be updated after fetching.

To create a concrete reader, make a subclass that implements the fetch_range() and transform_query() methods and set the READERTYPE and CATALOG_URIREF attributes. fetch_range() should populate the records, number_of_results, number_fetched and range_fetched attributes.

ALLOW_EMPTY_QUERY = False¶: If True, it is possible to enter an empty query to fetch all records.

CATALOG_URIREF: URIRef | None = None¶

DEFAULT_RECORDS_PER_PAGE: int = 10¶: The number of records to fetch at a time using the fetch() method if not determined by user.

DESCRIPTION: str | None = None¶: Information about the contents of the corresponding catalogue, to be used in user interfaces.

FETCH_ALL_AT_ONCE = False¶: True if the reader is configured to fetch all records at once, even if the user only needs a subset.

IRI_PREFIX: str | None = None¶: The prefix to use to create an IRI out of a record identifier. If an IRI cannot be created with a simple prefix, the identifier_to_iri and iri_to_identifier methods have to be overridden.

MAXIMUM_RECORDS_PER_PAGE: int | None = None¶: Maximum number of records to fetch per page. If not set, there is no predefined limit.

READERTYPE: str | None = None¶: The type of the reader, out of BIOGRAPHICAL and BIBLIOGRAPHICAL (defined in the edpop_explorer package).

SHORT_NAME: str | None = None¶: Short name of the corresponding catalogue, to be used in user interfaces.

adjust_start_record(start_number: int) → None¶

Skip the given number of first records and start fetching afterwards.

This functionality may be ignored by readers that can only load all records at once; generally these are readers that return lazy records.

classmethod catalog_to_graph() → Graph¶: Create an RDF representation of the catalog that this reader supports as an instance of EDPOPREC:Catalog.

fetch(number: int | None = None) → range¶: Perform an initial or subsequent query. Most readers fetch a limited number of records at once – this number depends on the reader but it may be adjusted using the number parameter. Other readers fetch all records at once and ignore the number parameter. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. Returns the range of record indexes that has been fetched.

abstractmethod fetch_range(range_to_fetch: range) → range¶

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:: range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.
Returns:: The range of record indexes that has actually been fetched.

property fetching_exhausted: bool¶: Return True if all results have been fetched.

property fetching_started: bool¶: True if fetching has started, otherwise False. As soon as fetching has started, changing the query is not possible anymore.

generate_identifier() → str¶

Generate an identifier for this reader that is unique for the combination of reader type and prepared query. This identifier can be used when the reader has to be reused across sessions by pickling and unpickling.

Note: while the identifier is guaranteed to be unique, there is no guarantee that the generated identifier is the same for every combination of reader type and prepared query.

get(index: int, allow_fetching: bool = True) → Record¶

Get a record with a specific index. If the record is not yet available, fetch additional records to make it available.

Parameters:

index – The number of the record to get.
allow_fetching – Allow fetching the record from an external source if it was not yet fetched.

abstractmethod classmethod get_by_id(identifier: str) → Record¶: Get a single record by its identifier.

classmethod get_by_iri(iri: str) → Record¶: Get a single records by its IRI.

classmethod get_catalog_slug() → str | None¶

classmethod identifier_to_iri(identifier: str) → str¶

classmethod iri_to_identifier(iri: str) → str¶

property number_fetched: int¶: The number of results that has been fetched so far, or 0 if no fetch has been performed yet.

number_of_results: int | None = None¶: The total number of results for the query, or None if fetching has not yet started and the number is not yet known.

prepare_query(query: str) → None¶: Prepare a query for use by the reader’s API. Updates the prepared_query attribute.

prepared_query: str | BasePreparedQuery | None = None¶: A transformed version of the query, available after calling prepare_query() or set_query.

records: Dict[int, Record]¶: The records that have been fetched as instances of (a subclass of) Record.

set_query(query: str | BasePreparedQuery) → None¶: Set an exact query. Updates the prepared_query attribute.

abstractmethod classmethod transform_query(query: str) → str | BasePreparedQuery¶

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

exception edpop_explorer.ReaderError¶

Bases: Exception

Generic exception for failures in Reader class. More specific errors derive from this class.

class edpop_explorer.Record(from_reader: Type[Reader])¶

Bases: object

Python representation of edpoprec:Record.

This base class provides some basic attributes, an infrastructure to define fields and a method to convert the record to RDF. While this is a non-abstract base class, no fields are defined here – these should be added in subclasses. Record and its subclasses should be created by calling the constructor with the Reader class as the from_reader parameter and by setting the data, link, identifier and subject_node attributes (all are optional but recommended), as well as the fields that are defined by the subclass. fields are set using the attribute with the same name: set them to an instance of Field or to None. The basic attributes and the fields are None by default.

Subclasses should override the _rdf_class attribute to the corresponding RDF class. They should define additional fields by adding additional public attributes defaulting to None and by registring them in the _fields attribute. For registring, a constructor __init__ should be defined that first calls the parent’s constructor and then adds the fields by adding tuples to _fields in the form ('<attribute-name>', EDPOPREC.<rdf-property-name>, <Field class name>).

data: None | dict | RawData = None¶: The raw original data of a record.

fetch() → None¶: Fetch the full contents of the record if this record works with lazy loading (i.e., if the record’s class derives from RDFRecordMixin). If the record is not lazy, this method does nothing.

from_reader: Type[Reader]¶: The subject node, which will be used to convert the record to RDF. This is a blank node by default.

get_data_dict() → dict | None¶: Convenience function to get the record’s raw data as a dict, or None if it is not available.

identifier: str | None = None¶: Unique identifier used by the source catalog.

property iri: str | None¶: A stable IRI based on the identifier attribute. None if the identifier attribute is not set.

link: str | None = None¶: A user-friendly link where the user can find the record.

property subject_node: Node¶: A subject node based on the identifier attribute. If the identifier attribute is not set, a blank node.

to_graph() → Graph¶: Return an RDF graph for this record.

exception edpop_explorer.RecordError¶: Bases: Exception

class edpop_explorer.SRUMarc21BibliographicalReader¶

Bases: SRUMarc21Reader, Marc21BibliographicalReaderMixin, ABC

Subclass of SRUMarc21Reader that adds functionality to create instances of BibliographicRecord.

This subclass assumes that the Marc21 data is according to the standard format of Marc21 for bibliographical data. See: https://www.loc.gov/marc/bibliographic/

READERTYPE: str | None = 'bibliographical'¶: The type of the reader, out of BIOGRAPHICAL and BIBLIOGRAPHICAL (defined in the edpop_explorer package).

records: List[Marc21BibliographicalRecord]¶: The records that have been fetched as instances of (a subclass of) Record.

class edpop_explorer.SRUMarc21Reader¶

Bases: SRUReader

Subclass of SRUReader that adds Marc21 functionality.

This class is still abstract and to create concrete readers the _get_link(), _get_identifier() and _convert_record methods should be implemented.

abstractmethod classmethod _convert_record(sruthirecord: dict) → Record¶: Convert the output of sruthi into an instance of (a subclass of) Record.

abstractmethod classmethod _get_link(data: Marc21Data) → str | None¶: Get a public URL according to the Marc21 data or None if it is not available.

abstractmethod classmethod _get_identifier(data: Marc21Data) → str | None¶: Get the unique identifier from the Marc21 data or None if it is not available.

marcxchange_prefix: str = ''¶

picaxml_prefix: str = 'info:srw/schema/5/picaXML-v1.0:'¶

class edpop_explorer.SRUReader¶

Bases: GetByIdBasedOnQueryMixin, Reader

Subclass of Reader that adds basic SRU functionality using the sruthi library.

This class is still abstract and subclasses should implement the transform_query() and _convert_record() methods and set the attributes sru_url and sru_version.

The _prepare_get_by_id_query() method by default returns the transformed version of the identifier as a query, which normally works, but this may be optimised by overriding it.

abstractmethod classmethod _convert_record(sruthirecord: dict) → Record¶: Convert the output of sruthi into an instance of (a subclass of) Record.

fetch_range(range_to_fetch: range) → range¶

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:: range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.
Returns:: The range of record indexes that has actually been fetched.

prepare_query(query) → None¶: Prepare a query for use by the reader’s API. Updates the prepared_query attribute.

query: str | None = None¶

session: Session¶: The Session object of the requests library.

sru_additional_schema: str | None = None¶: Additional SRU schemas for which an additional request is made. The results are merged in the SRU results. If None (default), do not make an additional request.

sru_schema: str | None = None¶: The requested SRU schema. If None (default), use the default schema of the SRU provider.

sru_url: str¶: URL of the SRU API.

sru_version: str¶: Version of the SRU protocol. Can be ‘1.1’ or ‘1.2’.

abstractmethod classmethod transform_query(query: str) → str¶

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

edpop_explorer.bind_common_namespaces(graph: Graph) → None¶

Bind the RDF namespaces that are in use across this package to the specified graph.

These are: RDF, RDFS, EDPOPREC.

edpop_explorer package¶

Module contents¶

EDPOP Explorer

Navigation

Related Topics