edpop_explorer package

Module contents

class edpop_explorer.BasePreparedQuery

Bases: object

Empty base dataclass for prepared queries. For prepared queries that can be represented by a single string, do not inherit from this class but use a simple string instead.

class edpop_explorer.BibliographicalRecord(from_reader: Type[Reader])

Bases: Record

Python representation of edpoprec:BibliographicalRecord.

This subclass adds fields that are specific for bibliographical records.

alternative_title: Field | None = None
bibliographical_format: Field | None = None
bookseller: Field | None = None
collation_formula: Field | None = None
contributors: List[Field] | None = None
dating: Field | None = None
digitization: List[Field] | None = None
extent: Field | None = None
fingerprint: Field | None = None
genres: List[Field] | None = None
holdings: List[Field] | None = None
languages: List[Field] | None = None
physical_description: Field | None = None
place_of_publication: Field | None = None
publisher_or_printer: Field | None = None
size: Field | None = None
title: Field | None = None
typographical_features: List[Field] | None = None
class edpop_explorer.BiographicalRecord(from_reader: Type[Reader])

Bases: Record

Python representation of edpoprec:BiographicalRecord.

This subclass adds fields that are specific for biographical records.

activities: List[Field] | None = None
activity_timespan: Field | None = None
gender: Field | None = None
name: Field | None = None
place_of_birth: Field | None = None
place_of_death: Field | None = None
places_of_activity: List[Field] | None = None
timespan: Field | None = None
variant_names: List[Field] | None = None
class edpop_explorer.CERLReader

Bases: GetByIdBasedOnQueryMixin, Reader

A generic reader class for the CERL databases on the data.cerl.org platform.

This is an abstract class – to use, derive from this class, set the API_URL, API_BY_ID_BASE_URL and LINK_BASE_URL constant attributes, and implement the _convert_record class method.

API_BY_ID_BASE_URL: str

The base URL of the API for retrieving single records, of the form https://data.cerl.org/<CATALOGUE>/.

API_URL: str

The base URL of the search API, of the form https://data.cerl.org/<CATALOGUE>/_search.

DEFAULT_RECORDS_PER_PAGE: int = 10

The number of records to fetch at a time using the fetch() method if not determined by user.

The base URL for userfriendly representations of single records.

MAXIMUM_RECORDS_PER_PAGE: int | None = 100

Maximum number of records to fetch per page. If not set, there is no predefined limit.

additional_params: Dict[str, str] | None = None
fetch_range(range_to_fetch: range) range

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:

range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.

Returns:

The range of record indexes that has actually been fetched.

classmethod transform_query(query) str

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

class edpop_explorer.DatabaseFileMixin

Bases: object

Mixin that adds a method prepare_data to a Reader class, which will make the database file available in the database_path attribute as a pathlib.Path object. If the constant attribute DATABASE_URL is given, the database will be downloaded from that URL if the data is not yet available. The database file will be (expected to be) stored in the application data directory using the filename specified in the constant attribute DATABASE_FILENAME, which has to be specified by the user of this mixin.

DATABASE_FILENAME: str

The filename (not the full path) under which the database is expected to be stored.

DATABASE_LICENSE: str | None = None

A URL that contains the license of the downloaded database file.

DATABASE_URL: str | None = None

The URL to download the database file from. If this attribute is None, automatically downloading the database file is not supported.

database_path: Path | None = None

The path to the database file. Will be set by the prepare_data method.

prepare_data() None

Prepare the database file by confirming that it is available, and if not, by attempting to download it.

class edpop_explorer.DigitizationField(original_text: str)

Bases: Field

description: str | None = None
iiif_manifest: str | None = None
preview_url: str | None = None
property summary_text: str | None
url: str | None = None
class edpop_explorer.Field(original_text: str)

Bases: object

Python representation of edpoprec:Field.

This base class has two user-defined subfields: original_text (which is required and should be passed to the constructor), and unknown. User-defined subfields are simple object attributes and can be accessed directly. In addition, this base class defines an automatic subfield normalized_text, which is a read-only property that is only available if normalization is supported by the field – this is not the case for this base class. In those cases, it is still possible to set this field using the set_normalized_text method. Except original_text, all subfields are optional and are None by default. Use to_graph() to obtain an RDF graph. The subject node is by default a blank node, but this may be overridden by setting the subject_node attribute.

Subclasses should override the _rdf_class attribute to the corresponding RDF class. Subclasses can define additional subfields by adding additional public attributes and by registring them in the SUBFIELDS constant attribute. For registring, a constructor __init__ should be defined that first calls the parent’s constructor and then adds the subfields one by one using self.SUBFIELDS.append(('<attribute-name>', EDPOPREC.<rdf-property-name>, '<datatype>')), where <datatype> is any of the datatypes defined in the DATATYPES constant of this module. Subclasses may furthermore define the _normalized_text private method.

authority_record: str | None = None

Subfield – may contain the URI of an authority record

normalize() NormalizationResult

Perform normalization on this field, based on the normalizer attribute. Subclasses of Field may predefine a normalizer function, but this can always be overridden.

normalizer: Callable | None = None
original_text: str

Subfield – text of this field according to the original record.

subject_node: Node

This field’s subject node if converted to RDF. This is a blank node by default.

property summary_text: str | None
to_graph() Graph

Create an rdflib RDF graph according to the current data.

unknown: bool | None = None

Subfield – indicates whether the value of this field is explicitly marked as unknown in the original record.

exception edpop_explorer.FieldError

Bases: Exception

class edpop_explorer.GetByIdBasedOnQueryMixin

Bases: ABC

Mixin for readers that are based on an API that has no special way of retrieving single records – instead, these readers fetch single records using a list query. To use, make sure to override the _prepare_get_by_id_query method, which defines the list query that should be used.

classmethod get_by_id(identifier: str) Record
class edpop_explorer.LazyRecordMixin

Bases: ABC

Abstract mixin that adds an interface for lazy loading to a Record.

To use, implement the fetch() method and make sure that it fills the record’s data attributes and its Fields and that the fetched attribute is set to True.

abstractmethod fetch() None
fetched: bool = False
class edpop_explorer.LocationField(original_text: str)

Bases: Field

COUNTRY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/country')
LOCALITY = rdflib.term.URIRef('https://dhstatic.hum.uu.nl/edpop-records/0.1.0/locality')
location_type: URIRef | None = None
class edpop_explorer.Marc21BibliographicalReaderMixin

Bases: Reader, ABC

class edpop_explorer.Marc21BibliographicalRecord(from_reader: Type[Reader])

Bases: Marc21DataMixin, BibliographicalRecord

A combination of BibliographicalRecord and Marc21DataMixin.

class edpop_explorer.Marc21Data(fields: List[Marc21Field] = <factory>, controlfields: Dict[str, str]=<factory>, picafields: List[Marc21Field] = <factory>, raw: dict = <factory>)

Bases: RawData

Python representation of the data inside a Marc21 record

controlfields: Dict[str, str]
fields: List[Marc21Field]
get_all_subfields(fieldnumber: str, subfield: str, picaxml=False) List[str]

Return a list of subfields that matches the requested field number and subfield. May return an empty list if the field and subfield do not occur.

get_fields(fieldnumber: str, picaxml=False) List[Marc21Field]

Return a list of fields with a given field number. May return an empty list if field does not occur.

get_first_field(fieldnumber: str, picaxml=False) Marc21Field | None

Return the first occurance of a field with a given field number. May be useful for fields that appear only once, such as 245. Return None if field is not found.

get_first_subfield(fieldnumber: str, subfield: str | tuple[str], picaxml=False) str | None

Return the requested subfield of the first occurance of a field with the given field number. Return None if field is not found or if the subfield is not present on the first occurance of the field. subfield may be a tuple, in that case a concatenation of all given subfields is returned.

picafields: List[Marc21Field]
raw: dict
to_dict() dict

Give a dict representation of the raw data.

class edpop_explorer.Marc21DataMixin

Bases: object

A mixin that adds a data attribute to a Record class to contain an instance of Marc21Data.

data: Marc21Data | None = None
show_record() str
class edpop_explorer.Marc21Field(fieldnumber: str, indicator1: str | None = None, indicator2: str | None = None, subfields: Dict[str, str]=<factory>, description: str | None = None)

Bases: object

Python representation of a single field in a Marc21 record

description: str | None = None
fieldnumber: str
indicator1: str | None = None
indicator2: str | None = None
subfields: Dict[str, str]
exception edpop_explorer.NotFoundError

Bases: ReaderError

class edpop_explorer.RawData

Bases: ABC

Base class to store raw original data of a record. Only defines an abstract method to_dict.

abstractmethod to_dict() dict

Give a dict representation of the raw data.

class edpop_explorer.Reader

Bases: ABC

Base reader class (abstract).

This abstract base class provides a common interface for all readers. To use, instantiate a subclass, set a query using the prepare_query() or set_query() method, call fetch() and subsequently fetch_next() until you have the number of results that you want. The attributes number_of_results, number_fetched and records will be updated after fetching.

To create a concrete reader, make a subclass that implements the fetch_range() and transform_query() methods and set the READERTYPE and CATALOG_URIREF attributes. fetch_range() should populate the records, number_of_results, number_fetched and range_fetched attributes.

ALLOW_EMPTY_QUERY = False

If True, it is possible to enter an empty query to fetch all records.

CATALOG_URIREF: URIRef | None = None
DEFAULT_RECORDS_PER_PAGE: int = 10

The number of records to fetch at a time using the fetch() method if not determined by user.

DESCRIPTION: str | None = None

Information about the contents of the corresponding catalogue, to be used in user interfaces.

FETCH_ALL_AT_ONCE = False

True if the reader is configured to fetch all records at once, even if the user only needs a subset.

IRI_PREFIX: str | None = None

The prefix to use to create an IRI out of a record identifier. If an IRI cannot be created with a simple prefix, the identifier_to_iri and iri_to_identifier methods have to be overridden.

MAXIMUM_RECORDS_PER_PAGE: int | None = None

Maximum number of records to fetch per page. If not set, there is no predefined limit.

READERTYPE: str | None = None

The type of the reader, out of BIOGRAPHICAL and BIBLIOGRAPHICAL (defined in the edpop_explorer package).

SHORT_NAME: str | None = None

Short name of the corresponding catalogue, to be used in user interfaces.

adjust_start_record(start_number: int) None

Skip the given number of first records and start fetching afterwards.

This functionality may be ignored by readers that can only load all records at once; generally these are readers that return lazy records.

classmethod catalog_to_graph() Graph

Create an RDF representation of the catalog that this reader supports as an instance of EDPOPREC:Catalog.

fetch(number: int | None = None) range

Perform an initial or subsequent query. Most readers fetch a limited number of records at once – this number depends on the reader but it may be adjusted using the number parameter. Other readers fetch all records at once and ignore the number parameter. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. Returns the range of record indexes that has been fetched.

abstractmethod fetch_range(range_to_fetch: range) range

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:

range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.

Returns:

The range of record indexes that has actually been fetched.

property fetching_exhausted: bool

Return True if all results have been fetched.

property fetching_started: bool

True if fetching has started, otherwise False. As soon as fetching has started, changing the query is not possible anymore.

generate_identifier() str

Generate an identifier for this reader that is unique for the combination of reader type and prepared query. This identifier can be used when the reader has to be reused across sessions by pickling and unpickling.

Note: while the identifier is guaranteed to be unique, there is no guarantee that the generated identifier is the same for every combination of reader type and prepared query.

get(index: int, allow_fetching: bool = True) Record

Get a record with a specific index. If the record is not yet available, fetch additional records to make it available.

Parameters:
  • index – The number of the record to get.

  • allow_fetching – Allow fetching the record from an external source if it was not yet fetched.

abstractmethod classmethod get_by_id(identifier: str) Record

Get a single record by its identifier.

classmethod get_by_iri(iri: str) Record

Get a single records by its IRI.

classmethod get_catalog_slug() str | None
classmethod identifier_to_iri(identifier: str) str
classmethod iri_to_identifier(iri: str) str
property number_fetched: int

The number of results that has been fetched so far, or 0 if no fetch has been performed yet.

number_of_results: int | None = None

The total number of results for the query, or None if fetching has not yet started and the number is not yet known.

prepare_query(query: str) None

Prepare a query for use by the reader’s API. Updates the prepared_query attribute.

prepared_query: str | BasePreparedQuery | None = None

A transformed version of the query, available after calling prepare_query() or set_query.

records: Dict[int, Record]

The records that have been fetched as instances of (a subclass of) Record.

set_query(query: str | BasePreparedQuery) None

Set an exact query. Updates the prepared_query attribute.

abstractmethod classmethod transform_query(query: str) str | BasePreparedQuery

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

exception edpop_explorer.ReaderError

Bases: Exception

Generic exception for failures in Reader class. More specific errors derive from this class.

class edpop_explorer.Record(from_reader: Type[Reader])

Bases: object

Python representation of edpoprec:Record.

This base class provides some basic attributes, an infrastructure to define fields and a method to convert the record to RDF. While this is a non-abstract base class, no fields are defined here – these should be added in subclasses. Record and its subclasses should be created by calling the constructor with the Reader class as the from_reader parameter and by setting the data, link, identifier and subject_node attributes (all are optional but recommended), as well as the fields that are defined by the subclass. fields are set using the attribute with the same name: set them to an instance of Field or to None. The basic attributes and the fields are None by default.

Subclasses should override the _rdf_class attribute to the corresponding RDF class. They should define additional fields by adding additional public attributes defaulting to None and by registring them in the _fields attribute. For registring, a constructor __init__ should be defined that first calls the parent’s constructor and then adds the fields by adding tuples to _fields in the form ('<attribute-name>', EDPOPREC.<rdf-property-name>, <Field class name>).

data: None | dict | RawData = None

The raw original data of a record.

fetch() None

Fetch the full contents of the record if this record works with lazy loading (i.e., if the record’s class derives from RDFRecordMixin). If the record is not lazy, this method does nothing.

from_reader: Type[Reader]

The subject node, which will be used to convert the record to RDF. This is a blank node by default.

get_data_dict() dict | None

Convenience function to get the record’s raw data as a dict, or None if it is not available.

identifier: str | None = None

Unique identifier used by the source catalog.

property iri: str | None

A stable IRI based on the identifier attribute. None if the identifier attribute is not set.

A user-friendly link where the user can find the record.

property subject_node: Node

A subject node based on the identifier attribute. If the identifier attribute is not set, a blank node.

to_graph() Graph

Return an RDF graph for this record.

exception edpop_explorer.RecordError

Bases: Exception

class edpop_explorer.SRUMarc21BibliographicalReader

Bases: SRUMarc21Reader, Marc21BibliographicalReaderMixin, ABC

Subclass of SRUMarc21Reader that adds functionality to create instances of BibliographicRecord.

This subclass assumes that the Marc21 data is according to the standard format of Marc21 for bibliographical data. See: https://www.loc.gov/marc/bibliographic/

READERTYPE: str | None = 'bibliographical'

The type of the reader, out of BIOGRAPHICAL and BIBLIOGRAPHICAL (defined in the edpop_explorer package).

records: List[Marc21BibliographicalRecord]

The records that have been fetched as instances of (a subclass of) Record.

class edpop_explorer.SRUMarc21Reader

Bases: SRUReader

Subclass of SRUReader that adds Marc21 functionality.

This class is still abstract and to create concrete readers the _get_link(), _get_identifier() and _convert_record methods should be implemented.

abstractmethod classmethod _convert_record(sruthirecord: dict) Record

Convert the output of sruthi into an instance of (a subclass of) Record.

Get a public URL according to the Marc21 data or None if it is not available.

abstractmethod classmethod _get_identifier(data: Marc21Data) str | None

Get the unique identifier from the Marc21 data or None if it is not available.

marcxchange_prefix: str = ''
picaxml_prefix: str = 'info:srw/schema/5/picaXML-v1.0:'
class edpop_explorer.SRUReader

Bases: GetByIdBasedOnQueryMixin, Reader

Subclass of Reader that adds basic SRU functionality using the sruthi library.

This class is still abstract and subclasses should implement the transform_query() and _convert_record() methods and set the attributes sru_url and sru_version.

The _prepare_get_by_id_query() method by default returns the transformed version of the identifier as a query, which normally works, but this may be optimised by overriding it.

abstractmethod classmethod _convert_record(sruthirecord: dict) Record

Convert the output of sruthi into an instance of (a subclass of) Record.

fetch_range(range_to_fetch: range) range

Fetch a specific range of records. After fetching, the records are available in the records attribute and the number_of_results attribute will be available. If not all records of the specified range exist, only the records that exist will be fetched.

Parameters:

range_to_fetch – The range of records to fetch. step values of ranges other than 1 are not supported and may be ignored.

Returns:

The range of record indexes that has actually been fetched.

prepare_query(query) None

Prepare a query for use by the reader’s API. Updates the prepared_query attribute.

query: str | None = None
session: Session

The Session object of the requests library.

sru_additional_schema: str | None = None

Additional SRU schemas for which an additional request is made. The results are merged in the SRU results. If None (default), do not make an additional request.

sru_schema: str | None = None

The requested SRU schema. If None (default), use the default schema of the SRU provider.

sru_url: str

URL of the SRU API.

sru_version: str

Version of the SRU protocol. Can be ‘1.1’ or ‘1.2’.

abstractmethod classmethod transform_query(query: str) str

Return a version of the query that is prepared for use in the API.

This method does not have to be called directly; instead prepare_query() can be used.

edpop_explorer.bind_common_namespaces(graph: Graph) None

Bind the RDF namespaces that are in use across this package to the specified graph.

These are: RDF, RDFS, EDPOPREC.