Maniphest T183145

Refresh SWAP notebook hardware
Closed, ResolvedPublic5 Estimated Story Points


The SWAP jupyter notebook hardware is old and OOW, and we need to replace it.

Perhaps along the way we should update Jupyter too? And/or consider ?

Event Timeline

Ottomata added a subtask: Unknown Object (Task).Dec 18 2017, 2:25 PM
Ottomata renamed this task from Refresh SWAP hardware to Refresh SWAP notebook hardware.Feb 28 2018, 4:51 PM
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.
Ottomata set the point value for this task to 5.

Change 419251 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch

Change 419251 merged by Ottomata:
[operations/wheels/paws-internal@master] Update wheels for Debian Stretch

Change 419260 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator

Change 419260 merged by Ottomata:
[operations/wheels/paws-internal@master] Update jupyterhub to 0.8.1 to work with newer singleuserauthenticator

Change 419507 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP

Change 419507 merged by Ottomata:
[analytics/swap/deploy@master] Scripts to build jupyterhub based SWAP

Change 419509 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Add artifacts for initial build of swap

Change 419509 merged by Ottomata:
[analytics/swap/deploy@master] Add artifacts for initial build of swap

Change 419510 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Rename wheels_dir -> wheels

Change 419510 merged by Ottomata:
[analytics/swap/deploy@master] Rename wheels_dir -> wheels

Change 419656 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Puppetization for newer SWAP (JupyterHub) deployed via scap

Change 419821 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] now takes the destination venv path as $1

Change 419821 merged by Ottomata:
[analytics/swap/deploy@master] now takes the destination venv path as $1

Change 419656 merged by Ottomata:
[operations/puppet@production] Puppetization for newer SWAP (JupyterHub)

Change 419835 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs

Change 419835 merged by Ottomata:
[operations/puppet@production] Use venv instead of jupyter-venv for user venv dirs

I have rsynced over user home directories from notebook1001 -> notebook1003, and am upgrading the default notebook venv ($HOME/venv) by:

for u in $(ls /home); do
    if [ -d $venv ]; then
        echo "Upgrading $venv"
        sudo -u $u python3 -m venv --upgrade /home/$u/venv
        sudo -u $u $venv/bin/pip install --upgrade --no-index --find-links=$wheels_path jupyterhub jupyter jupyterlab

Will this work? ¯\_(ツ)_/¯


Updated JupyterHub with JupyterLab beta installed on notebook1003 and notebook1004. notebook1003 home directories have been copied over.

WOoO let's try this out and test it. FYI: SPARK WORKS TOO!

Change 421298 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

Change 421298 merged by Ottomata:
[analytics/jupyterhub/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

Change 421306 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install python3 statistics packages; configure user venvs with packages in puppet

Change 421306 merged by Ottomata:
[operations/puppet@production] Install python3 packages; configure user venvs from requirements

Change 421320 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix typo in jupyterhub config

Change 421320 merged by Ottomata:
[operations/puppet@production] Fix typo in jupyterhub config

Change 421353 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow user venv to use system-site-packages

Change 421353 merged by Ottomata:
[operations/puppet@production] Allow user venv to use system-site-packages

Ah, ok, a better user venv upgrade is:

for u in $(getent passwd | awk -F ':' '{print $1}'); do
    if [ -d $venv ]; then
        echo "Upgrading $venv"
        sudo -u $u python3 -m venv --upgrade /home/$u/venv
        # change system-site-packges to true
        test -f $venv/pyvenv.cfg && sed -i 's@include-system-site-packages = false@include-system-site-packages = true@' $venv/pyvenv.cfg
        sudo -u $u $venv/bin/pip install --upgrade --no-index --ignore-installed --find-links=$wheels_path --requirement=/srv/jupyterhub/deploy/frozen-requirements.txt

I've run this on all user venvs on notebook1003 and notebook1004.

Email sent (Subject: 'New SWAP (Jupyter Notebook) servers and updates!'). Timeline for notebook1001 deprecation: Monday April 2nd.

Small note for the record: I'm getting "Warning: JupyterHub seems to be served over an unsecured HTTP connection. We strongly recommend enabling HTTPS for JupyterHub" at the login screen. I guess that's rather inconsequential, considering that this goes through an SSH tunnel anyway, but I don't recall seeing the same message on notebook1001. Perhaps it's just a change in the new Jupyter version?

Yeah, if this wasn't happening before, it is almost certainly due to the JupyterHub version upgrade. Should be fine since it goes through ssh.

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).

from impala.dbapi import connect

hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')
AttributeError                            Traceback (most recent call last)
<ipython-input-3-bb76209539e0> in <module>()
----> 1 hive_conn = connect(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')

~/venv/lib/python3.5/site-packages/impala/ in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol)
    145                           ca_cert=ca_cert, user=user, password=password,
    146                           kerberos_service_name=kerberos_service_name,
--> 147                           auth_mechanism=auth_mechanism)
    148     return hs2.HiveServer2Connection(service, default_db=database)

~/venv/lib/python3.5/site-packages/impala/ in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism)
    756     transport = get_transport(sock, host, kerberos_service_name,
    757                               auth_mechanism, user, password)
--> 758
    759     protocol = TBinaryProtocol(transport)
    760     if six.PY2:

~/venv/lib/python3.5/site-packages/thrift_sasl/ in open(self)
     66   def open(self):
---> 67     if not self._trans.isOpen():

AttributeError: 'TSocket' object has no attribute 'isOpen'

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).

AttributeError: 'TSocket' object has no attribute 'isOpen'

This might help:

Hm, in the meantime, I’ve also installed pyhive, which I think has a
similar interface.

Try that?

Hm, in the meantime, I’ve also installed pyhive, which I think has a
similar interface.

Try that?

I am not sure the format is compatible with impyla (e.g. is the cursor.description part mandatory, i.e. would it need to be added every time when swapping out impyla for pyhive in an existing notebook?).

But in any case I can't get pyhive to work either right now. The example code from fails as follows (in a fresh notebook on notebook1003):

In [1]: lang=pyhive
from pyhive import hive
cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')
[('page_title', 'STRING_TYPE', None, None, None, None, True)]

AttributeError                            Traceback (most recent call last)
<ipython-input-1-3b4d63bb34fe> in <module>()
      1 from pyhive import hive
----> 2 cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
      3 cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')
      4 cursor.description
      5 [('page_title', 'STRING_TYPE', None, None, None, None, True)]

~/venv/lib/python3.5/site-packages/pyhive/ in connect(*args, **kwargs)
     62     :returns: a :py:class:`Connection` object.
     63     """
---> 64     return Connection(*args, **kwargs)

~/venv/lib/python3.5/site-packages/pyhive/ in __init__(self, host, port, username, database, auth, configuration, kerberos_service_name, password, thrift_transport)
    166                 username=username,
    167             )
--> 168             response = self._client.OpenSession(open_session_req)
    169             _check_status(response)
    170             assert response.sessionHandle is not None, "Expected a session from OpenSession"

~/venv/lib/python3.5/site-packages/TCLIService/ in OpenSession(self, req)
    185         """
    186         self.send_OpenSession(req)
--> 187         return self.recv_OpenSession()
    189     def send_OpenSession(self, req):

~/venv/lib/python3.5/site-packages/TCLIService/ in recv_OpenSession(self)
    197     def recv_OpenSession(self):
    198         iprot = self._iprot
--> 199         (fname, mtype, rseqid) = iprot.readMessageBegin()
    200         if mtype == TMessageType.EXCEPTION:
    201             x = TApplicationException()

~/venv/lib/python3.5/site-packages/thrift/protocol/ in readMessageBegin(self)
    133     def readMessageBegin(self):
--> 134         sz = self.readI32()
    135         if sz < 0:
    136             version = sz & TBinaryProtocol.VERSION_MASK

~/venv/lib/python3.5/site-packages/thrift/protocol/ in readI32(self)
    216     def readI32(self):
--> 217         buff = self.trans.readAll(4)
    218         val, = unpack('!i', buff)
    219         return val

AttributeError: 'TSaslClientTransport' object has no attribute 'readAll'

I have been using impyla on notebook1001 to run Hive queries, but this no longer works on notebook1003. Any ideas what might be wrong? See error message below (these two lines work without problem on notebook1001).

AttributeError: 'TSocket' object has no attribute 'isOpen'

This might help:

Like to someone else on that ticket, it wasn't quite clear to me which exact versions (and pip commands ) to use for that workaround. But after adapting the below from (on a similar-sounding topic), impyla appears to work for me now, albeit in an outdated version:

# cf.
!pip uninstall -y thrift
!pip uninstall -y impyla
!pip install thrift==0.9.3
!pip install impyla==0.13.8

Hmm, I'm having the same problem as @Tbayer, but that workaround isn't working for me.

> !pip show impyla
Name: impyla
Version: 0.13.8
> !pip show thrift
Name: thrift
Version: 0.9.3
> from impala.dbapi import connect
> impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')
AttributeError                            Traceback (most recent call last)
<ipython-input-16-805f90863b9a> in <module>()
      1 from impala.dbapi import connect
----> 2 impala_conn(host='analytics1003.eqiad.wmnet', port=10000, auth_mechanism='PLAIN')

~/venv/lib/python3.5/site-packages/impala/ in connect(host, port, database, timeout, use_ssl, ca_cert, auth_mechanism, user, password, kerberos_service_name, use_ldap, ldap_user, ldap_password, use_kerberos, protocol)
    145                           ca_cert=ca_cert, user=user, password=password,
    146                           kerberos_service_name=kerberos_service_name,
--> 147                           auth_mechanism=auth_mechanism)
    148     return hs2.HiveServer2Connection(service, default_db=database)

~/venv/lib/python3.5/site-packages/impala/ in connect(host, port, timeout, use_ssl, ca_cert, user, password, kerberos_service_name, auth_mechanism)
    656     transport = get_transport(sock, host, kerberos_service_name,
    657                               auth_mechanism, user, password)
--> 658
    659     protocol = TBinaryProtocol(transport)
    660     if six.PY2:

~/venv/lib/python3.5/site-packages/thrift_sasl/ in open(self)
     66   def open(self):
---> 67     if not self._trans.isOpen():

AttributeError: 'TSocket' object has no attribute 'isOpen'

But in any case I can't get pyhive to work either right now

Hm, pyhive seems to work just fine for me:

from pyhive import hive
cursor = hive.connect('analytics1003.eqiad.wmnet', 10000).cursor()
cursor.execute('SELECT page_title FROM wmf.pageview_hourly WHERE year=2017 and month=1 and day=1 and hour=0 LIMIT 10')

I had the same errors you guys saw with impyla. After parsing a few of those tickets you linked to, this invocation seems to work:

pip uninstall -y impyla thriftpy thrift_sasl sasl thrift
pip install thriftpy==0.3.9 thrift-sasl==0.2.1 sasl==0.2.1 six bit_array  impyla

This gets you the latest (0.14.1) impyla with versions of thrift-sasl, sasl, and thriftpy that should be compatible (not thrift, python3 wants thriftpy). I think impyla upstream needs to figure out their dependencies; pip install impyla should just work.

Change 425878 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes

Change 425878 merged by Ottomata:
[operations/puppet@production] Mark notebook1001 as spare and remove unused paws_internal classes

I had the same errors you guys saw with impyla. After parsing a few of those tickets you linked to, this invocation seems to work:

pip uninstall -y impyla thriftpy thrift_sasl sasl thrift
pip install thriftpy==0.3.9 thrift-sasl==0.2.1 sasl==0.2.1 six bit_array  impyla

This gets you the latest (0.14.1) impyla with versions of thrift-sasl, sasl, and thriftpy that should be compatible (not thrift, python3 wants thriftpy). I think impyla upstream needs to figure out their dependencies; pip install impyla should just work.

That seems to work; thank you!

Change 427385 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove unused jupyterhub_old module

Change 427385 merged by Ottomata:
[operations/puppet@production] Remove unused jupyterhub_old module

RobH closed subtask Unknown Object (Task) as Resolved.May 31 2018, 4:29 PM

Change 451060 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/swap/deploy@master] Update wheels with pyhive and impyla for default Hive access in prod

Change 451060 abandoned by Ottomata:
Update wheels with pyhive and impyla for default Hive access in prod

wrong repo